Pushing ContT to its limits, or how to write advanced goto, generators, and fibers in haskell
In this article, I’ll walk through ContT monad and show how you can bring return
, continue
and other imperative control-flow operators to haskell.
0. Introduction of continuations
First, let’s take a look on how do we chain functions. The easiest way is to simply compose them: (h . g . f) x
. Another common way to do this (not in haskell, though) is to pass a callback to the function. Callback version of composition looks something like this:
callbackify f = \argument callback -> callback (f argument)
f' = callbackify f
g' = callbackify g
h' = callbackify h
chain x callback = ($ x) f' g' h' callback
This style of chaining is called Continuation Passing Style (CPS). Continuation is basically just a fancy name for callback. Also, note that unlike composition, where function knows nothing about how the result is used, in CPS function receives callback explicitly and can call it multiple times or not at all.
1. Continuations in haskell
So now, let’s define a type for continuable values:
data ContT r m a = ContT { runCont :: (a -> m r) -> m r }
a -> m r
function will usually be called cc
– this stands for “current continuation”.
To get used to this definition, let’s consider few examples. The simplest thing we can define is a continuable number:
cont5 :: ContT r m Integer
cont5 = ContT ($ 5)
cont5
is a function that receives a continuations and calls it with a number 5. We can make sure that the number stored is indeed 5 by running this function with print
continuation.
ghci> runCont cont5 print
5
Yep, definitely 5. Now we can try to do something more complex, for example, modify a continuable value:
addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = ContT $
\cc -> runCont contX $
\x -> cc (x + 1)
To do this we build a new continuation based on cc
: \x -> cc (x + 1)
. It intercepts the value returned by contX
, modifies it and calls the true continuation with it. Let’s make sure that addOne cont5
stores number 6:
ghci> runCont (addOne cont5) print
6
To make continuable values easier to work with, I’ll define instances of Functor
, Applicative
and Monad
for them:
instance Functor (ContT r m) where
fmap f contX = ContT $
\cc -> runCont contX $
\x -> cc (f x)
instance Applicative (ContT r m) where
pure a = ContT ($ a)
contF <*> contX = ContT $
\cc -> runCont contF $
\f -> runCont contX $
\x -> cc (f x)
instance Monad (ContT r m) where
return = pure
contX >>= f = ContT $
\cc -> runCont contX $
\x -> runCont (f x) $
\res -> cc res
With those instances functions from above can be redefined as:
cont5 :: ContT r m Integer
cont5 = return 5
addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = do
x <- contX
return (x + 1)
Until now all examples with the continuations were pretty straightforward. Continuations were called exactly once, no skips, no multiple calls, no shenanigans. Next chapter will fix this
2. CallCC
Take another look at the ContT
type:
data ContT r m a = ContT { runCont :: (a -> m r) -> m r }
Did you notice? If we have a value of type m r
, we can just return it without calling cc
at all. Then all actions inside the continuation will be skipped, since it was never called.
Let’s temporarily assign r = ()
, m = IO
and write skip3
– a continuable value that skips everything and just prints 3:
skip3 :: ContT () IO a
skip3 = ContT $ \cc -> print 3 -- ContT (const $ print 3)
By the way, type a
can be anything, since we never use it. Now let’s check that skip3
actually works:
test1 = do
return 5
test2 = do
skip3
return 5
ghci> runCont test1 print
5
ghci> runCont test2 print
3
skip3
is definitely interesting, but because we fixed r
and m
it is not very useful. Why don’t we write something utilizing the same idea, but a bit more flexible.
To do this we have to somehow get a value of type m r
. Let’s again look inside the ContT
:
cont :: ContT r m a
cont = ContT $ \cc -> _ -- ?
cc
has type a -> m r
. That allows us to obtain m r
from a
, so we can define a skip
function inside cont
:
cont :: ContT r m a
cont = ContT $
\cc ->
let skip a = ContT (const $ cc a)
in _ -- ?
skip
has type a -> m r
and skip x
works similarly to skip3
. Inside cont
we can execute a minor function from which we can at any point switch back to the outer ContT
using our skip
function.
So we come to the definition of callCC
(call with current continuation):
type Switch r m a = forall b. a -> ContT r m b
callCC :: (Switch r m a -> ContT r m a) -> ContT r m a
callCC f = ContT $
\cc ->
let switchToCC a = ContT (const $ cc a)
in runCont (f switchToCC) cc
This function will be crucial to almost all following constructions.
Let’s walk through few callCC
examples:
test = callCC $ \exit -> do
lift $ putStrLn "Reachable"
exit ()
lift $ putStrLn "Unreachable"
ghci> runCont test (const $ return ())
Reachable
As you can see, callCC
works as expected, and returns to the outer execution. Also, as promised, here’s break
and continue
operators expressed using callCC
:
test :: ContT r IO ()
test = do
forM_ [1 .. 10] $ \i -> do
callCC $ \continue -> do
when (i == 5) $ do
continue ()
lift $ print i
ghci> runCont test (const $ return ())
1
2
3
4
6
7
8
9
10
test :: ContT r IO ()
test = do
callCC $ \break -> do
forM_ [1 .. 10] $ \i -> do
when (i == 5) $ do
break ()
lift $ print i
ghci> runCont test (const $ return ())
1
2
3
4
Notice that switchToCC
doesn’t just interrupt minor function. Technically it ends current computation and switches to the point right after the callCC
call:
test = do
val {- here -} <- callCC $ \cc -> do
-- ...
-- ...
So if switchToCC
somehow manages to escape the inner function, calling will “teleport” us back to this point. Now let’s see how we can get it out.
3. Label function (aka goto, aka useState)
We want to write a label
function. It has to receive an initial value and return pair (restart, value)
such that:
-
restart v
– runs the computation again, but withvalue
set tov
-
value
– value in current computation. At the first iteration it is set to the initial value, then to a restart value
Actually definition of this function is quite simple:
label :: a -> ContT r m (a -> ContT r m b, a)
label init = callCC $
\switch ->
let restart val = switch (restart, val)
in return (restart, init)
But it allows us to write even more imperative code in haskell. For example, loops:
test = do
(restart, counter) <- label 0
lift $ print counter
when (counter < 10) $ do
restart $ counter + 1
ghci> runCont test (const $ return ())
0
1
2
3
4
5
6
7
8
9
10
We can also define setjmp
/longjmp
. It is even simpler than label
, because we can fully ignore the value param:
setjmp = do
(restart, _) <- label ()
let longjmp = restart ()
return longjmp
test = do
longjmp <- setjmp
lift $ print 10
longjmp
ghci> runCont test (const $ return ())
10
10
10
10
10
10
...
4. Generators, Fibers, and Scheduler
Now we are ready to write even more advanced control flow operators.
Let’s reason about how we can control a generator inside haskell. In ordinary languages keyword yield
is used to do this. We will also use yield
, but here’s the nuances:
- We cannot add a keyword to haskell, so yield must be a function
- It is necessary that
yield x
returns nextyield
function, because if we use the same yield twice, we’ll exit at the same point, which is definitely not what we wanted. - We also have to define
exit
(which also must be updated on each yield) to return back to the calling point
I’ll pack them into a separate structure and update all at once:
data Controls r m x = Controls
{ yield :: x -> ContT r m (Controls r m x)
, exit :: forall b. ContT r m b
}
Then a generator function could look something like that:
test controls = do
controls <- yield controls 1
controls <- yield controls 2
controls <- yield controls 3
exit controls
It may look a bit strange that we call exit
at the end of the generator, but this will be explained a bit later. For now I’ll just forbid non-exit return from generator with types:
type Generator r m x = Controls r m x -> ContT r m Void
Next thing to write is a runToYield
function. As the name suggests, it runs a generator to a next yield. It will return either a pair of value and the next part of the generator or Nothing
if generator has ended:
runToYield :: Generator r m x -> ContT r m (Maybe (x, Generator r m x))
runToYield generator = callCC $ \exitContext -> do
let exit = exitContext Nothing
yield value = callCC $ \continueGenerator ->
exitContext $ Just (value, continueGenerator)
controls = Controls{yield, exit}
generator controls
-- If execution is here, something similar to stack overflow in imperative languages
-- goes on. Even through within haskell further execution is well-defined, it is
-- extremely confusing and unpredictable for a user
-- Also, since generator return type is Void, we must never actually be here
error "Generator exit invariant violated"
Let’s explain this code line by line:
-
callCC $ \exitContext -> do
– We create a new context, and get a function to exit back to the parent one -
exit = exitContext Nothing
– A function to hop back in a parent context from a generator -
yield value = callCC $ \continueGenerator ->
– This function will be called inside a generator, socallCC
captures generator’s context -
exitContext $ Just (value, continueGenerator)
– We return back to the calling context.continueGenerator
stores a generator context, so calling it will continue generator execution,value
is just a value yielded by a generator. -
controls = Controls{yield, exit}
– Just a convenient name -
generator controls
– Actually running the generator -
error "Generator exit invariant violated"
– Types guarantee that this line is unreachable, but just in case
Fibers can be viewed as generators, whose yield function just returns control back to the scheduler, without passing any values. I’ll define few synonyms for them:
type Fiber r m = Generator r m ()
suspend :: Controls r m () -> ContT r m (Controls r m ())
suspend controls = yield controls ()
Next we have to write an actual scheduler to run multiple fibers. I’ll write the simplest possible: it will run every fiber, remove finished, repeat until no fibers left.
scheduler :: [Fiber r m] -> ContT r m ()
scheduler threads = do
let round threads = do
nextThreads <- forM threads $ \thread -> do
res <- runToYield thread
return $ snd <$> res
return $ catMaybes nextThreads -- (filtering fibers)
(loop, threads) <- label threads
threadsLeft <- round threads
unless (length threadsLeft == 0) $ do
loop threadsLeft
All preparations are done, now let’s see if this actually works:
debug str = liftIO $ putStrLn str
fiberA :: Fiber r IO
fiberA controls = do
debug "Started fiber A"
controls <- suspend controls
debug "Running fiber A"
controls <- suspend controls
debug "Exiting fiber A"
exit controls
fiberB :: Fiber r IO
fiberB controls = do
debug "Started fiber B"
controls <- suspend controls
debug "Running fiber B"
controls <- suspend controls
debug "Running fiber B again"
controls <- suspend controls
debug "Exiting fiber B"
exit controls
ghci> runCont (scheduler [fiberA, fiberB]) (const $ return ())
Started fiber A
Started fiber B
Running fiber A
Running fiber B
Exiting fiber A
Running fiber B again
Exiting fiber B
It does! Another mind-blowing use for haskell (won’t recommend to use this in production, through)