Pushing ContT to its limits, or how to write advanced goto, generators, and fibers in haskell

In this article, I’ll walk through ContT monad and show how you can bring return, continue and other imperative control-flow operators to haskell.

0. Introduction of continuations

First, let’s take a look on how do we chain functions. The easiest way is to simply compose them: (h . g . f) x. Another common way to do this (not in haskell, though) is to pass a callback to the function. Callback version of composition looks something like this:

callbackify f = \argument callback -> callback (f argument) 

f' = callbackify f
g' = callbackify g
h' = callbackify h

chain x callback = ($ x) f' g' h' callback

This style of chaining is called Continuation Passing Style (CPS). Continuation is basically just a fancy name for callback. Also, note that unlike composition, where function knows nothing about how the result is used, in CPS function receives callback explicitly and can call it multiple times or not at all.

1. Continuations in haskell

So now, let’s define a type for continuable values:

data ContT r m a = ContT { runCont :: (a -> m r) -> m r }

a -> m r function will usually be called cc – this stands for “current continuation”.

To get used to this definition, let’s consider few examples. The simplest thing we can define is a continuable number:

cont5 :: ContT r m Integer
cont5 = ContT ($ 5)

cont5 is a function that receives a continuations and calls it with a number 5. We can make sure that the number stored is indeed 5 by running this function with print continuation.

ghci> runCont cont5 print
5

Yep, definitely 5. Now we can try to do something more complex, for example, modify a continuable value:

addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = ContT $
  \cc -> runCont contX $
    \x -> cc (x + 1)

To do this we build a new continuation based on cc: \x -> cc (x + 1). It intercepts the value returned by contX, modifies it and calls the true continuation with it. Let’s make sure that addOne cont5 stores number 6:

ghci> runCont (addOne cont5) print
6

To make continuable values easier to work with, I’ll define instances of Functor, Applicative and Monad for them:

instance Functor (ContT r m) where
  fmap f contX = ContT $
    \cc -> runCont contX $
      \x -> cc (f x)

instance Applicative (ContT r m) where
  pure a = ContT ($ a)

  contF <*> contX = ContT $
    \cc -> runCont contF $
      \f -> runCont contX $
        \x -> cc (f x)

instance Monad (ContT r m) where
  return = pure

  contX >>= f = ContT $
    \cc -> runCont contX $
      \x -> runCont (f x) $
        \res -> cc res

With those instances functions from above can be redefined as:

cont5 :: ContT r m Integer
cont5 = return 5

addOne :: ContT r m Integer -> ContT r m Integer
addOne contX = do
  x <- contX
  return (x + 1)

Until now all examples with the continuations were pretty straightforward. Continuations were called exactly once, no skips, no multiple calls, no shenanigans. Next chapter will fix this

2. CallCC

Take another look at the ContT type:

data ContT r m a = ContT { runCont :: (a -> m r) -> m r }

Did you notice? If we have a value of type m r, we can just return it without calling cc at all. Then all actions inside the continuation will be skipped, since it was never called.

Let’s temporarily assign r = (), m = IO and write skip3 – a continuable value that skips everything and just prints 3:

skip3 :: ContT () IO a
skip3 = ContT $ \cc -> print 3 -- ContT (const $ print 3)

By the way, type a can be anything, since we never use it. Now let’s check that skip3 actually works:

test1 = do
  return 5

test2 = do
  skip3
  return 5

ghci> runCont test1 print
5
ghci> runCont test2 print
3

skip3 is definitely interesting, but because we fixed r and m it is not very useful. Why don’t we write something utilizing the same idea, but a bit more flexible.

To do this we have to somehow get a value of type m r. Let’s again look inside the ContT:

cont :: ContT r m a
cont = ContT $ \cc -> _ -- ?

cc has type a -> m r. That allows us to obtain m r from a, so we can define a skip function inside cont:

cont :: ContT r m a
cont = ContT $
  \cc ->
    let skip a = ContT (const $ cc a)
     in _ -- ?

skip has type a -> m r and skip x works similarly to skip3. Inside cont we can execute a minor function from which we can at any point switch back to the outer ContT using our skip function.

So we come to the definition of callCC (call with current continuation):

type Switch r m a = forall b. a -> ContT r m b

callCC :: (Switch r m a -> ContT r m a) -> ContT r m a
callCC f = ContT $
  \cc ->
    let switchToCC a = ContT (const $ cc a)
     in runCont (f switchToCC) cc

This function will be crucial to almost all following constructions.

Let’s walk through few callCC examples:

test = callCC $ \exit -> do
  lift $ putStrLn "Reachable"
  exit ()
  lift $ putStrLn "Unreachable"

ghci> runCont test (const $ return ())
Reachable

As you can see, callCC works as expected, and returns to the outer execution. Also, as promised, here’s break and continue operators expressed using callCC:

test :: ContT r IO ()
test = do
  forM_ [1 .. 10] $ \i -> do
    callCC $ \continue -> do
      when (i == 5) $ do
        continue ()

      lift $ print i

ghci> runCont test (const $ return ())
1
2
3
4
6
7
8
9
10

test :: ContT r IO ()
test = do
  callCC $ \break -> do
    forM_ [1 .. 10] $ \i -> do
      when (i == 5) $ do
        break ()

      lift $ print i

ghci> runCont test (const $ return ())
1
2
3
4

Notice that switchToCC doesn’t just interrupt minor function. Technically it ends current computation and switches to the point right after the callCC call:

test = do
  val {- here -} <- callCC $ \cc -> do
    -- ...
  -- ...

So if switchToCC somehow manages to escape the inner function, calling will “teleport” us back to this point. Now let’s see how we can get it out.

3. Label function (aka goto, aka useState)

We want to write a label function. It has to receive an initial value and return pair (restart, value) such that:

restart v – runs the computation again, but with value set to v
value – value in current computation. At the first iteration it is set to the initial value, then to a restart value

Actually definition of this function is quite simple:

label :: a -> ContT r m (a -> ContT r m b, a)
label init = callCC $
  \switch ->
    let restart val = switch (restart, val)
     in return (restart, init)

But it allows us to write even more imperative code in haskell. For example, loops:

test = do
  (restart, counter) <- label 0

  lift $ print counter

  when (counter < 10) $ do
    restart $ counter + 1

ghci> runCont test (const $ return ())
0
1
2
3
4
5
6
7
8
9
10

We can also define setjmp/longjmp. It is even simpler than label, because we can fully ignore the value param:

setjmp = do
  (restart, _) <- label ()
  let longjmp = restart ()

  return longjmp

test = do
  longjmp <- setjmp
  lift $ print 10

  longjmp

ghci> runCont test (const $ return ())
10
10
10
10
10
10
...

4. Generators, Fibers, and Scheduler

Now we are ready to write even more advanced control flow operators.

Let’s reason about how we can control a generator inside haskell. In ordinary languages keyword yield is used to do this. We will also use yield, but here’s the nuances:

We cannot add a keyword to haskell, so yield must be a function
It is necessary that yield x returns next yield function, because if we use the same yield twice, we’ll exit at the same point, which is definitely not what we wanted.
We also have to define exit (which also must be updated on each yield) to return back to the calling point

I’ll pack them into a separate structure and update all at once:

data Controls r m x = Controls
  { yield :: x -> ContT r m (Controls r m x)
  , exit :: forall b. ContT r m b
  }

Then a generator function could look something like that:

test controls = do
  controls <- yield controls 1
  controls <- yield controls 2
  controls <- yield controls 3
  exit controls

It may look a bit strange that we call exit at the end of the generator, but this will be explained a bit later. For now I’ll just forbid non-exit return from generator with types:

type Generator r m x = Controls r m x -> ContT r m Void

Next thing to write is a runToYield function. As the name suggests, it runs a generator to a next yield. It will return either a pair of value and the next part of the generator or Nothing if generator has ended:

runToYield :: Generator r m x -> ContT r m (Maybe (x, Generator r m x))
runToYield generator = callCC $ \exitContext -> do
  let exit = exitContext Nothing
      yield value = callCC $ \continueGenerator ->
        exitContext $ Just (value, continueGenerator)

      controls = Controls{yield, exit}

  generator controls
  -- If execution is here, something similar to stack overflow in imperative languages
  -- goes on. Even through within haskell further execution is well-defined, it is 
  -- extremely confusing and unpredictable for a user
  
  -- Also, since generator return type is Void, we must never actually be here 
  error "Generator exit invariant violated"

Let’s explain this code line by line:

callCC $ \exitContext -> do – We create a new context, and get a function to exit back to the parent one
exit = exitContext Nothing – A function to hop back in a parent context from a generator
yield value = callCC $ \continueGenerator -> – This function will be called inside a generator, so callCC captures generator’s context
exitContext $ Just (value, continueGenerator) – We return back to the calling context. continueGenerator stores a generator context, so calling it will continue generator execution, value is just a value yielded by a generator.
controls = Controls{yield, exit} – Just a convenient name
generator controls – Actually running the generator
error "Generator exit invariant violated" – Types guarantee that this line is unreachable, but just in case

Fibers can be viewed as generators, whose yield function just returns control back to the scheduler, without passing any values. I’ll define few synonyms for them:

type Fiber r m = Generator r m ()

suspend :: Controls r m () -> ContT r m (Controls r m ())
suspend controls = yield controls ()

Next we have to write an actual scheduler to run multiple fibers. I’ll write the simplest possible: it will run every fiber, remove finished, repeat until no fibers left.

scheduler :: [Fiber r m] -> ContT r m ()
scheduler threads = do
  let round threads = do
        nextThreads <- forM threads $ \thread -> do
          res <- runToYield thread
          return $ snd <$> res

        return $ catMaybes nextThreads -- (filtering fibers)

  (loop, threads) <- label threads
  threadsLeft <- round threads

  unless (length threadsLeft == 0) $ do
    loop threadsLeft

All preparations are done, now let’s see if this actually works:

debug str = liftIO $ putStrLn str

fiberA :: Fiber r IO
fiberA controls = do
  debug "Started fiber A"
  controls <- suspend controls
  debug "Running fiber A"
  controls <- suspend controls
  debug "Exiting fiber A"
  exit controls

fiberB :: Fiber r IO
fiberB controls = do
  debug "Started fiber B"
  controls <- suspend controls
  debug "Running fiber B"
  controls <- suspend controls
  debug "Running fiber B again"
  controls <- suspend controls
  debug "Exiting fiber B"
  exit controls

ghci> runCont (scheduler [fiberA, fiberB]) (const $ return ())
Started fiber A
Started fiber B
Running fiber A
Running fiber B
Exiting fiber A
Running fiber B again
Exiting fiber B

It does! Another mind-blowing use for haskell (won’t recommend to use this in production, through)