Yhc/RTS/Exceptions

From HaskellWiki

RTS Exceptions

Support for 'imprecise exceptions' has recently been added to Yhc. Imprecise exceptions allow any kind of exception (including 'error') to be thrown from pure code and caught in the IO monad.

This page attempts to describe how imprecise exceptions are implemented in the Runtime System.

The most important haskell functions for imprecise exceptions are 'catch' and 'throw'.

catch :: IO a -> (Exception -> IO a) -> IO a
throw :: Exception -> a

catch takes an IO actione to run, and an exception handler. It runs the IO action and if an exception occurs it runs the exception handler. throw simply throws an exception. For example

    ...
    catch doStuff $ \ e -> case e of 
                  ErrorCall _ -> return ()
                  _           -> throw e
    ...

This code will execute the IO action doStuff, and if an exception occurs it will catch it. If that exception resulted from a call to the 'error' function then it does nothing, otherwise it rethrows the exception.

The exception stack

catch blocks can be nested inside each other and throw returns control to the handler for the inner-most catch block. This structure naturally leads us to having a stack of exception handlers with each new catch block pushing a new handler on the stack at the beginning of the block and removing the top handler at the end of the block.

Haskell level implementation

throw

throw is implemented very simply

throw :: Exception -> a
throw = primThrow

primThrow :: a -> b

primCatch cannot be written in Haskell and is instead defined directly in bytecode (see src/runtime/BCKernel/primitive.c) as

primThrow e
   PUSH_ZAP_ARG e
   THROW

The THROW instruction removes the value on the top of the program stack 'e' and removes the exception handler on the top of the exception stack. In then returns control to that exception handler, this strips the program stack back to the place where the exception handler was created. THROW then pushes 'e' on the new top of the program 'stack'.

Diagramatically (see Yhc/RTS/Machine for comparison). Before executing THROW we have:

   ==================
     |   frame    |
     +------------+
     : stack-data :     <- top exception handler points here
     ..............

         . . .

     +------------+
     |   frame    |     <- the frame for primThrow
     +------------+
     : exception  :
     ..............

afterwards we have:

   ==================
     |   frame    |
     +------------+
     : stack-data :    
     ..............
     : exception  :
     ..............

catch

catch is implemented using YHC.Exception.catchException

catchException :: IO a -> (Exception -> IO a) -> IO a
catchException action handler = IO $ \ w -> primCatch
                               (unsafePerformIO action)
                               (\e -> unsafePerformIO (handler e))

catchException simply converts from the IO monad into standard closures and passes them to the primitive 'primCatch'

primCatch :: a -> (b -> a) -> a

primCatch also cannot be written in Haskell, and is instead defined directly in bytecode:

primCatch act h
     NEED_HEAP_32
     CATCH_BEGIN handler
     PUSH_ZAP_ARG act
     EVAL
     CATCH_END
     RETURN
  handler:
     PUSH_ZAP_ARG h
     APPLY 1
     RETURN_EVAL

'CATCH_BEGIN label' creates a new exception handler and pushes it on the top of the exception stack. The new exception handler will return control to the function that executed the CATCH_BEGIN instruction and resume execution at the code given by 'label'.

Having pushed the new exception handler on the stack, primCatch forces evaluation of the action, causing the code inside the catch block to be executed.

If evaluation of the action succeeds without throwing an exception then CATCH_END is executed which removes the handler on the top of the stack (which is necessarily the same handler as was pushed by CATCH_BEGIN).

However, if evaluation of the action results in a call to throw then execution returns to 'handler'. Here we need to remember that THROW pushes the exception thrown on the program stack after stripping back to the exception handler. Thus at 'handler' we know that the exception thrown is on the top of the program stack.

We thus 'PUSH_ZAP_ARG h' to push the handler function on the stack, and 'APPLY 1' to apply the handler function to the exception and finally 'RETURN_EVAL' to call the handler function.

Interpreter implementation

In the C interpreter exception handlers are stored in the heap as standard Haskell heap nodes. The structure is given in src/runtime/BCKernel/node.h

/* an exception handler */<br/>
typedef struct _ExceptionHandlerNode {
  NodeHeader                        node;
  struct _ExceptionHandlerNode*     next;           /* next exception handler in the stack */
  Node*                             vapptr;         /* vapptr of the handler code */
  CodePtr                           ip;             /* ip to jump to for the handler code */
  UInt                              spOffs;         /* offset of sp from G_spBase, offset is easier than ptr here because of GC */
  UInt                              fpOffs;         /* offset of fp from G_spBase, again offsets easier */
}ExceptionHandlerNode;

vapptr, ip and fpOffs is basically the same information as stored in stack frames (see Yhc/RTS/Machine). spOffs is included for completeness although in practice it isn't strictly necessary.

fpOffs and spOffs are offsets from G_spBase rather than direct pointers because program stacks are stored in the heap, and thus might be moved by the garbage collector. The offset from the base of the stack, however, will not be changed by the garbage collector.

The 'next' field allows us to setup a stack of exception handlers. Each process has its own exception stack, so the handler on the top of the stack is stored in the process information structure (see Yhc/RTS/Concurrency).

Ensuring that the ExceptionHandlerNodes are treated correctly by the GC is ensured by simply having CATCH_BEGIN push the created ExceptionHandlerNode on the program stack. There it remains until it's either removed by the corresponding CATCH_END or it is stripped off the stack by THROW. This simple trick means we can avoid having to scan process information structures for pointers to heap nodes.

Changing the type of IO

Before imprecise exceptions the IO type was defined as:

newtype IO a = IO (World -> Either IOError a)

World is a dummy argument to prevent us from accidentally introducing a CAF, and the function either returns (Left err) if there was some error performing the IO action or (Right a) if the IO action succeeded with value 'a'.

However imprecise exceptions allow us to improve this to:

newtype IO a = IO (World -> a)

since we can simply implement throwing and catching IOErrors using 'throw' and 'catch'. This requires a slight change in the code for the IO monad to:

instance Monad IO where
  (IO x) >>= yf = IO $ \ w -> let xv = x w 
                              in xv `seq` case yf xv of
                                            IO y -> y w

  (IO x) >> (IO y) = IO $ \ w -> x w `seq` y w

  return a = IO $ \ w -> a

Here we use Yhc's 'seq' to ensure that impure IO functions are executed in the correct order.

There are also some changes to src/packages/yhc-base-1.0/YHC/_Driver.hs to ensure that no exceptions can possibly 'escape' the program.