Safely running untrusted Haskell code
Obviously, don't run code in the IO monad, just show pure results (or possibly make your own monad that is a restricted subset of IO). But it's a lot more complicated than that...
1 Verifying safety : lambdabot's approach
Since 2004, lambdabot has executed arbitrary strings of Haskell provided by user's of various IRC channels, in particular, the Haskell channel. In order to do this, a particular security policy is required. The policy, and its implementation, is described here.
1.1 The policy
Only allow execution of pure Haskell expressions.
1.2 The implementationThe evaluator is essentially a function,
compiles it, and evaluates the result, returning a String representing the result, back over the network.
This function is implemented as two separate processes:
The driver reads a String from the network, and then subjects it to a simple test:
- The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)
If the string parses as a Haskell 98 expression, the 'runplugs' process is then forked to evaluate the string, and the following checks are put in place:
- Only a trusted module set is imported, avoiding unsafePerformIO and stToIO and such like.
- Module imports are disallowed
- Time and space limitations on the runplugs process are set by the OS
- The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
- The expression is wrapped in 'show', and must be an instance of Show
- An instance of Show IO is defined, which prints "<IO>", rendering IO impossible (otherwise, showing the IO would result in a type error, and still not run it). Actually instances for IO and (->), using Typeable, are used to show the specific types of all IO and functions (e.g. putStrLn --> <[Char] -> IO ()> )
- The expression type checked, with the show constraint, enforcing purity
- If it type checks, and the type checker doesn't time out, it is compiled to native code with -fasm
- Only -fextended-default-rules are allowed, as language extensions over H98.
- The resulting .o file is dynamically linked into the throw-away runplugs instance
- The value is evaluated inside an exception handler.
- If an exception is thrown, only the first 1024 characters of the exception string are returned.
- If all went well, the first 2048 characters of the shown string are returned to the caller.
A variety of interesting exploits have been found, or thought of, over the years. Those we remember are listed below:
- using newtype recursion to have the typechecker not terminate
- using pathological type inference cases to have the type checker not terminate
- code injection of code fragments that aren't Haskell expressions
- Template Haskell used to run IO actions during type checking
- stToIO to convert a safe ST action, into an IO action that is run
- large strings returned in exceptions
- unsafePerformIO, of course
- throwing a piece of code as an exception, which is evaluated when the exception is shown
- non-terminating code, in a tight loop that doesn't allocate, can't use GHC's threadDelay/scheduler (let f () = f () in f ()) to timeout (must use OS resource limits).
- large array allocations can fill memory
- very large array allocations can integer overflow the storage manager, allowing arbitrary memory access
- creating class instances that violate assumed laws (cf EvilIx)
- various literal strings that print IRC protocol commands could be printed using exceptions.
- if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
- zombies could be created by multiple runplugs calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS.
3 Template Haskell
We believe that Template Haskell can be made safe for users by hiding runIO and reify.