Safely running untrusted Haskell code
Obviously, don't run code in the I/O monad, just show pure results (or possibly make your own monadic type that is a restricted subset of IO
). But it's a lot more complicated than that...
Verifying safety: lambdabot's approach[edit]
Since 2004, lambdabot has executed arbitrary strings of Haskell provided by user's of various IRC channels, in particular, the Haskell channel. In order to do this, a particular security policy is required. The policy, and its implementation, is described here.
The policy[edit]
Only allow execution of pure Haskell expressions.
The implementation[edit]
- Note: This section refers to the old Lambdabot evaluator; as of 2009, lambdabot calls out to mueval, which while it uses many of the same techniques, is structured differently.
The evaluator is essentially a function, eval :: String -> IO String
, which takes a random Haskell string, verifies it, compiles it, and evaluates the result, returning a String
representing the result, back over the network.
This function is implemented as two separate processes:
The driver reads a String
from the network, and then subjects it to a simple test:
- The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)
- If the string parses as a Haskell 98 expression, the
runplugs
process is then forked to evaluate the string, and the following checks are put in place:
- Only a trusted module set is imported, avoiding
unsafePerformIO
,unsafeIOtoST
and such like. - Module imports are disallowed
- Time and space limitations on the
runplugs
process are set by the OSrlimit
facility - The expression type checked, enforcing lack of memory errors
- Because the user code is not at the beginning of the file, malicious
{-# LANGUAGE #-}
and{-# OPTIONS #-}
flags are ignored - Only
-fextended-default-rules
are allowed as language extensions over Haskell 98. - The resulting object file is dynamically linked only into the throw-away
runplugs
instance - Even if all went well, the first 2048 characters of the output string are returned to the caller (no infinite output DoS)
- Only a trusted module set is imported, avoiding
A few other niceties are provided:
- The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
- The expression is wrapped in
show
. - A catch-all instance of
Show
in terms ofTypable
is provided, to display non-displayable objects in a more useful way (e.g.putStrLn
➝[Char] -> IO ()
) - It is compiled to native code with
-fasm
for speed (compilation time is negligible compared to IRC lag) - The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.
Exploits[edit]
A variety of interesting exploits have been found, or thought of, over the years. Those we remember are listed below:
- using
newtype
recursion to have the inliner not terminate - using pathological type inference cases to have the type checker not terminate
- code injection of code fragments that aren't Haskell expressions
- Template Haskell used to run IO actions during type checking
stToIO
to convert a safeST
action, into anIO
action that is run- large strings returned in exceptions
unsafePerformIO
, of courseunsafeCoerce#
- throwing a piece of code as an exception, which is evaluated when the exception is shown
- non-terminating code, in a tight loop that doesn't allocate:
let f () = f () in f ()
- can't use GHC's
threadDelay
/scheduler to timeout (must use OS resource limits).
- large array allocations can fill memory
- very large array allocations can integer overflow the storage manager, allowing arbitrary memory access (this appears to be fixed in GHC 6.8.x)
- creating class instances that violate assumed laws (cf EvilIx)
- various literal strings that print IRC protocol commands could be printed using exceptions.
- if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
- zombies could be created by multiple
runplugs
calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS. (ifwaitForProcess
was broken)
Template Haskell[edit]
We believe that Template Haskell can be made safe for users by hiding runIO
and reify
.
See also[edit]
- See a long discussion in Haskell Cafe.
- The GHC ticket for
-fsafe