Difference between revisions of "Safely running untrusted Haskell code"

From HaskellWiki
Jump to: navigation, search
(Exploits: ln)
m (Various formatting changes, etc.)
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Category:How to]]
 
[[Category:How to]]
Obviously, don't run code in the IO monad, just show pure results (or possibly make your own monad that is a restricted subset of IO). But it's a lot more complicated than that...
+
Obviously, don't run code in the I/O monad, just show pure results (or possibly make your own monadic type that is a restricted subset of <code>IO</code>). But it's a lot more complicated than that...
   
== Verifying safety : lambdabot's approach ==
+
== Verifying safety: lambdabot's approach ==
   
Since 2004, lambdabot has executed arbitrary strings of Haskell provided
+
Since 2004, lambdabot has executed arbitrary strings of Haskell provided by user's of various [[IRC_channel|IRC channels]], in particular, the Haskell channel. In order to do this, a particular security policy is required. The policy, and its implementation, is described here.
by user's of various [[IRC_channel|IRC channels]], in particular, the
 
Haskell channel. In order to do this, a particular security policy is
 
required. The policy, and its implementation, is described here.
 
   
 
=== The policy ===
 
=== The policy ===
Line 12: Line 12:
 
=== The implementation ===
 
=== The implementation ===
   
The evaluator is essentially a function, <hask>eval :: String -> IO
 
  +
:''Note: This section refers to the old Lambdabot evaluator; as of 2009, lambdabot calls out to [http://hackage.haskell.org/package/mueval mueval], which while it uses many of the same techniques, is structured differently.''
String</hask>, which takes a random Haskell string, verifies it,
 
  +
compiles it, and evaluates the result, returning a String representing
+
The evaluator is essentially a function, <hask>eval :: String -> IO String</hask>, which takes a random Haskell string, verifies it, compiles it, and evaluates the result, returning a <code>String</code> representing the result, back over the network.
the result, back over the network.
 
   
 
This function is implemented as two separate processes:
 
This function is implemented as two separate processes:
Line 21: Line 21:
 
* [http://www.cse.unsw.edu.au/~dons/code/lambdabot/scripts/RunPlugs.hs Evaluator binary]
 
* [http://www.cse.unsw.edu.au/~dons/code/lambdabot/scripts/RunPlugs.hs Evaluator binary]
   
The driver reads a String from the network, and then subjects it to a
+
The driver reads a <code>String</code> from the network, and then subjects it to a simple test:
simple test:
 
   
* The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)
+
# The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (''is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?'')
+
# If the string parses as a Haskell 98 expression, the <code>runplugs</code> process is then forked to evaluate the string, and the following checks are put in place:
If the string parses as a Haskell 98 expression, the 'runplugs' process
+
::{|
is then forked to evaluate the string, and the following checks are put
+
|
in place:
+
* Only a trusted module set is imported, avoiding <code>unsafePerformIO</code>, <code>unsafeIOtoST</code> and such like.
 
* Only a trusted module set is imported, avoiding unsafePerformIO and unsafeIOtoST and such like.
 
 
* Module imports are disallowed
 
* Module imports are disallowed
* Time and space limitations on the runplugs process are set by the OS 'rlimit' facility
+
* Time and space limitations on the <code>runplugs</code> process are set by the OS <code>rlimit</code> facility
 
* The expression type checked, enforcing lack of memory errors
 
* The expression type checked, enforcing lack of memory errors
* Because the user code is not at the beginning of the file, malicious {-# LANGUAGE #-} and {-# OPTIONS #-} flags are ignored
+
* Because the user code is not at the beginning of the file, malicious <code>{-# LANGUAGE #-}</code> and <code>{-# OPTIONS #-}</code> flags are ignored
* Only -fextended-default-rules are allowed, as language extensions over H98.
+
* Only <code>-fextended-default-rules</code> are allowed as language extensions over Haskell 98.
* The resulting .o file is dynamically linked only into the throw-away runplugs instance
+
* The resulting object file is dynamically linked only into the throw-away <code>runplugs</code> instance
* Even if all went well, the first 2048 characters of the shown string are returned to the caller (no infinite output DoS)
+
* Even if all went well, the first 2048 characters of the output string are returned to the caller (no infinite output DoS)
  +
|}
   
 
A few other niceties are provided:
 
A few other niceties are provided:
   
 
* The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
 
* The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
* The expression is wrapped in 'show'.
+
* The expression is wrapped in <code>show</code>.
* A catch-all instance of Show in terms of Typable is provided, to display non-displayable objects in a more useful way (e.g. putStrLn --> <[Char] -> IO ()>)
+
* A catch-all instance of <code>Show</code> in terms of <code>Typable</code> is provided, to display non-displayable objects in a more useful way (e.g. <code>putStrLn</code> <code>[Char] -> IO ()</code>)
* It is compiled to native code with -fasm for speed (compilation time is neglible compared to IRC lag)
+
* It is compiled to native code with <code>-fasm</code> for speed (compilation time is negligible compared to IRC lag)
 
* The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.
 
* The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.
   
 
== Exploits ==
 
== Exploits ==
   
A variety of interesting exploits have been found, or thought of, over
+
A variety of interesting exploits have been found, or thought of, over the years. Those we remember are listed below:
the years. Those we remember are listed below:
 
   
* using newtype recursion to have the inliner not terminate
+
* using <code>newtype</code> recursion to have the inliner not terminate
 
* using pathological type inference cases to have the type checker not terminate
 
* using pathological type inference cases to have the type checker not terminate
 
* code injection of code fragments that aren't Haskell expressions
 
* code injection of code fragments that aren't Haskell expressions
 
* Template Haskell used to run IO actions during type checking
 
* Template Haskell used to run IO actions during type checking
* stToIO to convert a safe ST action, into an IO action that is run
+
* <code>stToIO</code> to convert a safe <code>ST</code> action, into an <code>IO</code> action that is run
 
* large strings returned in exceptions
 
* large strings returned in exceptions
* unsafePerformIO, of course
+
* <code>unsafePerformIO</code>, of course
* unsafeCoerce#
+
* <code>unsafeCoerce#</code>
 
* throwing a piece of code as an exception, which is evaluated when the exception is shown
 
* throwing a piece of code as an exception, which is evaluated when the exception is shown
* non-terminating code, in a tight loop that doesn't allocate, can't use GHC's threadDelay/scheduler (let f () = f () in f ()) to timeout (must use OS [[resource limits]]).
+
* non-terminating code, in a tight loop that doesn't allocate:
  +
::<code>let f () = f () in f ()</code>
  +
: can't use GHC's <code>threadDelay</code>/scheduler to timeout (must use OS [[resource limits]]).
 
* large array allocations can fill memory
 
* large array allocations can fill memory
* very large array allocations can integer overflow the storage manager, allowing arbitrary memory access
+
* very large array allocations can integer overflow the storage manager, allowing arbitrary memory access (this appears to be fixed in GHC 6.8.x)
* creating class instances that violate assumed laws (cf EvilIx)
+
* creating class instances that violate assumed laws (cf [http://www.haskell.org/pipermail/haskell-cafe/2006-December/019994.html EvilIx])
 
* various literal strings that print IRC protocol commands could be printed using exceptions.
 
* various literal strings that print IRC protocol commands could be printed using exceptions.
 
* if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
 
* if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
* zombies could be created by multiple runplugs calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS.
+
* zombies could be created by multiple <code>runplugs</code> calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS. (if <code>waitForProcess</code> was broken)
+
 
== Template Haskell ==
 
== Template Haskell ==
   
We believe that Template Haskell can be made safe for users by hiding runIO and reify.
+
We believe that Template Haskell can be made safe for users by hiding <code>runIO</code> and <code>reify</code>.
   
 
== See also ==
 
== See also ==
   
 
* See [http://haskell.org/pipermail/haskell-cafe/2007-May/025941.html a long discussion] in Haskell Cafe.
 
* See [http://haskell.org/pipermail/haskell-cafe/2007-May/025941.html a long discussion] in Haskell Cafe.
* The [http://hackage.haskell.org/trac/ghc/ticket/1380 GHC ticket] for -fsafe
+
* The [http://hackage.haskell.org/trac/ghc/ticket/1380 GHC ticket] for <code>-fsafe</code>

Latest revision as of 12:41, 22 June 2021

Obviously, don't run code in the I/O monad, just show pure results (or possibly make your own monadic type that is a restricted subset of IO). But it's a lot more complicated than that...

Verifying safety: lambdabot's approach

Since 2004, lambdabot has executed arbitrary strings of Haskell provided by user's of various IRC channels, in particular, the Haskell channel. In order to do this, a particular security policy is required. The policy, and its implementation, is described here.

The policy

Only allow execution of pure Haskell expressions.

The implementation

Note: This section refers to the old Lambdabot evaluator; as of 2009, lambdabot calls out to mueval, which while it uses many of the same techniques, is structured differently.

The evaluator is essentially a function, eval :: String -> IO String, which takes a random Haskell string, verifies it, compiles it, and evaluates the result, returning a String representing the result, back over the network.

This function is implemented as two separate processes:

The driver reads a String from the network, and then subjects it to a simple test:

  1. The expression is parsed as a Haskell 98 expression, hopefully preventing code injection (is this true? and can any string that can parse as a valid Haskell expression become something more sinister when put in a particular context?)
  2. If the string parses as a Haskell 98 expression, the runplugs process is then forked to evaluate the string, and the following checks are put in place:
  • Only a trusted module set is imported, avoiding unsafePerformIO, unsafeIOtoST and such like.
  • Module imports are disallowed
  • Time and space limitations on the runplugs process are set by the OS rlimit facility
  • The expression type checked, enforcing lack of memory errors
  • Because the user code is not at the beginning of the file, malicious {-# LANGUAGE #-} and {-# OPTIONS #-} flags are ignored
  • Only -fextended-default-rules are allowed as language extensions over Haskell 98.
  • The resulting object file is dynamically linked only into the throw-away runplugs instance
  • Even if all went well, the first 2048 characters of the output string are returned to the caller (no infinite output DoS)

A few other niceties are provided:

  • The expression is bound to a random identifier (harmless to guess), in order to allow nice line error messages with line pragmas.
  • The expression is wrapped in show.
  • A catch-all instance of Show in terms of Typable is provided, to display non-displayable objects in a more useful way (e.g. putStrLn[Char] -> IO ())
  • It is compiled to native code with -fasm for speed (compilation time is negligible compared to IRC lag)
  • The value is evaluated inside an exception handler; if an exception is thrown, the first 1024 characters of the exception string are returned.

Exploits

A variety of interesting exploits have been found, or thought of, over the years. Those we remember are listed below:

  • using newtype recursion to have the inliner not terminate
  • using pathological type inference cases to have the type checker not terminate
  • code injection of code fragments that aren't Haskell expressions
  • Template Haskell used to run IO actions during type checking
  • stToIO to convert a safe ST action, into an IO action that is run
  • large strings returned in exceptions
  • unsafePerformIO, of course
  • unsafeCoerce#
  • throwing a piece of code as an exception, which is evaluated when the exception is shown
  • non-terminating code, in a tight loop that doesn't allocate:
let f () = f () in f ()
can't use GHC's threadDelay/scheduler to timeout (must use OS resource limits).
  • large array allocations can fill memory
  • very large array allocations can integer overflow the storage manager, allowing arbitrary memory access (this appears to be fixed in GHC 6.8.x)
  • creating class instances that violate assumed laws (cf EvilIx)
  • various literal strings that print IRC protocol commands could be printed using exceptions.
  • if a user guesses the top level identifier the expression is bound to, it can be used to print a silly string
  • zombies could be created by multiple runplugs calls, leading to blocking on endless output. the resulting zombies accumulate, eventually leading to DOS. (if waitForProcess was broken)

Template Haskell

We believe that Template Haskell can be made safe for users by hiding runIO and reify.

See also