|
|
(8 intermediate revisions by 5 users not shown) |
Line 1: |
Line 1: |
− | = "throwTo & block statements considered harmful" =
| + | [[Category:GHC|Concurrency]] |
| | | |
− | The following description is by ChrisKuklewicz.
| + | = throwTo & block statements fixed in GHC 6.6.1 = |
| | | |
− | This is a short essay to prove that the current GHC concurrency
| + | The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1 |
− | implementation has a critical flaw.
| |
| | | |
− | The key problem is, at least in the presence of block/unblock, that
| + | I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present. |
− | Exceptions are never reliably delivered. And this is not a
| |
− | theoretical case, but can cause a hang in something as innocuous as
| |
− | "program A" given below.
| |
| | | |
− | Simon Marlow wrote:
| + | See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm]. |
− | <pre>
| |
− | >> I think people are misunderstanding the nature of a safepoint. The
| |
− | >> safepoint is a point at which you are prepared to have exceptions
| |
− | >> delivered. This does not mean that they *will* be delivered, just that they
| |
− | >> can. If you need to *wait* for an asynchronous exception, then you
| |
− | >> shouldn't be using them at all.
| |
− | >
| |
− | > Right. If a thread mostly runs inside 'block' with the occasional safe
| |
− | > point, then your exceptions are not really asynchronous, they're synchronous.
| |
− | >
| |
− | >
| |
− | > In this case, I'd say a better solution is to have an explicit event queue,
| |
− | > and instead of the safe point take an event from the queue. The action on
| |
− | > receiving an event can be to raise an exception, if necessary.
| |
− | >
| |
− | > Cheers, Simon
| |
− | </pre>
| |
| | | |
− | The implementation of asynchronous signals, as described by the paper | + | The description which was here has been deleted, but is still available in the historical views of this page. |
− | "Asynchronous exceptions in Haskell by Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy, PLDI'01."
| |
− | is fatally inconsistent with the implementation in GHC 6.4 and GHC 6.6 today.
| |
| | | |
− | The implemented semantics have strictly weaker guarantees and render programs
| + | [[Category:Pages to be removed]] |
− | using asynchronous expressions impossible to write correctly. The semantics in
| |
− | the paper were carefully designed to solve the problem laid out in the first
| |
− | sentence of the abstract:
| |
− | | |
− | "Asynchronous exceptions, such as timeouts, are important for robust, modular
| |
− | programs, but are extremely difficult to program with -- so much so that most
| |
− | programming languages either heavily restrict them or ban them altogether."
| |
− | | |
− | And I believe the paper succeeded. The paper shows how to replace other
| |
− | languages pervasive and intrusive error catching and handling code with much
| |
− | cleaner, clearer, and often more correct code.
| |
− | | |
− | The implementation in GHC has changed the behavior of throwTo from asynchronous
| |
− | (not-interruptible) to synchronous (interruptible?) as discussed in section 8 of
| |
− | the paper. This change, in and of itself, is not the fatal problem; as
| |
− | described in the paper a (forkIO (throwTo ...)) recovers the asynchronous behavior.
| |
− | | |
− | The fatal change between the paper and GHC comes from not following section 7.2
| |
− | as published. Section "7.2 Implementation of throwTo" has two bullet point, and
| |
− | the second bullet point is (retyped, so typos are my own fault):
| |
− | | |
− | "As soon as a thread exits the scope of a 'block', and at regular intervals
| |
− | during execution inside 'unblock', it should check its queue of pending
| |
− | exceptions. If the queue is non-empty, the first exception from the queue
| |
− | should be raised."
| |
− | | |
− | A test of GHC 6.6 shows that this is not the case. Test program A:
| |
− | <haskell>
| |
− | > loop = block (print "alive") >> loop
| |
− | >
| |
− | > main = do tid <- forkIO loop
| |
− | > threadDelay 1
| |
− | > killThread tid
| |
− | </haskell>
| |
− | Program A, compiled with (-threaded) on a single CPU machine never halts. It
| |
− | will print "alive" forever while the the main thread is blocked on "killThread".
| |
− | | |
− | As an aside, removing the threadDelay causes killThread to destroy the child
| |
− | before it can enter the block, thus showing the need to add "forkBlockedIO" or
| |
− | "forkInheritIO" to the library. This can be worked around using an MVar.
| |
− | | |
− | Changing the definition of loop produces Test program B:
| |
− | <haskell>
| |
− | > loop = block (print "alive") >> loop >> yield
| |
− | >
| |
− | > main = do tid <- forkIO loop
| |
− | > threadDelay 1
| |
− | > killThread tid
| |
− | </haskell>
| |
− | This prints "alive" twice before the killThread succeeds.
| |
− | | |
− | The paper demands that when the loop in Program A exits the scope of "block
| |
− | (print a)" that it check a queue of pending exceptions, see that it is
| |
− | non-empty, and raise the exception thrown by killThread. This can also be seen
| |
− | in "Figure 5. Transition Rules for Asynchronous Exceptions", where killThread
| |
− | should use throwTo to create an in-flight exception and exiting the scope of
| |
− | block in the presence of this in-flight exception should raise the exception.
| |
− | | |
− | The implementation in GHC sleeps the main thread at the killThread command, and
| |
− | it is awoken when the block is exited and to succeed in delivering the
| |
− | exception it must execute while the child is still in the unblocked state. But
| |
− | the child re-enters a blocked state too quickly in Program A, so killThread
| |
− | never succeeds. The change in Program B has the child "yield" when unblocked
| |
− | and this gives the main thread a change to succeed.
| |
− | | |
− | This trick using yield to make a safepopint was suggested by Simon Marlow:
| |
− | <pre>
| |
− | > The window in 'unblock (return ())' is tiny, I'm not really surprised if
| |
− | > nothing ever gets through it. You might have more luck with 'unblock yield'.
| |
− | </pre>
| |
− | It has been said on this mailing list thread that needing "yield" to
| |
− | program concurrently is a bug in the user's code and should be
| |
− | replaced by other mechanisms. In a complex program with many threads
| |
− | even the yield trick may not be enough to program reliably. Using
| |
− | | |
− | <haskell>
| |
− | blockY io = do x <- block io
| |
− | yield
| |
− | return x
| |
− | | |
− | unblockY io = unblock (yield >> io)
| |
− | </haskell>
| |
− | | |
− | instead of 'block' and 'unblock' is at least a simple but unreliable workaround.
| |
− | | |
− | This would not be a fatal flaw if there were a simple reliable
| |
− | workaround. The best workaround is the following, which is anything
| |
− | but simple, and amount to re-implementing the mechanisms in the paper:
| |
− | | |
− | * When forking a thread:
| |
− | ** Create an (Chan Exception) (or (TChan Exception)) and pass it to
| |
− | the child thread. Call this the queue.
| |
− | ** Create a ((T)MVar queue) to indicate whether the thread is still
| |
− | alive or not and wrap the whole child action in a 'finally' to
| |
− | ensure this is set to the dead state when the child exits.
| |
− | (Alive iff the queue is in the MVar)
| |
− | ** To be sure the child thread is in the handler, use another MVar
| |
− | or re-use the "alive/dead" MVar to ensure the child is running
| |
− | before returning the ThreadId from the fork operation.
| |
− | | |
− | * Replace all uses throwTo/killThread by
| |
− | ** Test the alive/dead state of the thread, proceeding only on
| |
− | living threads
| |
− | ** Putting the Exception in the queue
| |
− | ** Call throwTo/killThread with the Exception
| |
− | | |
− | * The child thread must:
| |
− | ** implement safepoints by checking the queue
| |
− | ** At each 'block' to 'unblock' transition the child ought to add a
| |
− | safepoint
| |
− | | |
− | This machinery closely recovers the queue of exceptions and the
| |
− | semantics described in the paper. When a thread has died the queue is
| |
− | removed from the alive/dead MVar and can be garbage collected, along | |
− | with any pending exceptions for the dead thread.
| |
− | | |
− | To ensure the above semantics are not violated by the caller, the now
| |
− | fork operation should not return an exposed ThreadId, but should
| |
− | return some proxy type or operation to ensure only the proper
| |
− | throwto/killThread are peformed.
| |
− | | |
− | Next I will demonstrate how the paper's achievement of simple and
| |
− | clear error handling code is lost.
| |
− | | |
− | The child thread is where the workaround becomes a nightmare. It is
| |
− | impossible to test the current 'block' versus 'unblock' state. So the
| |
− | only time the child is certain that a 'block' to 'unblock' transition
| |
− | has occurred is when it is explicit:
| |
− | <haskell>
| |
− | > unblock ( .... block ( ... ) <*> .... )
| |
− | </haskell>
| |
− | The <*> is obviously such a transition and ought to have a safepoint added.
| |
− | | |
− | But is only because the block had the explicit unblock around it. If
| |
− | there were a function such as
| |
− | <haskell>
| |
− | > atomicUpdateFoo = block (...) <?>
| |
− | </haskell>
| |
− | then there is no way to know if the <?> should be a safepoint.
| |
− | Therefore atomicUpdateFoo needs documentation so that the caller knows
| |
− | to add a safepoint in the case it is called from the unblocked case.
| |
− | | |
− | In a more complex example:
| |
− | <haskell>
| |
− | > atomicUpdateFooBar = block (...) <1?> block (...) <2?>
| |
− | </haskell>
| |
− | the caller cannot add <1?> itself, and must therefore pass in code
| |
− | 'safe :: IO ()' to all such functions:
| |
− | <haskell>
| |
− | > atomicUpdateFooBar safe = block (...) >> trans >> block (...) >>
| |
− | </haskell>
| |
− | trans where 'trans' is "return ()" or "checkChan queue" depending on
| |
− | whether the the caller knows it is in a 'block' or 'unblock' state.
| |
− | | |
− | If the parent never needed to use 'throwTo'/'killThread' then the
| |
− | child never needs 'block'/'unblock' and the nightmare is avoided. But
| |
− | if the child uses any operation which can block such as 'accept' or
| |
− | 'getChar' or 'takeMVar' then the parent cannot deliver the signal
| |
− | without waking up the child with a 'throwTo' which quickly leads to
| |
− | either an overwhelming use of 'catch' in the child (like other
| |
− | languages) or use of 'block' and the fragile workaround I just
| |
− | described.
| |
− | | |
− | As an aside, if there was a "isBlockedThreadState" operation, then it
| |
− | would be possible to create a better client workaround:
| |
− | | |
− | <haskell>
| |
− | checkChan queue x = do
| |
− | isBlocked <- isBlockedThreadState -- perhaps needing =<< myThreadId
| |
− | case isBlocked of
| |
− | True -> return x
| |
− | False -> raiseQueue queue x
| |
− | | |
− | raiseQueue queue x = do
| |
− | hasException <- isEmptyChan queue
| |
− | case hasException of
| |
− | False -> return x
| |
− | True -> readChan >>= raise
| |
− | | |
− | checkedBlock queue io = do x <- block io
| |
− | checkChan queue x
| |
− | | |
− | checkedUnblock queue io = checkChan queue >> io
| |
− | </haskell>
| |
− | | |
− | If that existed then the client should use checkedBlock and
| |
− | checkedUnblock instead of block and unblock. The net effect of this
| |
− | workaround is to implement the queue described in the original paper
| |
− | and wrap the forkIO/throwTo/block/unblock operations with new ones
| |
− | that implement the semantics described in the section 7.2 of the
| |
− | paper.
| |
− | | |
− | An ugly side effect of the above is that the modified throwTo also
| |
− | sent the Exception via throwTo as well as the (T)Chan. So it is
| |
− | possible that Exception could be raised twice depending on how the
| |
− | child is written. And the order of exceptions in the channel may not
| |
− | be the order that the throwTo's deliver the asynchronous exceptions.
| |
− | | |
− | To alleviate this somewhat you could have the modified throwTo put the
| |
− | real exception on the (T)Chan and send a GoLookInTheChan asynchronous
| |
− | exception. Reversing this might be possible, but it is not clear how.
| |
− | If we could deterministicly wait for a pending asynchronous signal
| |
− | then we could write a proper safe point to fix "block to unblock"
| |
− | transitions, which we know we cannot do.
| |
− | | |
− | We can try to implement isBlockedThread state by maintaining a stack
| |
− | of block/unblock operations:
| |
− | | |
− | <haskell>
| |
− | checkedBlock stack queue io = do
| |
− | x <- bracket_ (modifyIORef stack (True:))
| |
− | (modifyIORef stack tail)
| |
− | (io)
| |
− | checkChan queue x
| |
− | | |
− | checkedUnblock stack queue io = do
| |
− | checkChan queue ()
| |
− | x <- bracket_ (modifyIORef stack (False:))
| |
− | (modifyIORef stack tail)
| |
− | (io)
| |
− | | |
− | isBlockedThread stack = liftM head (readIORef stack)
| |
− | </haskell>
| |
− | | |
− | Where "stack :: IORef [Bool]" is created and passed to the child by
| |
− | the modified fork operation. The 'stack' above grows without bound
| |
− | during recursion and is not tail recursive, so the stack management
| |
− | should be changed to the algorithm in the paper... in which case one
| |
− | really has recreated the paper's implementation inside Haskell.
| |
− | | |
− | The only alternatives are to
| |
− | * write code that may hang forever at a throwTo/killThread
| |
− | * never use throwTo or killThread making block/unblock irrelevant
| |
− | * never use 'block' (and 'unblock')
| |
− | * only write children with explicit "(unblock ... block (... block
| |
− | () ...) >>= raiseQueue queue >>= ...." where the different between
| |
− | checked and unchecked block is obvious and explicit.
| |
− | * Track and propagate the blocked state as in atomicUpdateFooBar.
| |
− | | |
− | This problem with throwTo/block was not obvious until I stated writing
| |
− | small programs to test the implementation in GHC and thus discovered
| |
− | that the paper's description had mislead me.
| |
− | | |
− | The key problem is, at least in the presence of block/unblock, that
| |
− | Exceptions are never reliably delivered. And this is not a
| |
− | theoretical case, but can cause a hang in something as innocuous as
| |
− | "program A" given above.
| |