Difference between revisions of "GHC/Concurrency/Flaws"
< GHC | Concurrency
Jump to navigation
Jump to search
BrettGiles (talk | contribs) (Reverting to Chris's version due to double edit....) |
m (To be deleted if no new content appears...) |
||
(6 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:GHC|Concurrency]] |
||
− | = "throwTo & block statements considered harmful" = |
||
+ | = throwTo & block statements fixed in GHC 6.6.1 = |
||
− | The following description is by ChrisKuklewicz. |
||
+ | The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1 |
||
− | This is a short essay to prove that the current GHC concurrency |
||
− | implementation has a critical flaw. |
||
+ | I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present. |
||
− | The key problem is, at least in the presence of block/unblock, that |
||
− | Exceptions are never reliably delivered. And this is not a |
||
− | theoretical case, but can cause a hang in something as innocuous as |
||
− | "program A" given below. |
||
+ | See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm]. |
||
− | Simon Marlow wrote: |
||
− | <pre> |
||
− | >> I think people are misunderstanding the nature of a safepoint. The |
||
− | >> safepoint is a point at which you are prepared to have exceptions |
||
− | >> delivered. This does not mean that they *will* be delivered, just that they |
||
− | >> can. If you need to *wait* for an asynchronous exception, then you |
||
− | >> shouldn't be using them at all. |
||
− | > |
||
− | > Right. If a thread mostly runs inside 'block' with the occasional safe |
||
− | > point, then your exceptions are not really asynchronous, they're synchronous. |
||
− | > |
||
− | > |
||
− | > In this case, I'd say a better solution is to have an explicit event queue, |
||
− | > and instead of the safe point take an event from the queue. The action on |
||
− | > receiving an event can be to raise an exception, if necessary. |
||
− | > |
||
− | > Cheers, Simon |
||
− | </pre> |
||
+ | The description which was here has been deleted, but is still available in the historical views of this page. |
||
− | The implementation of asynchronous signals, as described by the paper |
||
− | "Asynchronous exceptions in Haskell by Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy, PLDI'01." |
||
− | is fatally inconsistent with the implementation in GHC 6.4 and GHC 6.6 today. |
||
+ | [[Category:Pages to be removed]] |
||
− | The implemented semantics have strictly weaker guarantees and render programs |
||
− | using asynchronous expressions impossible to write correctly. The semantics in |
||
− | the paper were carefully designed to solve the problem laid out in the first |
||
− | sentence of the abstract: |
||
− | |||
− | "Asynchronous exceptions, such as timeouts, are important for robust, modular |
||
− | programs, but are extremely difficult to program with -- so much so that most |
||
− | programming languages either heavily restrict them or ban them altogether." |
||
− | |||
− | And I believe the paper succeeded. The paper shows how to replace other |
||
− | languages pervasive and intrusive error catching and handling code with much |
||
− | cleaner, clearer, and often more correct code. |
||
− | |||
− | The implementation in GHC has changed the behavior of throwTo from asynchronous |
||
− | (not-interruptible) to synchronous (interruptible?) as discussed in section 8 of |
||
− | the paper. This change, in and of itself, is not the fatal problem; as |
||
− | described in the paper a (forkIO (throwTo ...)) recovers the asynchronous behavior. |
||
− | |||
− | The fatal change between the paper and GHC comes from not following section 7.2 |
||
− | as published. Section "7.2 Implementation of throwTo" has two bullet point, and |
||
− | the second bullet point is (retyped, so typos are my own fault): |
||
− | |||
− | "As soon as a thread exits the scope of a 'block', and at regular intervals |
||
− | during execution inside 'unblock', it should check its queue of pending |
||
− | exceptions. If the queue is non-empty, the first exception from the queue |
||
− | should be raised." |
||
− | |||
− | A test of GHC 6.6 shows that this is not the case. Test program A: |
||
− | <haskell> |
||
− | > loop = block (print "alive") >> loop |
||
− | > |
||
− | > main = do tid <- forkIO loop |
||
− | > threadDelay 1 |
||
− | > killThread tid |
||
− | </haskell> |
||
− | Program A, compiled with (-threaded) on a single CPU machine never halts. It |
||
− | will print "alive" forever while the the main thread is blocked on "killThread". |
||
− | |||
− | As an aside, removing the threadDelay causes killThread to destroy the child |
||
− | before it can enter the block, thus showing the need to add "forkBlockedIO" or |
||
− | "forkInheritIO" to the library. This can be worked around using an MVar. |
||
− | |||
− | Changing the definition of loop produces Test program B: |
||
− | <haskell> |
||
− | > loop = block (print "alive") >> loop >> yield |
||
− | > |
||
− | > main = do tid <- forkIO loop |
||
− | > threadDelay 1 |
||
− | > killThread tid |
||
− | </haskell> |
||
− | This prints "alive" twice before the killThread succeeds. |
||
− | |||
− | The paper demands that when the loop in Program A exits the scope of "block |
||
− | (print a)" that it check a queue of pending exceptions, see that it is |
||
− | non-empty, and raise the exception thrown by killThread. This can also be seen |
||
− | in "Figure 5. Transition Rules for Asynchronous Exceptions", where killThread |
||
− | should use throwTo to create an in-flight exception and exiting the scope of |
||
− | block in the presence of this in-flight exception should raise the exception. |
||
− | |||
− | The implementation in GHC sleeps the main thread at the killThread command, and |
||
− | it is awoken when the block is exited and to succeed in delivering the |
||
− | exception it must execute while the child is still in the unblocked state. But |
||
− | the child re-enters a blocked state too quickly in Program A, so killThread |
||
− | never succeeds. The change in Program B has the child "yield" when unblocked |
||
− | and this gives the main thread a change to succeed. |
||
− | |||
− | This trick using yield to make a safepopint was suggested by Simon Marlow: |
||
− | <pre> |
||
− | > The window in 'unblock (return ())' is tiny, I'm not really surprised if |
||
− | > nothing ever gets through it. You might have more luck with 'unblock yield'. |
||
− | </pre> |
||
− | It has been said on this mailing list thread that needing "yield" to |
||
− | program concurrently is a bug in the user's code and should be |
||
− | replaced by other mechanisms. In a complex program with many threads |
||
− | even the yield trick may not be enough to program reliably. Using |
||
− | |||
− | <haskell> |
||
− | blockY io = do x <- block io |
||
− | yield |
||
− | return x |
||
− | |||
− | unblockY io = unblock (yield >> io) |
||
− | </haskell> |
||
− | |||
− | instead of 'block' and 'unblock' is at least a simple but unreliable workaround. |
||
− | |||
− | This would not be a fatal flaw if there were a simple reliable |
||
− | workaround. The best workaround is the following, which is anything |
||
− | but simple, and amount to re-implementing the mechanisms in the paper: |
||
− | |||
− | * When forking a thread: |
||
− | ** Create an (Chan Exception) (or (TChan Exception)) and pass it to |
||
− | the child thread. Call this the queue. |
||
− | ** Create a ((T)MVar queue) to indicate whether the thread is still |
||
− | alive or not and wrap the whole child action in a 'finally' to |
||
− | ensure this is set to the dead state when the child exits. |
||
− | (Alive iff the queue is in the MVar) |
||
− | ** To be sure the child thread is in the handler, use another MVar |
||
− | or re-use the "alive/dead" MVar to ensure the child is running |
||
− | before returning the ThreadId from the fork operation. |
||
− | |||
− | * Replace all uses throwTo/killThread by |
||
− | ** Test the alive/dead state of the thread, proceeding only on |
||
− | living threads |
||
− | ** Putting the Exception in the queue |
||
− | ** Call throwTo/killThread with the Exception |
||
− | |||
− | * The child thread must: |
||
− | ** implement safepoints by checking the queue |
||
− | ** At each 'block' to 'unblock' transition the child ought to add a |
||
− | safepoint |
||
− | |||
− | This machinery closely recovers the queue of exceptions and the |
||
− | semantics described in the paper. When a thread has died the queue is |
||
− | removed from the alive/dead MVar and can be garbage collected, along |
||
− | with any pending exceptions for the dead thread. |
||
− | |||
− | To ensure the above semantics are not violated by the caller, the now |
||
− | fork operation should not return an exposed ThreadId, but should |
||
− | return some proxy type or operation to ensure only the proper |
||
− | throwto/killThread are peformed. |
||
− | |||
− | Next I will demonstrate how the paper's achievement of simple and |
||
− | clear error handling code is lost. |
||
− | |||
− | The child thread is where the workaround becomes a nightmare. It is |
||
− | impossible to test the current 'block' versus 'unblock' state. So the |
||
− | only time the child is certain that a 'block' to 'unblock' transition |
||
− | has occurred is when it is explicit: |
||
− | <haskell> |
||
− | > unblock ( .... block ( ... ) <*> .... ) |
||
− | </haskell> |
||
− | The <*> is obviously such a transition and ought to have a safepoint added. |
||
− | |||
− | But is only because the block had the explicit unblock around it. If |
||
− | there were a function such as |
||
− | <haskell> |
||
− | > atomicUpdateFoo = block (...) <?> |
||
− | </haskell> |
||
− | then there is no way to know if the <?> should be a safepoint. |
||
− | Therefore atomicUpdateFoo needs documentation so that the caller knows |
||
− | to add a safepoint in the case it is called from the unblocked case. |
||
− | |||
− | In a more complex example: |
||
− | <haskell> |
||
− | > atomicUpdateFooBar = block (...) <1?> block (...) <2?> |
||
− | </haskell> |
||
− | the caller cannot add <1?> itself, and must therefore pass in code |
||
− | 'safe :: IO ()' to all such functions: |
||
− | <haskell> |
||
− | > atomicUpdateFooBar safe = block (...) >> trans >> block (...) >> |
||
− | </haskell> |
||
− | trans where 'trans' is "return ()" or "checkChan queue" depending on |
||
− | whether the the caller knows it is in a 'block' or 'unblock' state. |
||
− | |||
− | If the parent never needed to use 'throwTo'/'killThread' then the |
||
− | child never needs 'block'/'unblock' and the nightmare is avoided. But |
||
− | if the child uses any operation which can block such as 'accept' or |
||
− | 'getChar' or 'takeMVar' then the parent cannot deliver the signal |
||
− | without waking up the child with a 'throwTo' which quickly leads to |
||
− | either an overwhelming use of 'catch' in the child (like other |
||
− | languages) or use of 'block' and the fragile workaround I just |
||
− | described. |
||
− | |||
− | As an aside, if there was a "isBlockedThreadState" operation, then it |
||
− | would be possible to create a better client workaround: |
||
− | |||
− | <haskell> |
||
− | checkChan queue x = do |
||
− | isBlocked <- isBlockedThreadState -- perhaps needing =<< myThreadId |
||
− | case isBlocked of |
||
− | True -> return x |
||
− | False -> raiseQueue queue x |
||
− | |||
− | raiseQueue queue x = do |
||
− | hasException <- isEmptyChan queue |
||
− | case hasException of |
||
− | False -> return x |
||
− | True -> readChan >>= raise |
||
− | |||
− | checkedBlock queue io = do x <- block io |
||
− | checkChan queue x |
||
− | |||
− | checkedUnblock queue io = checkChan queue >> io |
||
− | </haskell> |
||
− | |||
− | If that existed then the client should use checkedBlock and |
||
− | checkedUnblock instead of block and unblock. The net effect of this |
||
− | workaround is to implement the queue described in the original paper |
||
− | and wrap the forkIO/throwTo/block/unblock operations with new ones |
||
− | that implement the semantics described in the section 7.2 of the |
||
− | paper. |
||
− | |||
− | An ugly side effect of the above is that the modified throwTo also |
||
− | sent the Exception via throwTo as well as the (T)Chan. So it is |
||
− | possible that Exception could be raised twice depending on how the |
||
− | child is written. And the order of exceptions in the channel may not |
||
− | be the order that the throwTo's deliver the asynchronous exceptions. |
||
− | |||
− | To alleviate this somewhat you could have the modified throwTo put the |
||
− | real exception on the (T)Chan and send a GoLookInTheChan asynchronous |
||
− | exception. Reversing this might be possible, but it is not clear how. |
||
− | If we could deterministicly wait for a pending asynchronous signal |
||
− | then we could write a proper safe point to fix "block to unblock" |
||
− | transitions, which we know we cannot do. |
||
− | |||
− | We can try to implement isBlockedThread state by maintaining a stack |
||
− | of block/unblock operations: |
||
− | |||
− | <haskell> |
||
− | checkedBlock stack queue io = do |
||
− | x <- bracket_ (modifyIORef stack (True:)) |
||
− | (modifyIORef stack tail) |
||
− | (io) |
||
− | checkChan queue x |
||
− | |||
− | checkedUnblock stack queue io = do |
||
− | checkChan queue () |
||
− | x <- bracket_ (modifyIORef stack (False:)) |
||
− | (modifyIORef stack tail) |
||
− | (io) |
||
− | |||
− | isBlockedThread stack = liftM head (readIORef stack) |
||
− | </haskell> |
||
− | |||
− | Where "stack :: IORef [Bool]" is created and passed to the child by |
||
− | the modified fork operation. The 'stack' above grows without bound |
||
− | during recursion and is not tail recursive, so the stack management |
||
− | should be changed to the algorithm in the paper... in which case one |
||
− | really has recreated the paper's implementation inside Haskell. |
||
− | |||
− | The only alternatives are to |
||
− | * write code that may hang forever at a throwTo/killThread |
||
− | * never use throwTo or killThread making block/unblock irrelevant |
||
− | * never use 'block' (and 'unblock') |
||
− | * only write children with explicit "(unblock ... block (... block |
||
− | () ...) >>= raiseQueue queue >>= ...." where the different between |
||
− | checked and unchecked block is obvious and explicit. |
||
− | * Track and propagate the blocked state as in atomicUpdateFooBar. |
||
− | |||
− | This problem with throwTo/block was not obvious until I stated writing |
||
− | small programs to test the implementation in GHC and thus discovered |
||
− | that the paper's description had mislead me. |
||
− | |||
− | The key problem is, at least in the presence of block/unblock, that |
||
− | Exceptions are never reliably delivered. And this is not a |
||
− | theoretical case, but can cause a hang in something as innocuous as |
||
− | "program A" given above. |
Latest revision as of 01:16, 26 April 2021
throwTo & block statements fixed in GHC 6.6.1
The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1
I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.
See the ticket at [1] and the comments near the top of [2].
The description which was here has been deleted, but is still available in the historical views of this page.