Personal tools

GHC/Concurrency/Flaws

From HaskellWiki

< GHC | Concurrency(Difference between revisions)
Jump to: navigation, search
(Mark this as having been fixed)
(Fixed -- removed problem description)
 
Line 3: Line 3:
 
= throwTo & block statements fixed in GHC 6.6.1 =
 
= throwTo & block statements fixed in GHC 6.6.1 =
  
The problem described below has been fixed in GHC HEAD and this will be included in GHC version 6.6.1
+
The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1
 +
 
 +
I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.
  
 
See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm].
 
See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm].
  
The description below will be removed once 6.6.1 is released.
+
The description which was here has been deleted, but is still available in the historical views of this page.
 
+
=== '''Obsolete:''' "throwTo & block statements considered harmful" ===
+
 
+
The following description is by ChrisKuklewicz.
+
 
+
This is a short essay to prove that the current GHC concurrency
+
implementation has a critical flaw.
+
 
+
The key problem is, at least in the presence of block/unblock, that
+
Exceptions are never reliably delivered.  And this is not a
+
theoretical case, but can cause a hang in something as innocuous as
+
"program A" given below.
+
 
+
Simon Marlow wrote:
+
<pre>
+
>> I think people are misunderstanding the nature of a safepoint. The
+
>> safepoint is a point at which you are prepared to have exceptions
+
>> delivered. This does not mean that they *will* be delivered, just that they
+
>> can. If you need to *wait* for an asynchronous exception, then you
+
>> shouldn't be using them at all.
+
>
+
> Right.  If a thread mostly runs inside 'block' with the occasional safe
+
> point, then your exceptions are not really asynchronous, they're synchronous.
+
>
+
>
+
> In this case, I'd say a better solution is to have an explicit event queue,
+
> and instead of the safe point take an event from the queue.  The action on
+
> receiving an event can be to raise an exception, if necessary.
+
>
+
> Cheers, Simon
+
</pre>
+
 
+
The implementation of asynchronous signals, as described by the paper
+
"Asynchronous exceptions in Haskell by Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy, PLDI'01."
+
is fatally inconsistent with the implementation in GHC 6.4 and GHC 6.6 today.
+
 
+
The implemented semantics have strictly weaker guarantees and render programs
+
using asynchronous expressions impossible to write correctly.  The semantics in
+
the paper were carefully designed to solve the problem laid out in the first
+
sentence of the abstract:
+
 
+
"Asynchronous exceptions, such as timeouts, are important for robust, modular
+
programs, but are extremely difficult to program with -- so much so that most
+
programming languages either heavily restrict them or ban them altogether."
+
 
+
And I believe the paper succeeded.  The paper shows how to replace other
+
languages pervasive and intrusive error catching and handling code with much
+
cleaner, clearer, and often more correct code.
+
 
+
The implementation in GHC has changed the behavior of throwTo from asynchronous
+
(not-interruptible) to synchronous (interruptible?) as discussed in section 8 of
+
the paper.  This change, in and of itself, is not the fatal problem; as
+
described in the paper a (forkIO (throwTo ...)) recovers the asynchronous behavior.
+
 
+
The fatal change between the paper and GHC comes from not following section 7.2
+
as published.  Section "7.2 Implementation of throwTo" has two bullet point, and
+
the second bullet point is (retyped, so typos are my own fault):
+
 
+
"As soon as a thread exits the scope of a 'block', and at regular intervals
+
during execution inside 'unblock', it should check its queue of pending
+
exceptions.  If the queue is non-empty, the first exception from the queue
+
should be raised."
+
 
+
A test of GHC 6.6 shows that this is not the case.  Test program A:
+
<haskell>
+
> loop = block (print "alive") >> loop
+
>
+
> main = do tid <- forkIO loop
+
>          threadDelay 1
+
>          killThread tid
+
</haskell>
+
Program A, compiled with (-threaded) on a single CPU machine never halts.  It
+
will print "alive" forever while the the main thread is blocked on "killThread".
+
 
+
As an aside, removing the threadDelay causes killThread to destroy the child
+
before it can enter the block, thus showing the need to add "forkBlockedIO" or
+
"forkInheritIO" to the library.  This can be worked around using an MVar.
+
 
+
Changing the definition of loop produces Test program B:
+
<haskell>
+
> loop = block (print "alive") >> loop >> yield
+
>
+
> main = do tid <- forkIO loop
+
>          threadDelay 1
+
>          killThread tid
+
</haskell>
+
This prints "alive" twice before the killThread succeeds.
+
 
+
The paper demands that when the loop in Program A exits the scope of "block
+
(print a)" that it check a queue of pending exceptions, see that it is
+
non-empty, and raise the exception thrown by killThread.  This can also be seen
+
in "Figure 5. Transition Rules for Asynchronous Exceptions", where killThread
+
should use throwTo to create an in-flight exception and exiting the scope of
+
block in the presence of this in-flight exception should raise the exception.
+
 
+
The implementation in GHC sleeps the main thread at the killThread command, and
+
it is awoken when the block is exited and to succeed in delivering the
+
exception it must execute while the child is still in the unblocked state.  But
+
the child re-enters a blocked state too quickly in Program A, so killThread
+
never succeeds.  The change in Program B has the child "yield" when unblocked
+
and this gives the main thread a change to succeed.
+
 
+
This trick using yield to make a safepoint was suggested by Simon Marlow:
+
<pre>
+
> The window in 'unblock (return ())' is tiny, I'm not really surprised if
+
> nothing ever gets through it.  You might have more luck with 'unblock yield'.
+
</pre>
+
It has been said on this mailing list thread that needing "yield" to
+
program concurrently is a bug in the user's code and should be
+
replaced by other mechanisms.  In a complex program with many threads
+
even the yield trick may not be enough to program reliably.  Using
+
 
+
<haskell>
+
blockY io = do x <- block io
+
              yield
+
              return x
+
 
+
unblockY io = unblock (yield >> io)
+
</haskell>
+
 
+
instead of 'block' and 'unblock' is at least a simple but unreliable workaround.
+
 
+
This would not be a fatal flaw if there were a simple reliable
+
workaround.  The best workaround is the following, which is anything
+
but simple, and amount to re-implementing the mechanisms in the paper:
+
 
+
  * When forking a thread:
+
  ** Create an (Chan Exception) (or (TChan Exception)) and pass it to
+
    the child thread.  Call this the queue.
+
  ** Create a ((T)MVar queue) to indicate whether the thread is still
+
    alive or not and wrap the whole child action in a 'finally' to
+
    ensure this is set to the dead state when the child exits.
+
    (Alive iff the queue is in the MVar)
+
  ** To be sure the child thread is in the handler, use another MVar
+
    or re-use the "alive/dead" MVar to ensure the child is running
+
    before returning the ThreadId from the fork operation.
+
 
+
  * Replace all uses throwTo/killThread by
+
  ** Test the alive/dead state of the thread, proceeding only on
+
    living threads
+
  ** Putting the Exception in the queue
+
  ** Call throwTo/killThread with the Exception
+
 
+
  * The child thread must:
+
  ** implement safepoints by checking the queue
+
  ** At each 'block' to 'unblock' transition the child ought to add a
+
    safepoint
+
 
+
This machinery closely recovers the queue of exceptions and the
+
semantics described in the paper.  When a thread has died the queue is
+
removed from the alive/dead MVar and can be garbage collected, along
+
with any pending exceptions for the dead thread.
+
 
+
To ensure the above semantics are not violated by the caller, the now
+
fork operation should not return an exposed ThreadId, but should
+
return some proxy type or operation to ensure only the proper
+
throwto/killThread are peformed.
+
 
+
Next I will demonstrate how the paper's achievement of simple and
+
clear error handling code is lost.
+
 
+
The child thread is where the workaround becomes a nightmare.  It is
+
impossible to test the current 'block' versus 'unblock' state.  So the
+
only time the child is certain that a 'block' to 'unblock' transition
+
has occurred is when it is explicit:
+
<haskell>
+
> unblock ( .... block ( ... ) <*> .... )
+
</haskell>
+
The <*> is obviously such a transition and ought to have a safepoint added.
+
 
+
But is only because the block had the explicit unblock around it.  If
+
there were a function such as
+
<haskell>
+
> atomicUpdateFoo = block (...) <?>
+
</haskell>
+
then there is no way to know if the <?> should be a safepoint.
+
Therefore atomicUpdateFoo needs documentation so that the caller knows
+
to add a safepoint in the case it is called from the unblocked case.
+
 
+
In a more complex example:
+
<haskell>
+
> atomicUpdateFooBar = block (...) <1?> block (...) <2?>
+
</haskell>
+
the caller cannot add <1?> itself, and must therefore pass in code
+
'safe :: IO ()' to all such functions:
+
<haskell>
+
> atomicUpdateFooBar safe = block (...) >> trans >> block (...) >>
+
</haskell>
+
trans where 'trans' is "return ()" or "checkChan queue" depending on
+
whether the the caller knows it is in a 'block' or 'unblock' state.
+
 
+
If the parent never needed to use 'throwTo'/'killThread' then the
+
child never needs 'block'/'unblock' and the nightmare is avoided.  But
+
if the child uses any operation which can block such as 'accept' or
+
'getChar' or 'takeMVar' then the parent cannot deliver the signal
+
without waking up the child with a 'throwTo' which quickly leads to
+
either an overwhelming use of 'catch' in the child (like other
+
languages) or use of 'block' and the fragile workaround I just
+
described.
+
 
+
As an aside, if there was a "isBlockedThreadState" operation, then it
+
would be possible to create a better client workaround:
+
 
+
<haskell>
+
checkChan queue x = do
+
  isBlocked <- isBlockedThreadState -- perhaps needing =<< myThreadId
+
  case isBlocked of
+
    True -> return x
+
    False -> raiseQueue queue x
+
 
+
raiseQueue queue x = do
+
  hasException <- isEmptyChan queue
+
  case hasException of
+
    False -> return x
+
    True -> readChan >>= raise
+
 
+
checkedBlock queue io = do x <- block io
+
                          checkChan queue x
+
 
+
checkedUnblock queue io = checkChan queue >> io
+
</haskell>
+
 
+
If that existed then the client should use checkedBlock and
+
checkedUnblock instead of block and unblock.  The net effect of this
+
workaround is to implement the queue described in the original paper
+
and wrap the forkIO/throwTo/block/unblock operations with new ones
+
that implement the semantics described in the section 7.2 of the
+
paper.
+
 
+
An ugly side effect of the above is that the modified throwTo also
+
sent the Exception via throwTo as well as the (T)Chan.  So it is
+
possible that Exception could be raised twice depending on how the
+
child is written.  And the order of exceptions in the channel may not
+
be the order that the throwTo's deliver the asynchronous exceptions.
+
 
+
To alleviate this somewhat you could have the modified throwTo put the
+
real exception on the (T)Chan and send a GoLookInTheChan asynchronous
+
exception.  Reversing this might be possible, but it is not clear how.
+
If we could deterministicly wait for a pending asynchronous signal
+
then we could write a proper safe point to fix "block to unblock"
+
transitions, which we know we cannot do.
+
 
+
We can try to implement isBlockedThread state by maintaining a stack
+
of block/unblock operations:
+
 
+
<haskell>
+
checkedBlock stack queue io = do
+
x <- bracket_ (modifyIORef stack (True:))
+
              (modifyIORef stack tail)
+
              (io)
+
checkChan queue x
+
 
+
checkedUnblock stack queue io = do
+
checkChan queue ()
+
x <- bracket_ (modifyIORef stack (False:))
+
              (modifyIORef stack tail)
+
              (io)
+
 
+
isBlockedThread stack = liftM head (readIORef stack)
+
</haskell>
+
 
+
Where "stack :: IORef [Bool]" is created and passed to the child by
+
the modified fork operation.  The 'stack' above grows without bound
+
during recursion and is not tail recursive, so the stack management
+
should be changed to the algorithm in the paper... in which case one
+
really has recreated the paper's implementation inside Haskell.
+
 
+
The only alternatives are to
+
  * Write code that may hang forever at a throwTo/killThread
+
  * Never use throwTo or killThread making block/unblock irrelevant
+
  * Never use 'block' (and 'unblock')
+
  * Only write children with explicit "(unblock ... block (... block
+
    () ...) >>= raiseQueue queue >>= ...." where the different between
+
    checked and unchecked block is obvious and explicit.
+
  * Track and propagate the blocked state as in atomicUpdateFooBar.
+
 
+
This problem with throwTo/block was not obvious until I stated writing
+
small programs to test the implementation in GHC and thus discovered
+
that the paper's description had mislead me.
+
 
+
The key problem is, at least in the presence of block/unblock, that
+
Exceptions are never reliably delivered.  And this is not a
+
theoretical case, but can cause a hang in something as innocuous as
+
"program A" given above.
+

Latest revision as of 17:07, 9 February 2007


[edit] throwTo & block statements fixed in GHC 6.6.1

The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1

I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.

See the ticket at [1] and the comments near the top of [2].

The description which was here has been deleted, but is still available in the historical views of this page.