Editing Lightweight concurrency (section)

=== Interaction with GC ===

In the vanilla GHC implementation, each capability maintains a '''list of runnable Haskell threads'''. Each generation in the GC also maintains a list of threads belonging to that generation. At the end of generational collection, threads that survive are promoted to the next generation. Whenever a new thread is created, it is added to generation0's thread list. During a GC, threads are classified into three categories:

* '''Runnable threads:''' Threads that are on the runnable queues. These are considered to be GC roots.
* '''Reachable threads:''' Threads that are reachable from runnable threads. These threads might be blocked on MVars, STM actions, etc., complete or killed.
* '''Unreachable threads:''' Threads that are unreachable. Unreachable threads might be blocked, complete or killed.

At the end of a GC, all unreachable threads that are blocked are prepared with <hask>BlockedIndefinitely</hask> exception and added to their capability's run queue. Note that complete and killed reachable threads survive a collection along with runnable threads, since asynchronous exceptions can still be invoked on them.

In the lightweight concurrency implementation, each capability has just a '''single runnable thread'''. Each generation still maintains a list of threads belonging to that generation. During a GC, threads are classified into reachable and unreachable. RTS knows whether a thread is blocked or complete since this is made explicit in the [[Lightweight_concurrency#Substrate_primitives|switch]] primitive.

==== Problem 1 - Indefinitely blocked threads ====

In the LWC implementation, since there is no notion of a runnable queue of threads for a capability, how do we raise <hask>BlockedIndefinitely</hask> exception?

We need to distinguish between blocked on an unreachable concurrent data structure and an unreachable scheduler. The programmer makes this distinction explicit through the thread status argument as a part of [[Lightweight_concurrency#Substrate_primitives|context switch]] primitives.

===== Blocked on an unreachable concurrent data structure =====

If the MVar is unreachable, the scheduler might still be reachable, and some runnable thread is potentially waiting pull work off this scheduler. Thread blocked on an unreachable MVar will be blocked with thread status <hask>BlockedOnConcDS</hask>. In this case, we can prepare the blocked thread for raising the asynchronous exception as we do in the vanilla implementation. Subsequently, RTS need only to evaluate the blocked thread's unblock action, which will enqueue the blocked thread on its scheduler. But on which thread do we execute the unblock action? In the LWC implementation, each capability has only ever one thread in its run queue. 

The solution proposed here is similar to finalizer invocations. We create an array of IO () actions with the following structure:

<haskell>
[unblock_currentThread, unblock_t0, unblock_t1, ..., unblock_tn, switchToNext_currentThread]
</haskell>

where unblock_t0 to unblock_tn correspond to <hask>unblockThread</hask> upcalls of threads t0 to tn, which are being resurrected with <hask>BlockedIndefinitelyOnConcDS</hask> exception. unblock_currentThread and switchToNext_currentThread correspond to the <hask>unblockThread</hask> and <hask>switchToNext</hask> upcalls of the (only) thread currently on this capability. Next, we create a helper thread with the following closure applied to the array constructed previously.

<haskell>
rtsSchedulerBatchIO :: Array (IO ()) -> IO ()
</haskell>

When given an array of IO () actions, <hask>rtsSchedulerBatchIO</hask> performs each IO action it one-by-one. The net effect of executing the new thread is to add the resurrected threads to their corresponding schedulers and waking up the original thread that was running on this capability.

The newly created thread inherits the scheduler of the thread that was running on the scheduler. This is done by copying the upcall handlers. This is necessary since the newly created helper thread might also get blocked due to PTM actions, blackholes, etc,.

===== Blocked on a unreachable scheduler =====

This case is a bit tricky. If a thread is blocked on an unreachable scheduler, we need to find a scheduler for this thread to execute. But which scheduler? RTS does not know about any other user-level schedulers.

We might fall back to the vanilla GHC's solution here, which is to prepare the blocked thread for asynchronous exception and add it to the current capability's queue of threads blocked on scheduler. At the end of GC, RTS first raises BlockedIndefinitelyOnScheduler exception on all the threads blocked on scheduler, and finally switches to the actual computation (current thread). This solution is not ideal since we do not eliminate schedulers completely from RTS.

==== Problem 2 - Detecting deadlock ====

In the vanilla implementation, whenever RTS finds there are no runnable threads, a GC is forced, that might potentially release the deadlock. This will happen since any indefinitely blocked threads will be woken up with asynchronous exceptions. In the LWC implementation, how would the runtime distinguish between a scheduler that might actively be spinning, looking for more work and a thread execution? There might be multiple schedulers running on a capability, and no one scheduler might know that all schedulers on the capability are waiting for work. It might not be a good idea to trigger a GC whenever a scheduler runs out of work.

''' Proposal 1 '''

Every capability keeps a count of SConts spawned as schedulers, and empty schedulers. When these counts become equal, a GC is triggered. If no threads are unblocked by this GC, then we are really deadlocked. There is a possibility of false positives with this scheme since a scheduler might be slow to let the RTS know that it has in fact found work. How do we deal with such momentary false positives?

''' Proposal 2 '''

Treat the first Haskell thread (proto_thread) created on any capability as a special thread whose only job is to create threads to execute work. It enqueues itself on the scheduler it creates. But none of the threads on the scheduler will switch to this proto_thread, unless there is nothing else to switch to. The proto_thread, when resumed, will force a GC. However, this solution assumes there is a single scheduler data structure at the lowest level per capability.

A capability might really not be deadlocked, since work might be generated by other cores. For example, MVar synchronization might release threads that will be enqueued to schedulers on this capability. Should performing GC on a deadlock be a global property of all capabilities?