Yhc/RTS/GC

From HaskellWiki
< Yhc‎ | RTS
Revision as of 13:14, 12 May 2006 by TomShackell (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Part of Yhc

(Download)

The Yhc Garbage collector is based on the Jonkers collector which is a Mark/Sweep/Compacting collector designed to run on systems with limited memory, for more details of the algorithm in general see:

 H. B. M. Jonkers. A fast garbage compaction algorithm. Information Processing Letters, 9(1):25--30, July 1979.

Sadly this paper is a little hard to get hold of now so this page will explain the algorithm in brief and in particular how the Yhc implementation of it works.

The algorithm is divided in 3 phases:

 - Mark     find all the heap nodes that are reachable from the program root.
 - Sweep    go through and calculate the new positions for all the nodes about 
            to be moved and update all the pointers to point to the new 
            positions.
 - Compact  actually move all the heap nodes to their new positions.

Mark

Yhc uses a fairly standard Yhc/RTS/PointerReversal mark phase to find all the heap nodes that are reachable from the program roots (and thus are still alive).

The program roots for Yhc are:

 - For each process stack (see Yhc/RTS/Machine):
      - The stack itself (stacks are heap nodes).
      - All the entries in the stack that are 'stack data' (i.e. not stack          frames).       
      - For each stack frame the vapptr pointer.
 - For each heap global (see Yhc/RTS/Primitive):
      - The contents of the global pointer.
 - For each FInfo node which is reachable from the heap:
      - All the references to heap nodes in the constant table of the FInfo.

The most complicated part is the FInfos. FInfos have constant-tables which contain references to heap nodes and references to other FInfos that the function might call (see Yhc/RTS/Machine).

Thus the constant tables act as another form of program root, however, we only need to save the nodes referenced in a constant table of a FInfo if that FInfo is reachable from some node in the heap. A FInfo is reachable if it is either referenced by a node in the heap or it is referenced from a reachable FInfo.

Thus when a node in the heap is marked, if it is an application to a FInfo then the FInfo is added to end of the list of live FInfos. The first FInfo in this list is stored in the global variable G_firstCaf (the name 'Caf' here is a historical carry over from nhc, it is not only Cafs that are of interest in this process).

The 'link' field in the FInfo (see Yhc/RTS/Machine) is used to link the FInfos in the list together. It also serves as a 'mark bit' for the FInfo; if a FInfo's link field is already set then this FInfo has already been added to the list.

When all the heap nodes reachable from the other (non-FInfo) roots have been marked the FInfo list is traversed. All nodes that are referenced from the constant table are marked and all new FInfos references from the constant table are added to the end of the FInfo list. Thus all reachable FInfos are added to the FInfo list and all heap nodes reachable from those FInfos are marked.