Yhc/RTS/Heap

From HaskellWiki
< Yhc‎ | RTS
Jump to navigation Jump to search
Part of Yhc

(Download)

The Yhc heap is a garbage collected area of memory in which the running program introduces heap nodes. In this document I will try to use 'heap' to exclusively mean this garbage collected part of the heap. Any other memory used by Yhc will be refered to as 'memory'.

Heap Node structure

The only kind of structure in the Yhc heap are heap nodes. The structure of heap nodes is defined in src/runtime/BCKernel/node.h

Every heap node is layed out in memory as

+----------+----------+ . . +----------+
|  header  |  arg 0   |     |  arg n   |
+----------+----------+ . . +----------+

Where each block is exactly one machine word. The header is a bit packed structure with layout

n                     2    0 
+---------------------+----+
|  info ptr           | fl | 
+---------------------+----+

The node flags (fl) occupy bits 0 and 1 of the header, with the rest taken by the info ptr.

Node flags

To understand the meaning of the node flags it's necessary to consider the life cycle of a heap node. There are two types of heap nodes 'constructors'(CON) and 'applications'. Applications are either partial applications (PAP) or thunks awaiting evaluation (THUNK). Constructors are simple data items created by the program.

Constructors have a very simple life cycle, they are introduced into the heap and eventually they are no longer needed and are garbage collected. Applications are more complicated:

  • When an application is introduced into the heap it is either a partial application or a thunk.
  • If it is a partial application it may be applied to new arguments to give a new heap node with the additional arguments. If it is a thunk it may be evaluated.
  • When a thunk is evaluated it becomes a black hole. The purpose of this is cycle detection: if a black hole is evaluated then the application must be dependent on its own value and thus the program wouldn't terminate.
  • After evaluation a thunk is updated with an indirection to its result (either a constructor or another application).
  • Eventually the garbage collector will run and remove the indirections removing the application from the heap.

The node flags of a node reflect this life cycle: there are four possible values for the 2 flag bits, indicating the different states of the node.

  • 0 0 IND - This node is an indirection, 'info ptr' points to the result.
  • 0 1 NORMAL - This is a normal node, 'info ptr' points to the info table.
  • 1 0 GC - This flag has special meaning to the garbage collector
  • 1 1 HOLE - This is a heap node which is currently under evaluation, 'info ptr' points to info table.

Ignoring the GC flag, a constructor always has 'NORMAL' flag and an application goes from NORMAL -> HOLE -> IND.

Node Info

Every heap node which is a 'NORMAL' or a 'HOLE' has an info pointer. This pointer points to information (stored in memory, not the heap) about the heap node (representing by the structure Info in node.h). Most importantly it is the Info structure which identifies the type of the heap node: CON, THUNK or PAP.

Concurrency slightly complicates the issue here because for nodes which are "blocked" the info pointer may point to a list of processes chained together (with the original info at the end of the chain). These are the processes which are waiting on the result of the node being evaluated.

The Info structure for a Node is defined as.

struct _Info {
  HUInt    tag;
};

Tag is a half-word indicating what type of info this is, there are four possible values:

  • CINFO information about a constructor (node is a CON)
  • PINFO information about an application (node is THUNK/PAP)
  • FINFO information about a function (no node will have this info directly, it'll be a PAP instead)
  • XINFO information about an external function (again, no node will have this info directly)

If the tag of an 'info ptr' is a CINFO that the info ptr actually points to a CInfo structure, if the tag is PINFO it points to a PInfo structure, etc.

CInfo

The CInfo structure has the following layout:

struct CInfo {
  HUInt        tag;     /* 'inherited' from Info */
  HUInt        size;    /* number of arguments to the constructor */
  Char*        name;    /* name of the constructor */
  HUInt        number;  /* tag number: e.g. 0 for False, 1 for True, etc. */
  HUInt        flags;   /* flags controlling GC */
};

flags is a combination of properties about the constructor which aid garbage collection:

  • CI_NO_PTRS the arguments of the constructor do not contain pointers (so don't follow them in GC).
  • CI_INTEGER this is actually an integer (and thus variably sized).
  • CI_ARRAY this is actually an array (and thus also variably sized).
  • CI_FOREIGN_PTR this is a foreign ptr (and so requires special treatment in the 'mark' phase).

PInfo

Information about an application to some function

struct PInfo {
  HUInt           tag;   /* inherited from Info */
  QUInt           size;  /* the no. of arguments does this application has */
  QUInt           need;  /* the no. of arguments this application is missing */
};
  

Which function this is an application to is not encoded directly in the PInfo structure and is instead stored by laying out things careful in memory. The information about a function (both PInfo and FInfo structures) is always layed out as:

+-----------------+
| PInfo   0 / n   |            size 0, need n
+-----------------+
| PInfo   1 / n-1 |            size 1, need n-1
+-----------------+
        ...
+-----------------+
| PInfo   n / 0   |            size n-1, need 0
+-----------------+
| FInfo   ....    |            function info table
.                 .
.                 .

This way the corresponding FInfo for a given PInfo is given by

(FInfo*)(pinfo + pinfo->need + 1)

this is calculated by the PINFO_FINFO macro in node.h

FInfo

Information about a function: no application will have an info pointer directly to a FInfo, instead it'll point to a PInfo from which the FInfo can be found.

struct FInfo {
   HUInt        tag;        /* inherited from Info */
   PInfo*       papTable;   /* pointer back to the first item in the
                               paptable for this function */
   FInfo*       link;       /* used by the GC */
   HUInt        arity;      /* function arity */
   HUInt        stack;      /* unused - formely function stack usage */
   Char*        name;       /* function name */
   CodePtr      code;       /* pointer to the bytecode instructions for
                               this function */
   HUInt        numConsts;  /* number of constants in the constant table */
   UByte*       constTypes; /* pointer to a table of constant table
                               type information */
   ConstItem*   constTable; /* constant table constants */
};     

Every function has a constant table which holds references to global objects that can be referenced by a function. For example a function such as:

f = g (1 : [])

would need a reference to the FInfo for 'g', as well as the CInfo for ':' and the heap node that represents '[]'. These references are stored in the constant table.

There are two types of object in the constant table: references to nodes in the heap (C_NODE) and references to other Info structures (C_INFO). The types of each constant are given by the bytes of 'constTypes': each byte gives the type of consequetive constant table items.

The constTable itself is an array of ConstItems which are pointers to either heap nodes or Info structures (obviously depending on the constant type).

NOTE: it is by recursively scanning the constant table of a functions that the module loading system knows what entries to load from the bytecode file (see Yhc/RTS/Modules)

XInfo

XInfo is very similar to a FInfo except it refers to an external (i.e. primitive or FFI) function. The only difference is that instead of a code pointer and constant table it has a C function pointer to the external function. The function is assumed to be of the form:

Node* function(Node*);

Where the function takes the application node (including arguments) and returns the application node of the result (see Yhc/RTS/Primitive and Yhc/RTS/Foreign).

TInfo

TInfo is an info structure for a thread, this is not a real Info structure per-se but it is used to store blocked process lists see Yhc/RTS/Concurrency.