Yhc/RTS/Primitive

From HaskellWiki

Primitives[edit]

Some parts of the standard Haskell prelude and libraries cannot be implemented directly in Haskell and instead have to be implemented in the interpretter itself, these are "primitive functions".

Compiling primitive functions[edit]

An example of a primitive function is the primFloatSin function, declared in Haskell (definition from src/packages/yhc-base-1.0/Prelude.hs) as:

 primFloatSin primitive 1 :: Float -> Float

This declares that from the C side it is a primitive taking 1 argument and from the Haskell side is a function from Float to Float.

The Yhc compiler compiles this into the code:

 PRIMITIVE FUN Main;primFloatSin 
   {
     NEED_HEAP_32
     PUSH_ARG_0	
     EVAL		
     NEED_HEAP_32
     POP_1		
     PRIMITIVE		
     RETURN_EVAL	
     END_CODE		
   }
 ---- ConstTable ---------------
 0 PRIM Main;primFloatSin
 -------------------------------

Which ensures that the argument is fully evaluated and then executes the 'PRIMITIVE' instruction which performs the primitive operation.

Loading primitive functions[edit]

When the interpretter loads it creates an internal module called "_Builtin" through which all the primitive functions are accessed (see primitive.c in src/runtime/BCKernel). Primitive functions are then referenced from other modules in a similar way to normal functions (see Yhc/RTS/Modules).

The PRIMITIVE instruction[edit]

The interpretter executes the primitive instruction by using constant table item 0. This constant table item will always contain a pointer to an XInfo structure which represents the primitive function to evaluated.

The XInfo structure is largely similar to the FInfo structure except instead of having a codeptr and constant table it has the field:

 CFunction        func;       

where a CFunction is defined as:

 typedef Node*(*CFunction)(Node* ap);

which is a pointer to a CFunction which takes the node currently under evaluation and returns the node to be put on top of the stack (for the RETURN_EVAL).

The Wrapper[edit]

The 'func' field of the XInfo structure points to a wrapper which acts as an interface between the world of Haskell and the world of C.

The wrapper for the primFloatSin function is called _primFloatSin and is defined in src/runtime/BCKernel/builtin/Prelude.c

 Node* _primFloatSin(Node* node){ 
   FloatNode* in = (FloatNode*)node->args[0]; 
   REMOVE_IND(in, FloatNode*); 
   return make_float((Float)sin(in->value)); 
 }

(in the actual code a macro is used, the expanded version is provided)

The wrapper marshalls the arguments from Haskell datatypes into C datatypes, and converts the C result into a Haskell heap node.

Global references[edit]

Primitive functions are able to allocate Haskell heap memory using heap_alloc. This causes an additional complication because heap_alloc might trigger a garbage collection. This causes problems because the primitive C code might be holding pointers into the Haskell heap and a garbage collection is likely to move the heap nodes around, the C code might also be holding the only reference to a heap node.

This means that it is essential that the garbage collector know about C variables holding pointers to Haskell heap nodes. Since there is no way to ask C what variables it has it is, the primitive C functions must register the heap references they hold.

Fortunately it is not necessary to register all variables holding pointers to heap nodes that the C functions, only those that are live after a heap_alloc.

For example the function make_just defined in src/runtime/BCKernel/make.c and is equivilent to the Haskell function:

make_just :: a -> Maybe a
make_just x = Just x

The C code for make_just is

Node* make_just(Node* value){
  Node* ret;
  Global gValue = { NULL, &value };
  heap_pushGlobal(&gValue);
  ret = (Node*)heap_alloc(wordsof(Node)+1);
  heap_popGlobal(); 
  MAKE_NODE(ret, G_infoJust, N_NORMAL);
  INIT_HATNODE(ret, NULL);
  ret->args[0] = value;
  return ret;
}

Here gValue holds the reference to the 'value' argument. The variable 'value' is still needed after the heap_alloc so it must be registered with the garbage collector. heap_pushGlobal registers the global with the garbage collector and thus marks 'value' as being another program root. heap_popGlobal unregisters the variable.

Note that whilst ret is a pointer into the Haskell heap it is not registered, this is because it only has a useful value after the heap_alloc call and so will never be moved by the garbage collector.

One important point is that the list of heap globals is maintained as a stack and so the variables must be popped in the inverse order to the order they were pushed in.

As well as being used to register variables in primitive functions heap_pushGlobal is also used to register the few permanent references the interpretter has into the heap. For example the interpretter has references G_nodeTrue and G_nodeFalse, which are reference the Haskell values True and False. These are registerd in primitive.c and are used by the interpretter in bytecode instructions that calculate Boolean values, for example.