Ministg

Ministg is an interpreter for a high-level, small-step, operational semantics for the STG machine. The STG machine is the abstract machine at the core of GHC. The operational semantics used in Ministg is taken from the paper Making a fast curry: push/enter vs. eval/apply for higher-order languages by Simon Marlow and Simon Peyton Jones (hereafter referred to as the "fast curry" paper). Please consult the paper for a detailed description of the semantic rules.

Ministg follows the rules in the paper very closely, to the point that each evaluation rule in the paper corresponds to an equation in the interpreter (specifically: equations of the function smallStep). This makes is easy to check that the interpreter follows the rules properly, and it also makes is easy to understand how new rules can be added, and what effect they might have.

Motivation

The main motivations for Ministg are twofold:

To illustrate the dynamic behaviour of the STG machine.
To provide a simple testbed for STG extensions.

It is expected that Ministg will be useful to at least two main groups of people:

Those who are working on the STG machine (and GHC's backend in general).
Those who are learning about the implementation of programming languages (especially of the lazy functional kind).

Feature summary

The following is a summary of the main features of Ministg; they are discussed in more detail below:

Support for both the push/enter and eval/apply rules for higher-order function applications.
Optional tracing of program execution, rendered in HTML.
Optional garbage collection.
Optional standard Prelude of useful functions (much like Haskell).
Optional call-stack tracing and stack dumping on errors.
Program annotations for call-stack tracing, similar to GHC's scc annotation for cost centre stacks.
Optional automatic call-stack annoation of top-level declarations in the program, similar to GHC's --auto-all option for profiling.

Questions, feedback, feature requests, bug reports

Please contact Bernie Pope if you have any questions, feedback feature requests, or bug reports for Ministg. I'm particularly keen to hear from anyone who has tried to use it, and would like to know whether it builds and runs as advertised.

Download

You can get the latest version of the code from the git repository hosted on github using this command:

 git clone https://github.com/bjpop/ministg.git

Or you can install it from hackage http://hackage.haskell.org/package/ministg.

Source language

Ministg accepts programs written in the core syntax from the "fast curry" paper. At the moment programs are not type checked. Here is a definition of a program which sums a list of three numbers:

nil = CON(Nil);
zero = CON(I 0);
one = CON(I 1);
two = CON(I 2);
three = CON(I 3);

plusInt = FUN(x y ->
   case x of {
      I i -> case y of {
               I j -> case plus# i j of {
                         x -> let { result = CON (I x) } in result }}});

foldl = FUN(f acc list ->
   case list of {
      Nil -> acc;
      Cons h t -> let { newAcc = THUNK(f acc h) } in foldl f newAcc t
   });

# lazy sum with a well-known space leak
sum = FUN(list -> foldl plusInt zero list);

list1 = CON(Cons one nil);
list2 = CON(Cons two list1);
list3 = CON(Cons three list2);

main = THUNK(sum list3);

Running the above program yields the result (I 6), which is the integer six.

There are several things to note about the syntax:

Layout is explicit like C: semi-colons delimit statements, and braces group blocks (let and case expressions). Semi-colons are also allowed at the end of the last statement in a block, but are not required.
Line comments begin with a hash character and extend to the end of the line.
Whitespace is ignored except (of course) to delimit variable and keyword tokens.
A few primitive operators are provided, such as plus#. They all end in a hash character.
Integer literals represent unboxed integers. Boxed integers (which simulate the Int type of Haskell) must use the I data constructor.

Modules are not supported, so all of the code for your program must be written in one file. However, for convenience, Ministg provides a Prelude of standard functions, which is automatically imported into your program (but it can be disabled if necessary). Variables defined at the top-level of your source program take precedence of variables defined in the Prelude, so you can re-define Prelude functions if you want to. The example above includes many functions which are provided by the Prelude - we include them explicitly in the example for demonstration purposes only (you would normally just use the ones included from the Prelude).

Arity annotations

When Ministg displays program fragments you might notice that some variable names are decorated like so:

foo_?
bar_3

These indicate the arity of applied functions (the number of arguments that the function needs before it is saturated). An underscore followed by a question mark indicates that the arity is not known. An underscore followed by an integer indicates that the arity is known and is the same as the integer. The arities of applied functions are determined in a phase called "arity analysis" which happens just after the program is parsed. You can dump the program with arity annotations by invoking Ministg with the -d arity command line argument. Arity annotations are not supported in the source language.

Command line arguments

Short form	Long form	Meaning
-a	--annotate	automatically annotate the program with stack markers
-s STYLE	--style=STYLE	evaluation STYLE to use (EA = eval apply, PE = push enter)
-t	--trace	record a trace of program evaluation
	--tracedir=DIR	directory (DIR) to store trace files
-m STEPS	--maxsteps=STEPS	maximum number of evaluation STEPS to record in trace
-c	--callstack	enable call stack tracing
	--noprelude	do not import the Prelude
	--nogc	disable garbage collector
-d DUMPED	--dump=DUMPED	output DUMPED for debugging purposes (ast, parsed, arity)
-v	--version	show version number
-h	--help	get help about using this program

Basic operation

Ministg accepts the command line arguments outlined above. In all cases except when the -h and -v flags are given, ministg requires that the last argument be the name of an input file. For example, if file.stg is the name of a source file in the current working directory, then Ministg can be invoked like so:

ministg file.stg

This will cause Ministg to read the file and, if it is syntactically valid, execute it. If the program successfully terminates, Ministg will print its final value and end.

Execution tracing

One of the most useful features of Ministg is its ability to trace the steps of program execution. If tracing is enabled, Ministg will save each step of execution to a HTML file. Each such file contains the entire state of the STG machine at that point in the computation: namely the code, stack and heap. Here is an example trace. The files are linked together allowing you to step forwards and backwards through the execution.

Tracing is turned off by default. To enable tracing invoke Ministg like so:

ministg -t file.stg

By default the trace files will be saved in a directory called trace, in the current working directory. You can specify an alternative directory to save the trace files using the --tracedir argument. If the trace directory does not exist then Ministg will attempt to create the directory before executing the input program.

The trace files are named stepN.html, where N is a non-negative integer corresponding to the number of evaluation rules which have been applied before the machine entered the state contained in the file. The file step0.html always shows the initial state of the machine.

Due to the fact that Ministg implements a small-step semantics, even short computations can yield a large number of trace files. To avoid filling up your disk Ministg sets an upper bound on the number of trace files it will produce. By default the limit is 1000, but you can change it with the -m command line argument, for example:

ministg -t -m 12 file.stg

will set the maximum number of trace files to 12.

Evaluation style: push/enter versus eval/apply

The "fast curry" paper provides two different sets of rules for the operational semantics, which differ in the way that applications of higher-order functions are handled. Ministg supports both sets of rules. By default it will use the push/enter rules, but you can specify which rule set to use with the -s command line argument.

For example, to use the eval/apply rules, invoke Ministg like so:

ministg -s ea file.stg

or equivalently:

ministg --style=ea file.stg

Assuming there are no bugs in the implementation of Ministg, both rule sets should produce the same final answers for all valid input programs, including non-termination. Obviously they can lead to different execution traces in programs which use higher-order function application.

Runtime errors

Ministg adds an additional kind of heap object to those proposed in the "fast curry" paper. The extra object is called ERROR, and it represents an exceptional value. For convenience Prelude.stg provides a variable binding called error which can be used in your code to simulate a fatal error in execution (just like the error function in Haskell but without the string argument). A program which evaluates ERROR will terminate with a message which says Exception raised!. If the program contains call-stack annotations (described below) then Ministg will also display a call-stack dump (assuming the stack is not empty) like so:

Exception raised!

Call stack:
plusInt
foldl
foldl
sum
main

indicating that the error happened in a call to plusInt which was called by foldl and so on down to main.

Call-stack tracing

One of the goals of Ministg is to provide a simple testbed for extensions to the STG machine. One such extension is call-stack tracing. The idea is to provide a kind of "lexical" call stack which mimics the kind of call stack used in an eager language. Lazy evaluation means that the evaluation stack used by the STG machine is not so useful for diagnosing uncaught exceptions. The main issue is that the evaluation context in which thunks are created can be different to the evaluation context in which thunks are evaluated. The same problem does not occur in eager languages because the construction of (saturated) function applications and their evaluation happens atomically, that is, in the same evaluation context.

Ministg provides a way to add call-stack annotations to programs in a similar way that cost-centre stack annotations can be added to Haskell programs compiled by the GHC compiler. In many ways the call-stack tracing feature of Ministg is just a modification of the cost-centre profiling ideas in the GHC compiler. To understand cost-centre profiling please consult these references:

Execution Proﬁling for Non-strict Functional Languages, Patrick Samson's PhD thesis, 1994.
Profiling large-scale lazy functional programs, by Morgan and Jarvis, Journal of Functional Programming, Volume 8 , Issue 3, 1998.

Stack annotations

Ministg adds an extra kind of expression to the language of programs described in the "fast curry" paper:

stack "annotation" expression

stack is a keyword which takes two arguments:

A double-quoted string annotation. It can be any string, but usually it is the same as a variable name in the program.
An expression to be annotated.

Ignoring call-stack tracing, an annotated expression has the same meaning with the annotation removed. That is, the annotations do not affect the denotation of the program. The annotations do affect the operational semantics however, and Ministg contains an extra small-step evaluation rule to handle them.

A programmer can add stack annotations by hand, but it is often more convenient to have them automatically inserted before the program is executed, similar to the --auto-all option provided by GHC for profiling.

Automatic program annotation

Ministg can automatically add stack annotations to all the top-level definitions in a program, including those in the Prelude. Annotations are added to top-level thunks and functions using the variable name of the thing defined as the string annotation. Data constructors are not annotated. Automatic annotation is enabled via the -a command line argument.

For example, suppose your program contains these two top-level definitions:

fac = FUN(x -> 
         case eqInt x zero of {
            True -> one;
            False -> let { s = THUNK(subInt x one);
                           rec = THUNK(fac s) }
                           in multInt x rec
         });

main = THUNK (fac seven)

After automatic annotation the same definitions will look like this:

fac = FUN(x -> stack "fac"
          case eqInt x zero of {
             True -> one;
             False -> let { s = THUNK(subInt_2 x one);
                            rec = THUNK(fac_1 s) } 
                            in multInt x rec
            });

main = THUNK(stack "main" fac seven)

Viewing the call-stack in the trace files

By default Ministg does not show the call-stack in the trace files produced by execution tracing, however they can be made visible by invoking Ministg with the -c command line argument.

Garbage collection

For convenience Ministg provides garbage collection of heap bindings. This is especially useful for execution tracing because it keeps the heap as small as possible, which makes it easier to view. It also means that Prelude functions which are not used by your program are not kept in the heap. Thus the Prelude can have lots of definitions in it, but execution tracing only shows the ones in current use by your program; so you are not penalised for having an expressive Prelude! The garbage collector is run between every small-step of execution - thus there is not expectation that it will be fast. You can turn the garbage collector off by invoking Ministg with the --nogc command line argument.