https://wiki.haskell.org/api.php?action=feedcontributions&user=Vir&feedformat=atomHaskellWiki - User contributions [en]2022-12-10T01:20:39ZUser contributionsMediaWiki 1.31.7https://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=20357STG in Javascript2008-03-31T21:25:45Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
''Note by Dimitry'': Just for clarity, Dimitry's part ends here, and Vir's part starts.<br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are used interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = closure;<br />
args = null;<br />
return null;<br />
}<br />
<br />
String.prototype.arity = 0;<br />
String.prototype.code = function ()<br />
{<br />
if (closure.length == 0) {<br />
args = null;<br />
closure = Nil;<br />
return apply;<br />
}<br />
<br />
args = new Array (2);<br />
args[0] = new Number (closure.charCodeAt (0));<br />
args[1] = closure.slice (1, closure.length);<br />
closure = Cons;<br />
return apply;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
arity : f.arity - nargs,<br />
code : function () {<br />
var new_args = args;<br />
args = supplied_args<br />
supplied_args = null;<br />
<br />
// not working, type information is lost... :(<br />
//args.push (new_args);<br />
<br />
for (i = nargs; i < f.arity; i++)<br />
args[i] = new_args[i - nargs];<br />
new_args = null;<br />
closure = f;<br />
return apply;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// f.arity < nargs<br />
<br />
var remaining_args = args.slice (f.arity, nargs);<br />
args.length = f.arity;<br />
<br />
save_continuation_and_run (f.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
var f = closure;<br />
<br />
save_continuation_and_run (f.realcode);<br />
<br />
f.RCons = RCons;<br />
f.RVal = RVal;<br />
f.args = args;<br />
f.code = updated_code;<br />
f.realcode = null;<br />
return null;<br />
}<br />
<br />
function update_code ()<br />
{<br />
RCons = closure.RCons;<br />
RVal = closure.RVal;<br />
args = closure.args;<br />
return null;<br />
}<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
arity: 2,<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var a = RVal;<br />
closure = b;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var b = RVal;<br />
RVal = a + b;<br />
args = null;<br />
return null;<br />
}<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
arity: 2,<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
ConsTag = 3;<br />
Cons = {<br />
arity : 2,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
NilTag = 2;<br />
Nil = {<br />
arity : 0,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> Nil<br />
}<br />
*/<br />
map = {<br />
arity: 2,<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
closure = Nil;<br />
args = null;<br />
return Nil.code;<br />
break;<br />
}<br />
}<br />
}<br />
<br />
inc3 = {<br />
arity: 0,<br />
code: function () {<br />
args = new Array (1);<br />
args[0] = 3;<br />
closure = add;<br />
return apply;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com<br />
<br />
== Aug 29, 2007 ==<br />
<br />
Code from previous section was updated. Here are some tests I've used to debug this code:<br />
<br />
<pre><br />
args = null;<br />
closure = 1013;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(2);<br />
args[0] = 7;<br />
args[1] = 6;<br />
closure = add;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(1);<br />
closure = inc3;<br />
args[0] = new Number(21);<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
closure = "123";<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
</pre><br />
<br />
The result of this test is the following:<br />
<pre><br />
1013 // Means that JS numbers work as closures using prototype trick<br />
13 // Simple function calls are working<br />
24 // Not so simple calls are working: PAP is properly build and used<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
2 // Nil - list constructor<br />
</pre><br />
<br />
Last 4 lines shows that javascript strings works properly using prototype trick. We can observe the structure of "123" object: Cons 1 (Cons 2 (Cons 3 Nil))<br />
<br />
== Sept 4, 2007 ==<br />
<br />
I've got some feedback from Edward Kmett. Edward claims that simple trampolining as used below is less efficient than Appels trampoline. Trampolining is a trick to simulate tail calls. Simon Peyton Jones used the same technic as I did in my code (and Dimitry did too). The technic is simple: return continuation to call it. Mini-interpreter is used for trampolining on the stack:<br />
<br />
<pre><br />
while (f=f())<br />
;<br />
</pre><br />
<br />
It is efficient in Simon version because of GNU C compiler's tweaks (not portable so). But they are not available in JavaScript.<br />
<br />
Using the same interpreter in JavaScript I have to return function to simulate tail call and call interpreter again to simulate normal call. This seems very inefficient and Edward claims it is.<br />
<br />
The trick is the transformation of the program to Continuation Passing Style. We need no stack at all when using this transformation. So every call is tail call. We can get rid of interpreter and just call functions as usual in JavaScript. We can use counter to count stack frames (function enterings), and when we rich the limit, we don't call continuation directly, but register it as a callback on the timer event (and thus we flush the stack). So we use longer jumps on stack, and some peaple claims it's more efficient. Moreover we get some framework to introduce parallel threads. Stack jump become a quantum for the thread.<br />
<br />
I'd like to thank Edward for his ideas, and explore them feather.<br />
<br />
== Apr 1, 2008 ==<br />
Not an april joke, just links:<br />
Dmitry Golubovsky's and YHC Team's Javascript backend and Haskell Web Toolkit:<br />
http://yhc06.blogspot.com/2008/03/yhcjavascript-backend.html<br />
<br />
Victor Nazarov's GHC backend: http://vir.mskhug.ru/</div>Virhttps://wiki.haskell.org/index.php?title=IO_inside&diff=15434IO inside2007-09-06T12:11:27Z<p>Vir: </p>
<hr />
<div>Haskell I/O has always been a source of confusion and surprises for new Haskellers. While simple I/O code in Haskell looks very similar to its equivalents in imperative languages, attempts to write somewhat more complex code often result in a total mess. This is because Haskell I/O is really very different internally. Haskell is a pure language and even the I/O system can't break this purity.<br />
<br />
The following text is an attempt to explain the details of Haskell I/O implementations. This explanation should help you eventually master all the smart I/O tricks. Moreover, I've added a detailed explanation of various traps you might encounter along the way. After reading this text, you will receive a "Master of Haskell I/O" degree that is equal to a Bachelor in Computer Science and Mathematics, simultaneously :)<br />
<br />
If you are new to Haskell I/O you may prefer to start by reading the [[Introduction to IO]] page.<br />
<br />
<br />
== Haskell is a pure language ==<br />
<br />
Haskell is a pure language, which means that the result of any function call is fully determined by its arguments. Pseudo-functions like rand() or getchar() in C, which return different results on each call, are simply impossible to write in Haskell. Moreover, Haskell functions can't have side effects, which means that they can't effect any changes to the "real world", like changing files, writing to the screen, printing, sending data over the network, and so on. These two restrictions together mean that any function<br />
call can be omitted, repeated, or replaced by the result of a previous call with the same parameters, and the language '''guarantees''' that all these rearrangements will not change the program result!<br />
<br />
Let's compare this to C: optimizing C compilers try to guess which functions have no side effects and don't depend on mutable global variables. If this guess is wrong, an optimization can change the program's semantics! To avoid this kind of disaster, C optimizers are conservative in their guesses or require hints from the programmer about the purity of functions.<br />
<br />
Compared to an optimizing C compiler, a Haskell compiler is a set of pure mathematical transformations. This results in much better high-level optimization facilities. Moreover, pure mathematical computations can be much more easily divided into several threads that may be executed in parallel, which is increasingly important in these days of multi-core CPUs. Finally, pure computations are less error-prone and easier to verify, which adds to Haskell's robustness and to the speed of program development using Haskell.<br />
<br />
Haskell purity allows compiler to call only functions whose results<br />
are really required to calculate final value of high-level function<br />
(i.e., main) - this is called lazy evaluation. It's great thing for<br />
pure mathematical computations, but how about I/O actions? Function<br />
like (<hask>putStrLn "Press any key to begin formatting"</hask>) can't return any<br />
meaningful result value, so how can we ensure that compiler will not<br />
omit or reorder its execution? And in general: how we can work with<br />
stateful algorithms and side effects in an entirely lazy language?<br />
This question has had many different solutions proposed in 18 years of<br />
Haskell development (see [[History of Haskell]]), though a solution based on '''''monads''''' is now<br />
the standard.<br />
<br />
<br />
<br />
== What is a monad? ==<br />
<br />
What is a monad? It's something from mathematical category theory, which I<br />
don't know anymore :) In order to understand how monads are used to<br />
solve the problem of I/O and side effects, you don't need to know it. It's<br />
enough to just know elementary mathematics, like I do :)<br />
<br />
Let's imagine that we want to implement in Haskell the well-known<br />
'getchar' function. What type should it have? Let's try:<br />
<br />
<haskell><br />
getchar :: Char<br />
<br />
get2chars = [getchar,getchar]<br />
</haskell><br />
<br />
What will we get with 'getchar' having just the 'Char' type? You can see<br />
all the possible problems in the definition of 'get2chars':<br />
<br />
# Because the Haskell compiler treats all functions as pure (not having side effects), it can avoid "excessive" calls to 'getchar' and use one returned value twice.<br />
# Even if it does make two calls, there is no way to determine which call should be performed first. Do you want to return the two chars in the order in which they were read, or in the opposite order? Nothing in the definition of 'get2chars' answers this question.<br />
<br />
How can these problems be solved, from the programmer's viewpoint?<br />
Let's introduce a fake parameter of 'getchar' to make each call<br />
"different" from the compiler's point of view:<br />
<br />
<haskell><br />
getchar :: Int -> Char<br />
<br />
get2chars = [getchar 1, getchar 2]<br />
</haskell><br />
<br />
Right away, this solves the first problem mentioned above - now the<br />
compiler will make two calls because it sees them as having different<br />
parameters. The whole 'get2chars' function should also have a<br />
fake parameter, otherwise we will have the same problem calling it:<br />
<br />
<haskell><br />
getchar :: Int -> Char<br />
get2chars :: Int -> String<br />
<br />
get2chars _ = [getchar 1, getchar 2]<br />
</haskell><br />
<br />
<br />
Now we need to give the compiler some clue to determine which function it<br />
should call first. The Haskell language doesn't provide any way to express<br />
order of evaluation... except for data dependencies! How about adding an<br />
artificial data dependency which prevents evaluation of the second<br />
'getchar' before the first one? In order to achieve this, we will<br />
return an additional fake result from 'getchar' that will be used as a<br />
parameter for the next 'getchar' call:<br />
<br />
<haskell><br />
getchar :: Int -> (Char, Int)<br />
<br />
get2chars _ = [a,b] where (a,i) = getchar 1<br />
(b,_) = getchar i<br />
</haskell><br />
<br />
So far so good - now we can guarantee that 'a' is read before 'b'<br />
because reading 'b' needs the value ('i') that is returned by reading 'a'!<br />
<br />
We've added a fake parameter to 'get2chars' but the problem is that the<br />
Haskell compiler is too smart! It can believe that the external 'getchar'<br />
function is really dependent on its parameter but for 'get2chars' it<br />
will see that we're just cheating because we throw it away! Therefore it won't feel obliged to execute the calls in the order we want. How can we fix this? How about passing this fake parameter to the 'getchar' function?! In this case<br />
the compiler can't guess that it is really unused :)<br />
<br />
<haskell><br />
get2chars i0 = [a,b] where (a,i1) = getchar i0<br />
(b,i2) = getchar i1<br />
</haskell><br />
<br />
<br />
And more - 'get2chars' has all the same purity problems as the 'getchar'<br />
function. If you need to call it two times, you need a way to describe<br />
the order of these calls. Look at:<br />
<br />
<haskell><br />
get4chars = [get2chars 1, get2chars 2] -- order of 'get2chars' calls isn't defined<br />
</haskell><br />
<br />
We already know how to deal with these problems - 'get2chars' should<br />
also return some fake value that can be used to order calls:<br />
<br />
<haskell><br />
get2chars :: Int -> (String, Int)<br />
<br />
get4chars i0 = (a++b) where (a,i1) = get2chars i0<br />
(b,i2) = get2chars i1<br />
</haskell><br />
<br />
<br />
But what's the fake value 'get2chars' should return? If we use some integer constant, the excessively-smart Haskell compiler will guess that we're cheating again :) What about returning the value returned by 'getchar'? See:<br />
<br />
<haskell><br />
get2chars :: Int -> (String, Int)<br />
get2chars i0 = ([a,b], i2) where (a,i1) = getchar i0<br />
(b,i2) = getchar i1<br />
</haskell><br />
<br />
Believe it or not, but we've just constructed the whole "monadic"<br />
Haskell I/O system.<br />
<br />
<br />
<br />
== Welcome to the RealWorld, baby :) ==<br />
<br />
The 'main' Haskell function has the type:<br />
<br />
<haskell><br />
main :: RealWorld -> ((), RealWorld)<br />
</haskell><br />
<br />
where 'RealWorld' is a fake type used instead of our Int. It's something<br />
like the baton passed in a relay race. When 'main' calls some IO function,<br />
it passes the "RealWorld" it received as a parameter. All IO functions have<br />
similar types involving RealWorld as a parameter and result. To be<br />
exact, "IO" is a type synonym defined in the following way:<br />
<br />
<haskell><br />
type IO a = RealWorld -> (a, RealWorld)<br />
</haskell><br />
<br />
So, 'main' just has type "IO ()", 'getChar' has type "IO Char" and so<br />
on. You can think of the type "IO Char" as meaning "take the current RealWorld, do something to it, and return a Char and a (possibly changed) RealWorld". Let's look at 'main' calling 'getChar' two times:<br />
<br />
<haskell><br />
getChar :: RealWorld -> (Char, RealWorld)<br />
<br />
main :: RealWorld -> ((), RealWorld)<br />
main world0 = let (a, world1) = getChar world0<br />
(b, world2) = getChar world1<br />
in ((), world2)<br />
</haskell><br />
<br />
<br />
Look at this closely: 'main' passes to first 'getChar' the "world" it<br />
received. This 'getChar' returns some new value of type RealWorld<br />
that gets used in the next call. Finally, 'main' returns the "world" it got<br />
from the second 'getChar'.<br />
<br />
# Is it possible here to omit any call of 'getChar' if the Char it read is not used? No, because we need to return the "world" that is the result of the second 'getChar' and this in turn requires the "world" returned from the first 'getChar'.<br />
# Is it possible to reorder the 'getChar' calls? No: the second 'getChar' can't be called before the first one because it uses the "world" returned from the first call.<br />
# Is it possible to duplicate calls? In Haskell semantics - yes, but real compilers never duplicate work in such simple cases (otherwise, the programs generated will not have any speed guarantees).<br />
<br />
<br />
As we already said, RealWorld values are used like a baton which gets passed<br />
between all routines called by 'main' in strict order. Inside each<br />
routine called, RealWorld values are used in the same way. Overall, in<br />
order to "compute" the world to be returned from 'main', we should perform<br />
each IO procedure that is called from 'main', directly or indirectly.<br />
This means that each procedure inserted in the chain will be performed<br />
just at the moment (relative to the other IO actions) when we intended it<br />
to be called. Let's consider the following program:<br />
<br />
<haskell><br />
main = do a <- ask "What is your name?"<br />
b <- ask "How old are you?"<br />
return ()<br />
<br />
ask s = do putStr s<br />
readLn<br />
</haskell><br />
<br />
Now you have enough knowledge to rewrite it in a low-level way and<br />
check that each operation that should be performed will really be<br />
performed with the arguments it should have and in the order we expect.<br />
<br />
<br />
But what about conditional execution? No problem. Let's define the<br />
well-known 'when' operation:<br />
<br />
<haskell><br />
when :: Bool -> IO () -> IO ()<br />
when condition action world =<br />
if condition<br />
then action world<br />
else ((), world)<br />
</haskell><br />
<br />
As you can see, we can easily include or exclude from the execution chain<br />
IO procedures (actions) depending on the data values. If 'condition'<br />
will be False on the call of 'when', 'action' will never be called because<br />
real Haskell compilers, again, never call functions whose results<br />
are not required to calculate the final result (''i.e.'', here, the final "world" value of 'main').<br />
<br />
Loops and more complex control structures can be implemented in<br />
the same way. Try it as an exercise!<br />
<br />
<br />
Finally, you may want to know how much passing these RealWorld<br />
values around the program costs. It's free! These fake values exist solely for the compiler while it analyzes and optimizes the code, but when it gets to assembly code generation, it "suddenly" realize that this type is like "()", so<br />
all these parameters and result values can be omitted from the final generated code. Isn't it beautiful? :)<br />
<br />
<br />
<br />
== '>>=' and 'do' notation ==<br />
<br />
All beginners (including me :)) start by thinking that 'do' is some<br />
magic statement that executes IO actions. That's wrong - 'do' is just<br />
syntactic sugar that simplifies the writing of procedures that use IO (and also other monads, but that's beyond the scope of this tutorial). 'do' notation eventually gets translated to statements passing "world" values around like we've manually written above and is used to simplify the gluing of several<br />
IO actions together. You don't need to use 'do' for just one statement; for instance,<br />
<br />
<haskell><br />
main = do putStr "Hello!"<br />
</haskell><br />
<br />
is desugared to:<br />
<br />
<haskell><br />
main = putStr "Hello!"<br />
</haskell><br />
<br />
But nevertheless it's considered Good Style to use 'do' even for one statement<br />
because it simplifies adding new statements in the future.<br />
<br />
<br />
Let's examine how to desugar a 'do' with multiple statements in the<br />
following example: <br />
<br />
<haskell><br />
main = do putStr "What is your name?"<br />
putStr "How old are you?"<br />
putStr "Nice day!"<br />
</haskell><br />
<br />
The 'do' statement here just joins several IO actions that should be<br />
performed sequentially. It's translated to sequential applications<br />
of one of the so-called "binding operators", namely '>>':<br />
<br />
<haskell><br />
main = (putStr "What is your name?")<br />
>> ( (putStr "How old are you?")<br />
>> (putStr "Nice day!")<br />
)<br />
</haskell><br />
<br />
This binding operator just combines two IO actions, executing them<br />
sequentially by passing the "world" between them:<br />
<br />
<haskell><br />
(>>) :: IO a -> IO b -> IO b<br />
(action1 >> action2) world0 =<br />
let (a, world1) = action1 world0<br />
(b, world2) = action2 world1<br />
in (b, world2)<br />
</haskell><br />
<br />
If defining operators this way looks strange to you, read this<br />
definition as follows:<br />
<br />
<haskell><br />
action1 >> action2 = action<br />
where<br />
action world0 = let (a, world1) = action1 world0<br />
(b, world2) = action2 world1<br />
in (b, world2)<br />
</haskell><br />
<br />
Now you can substitute the definition of '>>' at the places of its usage<br />
and check that program constructed by the 'do' desugaring is actually the<br />
same as we could write by manually manipulating "world" values.<br />
<br />
<br />
A more complex example involves the binding of variables using "<-":<br />
<br />
<haskell><br />
main = do a <- readLn<br />
print a<br />
</haskell><br />
<br />
This code is desugared into:<br />
<br />
<haskell><br />
main = readLn<br />
>>= (\a -> print a)<br />
</haskell><br />
<br />
As you should remember, the '>>' binding operator silently ignores<br />
the value of its first action and returns as an overall result<br />
the result of its second action only. On the other hand, the '>>=' binding operator (note the extra '=' at the end) allows us to use the result of its first action - it gets passed as an additional parameter to the second one! Look at the definition:<br />
<br />
<haskell><br />
(>>=) :: IO a -> (a -> IO b) -> IO b<br />
(action1 >>= action2) world0 =<br />
let (a, world1) = action1 world0<br />
(b, world2) = action2 a world1<br />
in (b, world2)<br />
</haskell><br />
<br />
First, what does the type of the second "action" (more precisely, a function which returns an IO action), namely "a -> IO b", mean? By<br />
substituting the "IO" definition, we get "a -> RealWorld -> (b, RealWorld)".<br />
This means that second action actually has two parameters<br />
- the type 'a' actually used inside it, and the value of type RealWorld used for sequencing of IO actions. That's always the case - any IO procedure has one<br />
more parameter compared to what you see in its type signature. This<br />
parameter is hidden inside the definition of the type alias "IO".<br />
<br />
Second, you can use these '>>' and '>>=' operations to simplify your<br />
program. For example, in the code above we don't need to introduce the<br />
variable, because the result of 'readLn' can be send directly to 'print':<br />
<br />
<haskell><br />
main = readLn >>= print<br />
</haskell><br />
<br />
<br />
And third - as you see, the notation:<br />
<br />
<haskell><br />
do x <- action1<br />
action2<br />
</haskell><br />
<br />
where 'action1' has type "IO a" and 'action2' has type "IO b",<br />
translates into:<br />
<br />
<haskell><br />
action1 >>= (\x -> action2)<br />
</haskell><br />
<br />
where the second argument of '>>=' has the type "a -> IO b". It's the way<br />
the '<-' binding is processed - the name on the left-hand side of '<-' just becomes a parameter of subsequent operations represented as one large IO action. Note also that if 'action1' has type "IO a" then 'x' will just have type "a"; you can think of the effect of '<-' as "unpacking" the IO value of 'action1' into 'x'. Note also that '<-' is not a true operator; it's pure syntax, just like 'do' itself. Its meaning results only from the way it gets desugared.<br />
<br />
Look at the next example: <br />
<br />
<haskell><br />
main = do putStr "What is your name?"<br />
a <- readLn<br />
putStr "How old are you?"<br />
b <- readLn<br />
print (a,b)<br />
</haskell><br />
<br />
This code is desugared into:<br />
<br />
<haskell><br />
main = putStr "What is your name?"<br />
>> readLn<br />
>>= \a -> putStr "How old are you?"<br />
>> readLn<br />
>>= \b -> print (a,b)<br />
</haskell><br />
<br />
I omitted the parentheses here; both the '>>' and the '>>=' operators are<br />
left-associative, but lambda-bindings always stretches as far to the right as possible, which means that the 'a' and 'b' bindings introduced<br />
here are valid for all remaining actions. As an exercise, add the<br />
parentheses yourself and translate this procedure into the low-level<br />
code that explicitly passes "world" values. I think it should be enough to help you finally realize how the 'do' translation and binding operators work.<br />
<br />
<br />
Oh, no! I forgot the third monadic operator - 'return'. It just<br />
combines its two parameters - the value passed and "world":<br />
<br />
<haskell><br />
return :: a -> IO a<br />
return a world0 = (a, world0)<br />
</haskell><br />
<br />
How about translating a simple example of 'return' usage? Say,<br />
<br />
<haskell><br />
main = do a <- readLn<br />
return (a*2)<br />
</haskell><br />
<br />
<br />
Programmers with an imperative language background often think that<br />
'return' in Haskell, as in other languages, immediately returns from<br />
the IO procedure. As you can see in its definition (and even just from its<br />
type!), such an assumption is totally wrong. The only purpose of using<br />
'return' is to "lift" some value (of type 'a') into the result of<br />
a whole action (of type "IO a") and therefore it should generally be used only as the last executed statement of some IO sequence. For example try to<br />
translate the following procedure into the corresponding low-level code:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
when (a>=0) $ do<br />
return ()<br />
print "a is negative"<br />
</haskell><br />
<br />
and you will realize that the 'print' statement is executed even for non-negative values of 'a'. If you need to escape from the middle of an IO procedure, you can use the 'if' statement:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
if (a>=0)<br />
then return ()<br />
else print "a is negative"<br />
</haskell><br />
<br />
Moreover, Haskell layout rules allow us to use the following layout:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
if (a>=0) then return ()<br />
else do<br />
print "a is negative"<br />
...<br />
</haskell><br />
<br />
that may be useful for escaping from the middle of a longish 'do' statement.<br />
<br />
<br />
Last exercise: implement a function 'liftM' that lifts operations on<br />
plain values to the operations on monadic ones. Its type signature:<br />
<br />
<haskell><br />
liftM :: (a -> b) -> (IO a -> IO b)<br />
</haskell><br />
<br />
If that's too hard for you, start with the following high-level<br />
definition and rewrite it in low-level fashion:<br />
<br />
<haskell><br />
liftM f action = do x <- action<br />
return (f x)<br />
</haskell><br />
<br />
<br />
<br />
== Mutable data (references, arrays, hash tables...) ==<br />
<br />
As you should know, every name in Haskell is bound to one fixed (immutable) value. This greatly simplifies understanding algorithms and code optimization, but it's inappropriate in some cases. As we all know, there are plenty of algorithms that are simpler to implement in terms of updatable<br />
variables, arrays and so on. This means that the value associated with<br />
a variable, for example, can be different at different execution points,<br />
so reading its value can't be considered as a pure function. Imagine,<br />
for example, the following code:<br />
<br />
<haskell><br />
main = do let a0 = readVariable varA<br />
_ = writeVariable varA 1<br />
a1 = readVariable varA<br />
print (a0, a1)<br />
</haskell><br />
<br />
Does this look strange? First, the two calls to 'readVariable' look the same, so the compiler can just reuse the value returned by the first call. Second,<br />
the result of the 'writeVariable' call isn't used so the compiler can (and will!) omit this call completely. To complete the picture, these three calls may be rearranged in any order because they appear to be independent of each<br />
other. This is obviously not what was intended. What's the solution? You already know this - use IO actions! Using IO actions guarantees that:<br />
<br />
# the execution order will be retained as written<br />
# each action will have to be executed<br />
# the result of the "same" action (such as "readVariable varA") will not be reused<br />
<br />
So, the code above really should be written as:<br />
<br />
<haskell><br />
main = do varA <- newIORef 0 -- Create and initialize a new variable<br />
a0 <- readIORef varA<br />
writeIORef varA 1<br />
a1 <- readIORef varA<br />
print (a0, a1)<br />
</haskell><br />
<br />
Here, 'varA' has the type "IORef Int" which means "a variable (reference) in<br />
the IO monad holding a value of type Int". newIORef creates a new variable<br />
(reference) and returns it, and then read/write actions use this<br />
reference. The value returned by the "readIORef varA" action depends not<br />
only on the variable involved but also on the moment this operation is performed so it can return different values on each call.<br />
<br />
Arrays, hash tables and any other _mutable_ data structures are<br />
defined in the same way - for each of them, there's an operation that creates new "mutable values" and returns a reference to it. Then special read and write<br />
operations in the IO monad are used. The following code shows an example<br />
using mutable arrays:<br />
<br />
<haskell><br />
import Data.Array.IO<br />
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)<br />
a <- readArray arr 1<br />
writeArray arr 1 64<br />
b <- readArray arr 1<br />
print (a, b)<br />
</haskell><br />
<br />
Here, an array of 10 elements with 37 as the initial value at each location is created. After reading the value of the first element (index 1) into 'a' this element's value is changed to 64 and then read again into 'b'. As you can see by executing this code, 'a' will be set to 37 and 'b' to 64.<br />
<br />
<br />
<br />
Other state-dependent operations are also often implemented as IO<br />
actions. For example, a random number generator should return a different<br />
value on each call. It looks natural to give it a type involving IO:<br />
<br />
<haskell><br />
rand :: IO Int<br />
</haskell><br />
<br />
Moreover, when you import C routines you should be careful - if this<br />
routine is impure, i.e. its result depends on something in the "real<br />
world" (file system, memory contents...), internal state and so on,<br />
you should give it an IO type. Otherwise, the compiler can<br />
"optimize" repetitive calls of this procedure with the same parameters! :)<br />
<br />
For example, we can write a non-IO type for:<br />
<br />
<haskell><br />
foreign import ccall<br />
sin :: Double -> Double<br />
</haskell><br />
<br />
because the result of 'sin' depends only on its argument, but<br />
<br />
<haskell><br />
foreign import ccall<br />
tell :: Int -> IO Int<br />
</haskell><br />
<br />
If you will declare 'tell' as a pure function (without IO) then you may<br />
get the same position on each call! :)<br />
<br />
== IO actions as values ==<br />
<br />
By this point you should understand why it's impossible to use IO<br />
actions inside non-IO (pure) procedures. Such procedures just don't<br />
get a "baton"; they don't know any "world" value to pass to an IO action.<br />
The RealWorld type is an abstract datatype, so pure functions also can't construct RealWorld values by themselves, and it's a strict type, so 'undefined' also can't be used. So, the prohibition of using IO actions inside pure procedures is just a type system trick (as it usually is in Haskell :)).<br />
<br />
But while pure code can't _execute_ IO actions, it can work with them<br />
as with any other functional values - they can be stored in data<br />
structures, passed as parameters, returned as results, collected in<br />
lists, and partially applied. But an IO action will remain a<br />
functional value because we can't apply it to the last argument - of<br />
type RealWorld.<br />
<br />
In order to _execute_ the IO action we need to apply it to some<br />
RealWorld value. That can be done only inside some IO procedure,<br />
in its "actions chain". And real execution of this action will take<br />
place only when this procedure is called as part of the process of<br />
"calculating the final value of world" for 'main'. Look at this example:<br />
<br />
<haskell><br />
main world0 = let get2chars = getChar >> getChar<br />
((), world1) = putStr "Press two keys" world0<br />
(answer, world2) = get2chars world1<br />
in ((), world2)<br />
</haskell><br />
<br />
Here we first bind a value to 'get2chars' and then write a binding<br />
involving 'putStr'. But what's the execution order? It's not defined<br />
by the order of the 'let' bindings, it's defined by the order of processing<br />
"world" values! You can arbitrarily reorder the binding statements - the execution order will be defined by the data dependency with respect to the <br />
"world" values that get passed around. Let's see what this 'main' looks like in the 'do' notation:<br />
<br />
<haskell><br />
main = do let get2chars = getChar >> getChar<br />
putStr "Press two keys"<br />
get2chars<br />
return ()<br />
</haskell><br />
<br />
As you can see, we've eliminated two of the 'let' bindings and left only the one defining 'get2chars'. The non-'let' statements are executed in the exact order in which they're written, because they pass the "world" value from statement to statement as we described above. Thus, this version of the function is much easier to understand because we don't have to mentally figure out the data dependency of the "world" value.<br />
<br />
Moreover, IO actions like 'get2chars' can't be executed directly<br />
because they are functions with a RealWorld parameter. To execute them,<br />
we need to supply the RealWorld parameter, i.e. insert them in the 'main'<br />
chain, placing them in some 'do' sequence executed from 'main' (either directly in the 'main' function, or indirectly in an IO function called from 'main'). Until that's done, they will remain like any function, in partially<br />
evaluated form. And we can work with IO actions as with any other<br />
functions - bind them to names (as we did above), save them in data<br />
structures, pass them as function parameters and return them as results - and<br />
they won't be performed until you give them the magic RealWorld<br />
parameter!<br />
<br />
<br />
<br />
=== Example: a list of IO actions ===<br />
<br />
Let's try defining a list of IO actions:<br />
<br />
<haskell><br />
ioActions :: [IO ()]<br />
ioActions = [(print "Hello!"),<br />
(putStr "just kidding"),<br />
(getChar >> return ())<br />
]<br />
</haskell><br />
<br />
I used additional parentheses around each action, although they aren't really required. If you still can't believe that these actions won't be executed immediately, just recall the real type of this list:<br />
<br />
<haskell><br />
ioActions :: [RealWorld -> ((), RealWorld)]<br />
</haskell><br />
<br />
Well, now we want to execute some of these actions. No problem, just<br />
insert them into the 'main' chain:<br />
<br />
<haskell><br />
main = do head ioActions<br />
ioActions !! 1<br />
last ioActions<br />
</haskell><br />
<br />
Looks strange, right? :) Really, any IO action that you write in a 'do'<br />
statement (or use as a parameter for the '>>'/'>>=' operators) is an expression<br />
returning a result of type 'IO a' for some type 'a'. Typically, you use some function that has the type 'x -> y -> ... -> IO a' and provide all the x, y, etc. parameters. But you're not limited to this standard scenario -<br />
don't forget that Haskell is a functional language and you're free to<br />
compute the functional value required (recall that "IO a" is really a function<br />
type) in any possible way. Here we just extracted several functions<br />
from the list - no problem. This functional value can also be<br />
constructed on-the-fly, as we've done in the previous example - that's also<br />
OK. Want to see this functional value passed as a parameter?<br />
Just look at the definition of 'when'. Hey, we can buy, sell, and rent<br />
these IO actions just like we can with any other functional values! For example, let's define a function that executes all the IO actions in the list:<br />
<br />
<haskell><br />
sequence_ :: [IO a] -> IO ()<br />
sequence_ [] = return ()<br />
sequence_ (x:xs) = do x<br />
sequence_ xs<br />
</haskell><br />
<br />
No black magic - we just extract IO actions from the list and insert<br />
them into a chain of IO operations that should be performed one after another (in the same order that they occurred in the list) to "compute the final world value" of the entire 'sequence_' call.<br />
<br />
With the help of 'sequence_', we can rewrite our last 'main' function as:<br />
<br />
<haskell><br />
main = sequence_ ioActions<br />
</haskell><br />
<br />
<br />
Haskell's ability to work with IO actions as with any other<br />
(functional and non-functional) values allows us to define control<br />
structures of arbitrary complexity. Try, for example, to define a control<br />
structure that repeats an action until it returns the 'False' result:<br />
<br />
<haskell><br />
while :: IO Bool -> IO ()<br />
while action = ???<br />
</haskell><br />
<br />
Most programming languages don't allow you to define control structures at all, and those that do often require you to use a macro-expansion system. In Haskell, control structures are just trivial functions anyone can write.<br />
<br />
<br />
=== Example: returning an IO action as a result ===<br />
<br />
How about returning an IO action as the result of a function? Well, we've done<br />
this each time we've defined an IO procedure - they all return IO actions<br />
that need a RealWorld value to be performed. While we usually just<br />
execute them as part of a higher-level IO procedure, it's also<br />
possible to just collect them without actual execution:<br />
<br />
<haskell><br />
main = do let a = sequence ioActions<br />
b = when True getChar<br />
c = getChar >> getChar<br />
putStr "These 'let' statements are not executed!"<br />
</haskell><br />
<br />
These assigned IO procedures can be used as parameters to other<br />
procedures, or written to global variables, or processed in some other<br />
way, or just executed later, as we did in the example with 'get2chars'.<br />
<br />
But how about returning a parameterized IO action from an IO procedure? Let's define a procedure that returns the i'th byte from a file represented as a Handle:<br />
<br />
<haskell><br />
readi h i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
</haskell><br />
<br />
So far so good. But how about a procedure that returns the i'th byte of a file<br />
with a given name without reopening it each time?<br />
<br />
<haskell><br />
readfilei :: String -> IO (Integer -> IO Char)<br />
readfilei name = do h <- openFile name ReadMode<br />
return (readi h)<br />
</haskell><br />
<br />
As you can see, it's an IO procedure that opens a file and returns...<br />
another IO procedure that will read the specified byte. But we can go<br />
further and include the 'readi' body in 'readfilei':<br />
<br />
<haskell><br />
readfilei name = do h <- openFile name ReadMode<br />
let readi h i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
return (readi h)<br />
</haskell><br />
<br />
That's a little better. But why do we add 'h' as a parameter to 'readi' if it can be obtained from the environment where 'readi' is now defined? An even shorter version is this:<br />
<br />
<haskell><br />
readfilei name = do h <- openFile name ReadMode<br />
let readi i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
return readi<br />
</haskell><br />
<br />
What have we done here? We've build a parameterized IO action involving local<br />
names inside 'readfilei' and returned it as the result. Now it can be<br />
used in the following way:<br />
<br />
<haskell><br />
main = do myfile <- readfilei "test"<br />
a <- myfile 0<br />
b <- myfile 1<br />
print (a,b)<br />
</haskell><br />
<br />
<br />
This way of using IO actions is very typical for Haskell programs - you<br />
just construct one or more IO actions that you need,<br />
with or without parameters, possibly involving the parameters that your<br />
"constructor" received, and return them to the caller. Then these IO actions<br />
can be used in the rest of the program without any knowledge about your<br />
internal implementation strategy. One thing this can be used for is to<br />
partially emulate the OOP (or more precisely, the ADT) programming paradigm.<br />
<br />
<br />
=== Example: a memory allocator generator ===<br />
<br />
As an example, one of my programs has a module which is a memory suballocator. It receives the address and size of a large memory block and returns two<br />
procedures - one to allocate a subblock of a given size and the other to<br />
free the allocated subblock:<br />
<br />
<haskell><br />
memoryAllocator :: Ptr a -> Int -> IO (Int -> IO (Ptr b),<br />
Ptr c -> IO ())<br />
<br />
memoryAllocator buf size = do ......<br />
let alloc size = do ...<br />
...<br />
free ptr = do ...<br />
...<br />
return (alloc, free)<br />
</haskell><br />
<br />
How this is implemented? 'alloc' and 'free' work with references<br />
created inside the memoryAllocator procedure. Because the creation of these references is a part of the memoryAllocator IO actions chain, a new independent set of references will be created for each memory block for which<br />
memoryAllocator is called:<br />
<br />
<haskell><br />
memoryAllocator buf size = do start <- newIORef buf<br />
end <- newIORef (buf `plusPtr` size)<br />
...<br />
</haskell><br />
<br />
These two references are read and written in the 'alloc' and 'free' definitions (we'll implement a very simple memory allocator for this example):<br />
<br />
<haskell><br />
...<br />
let alloc size = do addr <- readIORef start<br />
writeIORef start (addr `plusPtr` size)<br />
return addr<br />
<br />
let free ptr = do writeIORef start ptr<br />
</haskell><br />
<br />
What we've defined here is just a pair of closures that use state<br />
available at the moment of their definition. As you can see, it's as<br />
easy as in any other functional language, despite Haskell's lack<br />
of direct support for impure functions.<br />
<br />
The following example uses procedures, returned by memoryAllocator, to<br />
simultaneously allocate/free blocks in two independent memory buffers:<br />
<br />
<haskell><br />
main = do buf1 <- mallocBytes (2^16)<br />
buf2 <- mallocBytes (2^20)<br />
(alloc1, free1) <- memoryAllocator buf1 (2^16)<br />
(alloc2, free2) <- memoryAllocator buf2 (2^20)<br />
ptr11 <- alloc1 100<br />
ptr21 <- alloc2 1000<br />
free1 ptr11<br />
free2 ptr21<br />
ptr12 <- alloc1 100<br />
ptr22 <- alloc2 1000<br />
</haskell><br />
<br />
<br />
<br />
=== Example: emulating OOP with record types ===<br />
<br />
Let's implement the classical OOP example: drawing figures. There are<br />
figures of different types: circles, rectangles and so on. The task is<br />
to create a heterogeneous list of figures. All figures in this list should<br />
support the same set of operations: draw, move and so on. We will<br />
represent these operations as IO procedures. Instead of a "class" let's<br />
define a structure containing implementations of all the procedures<br />
required:<br />
<br />
<haskell><br />
data Figure = Figure { draw :: IO (),<br />
move :: Displacement -> IO ()<br />
}<br />
<br />
type Displacement = (Int, Int) -- horizontal and vertical displacement in points<br />
</haskell><br />
<br />
<br />
The constructor of each figure's type should just return a Figure record:<br />
<br />
<haskell><br />
circle :: Point -> Radius -> IO Figure<br />
rectangle :: Point -> Point -> IO Figure<br />
<br />
type Point = (Int, Int) -- point coordinates<br />
type Radius = Int -- circle radius in points<br />
</haskell><br />
<br />
<br />
We will "draw" figures by just printing their current parameters.<br />
Let's start with a simplified implementation of the 'circle' and 'rectangle'<br />
constructors, without actual 'move' support:<br />
<br />
<haskell><br />
circle center radius = do<br />
let description = " Circle at "++show center++" with radius "++show radius<br />
return $ Figure { draw = putStrLn description }<br />
<br />
rectangle from to = do<br />
let description = " Rectangle "++show from++"-"++show to)<br />
return $ Figure { draw = putStrLn description }<br />
</haskell><br />
<br />
<br />
As you see, each constructor just returns a fixed 'draw' procedure that prints<br />
parameters with which the concrete figure was created. Let's test it:<br />
<br />
<haskell><br />
drawAll :: [Figure] -> IO ()<br />
drawAll figures = do putStrLn "Drawing figures:"<br />
mapM_ draw figures<br />
<br />
main = do figures <- sequence [circle (10,10) 5,<br />
circle (20,20) 3,<br />
rectangle (10,10) (20,20),<br />
rectangle (15,15) (40,40)]<br />
drawAll figures<br />
</haskell><br />
<br />
<br />
Now let's define "full-featured" figures that can actually be<br />
moved around. In order to achieve this, we should provide each figure<br />
with a mutable variable that holds each figure's current screen location. The<br />
type of this variable will be "IORef Point". This variable should be created in the figure constructor and manipulated in IO procedures (closures) enclosed in<br />
the Figure record:<br />
<br />
<haskell><br />
circle center radius = do<br />
centerVar <- newIORef center<br />
<br />
let drawF = do center <- readIORef centerVar<br />
putStrLn (" Circle at "++show center<br />
++" with radius "++show radius)<br />
<br />
let moveF (addX,addY) = do (x,y) <- readIORef centerVar<br />
writeIORef centerVar (x+addX, y+addY)<br />
<br />
return $ Figure { draw=drawF, move=moveF }<br />
<br />
<br />
rectangle from to = do<br />
fromVar <- newIORef from<br />
toVar <- newIORef to<br />
<br />
let drawF = do from <- readIORef fromVar<br />
to <- readIORef toVar<br />
putStrLn (" Rectangle "++show from++"-"++show to)<br />
<br />
let moveF (addX,addY) = do (fromX,fromY) <- readIORef fromVar<br />
(toX,toY) <- readIORef toVar<br />
writeIORef fromVar (fromX+addX, fromY+addY)<br />
writeIORef toVar (toX+addX, toY+addY)<br />
<br />
return $ Figure { draw=drawF, move=moveF }<br />
</haskell><br />
<br />
<br />
Now we can test the code which moves figures around:<br />
<br />
<haskell><br />
main = do figures <- sequence [circle (10,10) 5,<br />
rectangle (10,10) (20,20)]<br />
drawAll figures<br />
mapM_ (\fig -> move fig (10,10)) figures<br />
drawAll figures<br />
</haskell><br />
<br />
<br />
It's important to realize that we are not limited to including only IO actions<br />
in a record that's intended to simulate a C++/Java-style interface. The record can also include values, IORefs, pure functions - in short, any type of data. For example, we can easily add to the Figure interface fields for area and origin:<br />
<br />
<haskell><br />
data Figure = Figure { draw :: IO (),<br />
move :: Displacement -> IO (),<br />
area :: Double,<br />
origin :: IORef Point<br />
}<br />
</haskell><br />
<br />
<br />
<br />
== Dark side of IO monad ==<br />
=== unsafePerformIO ===<br />
<br />
Programmers coming from an imperative language background often look for a way to execute IO actions inside a pure procedure. But what does this mean?<br />
Imagine that you're trying to write a procedure that reads the contents of a file with a given name, and you try to write it as a pure (non-IO) function:<br />
<br />
<haskell><br />
readContents :: Filename -> String<br />
</haskell><br />
<br />
Defining readContents as a pure function will certainly simplify the code that uses it. But it will also create problems for the compiler:<br />
<br />
# This call is not inserted in a sequence of "world transformations", so the compiler doesn't know at what exact moment you want to execute this action. For example, if the file has one kind of contents at the beginning of the program and another at the end - which contents do you want to see? You have no idea when (or even if) this function is going to get invoked, because Haskell sees this function as pure and feels free to reorder the execution of any or all pure functions as needed.<br />
# Attempts to read the contents of files with the same name can be factored (''i.e.'' reduced to a single call) despite the fact that the file (or the current directory) can be changed between calls. Again, Haskell considers all non-IO functions to be pure and feels free to omit multiple calls with the same parameters.<br />
<br />
So, implementing pure functions that interact with the Real World is<br />
considered to be Bad Behavior. Good boys and girls never do it ;)<br />
<br />
<br />
Nevertheless, there are (semi-official) ways to use IO actions inside<br />
of pure functions. As you should remember this is prohibited by<br />
requiring the RealWorld "baton" in order to call an IO action. Pure functions don't have the baton, but there is a special "magic" procedure that produces this baton from nowhere, uses it to call an IO action and then throws the resulting "world" away! It's a little low-level magic :) This very special (and dangerous) procedure is:<br />
<br />
<haskell><br />
unsafePerformIO :: IO a -> a<br />
</haskell><br />
<br />
Let's look at its (possible) definition:<br />
<br />
<haskell><br />
unsafePerformIO :: (RealWorld -> (a, RealWorld)) -> a<br />
unsafePerformIO action = let (a, world1) = action createNewWorld<br />
in a<br />
</haskell><br />
<br />
where 'createNewWorld' is an internal function producing a new value of<br />
the RealWorld type.<br />
<br />
Using unsafePerformIO, you can easily write pure functions that do<br />
I/O inside. But don't do this without a real need, and remember to<br />
follow this rule: the compiler doesn't know that you are cheating; it still<br />
considers each non-IO function to be a pure one. Therefore, all the usual<br />
optimization rules can (and will!) be applied to its execution. So<br />
you must ensure that:<br />
<br />
# The result of each call depends only on its arguments.<br />
# You don't rely on side-effects of this function, which may be not executed if its results are not needed.<br />
<br />
<br />
Let's investigate this problem more deeply. Function evaluation in Haskell<br />
is determined by a value's necessity - the language computes only the values that are really required to calculate the final result. But what does this mean with respect to the 'main' function? To "calculate the final world's" value, you need to perform all the intermediate IO actions that are included in the 'main' chain. By using 'unsafePerformIO' we call IO actions outside of this chain. What guarantee do we have that they will be run at all? None. The only time they will be run is if running them is required to compute the overall function result (which in turn should be required to perform some action in the<br />
'main' chain). This is an example of Haskell's evaluation-by-need strategy. Now you should clearly see the difference:<br />
<br />
- An IO action inside an IO procedure is guaranteed to execute as long as<br />
it is (directly or indirectly) inside the 'main' chain - even when its result isn't used (because the implicit "world" value it returns ''will'' be used). You directly specify the order of the action's execution inside the IO procedure. Data dependencies are simulated via the implicit "world" values that are passed from each IO action to the next.<br />
<br />
- An IO action inside 'unsafePerformIO' will be performed only if<br />
result of this operation is really used. The evaluation order is not<br />
guaranteed and you should not rely on it (except when you're sure about<br />
whatever data dependencies may exist).<br />
<br />
<br />
I should also say that inside 'unsafePerformIO' call you can organize<br />
a small internal chain of IO actions with the help of the same binding<br />
operators and/or 'do' syntactic sugar we've seen above. For example, here's a particularly convoluted way to compute the integer that comes after zero:<br />
<br />
<haskell><br />
one :: Int<br />
one = unsafePerformIO $ do var <- newIORef 0<br />
modifyIORef var (+1)<br />
readIORef var<br />
</haskell><br />
<br />
and in this case ALL the operations in this chain will be performed as<br />
long as the result of the 'unsafePerformIO' call is needed. To ensure this,<br />
the actual 'unsafePerformIO' implementation evaluates the "world" returned<br />
by the 'action':<br />
<br />
<haskell><br />
unsafePerformIO action = let (a,world1) = action createNewWorld<br />
in (world1 `seq` a)<br />
</haskell><br />
<br />
(The 'seq' operation strictly evaluates its first argument before<br />
returning the value of the second one).<br />
<br />
<br />
=== inlinePerformIO ===<br />
<br />
inlinePerformIO has the same definition as unsafePerformIO but with addition of INLINE pragma:<br />
<haskell><br />
-- | Just like unsafePerformIO, but we inline it. Big performance gains as<br />
-- it exposes lots of things to further inlining<br />
{-# INLINE inlinePerformIO #-}<br />
inlinePerformIO action = let (a, world1) = action createNewWorld<br />
in (world1 `seq` a)<br />
#endif<br />
</haskell><br />
<br />
Semantically inlinePerformIO = unsafePerformIO<br />
in as much as either of those have any semantics at all.<br />
<br />
The difference of course is that inlinePerformIO is even less safe than<br />
unsafePerformIO. While ghc will try not to duplicate or common up<br />
different uses of unsafePerformIO, we aggressively inline<br />
inlinePerformIO. So you can really only use it where the IO content is<br />
really properly pure, like reading from an immutable memory buffer (as<br />
in the case of ByteStrings). However things like allocating new buffers<br />
should not be done inside inlinePerformIO since that can easily be<br />
floated out and performed just once for the whole program, so you end up<br />
with many things sharing the same buffer, which would be bad.<br />
<br />
So the rule of thumb is that IO things wrapped in unsafePerformIO have<br />
to be externally pure while with inlinePerformIO it has to be really<br />
really pure or it'll all go horribly wrong.<br />
<br />
That said, here's some really hairy code. This should frighten any pure<br />
functional programmer...<br />
<br />
<haskell><br />
write :: Int -> (Ptr Word8 -> IO ()) -> Put ()<br />
write !n body = Put $ \c buf@(Buffer fp o u l) -><br />
if n <= l<br />
then write' c fp o u l<br />
else write' (flushOld c n fp o u) (newBuffer c n) 0 0 0<br />
<br />
where {-# NOINLINE write' #-}<br />
write' c !fp !o !u !l =<br />
-- warning: this is a tad hardcore<br />
inlinePerformIO<br />
(withForeignPtr fp<br />
(\p -> body $! (p `plusPtr` (o+u))))<br />
`seq` c () (Buffer fp o (u+n) (l-n))<br />
</haskell><br />
<br />
it's used like:<br />
<haskell><br />
word8 w = write 1 (\p -> poke p w)<br />
</haskell><br />
<br />
This does not adhere to my rule of thumb above. Don't ask exactly why we<br />
claim it's safe :-) (and if anyone really wants to know, ask Ross<br />
Paterson who did it first in the Builder monoid)<br />
<br />
=== unsafeInterleaveIO ===<br />
<br />
But there is an even stranger operation called 'unsafeInterleaveIO' that<br />
gets the "official baton", makes its own pirate copy, and then runs<br />
an "illegal" relay-race in parallel with the main one! I can't talk further<br />
about its behavior without causing grief and indignation, so it's no surprise<br />
that this operation is widely used in countries that are hotbeds of software piracy such as Russia and China! ;) Don't even ask me - I won't say anything more about this dirty trick I use all the time ;)<br />
<br />
One can use unsafePerformIO (not unsafeInterleaveIO) to perform I/O<br />
operations not in predefined order but by demand. For example, the<br />
following code:<br />
<br />
<haskell><br />
do let c = unsafePerformIO getChar<br />
do_proc c<br />
</haskell><br />
<br />
will perform getChar I/O call only when value of c is really required<br />
by code, i.e. it this call will be performed lazily as any usual<br />
Haskell computation.<br />
<br />
Now imagine the following code:<br />
<br />
<haskell><br />
do let s = [unsafePerformIO getChar, unsafePerformIO getChar, unsafePerformIO getChar]<br />
do_proc s<br />
</haskell><br />
<br />
Three chars inside this list will be computed on demand too, and this<br />
means that their values will depend on the order they are consumed. It<br />
is not that we usually need :)<br />
<br />
<br />
unsafeInterleaveIO solves this problem - it performs I/O only on<br />
demand but allows to define exact *internal* execution order for parts<br />
of your datastructure. It is why I wrote that unsafeInterleaveIO makes<br />
illegal copy of baton :)<br />
<br />
First, unsafeInterleaveIO has (IO a) action as a parameter and returns<br />
value of type 'a':<br />
<br />
<haskell><br />
do str <- unsafeInterleaveIO myGetContents<br />
</haskell><br />
<br />
Second, unsafeInterleaveIO don't perform any action immediately, it<br />
only creates a box of type 'a' which on requesting this value will<br />
perform action specified as a parameter.<br />
<br />
Third, this action by itself may compute the whole value immediately<br />
or... use unsafeInterleaveIO again to defer calculation of some<br />
sub-components:<br />
<br />
<haskell><br />
myGetContents = do<br />
c <- getChar<br />
s <- unsafeInterleaveIO myGetContents<br />
return (c:s)<br />
</haskell><br />
<br />
This code will be executed only at the moment when value of str is<br />
really demanded. In this moment, getChar will be performed (with<br />
result assigned to c) and one more lazy IO box will be created - for s.<br />
This box again contains link to the myGetContents call<br />
<br />
Then, list cell returned that contains one char read and link to<br />
myGetContents call as a way to compute rest of the list. Only at the<br />
moment when next value in list required, this operation will be<br />
performed again<br />
<br />
As a final result, we get inability to read second char in list before<br />
first one, but lazy character of reading in whole. bingo!<br />
<br />
<br />
PS: of course, actual code should include EOF checking. also note that<br />
you can read many chars/records at each call:<br />
<br />
<haskell><br />
myGetContents = do<br />
c <- replicateM 512 getChar<br />
s <- unsafeInterleaveIO myGetContents<br />
return (c++s)<br />
</haskell><br />
<br />
== Welcome to the machine: the actual [[GHC]] implementation ==<br />
<br />
A little disclaimer: I should say that I'm not describing<br />
here exactly what a monad is (I don't even completely understand it myself) and my explanation shows only one _possible_ way to implement the IO monad in<br />
Haskell. For example, the hbc Haskell compiler implements IO monad via<br />
continuations. I also haven't said anything about exception handling,<br />
which is a natural part of the "monad" concept. You can read the "All About<br />
Monads" guide to learn more about these topics.<br />
<br />
But there is some good news: first, the IO monad understanding you've just acquired will work with any implementation and with many other monads. You just can't work with RealWorld<br />
values directly.<br />
<br />
Second, the IO monad implementation described here is really used in the GHC,<br />
yhc/nhc (Hugs/jhc, too?) compilers. Here is the actual IO definition<br />
from the GHC sources:<br />
<br />
<haskell><br />
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))<br />
</haskell><br />
<br />
It uses the "State# RealWorld" type instead of our RealWorld, it uses the "(# #)" strict tuple for optimization, and it adds an IO data constructor<br />
around the type. Nevertheless, there are no significant changes from the standpoint of our explanation. Knowing the principle of "chaining" IO actions via fake "state of the world" values, you can now easily understand and write low-level implementations of GHC I/O operations.<br />
<br />
<br />
=== The [[Yhc]]/nhc98 implementation ===<br />
<br />
<haskell><br />
data World = World<br />
newtype IO a = IO (World -> Either IOError a)<br />
</haskell><br />
<br />
This implementation makes the "World" disappear somewhat, and returns Either a<br />
result of type "a", or if an error occurs then "IOError". The lack of the World on the right-hand side of the function can only be done because the compiler knows special things about the IO type, and won't overoptimise it.<br />
<br />
<br />
== Further reading ==<br />
<br />
This tutorial is largely based on the Simon Peyton Jones' paper [http://research.microsoft.com/%7Esimonpj/Papers/marktoberdorf Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell]. I hope that my tutorial improves his original explanation of the Haskell I/O system and brings it closer to the point of view of beginning Haskell programmers. But if you need to learn about concurrency, exceptions and FFI in Haskell/GHC, the original paper is the best source of information.<br />
<br />
You can find more information about concurrency, FFI and STM at the [[GHC/Concurrency#Starting points]] page.<br />
<br />
The [[Arrays]] page contains exhaustive explanations about using mutable arrays.<br />
<br />
Look also at the [[Books and tutorials#Using Monads]] page, which contains tutorials and papers really describing these mysterious monads :)<br />
<br />
An explanation of the basic monad functions, with examples, can be found in the reference guide [http://members.chello.nl/hjgtuyl/tourdemonad.html A tour of the Haskell Monad functions], by Henk-Jan van Tuyl.<br />
<br />
Do you have more questions? Ask in the [http://www.haskell.org/mailman/listinfo/haskell-cafe haskell-cafe mailing list].<br />
<br />
<br />
== To-do list ==<br />
<br />
If you are interested in adding more information to this manual, please add your questions/topics here.<br />
<br />
Topics:<br />
* fixIO and 'mdo'<br />
* ST monad<br />
* Q monad<br />
<br />
Questions:<br />
* split '>>='/'>>'/return section and 'do' section, more examples of using binding operators<br />
* IORef detailed explanation (==const*), usage examples, syntax sugar, unboxed refs<br />
* control structures developing - much more examples<br />
* unsafePerformIO usage examples: global variable, ByteString, other examples<br />
* actual GHC implementation - how to write low-level routines on example of newIORef implementation<br />
<br />
This manual is collective work, so feel free to add more information to it yourself. The final goal is to collectively develop a comprehensive manual for using the IO monad.<br />
<br />
----<br />
<br />
[[Category:Tutorials]]</div>Virhttps://wiki.haskell.org/index.php?title=IO_inside&diff=15433IO inside2007-09-06T11:53:19Z<p>Vir: </p>
<hr />
<div>Haskell I/O has always been a source of confusion and surprises for new Haskellers. While simple I/O code in Haskell looks very similar to its equivalents in imperative languages, attempts to write somewhat more complex code often result in a total mess. This is because Haskell I/O is really very different internally. Haskell is a pure language and even the I/O system can't break this purity.<br />
<br />
The following text is an attempt to explain the details of Haskell I/O implementations. This explanation should help you eventually master all the smart I/O tricks. Moreover, I've added a detailed explanation of various traps you might encounter along the way. After reading this text, you will receive a "Master of Haskell I/O" degree that is equal to a Bachelor in Computer Science and Mathematics, simultaneously :)<br />
<br />
If you are new to Haskell I/O you may prefer to start by reading the [[Introduction to IO]] page.<br />
<br />
<br />
== Haskell is a pure language ==<br />
<br />
Haskell is a pure language, which means that the result of any function call is fully determined by its arguments. Pseudo-functions like rand() or getchar() in C, which return different results on each call, are simply impossible to write in Haskell. Moreover, Haskell functions can't have side effects, which means that they can't effect any changes to the "real world", like changing files, writing to the screen, printing, sending data over the network, and so on. These two restrictions together mean that any function<br />
call can be omitted, repeated, or replaced by the result of a previous call with the same parameters, and the language '''guarantees''' that all these rearrangements will not change the program result!<br />
<br />
Let's compare this to C: optimizing C compilers try to guess which functions have no side effects and don't depend on mutable global variables. If this guess is wrong, an optimization can change the program's semantics! To avoid this kind of disaster, C optimizers are conservative in their guesses or require hints from the programmer about the purity of functions.<br />
<br />
Compared to an optimizing C compiler, a Haskell compiler is a set of pure mathematical transformations. This results in much better high-level optimization facilities. Moreover, pure mathematical computations can be much more easily divided into several threads that may be executed in parallel, which is increasingly important in these days of multi-core CPUs. Finally, pure computations are less error-prone and easier to verify, which adds to Haskell's robustness and to the speed of program development using Haskell.<br />
<br />
Haskell purity allows compiler to call only functions whose results<br />
are really required to calculate final value of high-level function<br />
(i.e., main) - this is called lazy evaluation. It's great thing for<br />
pure mathematical computations, but how about I/O actions? Function<br />
like (<hask>putStrLn "Press any key to begin formatting"</hask>) can't return any<br />
meaningful result value, so how can we ensure that compiler will not<br />
omit or reorder its execution? And in general: how we can work with<br />
stateful algorithms and side effects in an entirely lazy language?<br />
This question has had many different solutions proposed in 18 years of<br />
Haskell development (see [[History of Haskell]]), though a solution based on '''''monads''''' is now<br />
the standard.<br />
<br />
<br />
<br />
== What is a monad? ==<br />
<br />
What is a monad? It's something from mathematical category theory, which I<br />
don't know anymore :) In order to understand how monads are used to<br />
solve the problem of I/O and side effects, you don't need to know it. It's<br />
enough to just know elementary mathematics, like I do :)<br />
<br />
Let's imagine that we want to implement in Haskell the well-known<br />
'getchar' function. What type should it have? Let's try:<br />
<br />
<haskell><br />
getchar :: Char<br />
<br />
get2chars = [getchar,getchar]<br />
</haskell><br />
<br />
What will we get with 'getchar' having just the 'Char' type? You can see<br />
all the possible problems in the definition of 'get2chars':<br />
<br />
# Because the Haskell compiler treats all functions as pure (not having side effects), it can avoid "excessive" calls to 'getchar' and use one returned value twice.<br />
# Even if it does make two calls, there is no way to determine which call should be performed first. Do you want to return the two chars in the order in which they were read, or in the opposite order? Nothing in the definition of 'get2chars' answers this question.<br />
<br />
How can these problems be solved, from the programmer's viewpoint?<br />
Let's introduce a fake parameter of 'getchar' to make each call<br />
"different" from the compiler's point of view:<br />
<br />
<haskell><br />
getchar :: Int -> Char<br />
<br />
get2chars = [getchar 1, getchar 2]<br />
</haskell><br />
<br />
Right away, this solves the first problem mentioned above - now the<br />
compiler will make two calls because it sees them as having different<br />
parameters. The whole 'get2chars' function should also have a<br />
fake parameter, otherwise we will have the same problem calling it:<br />
<br />
<haskell><br />
getchar :: Int -> Char<br />
get2chars :: Int -> String<br />
<br />
get2chars _ = [getchar 1, getchar 2]<br />
</haskell><br />
<br />
<br />
Now we need to give the compiler some clue to determine which function it<br />
should call first. The Haskell language doesn't provide any way to express<br />
order of evaluation... except for data dependencies! How about adding an<br />
artificial data dependency which prevents evaluation of the second<br />
'getchar' before the first one? In order to achieve this, we will<br />
return an additional fake result from 'getchar' that will be used as a<br />
parameter for the next 'getchar' call:<br />
<br />
<haskell><br />
getchar :: Int -> (Char, Int)<br />
<br />
get2chars _ = [a,b] where (a,i) = getchar 1<br />
(b,_) = getchar i<br />
</haskell><br />
<br />
So far so good - now we can guarantee that 'a' is read before 'b'<br />
because reading 'b' needs the value ('i') that is returned by reading 'a'!<br />
<br />
We've added a fake parameter to 'get2chars' but the problem is that the<br />
Haskell compiler is too smart! It can believe that the external 'getchar'<br />
function is really dependent on its parameter but for 'get2chars' it<br />
will see that we're just cheating because we throw it away! Therefore it won't feel obliged to execute the calls in the order we want. How can we fix this? How about passing this fake parameter to the 'getchar' function?! In this case<br />
the compiler can't guess that it is really unused :)<br />
<br />
<haskell><br />
get2chars i0 = [a,b] where (a,i1) = getchar i0<br />
(b,i2) = getchar i1<br />
</haskell><br />
<br />
<br />
And more - 'get2chars' has all the same purity problems as the 'getchar'<br />
function. If you need to call it two times, you need a way to describe<br />
the order of these calls. Look at:<br />
<br />
<haskell><br />
get4chars = [get2chars 1, get2chars 2] -- order of 'get2chars' calls isn't defined<br />
</haskell><br />
<br />
We already know how to deal with these problems - 'get2chars' should<br />
also return some fake value that can be used to order calls:<br />
<br />
<haskell><br />
get2chars :: Int -> (String, Int)<br />
<br />
get4chars i0 = (a++b) where (a,i1) = get2chars i0<br />
(b,i2) = get2chars i1<br />
</haskell><br />
<br />
<br />
But what's the fake value 'get2chars' should return? If we use some integer constant, the excessively-smart Haskell compiler will guess that we're cheating again :) What about returning the value returned by 'getchar'? See:<br />
<br />
<haskell><br />
get2chars :: Int -> (String, Int)<br />
get2chars i0 = ([a,b], i2) where (a,i1) = getchar i0<br />
(b,i2) = getchar i1<br />
</haskell><br />
<br />
Believe it or not, but we've just constructed the whole "monadic"<br />
Haskell I/O system.<br />
<br />
<br />
<br />
== Welcome to the RealWorld, baby :) ==<br />
<br />
The 'main' Haskell function has the type:<br />
<br />
<haskell><br />
main :: RealWorld -> ((), RealWorld)<br />
</haskell><br />
<br />
where 'RealWorld' is a fake type used instead of our Int. It's something<br />
like the baton passed in a relay race. When 'main' calls some IO function,<br />
it passes the "RealWorld" it received as a parameter. All IO functions have<br />
similar types involving RealWorld as a parameter and result. To be<br />
exact, "IO" is a type synonym defined in the following way:<br />
<br />
<haskell><br />
type IO a = RealWorld -> (a, RealWorld)<br />
</haskell><br />
<br />
So, 'main' just has type "IO ()", 'getChar' has type "IO Char" and so<br />
on. You can think of the type "IO Char" as meaning "take the current RealWorld, do something to it, and return a Char and a (possibly changed) RealWorld". Let's look at 'main' calling 'getChar' two times:<br />
<br />
<haskell><br />
getChar :: RealWorld -> (Char, RealWorld)<br />
<br />
main :: RealWorld -> ((), RealWorld)<br />
main world0 = let (a, world1) = getChar world0<br />
(b, world2) = getChar world1<br />
in ((), world2)<br />
</haskell><br />
<br />
<br />
Look at this closely: 'main' passes to first 'getChar' the "world" it<br />
received. This 'getChar' returns some new value of type RealWorld<br />
that gets used in the next call. Finally, 'main' returns the "world" it got<br />
from the second 'getChar'.<br />
<br />
# Is it possible here to omit any call of 'getChar' if the Char it read is not used? No, because we need to return the "world" that is the result of the second 'getChar' and this in turn requires the "world" returned from the first 'getChar'.<br />
# Is it possible to reorder the 'getChar' calls? No: the second 'getChar' can't be called before the first one because it uses the "world" returned from the first call.<br />
# Is it possible to duplicate calls? In Haskell semantics - yes, but real compilers never duplicate work in such simple cases (otherwise, the programs generated will not have any speed guarantees).<br />
<br />
<br />
As we already said, RealWorld values are used like a baton which gets passed<br />
between all routines called by 'main' in strict order. Inside each<br />
routine called, RealWorld values are used in the same way. Overall, in<br />
order to "compute" the world to be returned from 'main', we should perform<br />
each IO procedure that is called from 'main', directly or indirectly.<br />
This means that each procedure inserted in the chain will be performed<br />
just at the moment (relative to the other IO actions) when we intended it<br />
to be called. Let's consider the following program:<br />
<br />
<haskell><br />
main = do a <- ask "What is your name?"<br />
b <- ask "How old are you?"<br />
return ()<br />
<br />
ask s = do putStr s<br />
readLn<br />
</haskell><br />
<br />
Now you have enough knowledge to rewrite it in a low-level way and<br />
check that each operation that should be performed will really be<br />
performed with the arguments it should have and in the order we expect.<br />
<br />
<br />
But what about conditional execution? No problem. Let's define the<br />
well-known 'when' operation:<br />
<br />
<haskell><br />
when :: Bool -> IO () -> IO ()<br />
when condition action world =<br />
if condition<br />
then action world<br />
else ((), world)<br />
</haskell><br />
<br />
As you can see, we can easily include or exclude from the execution chain<br />
IO procedures (actions) depending on the data values. If 'condition'<br />
will be False on the call of 'when', 'action' will never be called because<br />
real Haskell compilers, again, never call functions whose results<br />
are not required to calculate the final result (''i.e.'', here, the final "world" value of 'main').<br />
<br />
Loops and more complex control structures can be implemented in<br />
the same way. Try it as an exercise!<br />
<br />
<br />
Finally, you may want to know how much passing these RealWorld<br />
values around the program costs. It's free! These fake values exist solely for the compiler while it analyzes and optimizes the code, but when it gets to assembly code generation, it "suddenly" realize that this type is like "()", so<br />
all these parameters and result values can be omitted from the final generated code. Isn't it beautiful? :)<br />
<br />
<br />
<br />
== '>>=' and 'do' notation ==<br />
<br />
All beginners (including me :)) start by thinking that 'do' is some<br />
magic statement that executes IO actions. That's wrong - 'do' is just<br />
syntactic sugar that simplifies the writing of procedures that use IO (and also other monads, but that's beyond the scope of this tutorial). 'do' notation eventually gets translated to statements passing "world" values around like we've manually written above and is used to simplify the gluing of several<br />
IO actions together. You don't need to use 'do' for just one statement; for instance,<br />
<br />
<haskell><br />
main = do putStr "Hello!"<br />
</haskell><br />
<br />
is desugared to:<br />
<br />
<haskell><br />
main = putStr "Hello!"<br />
</haskell><br />
<br />
But nevertheless it's considered Good Style to use 'do' even for one statement<br />
because it simplifies adding new statements in the future.<br />
<br />
<br />
Let's examine how to desugar a 'do' with multiple statements in the<br />
following example: <br />
<br />
<haskell><br />
main = do putStr "What is your name?"<br />
putStr "How old are you?"<br />
putStr "Nice day!"<br />
</haskell><br />
<br />
The 'do' statement here just joins several IO actions that should be<br />
performed sequentially. It's translated to sequential applications<br />
of one of the so-called "binding operators", namely '>>':<br />
<br />
<haskell><br />
main = (putStr "What is your name?")<br />
>> ( (putStr "How old are you?")<br />
>> (putStr "Nice day!")<br />
)<br />
</haskell><br />
<br />
This binding operator just combines two IO actions, executing them<br />
sequentially by passing the "world" between them:<br />
<br />
<haskell><br />
(>>) :: IO a -> IO b -> IO b<br />
(action1 >> action2) world0 =<br />
let (a, world1) = action1 world0<br />
(b, world2) = action2 world1<br />
in (b, world2)<br />
</haskell><br />
<br />
If defining operators this way looks strange to you, read this<br />
definition as follows:<br />
<br />
<haskell><br />
action1 >> action2 = action<br />
where<br />
action world0 = let (a, world1) = action1 world0<br />
(b, world2) = action2 world1<br />
in (b, world2)<br />
</haskell><br />
<br />
Now you can substitute the definition of '>>' at the places of its usage<br />
and check that program constructed by the 'do' desugaring is actually the<br />
same as we could write by manually manipulating "world" values.<br />
<br />
<br />
A more complex example involves the binding of variables using "<-":<br />
<br />
<haskell><br />
main = do a <- readLn<br />
print a<br />
</haskell><br />
<br />
This code is desugared into:<br />
<br />
<haskell><br />
main = readLn<br />
>>= (\a -> print a)<br />
</haskell><br />
<br />
As you should remember, the '>>' binding operator silently ignores<br />
the value of its first action and returns as an overall result<br />
the result of its second action only. On the other hand, the '>>=' binding operator (note the extra '=' at the end) allows us to use the result of its first action - it gets passed as an additional parameter to the second one! Look at the definition:<br />
<br />
<haskell><br />
(>>=) :: IO a -> (a -> IO b) -> IO b<br />
(action1 >>= action2) world0 =<br />
let (a, world1) = action1 world0<br />
(b, world2) = action2 a world1<br />
in (b, world2)<br />
</haskell><br />
<br />
First, what does the type of the second "action" (more precisely, a function which returns an IO action), namely "a -> IO b", mean? By<br />
substituting the "IO" definition, we get "a -> RealWorld -> (b, RealWorld)".<br />
This means that second action actually has two parameters<br />
- the type 'a' actually used inside it, and the value of type RealWorld used for sequencing of IO actions. That's always the case - any IO procedure has one<br />
more parameter compared to what you see in its type signature. This<br />
parameter is hidden inside the definition of the type alias "IO".<br />
<br />
Second, you can use these '>>' and '>>=' operations to simplify your<br />
program. For example, in the code above we don't need to introduce the<br />
variable, because the result of 'readLn' can be send directly to 'print':<br />
<br />
<haskell><br />
main = readLn >>= print<br />
</haskell><br />
<br />
<br />
And third - as you see, the notation:<br />
<br />
<haskell><br />
do x <- action1<br />
action2<br />
</haskell><br />
<br />
where 'action1' has type "IO a" and 'action2' has type "IO b",<br />
translates into:<br />
<br />
<haskell><br />
action1 >>= (\x -> action2)<br />
</haskell><br />
<br />
where the second argument of '>>=' has the type "a -> IO b". It's the way<br />
the '<-' binding is processed - the name on the left-hand side of '<-' just becomes a parameter of subsequent operations represented as one large IO action. Note also that if 'action1' has type "IO a" then 'x' will just have type "a"; you can think of the effect of '<-' as "unpacking" the IO value of 'action1' into 'x'. Note also that '<-' is not a true operator; it's pure syntax, just like 'do' itself. Its meaning results only from the way it gets desugared.<br />
<br />
Look at the next example: <br />
<br />
<haskell><br />
main = do putStr "What is your name?"<br />
a <- readLn<br />
putStr "How old are you?"<br />
b <- readLn<br />
print (a,b)<br />
</haskell><br />
<br />
This code is desugared into:<br />
<br />
<haskell><br />
main = putStr "What is your name?"<br />
>> readLn<br />
>>= \a -> putStr "How old are you?"<br />
>> readLn<br />
>>= \b -> print (a,b)<br />
</haskell><br />
<br />
I omitted the parentheses here; both the '>>' and the '>>=' operators are<br />
left-associative, but lambda-bindings always stretches as far to the right as possible, which means that the 'a' and 'b' bindings introduced<br />
here are valid for all remaining actions. As an exercise, add the<br />
parentheses yourself and translate this procedure into the low-level<br />
code that explicitly passes "world" values. I think it should be enough to help you finally realize how the 'do' translation and binding operators work.<br />
<br />
<br />
Oh, no! I forgot the third monadic operator - 'return'. It just<br />
combines its two parameters - the value passed and "world":<br />
<br />
<haskell><br />
return :: a -> IO a<br />
return a world0 = (a, world0)<br />
</haskell><br />
<br />
How about translating a simple example of 'return' usage? Say,<br />
<br />
<haskell><br />
main = do a <- readLn<br />
return (a*2)<br />
</haskell><br />
<br />
<br />
Programmers with an imperative language background often think that<br />
'return' in Haskell, as in other languages, immediately returns from<br />
the IO procedure. As you can see in its definition (and even just from its<br />
type!), such an assumption is totally wrong. The only purpose of using<br />
'return' is to "lift" some value (of type 'a') into the result of<br />
a whole action (of type "IO a") and therefore it should generally be used only as the last executed statement of some IO sequence. For example try to<br />
translate the following procedure into the corresponding low-level code:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
when (a>=0) $ do<br />
return ()<br />
print "a is negative"<br />
</haskell><br />
<br />
and you will realize that the 'print' statement is executed even for non-negative values of 'a'. If you need to escape from the middle of an IO procedure, you can use the 'if' statement:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
if (a>=0)<br />
then return ()<br />
else print "a is negative"<br />
</haskell><br />
<br />
Moreover, Haskell layout rules allow us to use the following layout:<br />
<br />
<haskell><br />
main = do a <- readLn<br />
if (a>=0) then return ()<br />
else do<br />
print "a is negative"<br />
...<br />
</haskell><br />
<br />
that may be useful for escaping from the middle of a longish 'do' statement.<br />
<br />
<br />
Last exercise: implement a function 'liftM' that lifts operations on<br />
plain values to the operations on monadic ones. Its type signature:<br />
<br />
<haskell><br />
liftM :: (a -> b) -> (IO a -> IO b)<br />
</haskell><br />
<br />
If that's too hard for you, start with the following high-level<br />
definition and rewrite it in low-level fashion:<br />
<br />
<haskell><br />
liftM f action = do x <- action<br />
return (f x)<br />
</haskell><br />
<br />
<br />
<br />
== Mutable data (references, arrays, hash tables...) ==<br />
<br />
As you should know, every name in Haskell is bound to one fixed (immutable) value. This greatly simplifies understanding algorithms and code optimization, but it's inappropriate in some cases. As we all know, there are plenty of algorithms that are simpler to implement in terms of updatable<br />
variables, arrays and so on. This means that the value associated with<br />
a variable, for example, can be different at different execution points,<br />
so reading its value can't be considered as a pure function. Imagine,<br />
for example, the following code:<br />
<br />
<haskell><br />
main = do let a0 = readVariable varA<br />
_ = writeVariable varA 1<br />
a1 = readVariable varA<br />
print (a0, a1)<br />
</haskell><br />
<br />
Does this look strange? First, the two calls to 'readVariable' look the same, so the compiler can just reuse the value returned by the first call. Second,<br />
the result of the 'writeVariable' call isn't used so the compiler can (and will!) omit this call completely. To complete the picture, these three calls may be rearranged in any order because they appear to be independent of each<br />
other. This is obviously not what was intended. What's the solution? You already know this - use IO actions! Using IO actions guarantees that:<br />
<br />
# the execution order will be retained as written<br />
# each action will have to be executed<br />
# the result of the "same" action (such as "readVariable varA") will not be reused<br />
<br />
So, the code above really should be written as:<br />
<br />
<haskell><br />
main = do varA <- newIORef 0 -- Create and initialize a new variable<br />
a0 <- readIORef varA<br />
writeIORef varA 1<br />
a1 <- readIORef varA<br />
print (a0, a1)<br />
</haskell><br />
<br />
Here, 'varA' has the type "IORef Int" which means "a variable (reference) in<br />
the IO monad holding a value of type Int". newIORef creates a new variable<br />
(reference) and returns it, and then read/write actions use this<br />
reference. The value returned by the "readIORef varA" action depends not<br />
only on the variable involved but also on the moment this operation is performed so it can return different values on each call.<br />
<br />
Arrays, hash tables and any other _mutable_ data structures are<br />
defined in the same way - for each of them, there's an operation that creates new "mutable values" and returns a reference to it. Then special read and write<br />
operations in the IO monad are used. The following code shows an example<br />
using mutable arrays:<br />
<br />
<haskell><br />
import Data.Array.IO<br />
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)<br />
a <- readArray arr 1<br />
writeArray arr 1 64<br />
b <- readArray arr 1<br />
print (a, b)<br />
</haskell><br />
<br />
Here, an array of 10 elements with 37 as the initial value at each location is created. After reading the value of the first element (index 1) into 'a' this element's value is changed to 64 and then read again into 'b'. As you can see by executing this code, 'a' will be set to 37 and 'b' to 64.<br />
<br />
<br />
<br />
Other state-dependent operations are also often implemented as IO<br />
actions. For example, a random number generator should return a different<br />
value on each call. It looks natural to give it a type involving IO:<br />
<br />
<haskell><br />
rand :: IO Int<br />
</haskell><br />
<br />
Moreover, when you import C routines you should be careful - if this<br />
routine is impure, i.e. its result depends on something in the "real<br />
world" (file system, memory contents...), internal state and so on,<br />
you should give it an IO type. Otherwise, the compiler can<br />
"optimize" repetitive calls of this procedure with the same parameters! :)<br />
<br />
For example, we can write a non-IO type for:<br />
<br />
<haskell><br />
foreign import ccall<br />
sin :: Double -> Double<br />
</haskell><br />
<br />
because the result of 'sin' depends only on its argument, but<br />
<br />
<haskell><br />
foreign import ccall<br />
tell :: Int -> IO Int<br />
</haskell><br />
<br />
If you will declare 'tell' as a pure function (without IO) then you may<br />
get the same position on each call! :)<br />
<br />
== IO actions as values ==<br />
<br />
By this point you should understand why it's impossible to use IO<br />
actions inside non-IO (pure) procedures. Such procedures just don't<br />
get a "baton"; they don't know any "world" value to pass to an IO action.<br />
The RealWorld type is an abstract datatype, so pure functions also can't construct RealWorld values by themselves, and it's a strict type, so 'undefined' also can't be used. So, the prohibition of using IO actions inside pure procedures is just a type system trick (as it usually is in Haskell :)).<br />
<br />
But while pure code can't _execute_ IO actions, it can work with them<br />
as with any other functional values - they can be stored in data<br />
structures, passed as parameters, returned as results, collected in<br />
lists, and partially applied. But an IO action will remain a<br />
functional value because we can't apply it to the last argument - of<br />
type RealWorld.<br />
<br />
In order to _execute_ the IO action we need to apply it to some<br />
RealWorld value. That can be done only inside some IO procedure,<br />
in its "actions chain". And real execution of this action will take<br />
place only when this procedure is called as part of the process of<br />
"calculating the final value of world" for 'main'. Look at this example:<br />
<br />
<haskell><br />
main world0 = let get2chars = getChar >> getChar<br />
((), world1) = putStr "Press two keys" world0<br />
(answer, world2) = get2chars world1<br />
in ((), world2)<br />
</haskell><br />
<br />
Here we first bind a value to 'get2chars' and then write a binding<br />
involving 'putStr'. But what's the execution order? It's not defined<br />
by the order of the 'let' bindings, it's defined by the order of processing<br />
"world" values! You can arbitrarily reorder the binding statements - the execution order will be defined by the data dependency with respect to the <br />
"world" values that get passed around. Let's see what this 'main' looks like in the 'do' notation:<br />
<br />
<haskell><br />
main = do let get2chars = getChar >> getChar<br />
putStr "Press two keys"<br />
get2chars<br />
return ()<br />
</haskell><br />
<br />
As you can see, we've eliminated two of the 'let' bindings and left only the one defining 'get2chars'. The non-'let' statements are executed in the exact order in which they're written, because they pass the "world" value from statement to statement as we described above. Thus, this version of the function is much easier to understand because we don't have to mentally figure out the data dependency of the "world" value.<br />
<br />
Moreover, IO actions like 'get2chars' can't be executed directly<br />
because they are functions with a RealWorld parameter. To execute them,<br />
we need to supply the RealWorld parameter, i.e. insert them in the 'main'<br />
chain, placing them in some 'do' sequence executed from 'main' (either directly in the 'main' function, or indirectly in an IO function called from 'main'). Until that's done, they will remain like any function, in partially<br />
evaluated form. And we can work with IO actions as with any other<br />
functions - bind them to names (as we did above), save them in data<br />
structures, pass them as function parameters and return them as results - and<br />
they won't be performed until you give them the magic RealWorld<br />
parameter!<br />
<br />
<br />
<br />
=== Example: a list of IO actions ===<br />
<br />
Let's try defining a list of IO actions:<br />
<br />
<haskell><br />
ioActions :: [IO ()]<br />
ioActions = [(print "Hello!"),<br />
(putStr "just kidding"),<br />
(getChar >> return ())<br />
]<br />
</haskell><br />
<br />
I used additional parentheses around each action, although they aren't really required. If you still can't believe that these actions won't be executed immediately, just recall the real type of this list:<br />
<br />
<haskell><br />
ioActions :: [RealWorld -> ((), RealWorld)]<br />
</haskell><br />
<br />
Well, now we want to execute some of these actions. No problem, just<br />
insert them into the 'main' chain:<br />
<br />
<haskell><br />
main = do head ioActions<br />
ioActions !! 1<br />
last ioActions<br />
</haskell><br />
<br />
Looks strange, right? :) Really, any IO action that you write in a 'do'<br />
statement (or use as a parameter for the '>>'/'>>=' operators) is an expression<br />
returning a result of type 'IO a' for some type 'a'. Typically, you use some function that has the type 'x -> y -> ... -> IO a' and provide all the x, y, etc. parameters. But you're not limited to this standard scenario -<br />
don't forget that Haskell is a functional language and you're free to<br />
compute the functional value required (recall that "IO a" is really a function<br />
type) in any possible way. Here we just extracted several functions<br />
from the list - no problem. This functional value can also be<br />
constructed on-the-fly, as we've done in the previous example - that's also<br />
OK. Want to see this functional value passed as a parameter?<br />
Just look at the definition of 'when'. Hey, we can buy, sell, and rent<br />
these IO actions just like we can with any other functional values! For example, let's define a function that executes all the IO actions in the list:<br />
<br />
<haskell><br />
sequence_ :: [IO a] -> IO ()<br />
sequence_ [] = return ()<br />
sequence_ (x:xs) = do x<br />
sequence_ xs<br />
</haskell><br />
<br />
No black magic - we just extract IO actions from the list and insert<br />
them into a chain of IO operations that should be performed one after another (in the same order that they occurred in the list) to "compute the final world value" of the entire 'sequence_' call.<br />
<br />
With the help of 'sequence_', we can rewrite our last 'main' function as:<br />
<br />
<haskell><br />
main = sequence_ ioActions<br />
</haskell><br />
<br />
<br />
Haskell's ability to work with IO actions as with any other<br />
(functional and non-functional) values allows us to define control<br />
structures of arbitrary complexity. Try, for example, to define a control<br />
structure that repeats an action until it returns the 'False' result:<br />
<br />
<haskell><br />
while :: IO Bool -> IO ()<br />
while action = ???<br />
</haskell><br />
<br />
Most programming languages don't allow you to define control structures at all, and those that do often require you to use a macro-expansion system. In Haskell, control structures are just trivial functions anyone can write.<br />
<br />
<br />
=== Example: returning an IO action as a result ===<br />
<br />
How about returning an IO action as the result of a function? Well, we've done<br />
this each time we've defined an IO procedure - they all return IO actions<br />
that need a RealWorld value to be performed. While we usually just<br />
execute them as part of a higher-level IO procedure, it's also<br />
possible to just collect them without actual execution:<br />
<br />
<haskell><br />
main = do let a = sequence ioActions<br />
b = when True getChar<br />
c = getChar >> getChar<br />
putStr "These 'let' statements are not executed!"<br />
</haskell><br />
<br />
These assigned IO procedures can be used as parameters to other<br />
procedures, or written to global variables, or processed in some other<br />
way, or just executed later, as we did in the example with 'get2chars'.<br />
<br />
But how about returning a parameterized IO action from an IO procedure? Let's define a procedure that returns the i'th byte from a file represented as a Handle:<br />
<br />
<haskell><br />
readi h i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
</haskell><br />
<br />
So far so good. But how about a procedure that returns the i'th byte of a file<br />
with a given name without reopening it each time?<br />
<br />
<haskell><br />
readfilei :: String -> IO (Integer -> IO Char)<br />
readfilei name = do h <- openFile name ReadMode<br />
return (readi h)<br />
</haskell><br />
<br />
As you can see, it's an IO procedure that opens a file and returns...<br />
another IO procedure that will read the specified byte. But we can go<br />
further and include the 'readi' body in 'readfilei':<br />
<br />
<haskell><br />
readfilei name = do h <- openFile name ReadMode<br />
let readi h i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
return (readi h)<br />
</haskell><br />
<br />
That's a little better. But why do we add 'h' as a parameter to 'readi' if it can be obtained from the environment where 'readi' is now defined? An even shorter version is this:<br />
<br />
<haskell><br />
readfilei name = do h <- openFile name ReadMode<br />
let readi i = do hSeek h i AbsoluteSeek<br />
hGetChar h<br />
return readi<br />
</haskell><br />
<br />
What have we done here? We've build a parameterized IO action involving local<br />
names inside 'readfilei' and returned it as the result. Now it can be<br />
used in the following way:<br />
<br />
<haskell><br />
main = do myfile <- readfilei "test"<br />
a <- myfile 0<br />
b <- myfile 1<br />
print (a,b)<br />
</haskell><br />
<br />
<br />
This way of using IO actions is very typical for Haskell programs - you<br />
just construct one or more IO actions that you need,<br />
with or without parameters, possibly involving the parameters that your<br />
"constructor" received, and return them to the caller. Then these IO actions<br />
can be used in the rest of the program without any knowledge about your<br />
internal implementation strategy. One thing this can be used for is to<br />
partially emulate the OOP (or more precisely, the ADT) programming paradigm.<br />
<br />
<br />
=== Example: a memory allocator generator ===<br />
<br />
As an example, one of my programs has a module which is a memory suballocator. It receives the address and size of a large memory block and returns two<br />
procedures - one to allocate a subblock of a given size and the other to<br />
free the allocated subblock:<br />
<br />
<haskell><br />
memoryAllocator :: Ptr a -> Int -> IO (Int -> IO (Ptr b),<br />
Ptr c -> IO ())<br />
<br />
memoryAllocator buf size = do ......<br />
let alloc size = do ...<br />
...<br />
free ptr = do ...<br />
...<br />
return (alloc, free)<br />
</haskell><br />
<br />
How this is implemented? 'alloc' and 'free' work with references<br />
created inside the memoryAllocator procedure. Because the creation of these references is a part of the memoryAllocator IO actions chain, a new independent set of references will be created for each memory block for which<br />
memoryAllocator is called:<br />
<br />
<haskell><br />
memoryAllocator buf size = do start <- newIORef buf<br />
end <- newIORef (buf `plusPtr` size)<br />
...<br />
</haskell><br />
<br />
These two references are read and written in the 'alloc' and 'free' definitions (we'll implement a very simple memory allocator for this example):<br />
<br />
<haskell><br />
...<br />
let alloc size = do addr <- readIORef start<br />
writeIORef start (addr `plusPtr` size)<br />
return addr<br />
<br />
let free ptr = do writeIORef start ptr<br />
</haskell><br />
<br />
What we've defined here is just a pair of closures that use state<br />
available at the moment of their definition. As you can see, it's as<br />
easy as in any other functional language, despite Haskell's lack<br />
of direct support for impure functions.<br />
<br />
The following example uses procedures, returned by memoryAllocator, to<br />
simultaneously allocate/free blocks in two independent memory buffers:<br />
<br />
<haskell><br />
main = do buf1 <- mallocBytes (2^16)<br />
buf2 <- mallocBytes (2^20)<br />
(alloc1, free1) <- memoryAllocator buf1 (2^16)<br />
(alloc2, free2) <- memoryAllocator buf2 (2^20)<br />
ptr11 <- alloc1 100<br />
ptr21 <- alloc2 1000<br />
free1 ptr11<br />
free2 ptr21<br />
ptr12 <- alloc1 100<br />
ptr22 <- alloc2 1000<br />
</haskell><br />
<br />
<br />
<br />
=== Example: emulating OOP with record types ===<br />
<br />
Let's implement the classical OOP example: drawing figures. There are<br />
figures of different types: circles, rectangles and so on. The task is<br />
to create a heterogeneous list of figures. All figures in this list should<br />
support the same set of operations: draw, move and so on. We will<br />
represent these operations as IO procedures. Instead of a "class" let's<br />
define a structure containing implementations of all the procedures<br />
required:<br />
<br />
<haskell><br />
data Figure = Figure { draw :: IO (),<br />
move :: Displacement -> IO ()<br />
}<br />
<br />
type Displacement = (Int, Int) -- horizontal and vertical displacement in points<br />
</haskell><br />
<br />
<br />
The constructor of each figure's type should just return a Figure record:<br />
<br />
<haskell><br />
circle :: Point -> Radius -> IO Figure<br />
rectangle :: Point -> Point -> IO Figure<br />
<br />
type Point = (Int, Int) -- point coordinates<br />
type Radius = Int -- circle radius in points<br />
</haskell><br />
<br />
<br />
We will "draw" figures by just printing their current parameters.<br />
Let's start with a simplified implementation of the 'circle' and 'rectangle'<br />
constructors, without actual 'move' support:<br />
<br />
<haskell><br />
circle center radius = do<br />
let description = " Circle at "++show center++" with radius "++show radius<br />
return $ Figure { draw = putStrLn description }<br />
<br />
rectangle from to = do<br />
let description = " Rectangle "++show from++"-"++show to)<br />
return $ Figure { draw = putStrLn description }<br />
</haskell><br />
<br />
<br />
As you see, each constructor just returns a fixed 'draw' procedure that prints<br />
parameters with which the concrete figure was created. Let's test it:<br />
<br />
<haskell><br />
drawAll :: [Figure] -> IO ()<br />
drawAll figures = do putStrLn "Drawing figures:"<br />
mapM_ draw figures<br />
<br />
main = do figures <- sequence [circle (10,10) 5,<br />
circle (20,20) 3,<br />
rectangle (10,10) (20,20),<br />
rectangle (15,15) (40,40)]<br />
drawAll figures<br />
</haskell><br />
<br />
<br />
Now let's define "full-featured" figures that can actually be<br />
moved around. In order to achieve this, we should provide each figure<br />
with a mutable variable that holds each figure's current screen location. The<br />
type of this variable will be "IORef Point". This variable should be created in the figure constructor and manipulated in IO procedures (closures) enclosed in<br />
the Figure record:<br />
<br />
<haskell><br />
circle center radius = do<br />
centerVar <- newIORef center<br />
<br />
let drawF = do center <- readIORef centerVar<br />
putStrLn (" Circle at "++show center<br />
++" with radius "++show radius)<br />
<br />
let moveF (addX,addY) = do (x,y) <- readIORef centerVar<br />
writeIORef centerVar (x+addX, y+addY)<br />
<br />
return $ Figure { draw=drawF, move=moveF }<br />
<br />
<br />
rectangle from to = do<br />
fromVar <- newIORef from<br />
toVar <- newIORef to<br />
<br />
let drawF = do from <- readIORef fromVar<br />
to <- readIORef toVar<br />
putStrLn (" Rectangle "++show from++"-"++show to)<br />
<br />
let moveF (addX,addY) = do (fromX,fromY) <- readIORef fromVar<br />
(toX,toY) <- readIORef toVar<br />
writeIORef fromVar (fromX+addX, fromY+addY)<br />
writeIORef toVar (toX+addX, toY+addY)<br />
<br />
return $ Figure { draw=drawF, move=moveF }<br />
</haskell><br />
<br />
<br />
Now we can test the code which moves figures around:<br />
<br />
<haskell><br />
main = do figures <- sequence [circle (10,10) 5,<br />
rectangle (10,10) (20,20)]<br />
drawAll figures<br />
mapM_ (\fig -> move fig (10,10)) figures<br />
drawAll figures<br />
</haskell><br />
<br />
<br />
It's important to realize that we are not limited to including only IO actions<br />
in a record that's intended to simulate a C++/Java-style interface. The record can also include values, IORefs, pure functions - in short, any type of data. For example, we can easily add to the Figure interface fields for area and origin:<br />
<br />
<haskell><br />
data Figure = Figure { draw :: IO (),<br />
move :: Displacement -> IO (),<br />
area :: Double,<br />
origin :: IORef Point<br />
}<br />
</haskell><br />
<br />
<br />
<br />
== Dark side of IO monad ==<br />
=== unsafePerformIO ===<br />
<br />
Programmers coming from an imperative language background often look for a way to execute IO actions inside a pure procedure. But what does this mean?<br />
Imagine that you're trying to write a procedure that reads the contents of a file with a given name, and you try to write it as a pure (non-IO) function:<br />
<br />
<haskell><br />
readContents :: Filename -> String<br />
</haskell><br />
<br />
Defining readContents as a pure function will certainly simplify the code that uses it. But it will also create problems for the compiler:<br />
<br />
# This call is not inserted in a sequence of "world transformations", so the compiler doesn't know at what exact moment you want to execute this action. For example, if the file has one kind of contents at the beginning of the program and another at the end - which contents do you want to see? You have no idea when (or even if) this function is going to get invoked, because Haskell sees this function as pure and feels free to reorder the execution of any or all pure functions as needed.<br />
# Attempts to read the contents of files with the same name can be factored (''i.e.'' reduced to a single call) despite the fact that the file (or the current directory) can be changed between calls. Again, Haskell considers all non-IO functions to be pure and feels free to omit multiple calls with the same parameters.<br />
<br />
So, implementing pure functions that interact with the Real World is<br />
considered to be Bad Behavior. Good boys and girls never do it ;)<br />
<br />
<br />
Nevertheless, there are (semi-official) ways to use IO actions inside<br />
of pure functions. As you should remember this is prohibited by<br />
requiring the RealWorld "baton" in order to call an IO action. Pure functions don't have the baton, but there is a special "magic" procedure that produces this baton from nowhere, uses it to call an IO action and then throws the resulting "world" away! It's a little low-level magic :) This very special (and dangerous) procedure is:<br />
<br />
<haskell><br />
unsafePerformIO :: IO a -> a<br />
</haskell><br />
<br />
Let's look at its (possible) definition:<br />
<br />
<haskell><br />
unsafePerformIO :: (RealWorld -> (a, RealWorld)) -> a<br />
unsafePerformIO action = let (a, world1) = action createNewWorld<br />
in a<br />
</haskell><br />
<br />
where 'createNewWorld' is an internal function producing a new value of<br />
the RealWorld type.<br />
<br />
Using unsafePerformIO, you can easily write pure functions that do<br />
I/O inside. But don't do this without a real need, and remember to<br />
follow this rule: the compiler doesn't know that you are cheating; it still<br />
considers each non-IO function to be a pure one. Therefore, all the usual<br />
optimization rules can (and will!) be applied to its execution. So<br />
you must ensure that:<br />
<br />
# The result of each call depends only on its arguments.<br />
# You don't rely on side-effects of this function, which may be not executed if its results are not needed.<br />
<br />
<br />
Let's investigate this problem more deeply. Function evaluation in Haskell<br />
is determined by a value's necessity - the language computes only the values that are really required to calculate the final result. But what does this mean with respect to the 'main' function? To "calculate the final world's" value, you need to perform all the intermediate IO actions that are included in the 'main' chain. By using 'unsafePerformIO' we call IO actions outside of this chain. What guarantee do we have that they will be run at all? None. The only time they will be run is if running them is required to compute the overall function result (which in turn should be required to perform some action in the<br />
'main' chain). This is an example of Haskell's evaluation-by-need strategy. Now you should clearly see the difference:<br />
<br />
- An IO action inside an IO procedure is guaranteed to execute as long as<br />
it is (directly or indirectly) inside the 'main' chain - even when its result isn't used (because the implicit "world" value it returns ''will'' be used). You directly specify the order of the action's execution inside the IO procedure. Data dependencies are simulated via the implicit "world" values that are passed from each IO action to the next.<br />
<br />
- An IO action inside 'unsafePerformIO' will be performed only if<br />
result of this operation is really used. The evaluation order is not<br />
guaranteed and you should not rely on it (except when you're sure about<br />
whatever data dependencies may exist).<br />
<br />
<br />
I should also say that inside 'unsafePerformIO' call you can organize<br />
a small internal chain of IO actions with the help of the same binding<br />
operators and/or 'do' syntactic sugar we've seen above. For example, here's a particularly convoluted way to compute the integer that comes after zero:<br />
<br />
<haskell><br />
one :: Int<br />
one = unsafePerformIO $ do var <- newIORef 0<br />
modifyIORef var (+1)<br />
readIORef var<br />
</haskell><br />
<br />
and in this case ALL the operations in this chain will be performed as<br />
long as the result of the 'unsafePerformIO' call is needed. To ensure this,<br />
the actual 'unsafePerformIO' implementation evaluates the "world" returned<br />
by the 'action':<br />
<br />
<haskell><br />
unsafePerformIO action = let (a,world1) = action createNewWorld<br />
in (world1 `seq` a)<br />
</haskell><br />
<br />
(The 'seq' operation strictly evaluates its first argument before<br />
returning the value of the second one).<br />
<br />
<br />
=== inlinePerformIO ===<br />
<br />
inlinePerformIO has the same definition as unsafePerformIO but with addition of INLINE pragma:<br />
<haskell><br />
-- | Just like unsafePerformIO, but we inline it. Big performance gains as<br />
-- it exposes lots of things to further inlining<br />
{-# INLINE inlinePerformIO #-}<br />
inlinePerformIO action = let (a, world1) = action createNewWorld<br />
in (world1 `seq` a)<br />
#endif<br />
</haskell><br />
<br />
Semantically inlinePerformIO = unsafePerformIO<br />
in as much as either of those have any semantics at all.<br />
<br />
The difference of course is that inlinePerformIO is even less safe than<br />
unsafePerformIO. While ghc will try not to duplicate or common up<br />
different uses of unsafePerformIO, we aggressively inline<br />
inlinePerformIO. So you can really only use it where the IO content is<br />
really properly pure, like reading from an immutable memory buffer (as<br />
in the case of ByteStrings). However things like allocating new buffers<br />
should not be done inside inlinePerformIO since that can easily be<br />
floated out and performed just once for the whole program, so you end up<br />
with many things sharing the same buffer, which would be bad.<br />
<br />
So the rule of thumb is that IO things wrapped in unsafePerformIO have<br />
to be externally pure while with inlinePerformIO it has to be really<br />
really pure or it'll all go horribly wrong.<br />
<br />
That said, here's some really hairy code. This should frighten any pure<br />
functional programmer...<br />
<br />
<haskell><br />
write :: Int -> (Ptr Word8 -> IO ()) -> Put ()<br />
write !n body = Put $ \c buf@(Buffer fp o u l) -><br />
if n <= l<br />
then write' c fp o u l<br />
else write' (flushOld c n fp o u) (newBuffer c n) 0 0 0<br />
<br />
where {-# NOINLINE write' #-}<br />
write' c !fp !o !u !l =<br />
-- warning: this is a tad hardcore<br />
inlinePerformIO<br />
(withForeignPtr fp<br />
(\p -> body $! (p `plusPtr` (o+u))))<br />
`seq` c () (Buffer fp o (u+n) (l-n))<br />
</haskell><br />
<br />
it's used like:<br />
<haskell><br />
word8 w = write 1 (\p -> poke p w)<br />
</haskell><br />
<br />
This does not adhere to my rule of thumb above. Don't ask exactly why we<br />
claim it's safe :-) (and if anyone really wants to know, ask Ross<br />
Paterson who did it first in the Builder monoid)<br />
<br />
=== unsafeInterleaveIO ===<br />
<br />
But there is an even stranger operation called 'unsafeInterleaveIO' that<br />
gets the "official baton", makes its own pirate copy, and then runs<br />
an "illegal" relay-race in parallel with the main one! I can't talk further<br />
about its behavior without causing grief and indignation, so it's no surprise<br />
that this operation is widely used in countries that are hotbeds of software piracy such as Russia and China! ;) Don't even ask me - I won't say anything more about this dirty trick I use all the time ;)<br />
<br />
One can use unsafePerformIO (not unsafeInterleaveIO) to perform I/O<br />
operations not in predefined order but by demand. For example, the<br />
following code:<br />
<br />
<haskell><br />
do let c = unsafePerformIO getChar<br />
do_proc c<br />
</haskell><br />
<br />
will perform getChar I/O call only when value of c is really required<br />
by code, i.e. it this call will be performed lazily as any usual<br />
Haskell computation.<br />
<br />
Now imagine the following code:<br />
<br />
<haskell><br />
do let s = [unsafePerformIO getChar, unsafePerformIO getChar, unsafePerformIO getChar]<br />
do_proc s<br />
</haskell><br />
<br />
Three chars inside this list will be computed on demand too, and this<br />
means that their values will depend on the order they are consumed. It<br />
is not that we usually need :)<br />
<br />
<br />
unsafeInterleaveIO solves this problem - it performs I/O only on<br />
demand but allows to define exact *internal* execution order for parts<br />
of your datastructure. It is why I wrote that unsafeInterleaveIO makes<br />
illegal copy of baton :)<br />
<br />
First, unsafeInterleaveIO has (IO a) action as a parameter and returns<br />
value of type 'a':<br />
<br />
<haskell><br />
do str <- unsafeInterleaveIO myGetContents<br />
</haskell><br />
<br />
Second, unsafeInterleaveIO don't perform any action immediately, it<br />
only creates a box of type 'a' which on requesting this value will<br />
perform action specified as a parameter.<br />
<br />
Third, this action by itself may compute the whole value immediately<br />
or... use unsafeInterleaveIO again to defer calculation of some<br />
sub-components:<br />
<br />
<haskell><br />
myGetContents = do<br />
c <- getChar<br />
s <- unsafeInterleaveIO myGetContents<br />
return (c:s)<br />
</haskell><br />
<br />
This code will be executed only at the moment when value of str is<br />
really demanded. In this moment, getChar will be performed (with<br />
result assigned to c) and one more lazy IO box will be created - for s.<br />
This box again contains link to the myGetContents call<br />
<br />
Then, list cell returned that contains one char read and link to<br />
myGetContents call as a way to compute rest of the list. Only at the<br />
moment when next value in list required, this operation will be<br />
performed again<br />
<br />
As a final result, we get inability to read second char in list before<br />
first one, but lazy character of reading in whole. bingo!<br />
<br />
<br />
PS: of course, actual code should include EOF checking. also note that<br />
you can read many chars/records at each call:<br />
<br />
<haskell><br />
myGetContents = do<br />
c <- replicateM 512 getChar<br />
s <- unsafeInterleaveIO myGetContents<br />
return (c++s)<br />
</haskell><br />
<br />
== Welcome to the machine: the actual [[GHC]] implementation ==<br />
<br />
A little disclaimer: I should say that I'm not describing<br />
here exactly what a monad is (I don't even completely understand it myself) and my explanation shows only one _possible_ way to implement the IO monad in<br />
Haskell. For example, the hbc Haskell compiler implements monads via<br />
continuations. I also haven't said anything about exception handling,<br />
which is a natural part of the "monad" concept. You can read the "All About<br />
Monads" guide to learn more about these topics.<br />
<br />
But there is some good news: first, the monad understanding you've just acquired will work with any implementation. You just can't work with RealWorld<br />
values directly.<br />
<br />
Second, the IO monad implementation described here is really used in the GHC,<br />
yhc/nhc (Hugs/jhc, too?) compilers. Here is the actual IO definition<br />
from the GHC sources:<br />
<br />
<haskell><br />
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))<br />
</haskell><br />
<br />
It uses the "State# RealWorld" type instead of our RealWorld, it uses the "(# #)" strict tuple for optimization, and it adds an IO data constructor<br />
around the type. Nevertheless, there are no significant changes from the standpoint of our explanation. Knowing the principle of "chaining" IO actions via fake "state of the world" values, you can now easily understand and write low-level implementations of GHC I/O operations.<br />
<br />
<br />
=== The [[Yhc]]/nhc98 implementation ===<br />
<br />
<haskell><br />
data World = World<br />
newtype IO a = IO (World -> Either IOError a)<br />
</haskell><br />
<br />
This implementation makes the "World" disappear somewhat, and returns Either a<br />
result of type "a", or if an error occurs then "IOError". The lack of the World on the right-hand side of the function can only be done because the compiler knows special things about the IO type, and won't overoptimise it.<br />
<br />
<br />
== Further reading ==<br />
<br />
This tutorial is largely based on the Simon Peyton Jones' paper [http://research.microsoft.com/%7Esimonpj/Papers/marktoberdorf Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell]. I hope that my tutorial improves his original explanation of the Haskell I/O system and brings it closer to the point of view of beginning Haskell programmers. But if you need to learn about concurrency, exceptions and FFI in Haskell/GHC, the original paper is the best source of information.<br />
<br />
You can find more information about concurrency, FFI and STM at the [[GHC/Concurrency#Starting points]] page.<br />
<br />
The [[Arrays]] page contains exhaustive explanations about using mutable arrays.<br />
<br />
Look also at the [[Books and tutorials#Using Monads]] page, which contains tutorials and papers really describing these mysterious monads :)<br />
<br />
An explanation of the basic monad functions, with examples, can be found in the reference guide [http://members.chello.nl/hjgtuyl/tourdemonad.html A tour of the Haskell Monad functions], by Henk-Jan van Tuyl.<br />
<br />
Do you have more questions? Ask in the [http://www.haskell.org/mailman/listinfo/haskell-cafe haskell-cafe mailing list].<br />
<br />
<br />
== To-do list ==<br />
<br />
If you are interested in adding more information to this manual, please add your questions/topics here.<br />
<br />
Topics:<br />
* fixIO and 'mdo'<br />
* ST monad<br />
* Q monad<br />
<br />
Questions:<br />
* split '>>='/'>>'/return section and 'do' section, more examples of using binding operators<br />
* IORef detailed explanation (==const*), usage examples, syntax sugar, unboxed refs<br />
* control structures developing - much more examples<br />
* unsafePerformIO usage examples: global variable, ByteString, other examples<br />
* actual GHC implementation - how to write low-level routines on example of newIORef implementation<br />
<br />
This manual is collective work, so feel free to add more information to it yourself. The final goal is to collectively develop a comprehensive manual for using the IO monad.<br />
<br />
----<br />
<br />
[[Category:Tutorials]]</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15411STG in Javascript2007-09-04T19:51:46Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
''Note by Dimitry'': Just for clarity, Dimitry's part ends here, and Vir's part starts.<br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are used interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = closure;<br />
args = null;<br />
return null;<br />
}<br />
<br />
String.prototype.arity = 0;<br />
String.prototype.code = function ()<br />
{<br />
if (closure.length == 0) {<br />
args = null;<br />
closure = Nil;<br />
return apply;<br />
}<br />
<br />
args = new Array (2);<br />
args[0] = new Number (closure.charCodeAt (0));<br />
args[1] = closure.slice (1, closure.length);<br />
closure = Cons;<br />
return apply;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
arity : f.arity - nargs,<br />
code : function () {<br />
var new_args = args;<br />
args = supplied_args<br />
supplied_args = null;<br />
<br />
// not working, type information is lost... :(<br />
//args.push (new_args);<br />
<br />
for (i = nargs; i < f.arity; i++)<br />
args[i] = new_args[i - nargs];<br />
new_args = null;<br />
closure = f;<br />
return apply;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// f.arity < nargs<br />
<br />
var remaining_args = args.slice (f.arity, nargs);<br />
args.length = f.arity;<br />
<br />
save_continuation_and_run (f.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
var f = closure;<br />
<br />
save_continuation_and_run (f.realcode);<br />
<br />
f.RCons = RCons;<br />
f.RVal = RVal;<br />
f.args = args;<br />
f.code = updated_code;<br />
f.realcode = null;<br />
return null;<br />
}<br />
<br />
function update_code ()<br />
{<br />
RCons = closure.RCons;<br />
RVal = closure.RVal;<br />
args = closure.args;<br />
return null;<br />
}<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
arity: 2,<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var a = RVal;<br />
closure = b;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var b = RVal;<br />
RVal = a + b;<br />
args = null;<br />
return null;<br />
}<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
arity: 2,<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
ConsTag = 3;<br />
Cons = {<br />
arity : 2,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
NilTag = 2;<br />
Nil = {<br />
arity : 0,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> Nil<br />
}<br />
*/<br />
map = {<br />
arity: 2,<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
closure = Nil;<br />
args = null;<br />
return Nil.code;<br />
break;<br />
}<br />
}<br />
}<br />
<br />
inc3 = {<br />
arity: 0,<br />
code: function () {<br />
args = new Array (1);<br />
args[0] = 3;<br />
closure = add;<br />
return apply;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com<br />
<br />
== Aug 29, 2007 ==<br />
<br />
Code from previous section was updated. Here are some tests I've used to debug this code:<br />
<br />
<pre><br />
args = null;<br />
closure = 1013;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(2);<br />
args[0] = 7;<br />
args[1] = 6;<br />
closure = add;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(1);<br />
closure = inc3;<br />
args[0] = new Number(21);<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
closure = "123";<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
</pre><br />
<br />
The result of this test is the following:<br />
<pre><br />
1013 // Means that JS numbers work as closures using prototype trick<br />
13 // Simple function calls are working<br />
24 // Not so simple calls are working: PAP is properly build and used<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
2 // Nil - list constructor<br />
</pre><br />
<br />
Last 4 lines shows that javascript strings works properly using prototype trick. We can observe the structure of "123" object: Cons 1 (Cons 2 (Cons 3 Nil))<br />
<br />
== Sept 4, 2007 ==<br />
<br />
I've got some feedback from Edward Kmett. Edward claims that simple trampolining as used below is less efficient than Appels trampoline. Trampolining is a trick to simulate tail calls. Simon Peyton Jones used the same technic as I did in my code (and Dimitry did too). The technic is simple: return continuation to call it. Mini-interpreter is used for trampolining on the stack:<br />
<br />
<pre><br />
while (f=f())<br />
;<br />
</pre><br />
<br />
It is efficient in Simon version because of GNU C compiler's tweaks (not portable so). But they are not available in JavaScript.<br />
<br />
Using the same interpreter in JavaScript I have to return function to simulate tail call and call interpreter again to simulate normal call. This seems very inefficient and Edward claims it is.<br />
<br />
The trick is the transformation of the program to Continuation Passing Style. We need no stack at all when using this transformation. So every call is tail call. We can get rid of interpreter and just call functions as usual in JavaScript. We can use counter to count stack frames (function enterings), and when we rich the limit, we don't call continuation directly, but register it as a callback on the timer event (and thus we flush the stack). So we use longer jumps on stack, and some peaple claims it's more efficient. Moreover we get some framework to introduce parallel threads. Stack jump become a quantum for the thread.<br />
<br />
I'd like to thank Edward for his ideas, and explore them feather.</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15329STG in Javascript2007-08-29T14:20:36Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
''Note by Dimitry'': Just for clarity, Dimitry's part ends here, and Vir's part starts.<br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are used interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = closure;<br />
args = null;<br />
return null;<br />
}<br />
<br />
String.prototype.arity = 0;<br />
String.prototype.code = function ()<br />
{<br />
if (closure.length == 0) {<br />
args = null;<br />
closure = Nil;<br />
return apply;<br />
}<br />
<br />
args = new Array (2);<br />
args[0] = new Number (closure.charCodeAt (0));<br />
args[1] = closure.slice (1, closure.length);<br />
closure = Cons;<br />
return apply;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
arity : f.arity - nargs,<br />
code : function () {<br />
var new_args = args;<br />
args = supplied_args<br />
supplied_args = null;<br />
<br />
// not working, type information is lost... :(<br />
//args.push (new_args);<br />
<br />
for (i = nargs; i < f.arity; i++)<br />
args[i] = new_args[i - nargs];<br />
new_args = null;<br />
closure = f;<br />
return apply;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// f.arity < nargs<br />
<br />
var remaining_args = args.slice (f.arity, nargs);<br />
args.length = f.arity;<br />
<br />
save_continuation_and_run (f.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
var f = closure;<br />
<br />
save_continuation_and_run (f.realcode);<br />
<br />
f.RCons = RCons;<br />
f.RVal = RVal;<br />
f.args = args;<br />
f.code = updated_code;<br />
f.realcode = null;<br />
return null;<br />
}<br />
<br />
function update_code ()<br />
{<br />
RCons = closure.RCons;<br />
RVal = closure.RVal;<br />
args = closure.args;<br />
return null;<br />
}<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
arity: 2,<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var a = RVal;<br />
closure = b;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var b = RVal;<br />
RVal = a + b;<br />
args = null;<br />
return null;<br />
}<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
arity: 2,<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
ConsTag = 3;<br />
Cons = {<br />
arity : 2,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
NilTag = 2;<br />
Nil = {<br />
arity : 0,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> Nil<br />
}<br />
*/<br />
map = {<br />
arity: 2,<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
closure = Nil;<br />
args = null;<br />
return Nil.code;<br />
break;<br />
}<br />
}<br />
}<br />
<br />
inc3 = {<br />
arity: 0,<br />
code: function () {<br />
args = new Array (1);<br />
args[0] = 3;<br />
closure = add;<br />
return apply;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com<br />
<br />
== Aug 29, 2007 ==<br />
<br />
Code from previous section was updated. Here are some tests I've used to debug this code:<br />
<br />
<pre><br />
args = null;<br />
closure = 1013;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(2);<br />
args[0] = 7;<br />
args[1] = 6;<br />
closure = add;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(1);<br />
closure = inc3;<br />
args[0] = new Number(21);<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
closure = "123";<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
</pre><br />
<br />
The result of this test is the following:<br />
<pre><br />
1013 // Means that JS numbers work as closures using prototype trick<br />
13 // Simple function calls are working<br />
24 // Not so simple calls are working: PAP is properly build and used<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
3 // Cons - list constructor<br />
2 // Nil - list constructor<br />
</pre><br />
<br />
Last 4 lines shows that javascript strings works properly using prototype trick. We can observe the structure of "123" object: Cons 1 (Cons 2 (Cons 3 Nil))</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15328STG in Javascript2007-08-29T14:13:32Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
''Note by Dimitry'': Just for clarity, Dimitry's part ends here, and Vir's part starts.<br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are used interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = closure;<br />
args = null;<br />
return null;<br />
}<br />
<br />
String.prototype.arity = 0;<br />
String.prototype.code = function ()<br />
{<br />
if (closure.length == 0) {<br />
args = null;<br />
closure = Nil;<br />
return apply;<br />
}<br />
<br />
args = new Array (2);<br />
args[0] = new Number (closure.charCodeAt (0));<br />
args[1] = closure.slice (1, closure.length);<br />
closure = Cons;<br />
return apply;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
arity : f.arity - nargs,<br />
code : function () {<br />
var new_args = args;<br />
args = supplied_args<br />
supplied_args = null;<br />
<br />
// not working, type information is lost... :(<br />
//args.push (new_args);<br />
<br />
for (i = nargs; i < f.arity; i++)<br />
args[i] = new_args[i - nargs];<br />
new_args = null;<br />
closure = f;<br />
return apply;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// f.arity < nargs<br />
<br />
var remaining_args = args.slice (f.arity, nargs);<br />
args.length = f.arity;<br />
<br />
save_continuation_and_run (f.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
var f = closure;<br />
<br />
save_continuation_and_run (f.realcode);<br />
<br />
f.RCons = RCons;<br />
f.RVal = RVal;<br />
f.args = args;<br />
f.code = updated_code;<br />
f.realcode = null;<br />
return null;<br />
}<br />
<br />
function update_code ()<br />
{<br />
RCons = closure.RCons;<br />
RVal = closure.RVal;<br />
args = closure.args;<br />
return null;<br />
}<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
arity: 2,<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var a = RVal;<br />
closure = b;<br />
args = null;<br />
save_continuation_and_run (apply);<br />
var b = RVal;<br />
RVal = a + b;<br />
args = null;<br />
return null;<br />
}<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
arity: 2,<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
ConsTag = 3;<br />
Cons = {<br />
arity : 2,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
NilTag = 2;<br />
Nil = {<br />
arity : 0,<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> Nil<br />
}<br />
*/<br />
map = {<br />
arity: 2,<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
arity : 0,<br />
code : update,<br />
realcode : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
closure = Nil;<br />
args = null;<br />
return Nil.code;<br />
break;<br />
}<br />
}<br />
}<br />
<br />
inc3 = {<br />
arity: 0,<br />
code: function () {<br />
args = new Array (1);<br />
args[0] = 3;<br />
closure = add;<br />
return apply;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com<br />
<br />
== Aug 29, 2007 ==<br />
<br />
Code from previous section was updated. Here are the result of some tests with this code:<br />
<br />
<pre><br />
args = null;<br />
closure = 1013;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(2);<br />
args[0] = 7;<br />
args[1] = 6;<br />
closure = add;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
args = new Array(1);<br />
closure = inc3;<br />
args[0] = new Number(21);<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RVal + "<br />");<br />
<br />
closure = "123";<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
<br />
closure = args[1];<br />
args = null;<br />
save_continuation_and_run (apply);<br />
<br />
document.write (RCons + "<br />");<br />
</pre></div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15325STG in Javascript2007-08-29T12:25:11Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are used interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var FunctionType = 1;<br />
var ThunkType = 2;<br />
var ConstructorType = 3;<br />
<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.type = ValueType;<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = this;<br />
return null;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
type : PAPType;<br />
arity : f.arity - nargs;<br />
code : function () {<br />
var new_args = args;<br />
args = supplied_args<br />
supplied_args = null;<br />
<br />
// closure variable is pointing to this pap right now<br />
args.push (new_args);<br />
new_args = null;<br />
closure = f;<br />
return f.code;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// closure.arity < nargs<br />
remaining_args = args.slice(closure.arity, nargs);<br />
args.length = closure.arity;<br />
<br />
save_continuation_and_run (closure.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
var f = closure;<br />
<br />
save_continuation_and_run (f.realcode);<br />
<br />
f.RCons = RCons;<br />
f.RVal = RVal;<br />
f.args = args;<br />
f.code = updated_code;<br />
f.realcode = null;<br />
return null;<br />
}<br />
<br />
function update_code ()<br />
{<br />
RCons = closure.RCons;<br />
RVal = closure.RVal;<br />
args = closure.args;<br />
return null;<br />
}<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
var _a = RVal;<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
var _b = RVal;<br />
RVal = a + b;<br />
args = null;<br />
return null;<br />
}<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : update;<br />
realcode : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
cons = {<br />
type : ConstructorType;<br />
arity : 2;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
nil = {<br />
type : ConstructorType;<br />
arity : 0;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> nil<br />
}<br />
*/<br />
map = {<br />
type : FunctionType;<br />
arity: 2;<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : update;<br />
realcode : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : update;<br />
realcode : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
return nil.code;<br />
break;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15290STG in Javascript2007-08-28T11:07:05Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
// updates are not considered yet<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are usually interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var FunctionType = 1;<br />
var ThunkType = 2;<br />
var ConstructorType = 3;<br />
<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.type = ValueType;<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = this;<br />
return null;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
type : PAPType;<br />
arity : f.arity - nargs;<br />
code : function () {<br />
var new_args = args;<br />
args = new Array (f.arity)<br />
for (i = 0; i < supplied_args.length; i++)<br />
args[i] = supplied_args[i];<br />
supplied_args = null;<br />
<br />
// closure variable is pointing to this pap right now<br />
for (i = 0; i < closure.arity; i++)<br />
args[supplied_args.length + i] = new_args[i];<br />
new_args = null;<br />
closure = f;<br />
return f.code;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// closure.arity < nargs<br />
remaining_args = new Array (nargs - closure.arity);<br />
var i;<br />
for (i = 0; i < remaining_args.length; i++) {<br />
remaining_args[i] = args[i + closure.arity];<br />
args[i + closure.arity] = null;<br />
}<br />
<br />
// FIX me, don't know if it works in js<br />
args.length = closure.arity;<br />
<br />
save_continuation_and_run (closure.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
...<br />
}<br />
<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
switch (RVal) {<br />
default: {<br />
var a = RVal;<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
switch (RVal) {<br />
default: {<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
}<br />
}<br />
}<br />
}<br />
}<br />
/*<br />
really, code is equivalent to<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
var a = RVal;<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
*/<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
cons = {<br />
type : ConstructorType;<br />
arity : 2;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
nil = {<br />
type : ConstructorType;<br />
arity : 0;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> nil<br />
}<br />
*/<br />
map = {<br />
type : FunctionType;<br />
arity: 2;<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
return nil.code;<br />
break;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15289STG in Javascript2007-08-28T11:03:03Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Note (Aug 27, 2007)'': This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: [[Yhc/Javascript]].<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
// updates are not considered yet<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are usually interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var FunctionType = 1;<br />
var ThunkType = 2;<br />
var ConstructorType = 3;<br />
<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.type = ValueType;<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = this;<br />
return null;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = 0<br />
if (args != null)<br />
nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
type : PAPType;<br />
arity : f.arity - nargs;<br />
code : function () {<br />
var new_args = args;<br />
args = new Array (f.arity)<br />
for (i = 0; i < supplied_args.length; i++)<br />
args[i] = supplied_args[i];<br />
supplied_args = null;<br />
<br />
// closure variable is pointing to this pap right now<br />
for (i = 0; i < closure.arity; i++)<br />
args[supplied_args.length + i] = new_args[i];<br />
new_args = null;<br />
closure = f;<br />
return f.code;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// closure.arity < nargs<br />
remaining_args = new Array (nargs - closure.arity);<br />
var i;<br />
for (i = 0; i < remaining_args.length; i++) {<br />
remaining_args[i] = args[i + closure.arity];<br />
args[i + closure.arity] = null;<br />
}<br />
<br />
// FIX me, don't know if it works in js<br />
args.length = closure.arity;<br />
<br />
save_continuation_and_run (closure.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates are called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
...<br />
}<br />
<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
var continuation1 = function () {<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
switch (RVal) {<br />
default: {<br />
var a = RVal;<br />
var continuation2 = function () {<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
switch (RVal) {<br />
default: {<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
}<br />
}<br />
return continuation2;<br />
}<br />
}<br />
}<br />
return continuation1;<br />
}<br />
/*<br />
really, code is equivalent to<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
var a = RVal;<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
*/<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = g;<br />
args = new Array (1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (1);<br />
closure = f;<br />
args[0] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
cons = {<br />
type : ConstructorType;<br />
arity : 2;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
nil = {<br />
type : ConstructorType;<br />
arity : 0;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> nil<br />
}<br />
*/<br />
map = {<br />
type : FunctionType;<br />
arity: 2;<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
//push continuation and enter xs<br />
closure = xs;<br />
args = null;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = f;<br />
args = new Array(1);<br />
args[0] = x;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
closure = map;<br />
args = new Array(2);<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
return nil.code;<br />
break;<br />
}<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15270STG in Javascript2007-08-26T02:18:46Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
// updates are not considered yet<br />
//<br />
// PAP - Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are usually interchangable here<br />
//<br />
<br />
<br />
////////////////////////////////////////////////////////////////<br />
// Run-time system:<br />
<br />
var FunctionType = 1;<br />
var ThunkType = 2;<br />
var ConstructorType = 3;<br />
<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
var RCons; // Constructor tag, constructors set this tag to some value<br />
var RVal; // Some returned value<br />
<br />
Number.prototype.type = ValueType;<br />
Number.prototype.arity = 0;<br />
Number.prototype.code = function () {<br />
RVal = this;<br />
return null;<br />
}<br />
<br />
// mini enterpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
<br />
if (nargs == 0) {<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
type : PAPType;<br />
arity : f.arity - nargs;<br />
code : function () {<br />
var new_args = args;<br />
args = new Array (f.arity)<br />
for (i = 0; i < supplied_args.length; i++)<br />
args[i] = supplied_args[i];<br />
supplied_args = null;<br />
<br />
// closure variable is pointing to this pap right now<br />
for (i = 0; i < closure.arity; i++)<br />
args[supplied_args.length + i] = new_args[i];<br />
new_args = null;<br />
closure = f;<br />
return f.code;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// closure.arity < nargs<br />
remaining_args = new Array (nargs - closure.arity);<br />
var i;<br />
for (i = 0; i < remaining_args.length; i++) {<br />
remaining_args[i] = args[i + closure.arity];<br />
args[i + closure.arity] = null;<br />
}<br />
<br />
// FIX me, don't know if it works in js<br />
args.length = closure.arity;<br />
<br />
save_continuation_and_run (closure.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
// Updates is called and used essentially as apply function<br />
// updatable thunks pushes continuation and runs as usual<br />
// when continuation activates it replaces the closure with the value<br />
// after that it returns to the next continuation<br />
function update()<br />
{<br />
...<br />
}<br />
<br />
<br />
////////////////////////////////////////////////////////////////////<br />
// Examples: STG -> JS<br />
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */<br />
<br />
add = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
var continuation1 = function () {<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
switch (RVal) {<br />
default: {<br />
var a = RVal;<br />
var continuation2 = function () {<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
switch (RVal) {<br />
default: {<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
}<br />
}<br />
return continuation2;<br />
}<br />
}<br />
}<br />
return continuation1;<br />
}<br />
/*<br />
really, code is equivalent to<br />
code: function () {<br />
var a = args[0];<br />
var b = args[1];<br />
closure = a;<br />
save_continuation_and_run (a.code);<br />
var a = RVal;<br />
closure = b;<br />
save_continuation_and_run (b.code);<br />
var b = RVal;<br />
RVal = a + b;<br />
return null;<br />
}<br />
*/<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array (2);<br />
args[0] = g;<br />
args[1] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (2);<br />
args[0] = f;<br />
args[1] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
cons = {<br />
type : ConstructorType;<br />
arity : 2;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RCons = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
nil = {<br />
type : ConstructorType;<br />
arity : 0;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RCons = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> nil<br />
}<br />
*/<br />
map = {<br />
type : FunctionType;<br />
arity: 2;<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
continuation = function () {<br />
//push continuation and enter xs<br />
closure = xs;<br />
save_continuation_and_run (xs.code)<br />
switch (RCons) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array(1);<br />
closure = f;<br />
args[0] = x;<br />
closure = apply;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array(2);<br />
closure = map;<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
return nil.code;<br />
break;<br />
}<br />
}<br />
args = null;<br />
return continuation; //return continuation means jump to it;<br />
}<br />
<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com</div>Virhttps://wiki.haskell.org/index.php?title=STG_in_Javascript&diff=15269STG in Javascript2007-08-26T01:28:47Z<p>Vir: </p>
<hr />
<div>[[Category:How to]]<br />
<br />
''Disclaimer'': Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the [[Talk:STG in Javascript]] page). Chances are, at some point a working implementation will be produced.<br />
<br />
The [http://www.squarefree.com/shell/shell.html Javascript Shell] is of great help for this experiment.<br />
<br />
----<br />
<br />
== Aug 22, 2006 ==<br />
<br />
Several people expressed interest in the matter, e. g.: [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017286.html], [http://www.haskell.org//pipermail/haskell-cafe/2006-August/017287.html]. <br />
<br />
A Wiki page [[Hajax]] has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.<br />
<br />
To achieve this, an existing Haskell compiler, namely [http://haskell.org/nhc98/ nhc98], is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.<br />
<br />
After (unsuccessful) trying several approaches (e. g. Javascript closures (see [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Functions#Nested_functions_and_closures]), it has been decided to implement a STG machine (as described in [http://citeseer.ist.psu.edu/peytonjones92implementing.html]) in Javascript.<br />
<br />
The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.<br />
<br />
To describe a thunk, a Javascript object of the following structure may be used:<br />
<br />
<pre><br />
thunk = {<br />
_c:function(){ ... }, // code to evaluate a thunk<br />
_1:..., // argument 1<br />
_2:...,<br />
_N:... // argument n<br />
};<br />
</pre><br />
<br />
So, similarly to what is described in the STG paper, the ''c'' method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. ''this.c'') with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).<br />
<br />
Some interesting things may be done by manipulating prototypes of Javascript built-in classes.<br />
<br />
Consider this (Javascript shell log pasted below):<br />
<br />
<pre><br />
<br />
Number.prototype.c=function(){return this};<br />
function(){return this}<br />
(1).c()<br />
1<br />
(2).c()<br />
2<br />
(-999).c()<br />
-999<br />
1<br />
1<br />
2<br />
2<br />
999<br />
999<br />
<br />
</pre><br />
<br />
Thus, simple numeric values are given thunk behavior: by calling the ''c'' method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).<br />
<br />
Similar trick can be done on Strings and Arrays: for these, the ''c'' method will return a head value (i. e. ''String.charAt(0)'') CONS'ed with the remainder of a String/Array.<br />
<br />
== Aug 23, 2006 ==<br />
<br />
First thing to do is to learn how to call primitives. In Javascript,<br />
primitives mostly cover built-in arithmetics and interface to the [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:Math Math] object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.<br />
<br />
At the moment, the following Haskell code:<br />
<br />
<pre><br />
f :: Int -> Int -> Int<br />
<br />
f a b = (a + b) * (a - b)<br />
<br />
g = f 1 2<br />
</pre><br />
<br />
compiles into (part of the Javascript below was inserted manually):<br />
<br />
<pre><br />
var HMain = {m:"HMain"};<br />
<br />
Number.prototype._c=function(){return this;};<br />
<br />
// Compiled code starts<br />
<br />
HMain.f_T=function(v164,v165){return {_c:HMain.f_C,<br />
_w:"9:1-9:24",<br />
_1:v164,<br />
_2:v165};};<br />
HMain.f_C=function(){<br />
return ((((this._1)._c())+((this._2)._c()))._c())*<br />
((((this._1)._c())-((this._2)._c()))._c());<br />
};<br />
<br />
HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};<br />
HMain.g_C=function(){<br />
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()<br />
};<br />
<br />
// Compiler code ends<br />
<br />
print(HMain.f_T(3,4)._c());<br />
<br />
print(HMain.g_T()._c()._c());<br />
</pre><br />
<br />
<br />
When running, the script produces:<br />
<br />
<pre><br />
Running...<br />
-7<br />
-3<br />
</pre><br />
<br />
So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).<br />
<br />
Note that the ''_c()'' method is applied twice to the output from ''HMain.g_T'': the function calls ''f_T'' which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.<br />
<br />
'''NB''': indeed, the thunk evaluation function for ''HMain.g'' should evaluate the thunk created by ''HMain.f_T''. Laziness will not be lost because ''HMain.g_C'' will not be executed until needed.<br />
<br />
== Sep 12, 2006 ==<br />
<br />
To simplify handling of partial function applications, format of thunk has been changed so that instead of ''_1'', ''_2'', etc. for function argument, an array named ''_a'' is used. This array always has at least one element which is ''undefined''. Arguments start with array element indexed at 1, so to access an argument ''n'', the following needs to be used: ''this._a[n]''.<br />
<br />
For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions.<br />
Imagine this Javascript function which prints its argument on the window status line:<br />
<br />
<pre><br />
// Output an integer value into the window status line<br />
<br />
putStatus = function (i) {window.status = i; return i;};<br />
</pre><br />
<br />
To import such a function is a Haskell program, the following FFI declaration is to be used:<br />
<br />
<pre><br />
foreign import ccall "putStatus" putStatus :: Int -> Int<br />
</pre><br />
<br />
Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.<br />
<br />
The current NHC98-based implementation compiles the above FFI declaration into this:<br />
<br />
<pre><br />
Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", <br />
_a:[undefined, _1]};};<br />
Test2.putStatus_C=function(){<br />
return (putStatus)((this._a[1])._c());<br />
};<br />
</pre><br />
<br />
Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.<br />
<br />
A test page illustrating this can be found at:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.html<br />
<br />
When this page is loaded, the window status line should display "456" while the rest of the page remains blank. <br />
The Haskell source for this test page is:<br />
<br />
http://www.golubovsky.org/repos/nhcjs/test2.hs<br />
<br />
== Sep 19, 2006 ==<br />
<br />
Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the ''fixStr'' function: in ''nhc98'' it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:<br />
<br />
<pre><br />
p3 :: Int -> Int -> Int -> Int<br />
p3 a b c = (a + b) * c;<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, <br />
_w:"15:1-15:22", <br />
_a:[undefined, <br />
v210, v211, v212]};};<br />
var Test3_46p3_C=function(){<br />
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*<br />
((this._a[3])._c()))._c();<br />
};<br />
</pre><br />
<br />
Note the function name: ''Test3_46p3_T''; in previous examples it would have been something like ''Test3.p3_T''.<br />
<br />
Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.<br />
<br />
Thus, for a function:<br />
<br />
<pre><br />
w = p3 1<br />
</pre><br />
<br />
resulting Javascript is:<br />
<br />
<pre><br />
var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", <br />
_a:[undefined]};};<br />
var Test3_46w_C=function(){<br />
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();<br />
};<br />
</pre><br />
<br />
Such a thunk always evaluates to itself (''_c()''); it holds the function name in its ''_s'' member, number of remaining arguments in its ''_x'' member, and available arguments in its ''_a'' member, only in this case the array does not have ''undefined'' as its zeroth element.<br />
<br />
An application of such a function (''w'') to additional arguments:<br />
<br />
<pre><br />
z = w 2 3<br />
</pre><br />
<br />
compiles into:<br />
<br />
<pre><br />
var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", <br />
_a:[undefined]};};<br />
var Test3_46z_C=function(){<br />
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();<br />
};<br />
</pre><br />
<br />
So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (''Test3_46w_T())._c()''), and adds the arguments provided (''[2, 3]'') to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:<br />
<br />
<pre><br />
var HSRuntime_46doApply = function (thunk, targs){<br />
thunk._a = thunk._a.concat (targs);<br />
thunk._x = thunk._x - targs.length;<br />
if (thunk._x > 0) {<br />
return thunk;<br />
} else {<br />
return thunk._s.apply (null, thunk._a);<br />
}<br />
};<br />
</pre><br />
<br />
Note the use of the ''apply'' method. It may be used also with functions that are not methods of some object. The first argument (''this_arg'') may be ''null'' or ''undefined'' as it will not be used by the function applied to the arguments.<br />
<br />
''NHC98'' acts differently when a partial application is not defined as a separate function, but is part of another expression.<br />
<br />
First, some Haskell definitions:<br />
<br />
<pre><br />
z :: Int -> Int<br />
<br />
z = (3 +)<br />
<br />
p :: Int -> Int -> Int<br />
<br />
p = (+)<br />
</pre><br />
<br />
compile into:<br />
<br />
<pre><br />
var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", <br />
_a:[undefined]};};<br />
var Test4_46z_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", <br />
_a:[undefined, v178]};};<br />
var LAMBDA181_C=function(){<br />
return (((3)._c())+((this._a[1])._c()))._c();<br />
};<br />
<br />
var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", <br />
_a:[undefined]};};<br />
var Test4_46p_C=function(){<br />
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();<br />
};<br />
<br />
var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, <br />
_w:"13:6", <br />
_a:[undefined, v179, v180]};};<br />
var LAMBDA182_C=function(){<br />
return (((this._a[1])._c())+((this._a[2])._c()))._c();<br />
};<br />
</pre><br />
<br />
Now, when these functions (''p'', ''z'') are used:<br />
<br />
<pre><br />
t4main = putStatus (z (p 6 8)) -- see above for putStatus<br />
</pre><br />
<br />
the generated Javascript is:<br />
<br />
<pre><br />
var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, <br />
_w:"17:1-17:28", <br />
_a:[undefined]};};<br />
var Test4_46t4main_C=function(){<br />
return (Test4_46putStatus_T(<br />
NHC_46Internal_46_95apply1_T(<br />
Test4_46z_T(), <br />
NHC_46Internal_46_95apply2_T(<br />
Test4_46p_T(), 6, 8)<br />
)))._c();<br />
};<br />
</pre><br />
<br />
For each application of ''p'' and ''z'', an internal function ''NHC_46Internal_46_95apply'''''N'''''_T'' is called where '''N''' depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the ''_c()'' method), getting a partial application thunk. Then, the Runtime support function ''HSRuntime_46doApply'' is called with the thunk and arguments array:<br />
<br />
<pre><br />
var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};<br />
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};<br />
...<br />
var __apply__ = function (args) {<br />
var i, targs = new Array();<br />
var thunk = args[0]._c();<br />
for (i = 1; i < args.length; i++) {<br />
targs [i - 1] = args [i];<br />
}<br />
return HSRuntime_46doApply (thunk, targs);<br />
};<br />
</pre><br />
<br />
== Aug 25, 2007 ==<br />
Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.<br />
<br />
This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.<br />
<br />
<pre><br />
// Example of Haskell to JavaScript translation<br />
// updates are not considered yet<br />
//<br />
// PAP -- Partial Application<br />
// every object (heap object in STG) is called closure here<br />
// closure and function are usually interchangeable here<br />
//<br />
// Victor Nazarov (vir@comtv.ru)<br />
<br />
<br />
var closure; // current entered closure<br />
var args; // arguments<br />
<br />
// mini interpreter is used to implement tail calls<br />
// to jump to some function, we don't call it, but<br />
// return it's address instead<br />
function save_continuation_and_run (function_to_run)<br />
{<br />
while (function_to_run != null)<br />
function_to_run = function_to_run ();<br />
}<br />
<br />
// calling convention<br />
// function is pointed by a [closure] global variable<br />
// arguments are in [args] array<br />
function apply ()<br />
{<br />
var f = closure;<br />
var nargs = args.length;<br />
<br />
if (f.arity == nargs)<br />
return f.code;<br />
// We CAN'T call a function, so we must build a PAP and call continuation!!!<br />
if (f.arity > nargs) {<br />
var supplied_args = args;<br />
args = null;<br />
var pap = {<br />
type : PAPType;<br />
arity : f.arity - nargs;<br />
code : function () {<br />
var new_args = args;<br />
args = new Array (f.arity)<br />
for (i = 0; i < supplied_args.length; i++)<br />
args[i] = supplied_args[i];<br />
supplied_args = null;<br />
<br />
// closure variable is pointing to this pap right now<br />
for (i = 0; i < closure.arity; i++)<br />
args[supplied_args.length + i] = new_args[i];<br />
new_args = null;<br />
closure = f;<br />
return f.code;<br />
}<br />
}<br />
<br />
closure = pap;<br />
// we don't know what to do, so run a continuation<br />
return null;<br />
}<br />
<br />
// closure.arity < nargs<br />
remaining_args = new Array (nargs - closure.arity);<br />
var i;<br />
for (i = 0; i < remaining_args.length; i++) {<br />
remaining_args[i] = args[i + closure.arity];<br />
args[i + closure.arity] = null;<br />
}<br />
<br />
// FIX me, don't know if it works in js<br />
args.length = closure.arity;<br />
<br />
save_continuation_and_run (closure.code)<br />
<br />
// closure now points to some new function, we'll try to call it<br />
args = remaining_args;<br />
return apply;<br />
}<br />
<br />
<br />
/*<br />
compose = \f g x -><br />
let gx = g x<br />
in f gx<br />
*/<br />
compose = {<br />
type: FunctionType;<br />
arity: 2;<br />
code: function () {<br />
var f = args[0];<br />
var g = args[1];<br />
var x = args[2];<br />
var gx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array (2);<br />
args[0] = g;<br />
args[1] = x;<br />
return apply;<br />
}<br />
}<br />
args = new Array (2);<br />
args[0] = f;<br />
args[1] = gx;<br />
return apply;<br />
}<br />
}<br />
<br />
cons = {<br />
type : ConstructorType;<br />
arity : 2;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Nil<br />
RTag = ConsTag;<br />
<br />
// We must return to continuation, arguments are returned in args array<br />
return null;<br />
}<br />
}<br />
<br />
nil = {<br />
type : ConstructorType;<br />
arity : 0;<br />
code : function () {<br />
// This is tag to distinguish this constructor from Cons<br />
RTag = NilTag;<br />
<br />
// We must return to continuation<br />
return null;<br />
}<br />
}<br />
<br />
/*<br />
map = \f xs-><br />
case xs of {<br />
Cons x xs -><br />
let fx = f x<br />
in let mapfxs = map f xs<br />
in Cons fx mapfxs<br />
; Nil -> nil<br />
}<br />
*/<br />
map = {<br />
type : FunctionType;<br />
arity: 2;<br />
code : function () {<br />
var f = args[0];<br />
var xs = args[1];<br />
continuation = function () {<br />
//push continuation and enter xs<br />
closure = xs;<br />
save_continuation_and_run (xs.code)<br />
switch (RTag) {<br />
case ConsTag:<br />
{<br />
var x = args[0];<br />
var xs = args[1];<br />
var fx = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array(1);<br />
closure = f;<br />
args[0] = x;<br />
closure = apply;<br />
return apply;<br />
}<br />
}<br />
var mapfxs = {<br />
type : ThunkType;<br />
arity : 0;<br />
code : function () {<br />
args = new Array(2);<br />
closure = map;<br />
args[0] = f;<br />
args[1] = xs;<br />
return apply;<br />
}<br />
}<br />
closure = cons;<br />
args = new Array(2);<br />
args[0] = fx;<br />
args[1] = mapfxs;<br />
return apply;<br />
}<br />
break;<br />
case NilTag:<br />
return nil.code;<br />
break;<br />
}<br />
}<br />
args = null;<br />
return continuation; //return continuation means jump to it;<br />
}<br />
</pre><br />
<br />
<br />
----<br />
<br />
Victor Nazarov<br />
<br />
asviraspossible@gmail.com</div>Vir