Yhc/TMR

From HaskellWiki
Jump to navigation Jump to search

Authors: Neil Mitchell, Tom Shackell, Dimitry Golubovsky

This is a draft of the Yhc TMR article, deadline April 13th. It isn't intended as a wiki article beyond the listed authors (although if you want to fix some spelling, we don't mind!). If you are interested in helping email the Yhc list.

The beginning

In the beginning there was the nhc compiler, which had a number of issues. We fixed some of them.

Author: Tom/Neil/Andrew

How we started up Yhc, this is the section that would have been in the History of Haskell paper if they had done a Yhc section :)

Include the transition from CVS -> york Darcs -> haskell.org Darcs

Portability concerns

From the beginning portability was a prime concern, while the original nhc was only running on Linux v old.old, and never Windows, Yhc was fully portable by design.

Author: Tom, Andrew

Why portability is such a concern, details of our ports system. Include our scons architecture, buildbot system etc. Mention that Yhc runs under Hugs, and indeed some of the developers use Hugs.


Yhc.Core

Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda calculus extended with positional information. Although the language was very similar to a subset of Haskell, it wasn’t the same. In particular there were unusual constructs such as FatBar, and all names were stored in a symbol table.

Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch, being inspired by both PosLambda and GHC Core. We then wrote a converter from PosLambda to Yhc.Core.

In particular our idealised Core language differs from GHC Core in a number of ways:

  • Untyped – originally this was a restriction of PosLambda, but now some of us see this as a feature (others still see this as a bug).
  • Syntactically a subset of Haskell.
  • Minimal name mangling.

All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. This very low barrier to entry means that with merely the Haddock description of our abstract syntax tree, most Haskell programmers can feel at home with relatively little effort.

Having reduced the barrier to entry for our Core language, the number of projects depending on it has increased. Consistently we have attempted to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more.

One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, if allows many types of analysis and transformation to be performed in a simpler manner. Naturally, if a technique turns out to be successful breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed.

A Small Example

To give a flavour of what Core looks like, it is easiest to start with a small program:

head2 (x:xs) = x
 
map2 f [] = []
map2 f (x:xs) = f x : map2 f xs
 
test x = map2 head2 x

Compiling this with yhc -showcore Sample.hs generates:

Sample.head2 v220 =
    case v220 of
        (:) v221 v222 -> v221
        _ -> Prelude.error Sample._LAMBDA228
 
Sample._LAMBDA228 =
    "Sample: Pattern match failure in function at 9:1-9:15."
 
Sample.map2 v223 v224 =
    case v224 of
        [] -> []
        (:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226)
 
Sample.test v227 = Sample.map2 Sample.head2 v227

The generated Core looks very much like Haskell, and can be treated as such. The generated Core is the simplest form of Haskell, with many restrictions:

  • Case statements only examine their outermost constructor
  • No type classes
  • No where
  • No nested functions (only top level)
  • All names are fully qualified

Yhc.Core.Overlay

Describe the overlay here

Evaluator

Here is a simple Yhc.Core evaluator:

import System
import Yhc.Core

norm               :: CoreExpr -> CoreExpr
norm (CoreCon c)   =  CoreApp (CoreCon c) []
norm x             =  x

try              :: CoreExpr -> (CoreExpr, CoreExpr) -> [CoreExpr]
try e (pat, rhs) =  case (norm e, norm pat) of
                       (CoreApp (CoreCon f) as, CoreApp (CoreCon g) bs)
                         | f == g     -> [CoreLet (zip (vars bs) as) rhs]
                       (e, CoreVar v) -> [CoreLet [(v,e)] rhs]
                       (a,b)
                         | isCoreConst a && a == b -> [rhs]
                       _              -> []
  where
    vars         =  map fromCoreVar

match      :: CoreExpr -> [(CoreExpr, CoreExpr)] -> CoreExpr
match e as =  head (concatMap (try (norm e)) as)

hnf                             :: Core -> CoreExpr -> CoreExpr
hnf p (CoreCase e as)           =  hnf p (match (hnf p e) as)
hnf p (CoreLet ds e)            =  hnf p (replaceFreeVars ds e)
hnf p (CoreCon c)               =  CoreCon c
hnf p (CoreFun f)               =  hnf p (CoreLam bs body)
  where
    CoreFunc _ bs body          =  coreFunc p f
hnf p (CoreLam [] e)            =  hnf p e
hnf p (CoreApp (CoreCon c) as)  =  CoreApp (CoreCon c) as
hnf p (CoreApp f [])            =  hnf p f
hnf p (CoreApp f (a:as))        =
  case hnf p f of
    CoreLam [] e                -> hnf p (CoreApp e (a:as))
    CoreLam (b:bs) e            -> hnf p (CoreLet [(b,a)] (CoreApp
                                              (CoreLam bs e) as))
hnf p (CorePos _ e)             =  hnf p e
hnf p e                         =  e

nf     :: Core -> CoreExpr -> CoreExpr
nf p e =  case hnf p e of
            CoreCon c -> CoreCon c
            CoreApp (CoreCon c) es -> CoreApp (CoreCon c) (map (nf p) es)
            e -> e

main = do [filename] <- getArgs
          core <- loadCore filename
          let core' = removeRecursiveLet core
          print (nf core' (CoreFun "main"))

Javascript backend

The Javascript backend is a unique feature of Yhc, something which our light weight approach makes easier.

Author: Dimitry

the ideas behind it, the Javascript FFI, browser compatability, the approach

The idea to write a converter from Haskell to Javascript has been floating around for a while [add links]. Many people expressed interest in such feature, but no practical implementation was visible.

Initial goals of this subproject were:

  • To develop a conversion program that converts the Yhc Core to Javascript, thus making it possible to execute arbitrary Haskell code within a web browser
  • To develop an unsafe interface layer for quick access to Javascript objects with ability to wrap arbitrary Javascript code into a Haskell-callable function
  • To develop a typesafe interface layer on top of the unsafe interface layer for access to the Document Object Model (DOM) available to Javascript executed in a web browser
  • To develop or adopt an existing GUI library or toolkit working on top of the typesafe DOM layer for actual development of clientside Web applications.

General concepts

The Javascript backend converts a linked and optimized Yhc Core file into a piece of Javascript code to be embedded in a XHTML document. The Javascript code generator attempts to translate Core expressions to Javascript expressions one-to-one with minor optimizations on its own, taking advantage of Javascript capability to pass functions around as values.

Three kinds of functions are present in the Javascript backend:

  • Unsafe functions that embed pieces of Javascript directly into the generated code: these functions pay no respect to types of arguments passed, and may force evaluation of their arguments if needed.
  • Typesafe wrappers that provide type signatures for unsafe functions. Such wrappers are either handwritten, or automatically generated from external interface specifications (such as the Document Object Model interface)
  • Regular library functions. These either come unmodified from the standard packages that come with Yhc, or are substituted by the Javascript backend using the Core overlay technique. An example of such a function is the toUpper function which is hooked up to the Javascript implementation supporting Unicode (the original library function currently works correctly only for the Latin1 range of characters).

Unsafe interfaces

The core part of unsafe interface to Javascript (or, in other words, Javascript FFI) is a pseudo-function unsafeJS.

The function has a type signature:

foreign import primitive unsafeJS :: String -> a

Which means that it takes a string. Type of the return value does not matter: the function itself is never executed. Its applications are detected by ycr2js at the time of Javascript generation.

The unsafeJS function should be called with a string literal. Neither explicitly coded (with (:)) list of characters nor concatenation of two or more strings will work. The converter will report an error in this situation.

A valid example of using unsafeJS is shown below:

global_YHC'_Primitive'_primIntSignum :: Int -> Int

global_YHC'_Primitive'_primIntSignum a = unsafeJS
  "var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;"

This is a Javascript overlay (in the sense that it overlays the default Prelude definition of the signum function) of a function that returns sign of an Int value.

The string literal unsafeJS is applied to is the Javascript code to be wrapped.

Below is the Javascript representation of this function found in generated code.

strIdx["F_hy"] = "YHC.Primitive.primIntSignum";
...
var F_hy=new HSFun("F_hy", 1, function(a){
var ea = exprEval(a); if (ea>0) return 1; else if (ea<0) return -1; else return 0;});

Typesafe wrappers

These functions add type safety on top of unsafe interface to Javascript. Sometimes they are defined within the same module as unsafe interfaces themselves, thus avoiding the exposure of unsafe interfaces to programmers.

An example of a handwritten wrapper is a function to create a new JSRef (a mechanism similar to IORef, but specific to Javascript).

data JSRef a

newJSRef :: a -> CPS b (JSRef a)

newJSRef a = toCPE (newJSRef' a)
newJSRef' a = unsafeJS "return {_val:a};"

Technically, a JSRef is a Javascript object with a property named _val that holds a persistent reference to some value. On the unsafe side, invoking a constructor for such an object would be sufficient. It is however desired that:

  • calls to functions creating such persistent references are properly sequenced with calls to funcitons using these references, and
  • type of values referred to were known to the Haskell compiler.

The unsafe part is implemented by the function newJSRef' which merely calls unsafeJS with proper Javascript constructor. The wrapper part newJSRef wraps the unsafe function into a CPS-style function, and is given a proper type signature, so the compiler is better informed.

In some cases, such typesafe wrappers may be generated automatically, using some external interface specifications provided by third parties for their APIs.

As an example of such API, the W3C DOM interface may be taken. For instance, this piece of OMG IDL:

  interface Text : CharacterData {
    Text               splitText(in unsigned long offset)
                                        raises(DOMException);
  };

is converted into:

data TText = TText

...

instance CText TText
 
instance CCharacterData TText
 
instance CNode TText

...

splitText :: (CText this, CText zz) => this -> Int -> CPS c zz
splitText a b = toCPE (splitText' a b)
splitText' a b
  = unsafeJS "return((exprEval(a)).splitText(exprEval(b)));"

again, giving the Haskell compiler better control over types of this function's (initially type-agnostic) arguments.

Usage of Continuation Passing Style

Initially it was attempted to build a monadic framework. The JS monad was designed to play the same role as the IO monad plays in "regular" Haskell programming. There were however arguments in favor of using CPS style:

  • CPS style involves less overhead as each expression passes its continustion iself instead of bind which takes the expression and invokes the continuation
  • CPS style results in Javascript patterns that are easy to detect and optimize for (this is one of the future plans).
  • The Fudgets library internals are written in CPS style, so taking CPS as general approach to programming is believed to make adoption of Fudgets easier.

Integration with DOM

The Web Consortium provides OMG IDL files to describe the API to use with the Document Object Model (DOM). An utility was designed, based on HaskellDirect, to parse these files and convert them to set of Haskell modules. The way interface inheritance is reflected differs from the original HaskellDirect way: in HaskellDirect this was achieved by declaration of "nested" algebraic data types, while the Javascript backend utility takes advantage of Haskell typeclasses, representing DOM types with phantom types, and declaring them instances of appropriate class(es).

Unicode support

Despite the fact that all modern Web browsers support Unicode, this is not the case with Javascript: no access to Unicode characters' properties is provided. In the same time it is impossible for a Haskell application running in a browser not to have access to such information. The approach used is the same as used in Hugs and GHC: the Unicode characters database file from Unicode Consortium was converted into a set of Javascript arrays, each array entry representing a range of character code values, or a case conversion rule for a range (for this implementation, Unicode support was limited to character category, and simple case conversions). First, a range is found by character code using binary search; then character category and case conversion distances (values to add to character core to convert between upper and lower cases) are retrieved from the range entry. The whole set of arrays adds about 70 kilobytes to the web page size, if embedded inside a <script> tag.

Using the Core overlay technique, Haskell character functions (like toUpper, isAlpha, etc.) were hooked up to the Javascript implementations supporting Unicode. This did not result in considerable slowdowns, rather, some browsers even showed minor speedup in heavy functions like read::String -> Int.

Browsers compatibility

Compatibility with major browsers such as Netscape/Mozilla/Firefox and Microsoft Internet Explorer, and also Opera was observed. Compatibility with Safari has not been reached so far.

Future plan: Fudgets

It is planned to port some portion of Fudgets, so it becomes possible to write Web applications using this library. Several experiments showed that the Stream Processors (SP), and some parts of Fudget Kernel layers worked within a Javascript application. More problems are expected with porting the toplevel widgets due to differences in many concepts between Web browser and X Window, for which the Fudgets library was originally developed.

Wacky features

Yhc is going in many interesting directions. Some of these directions are likely to become very important in the future, some are likely to fade away. Yhc is a genuine research bed for brand new ideas.

Author: All

When you don't spend all the time on wacky type systems, you get a lot more time left to work on Wacky other stuff. Include Java interpetter, .NET back end, Javascript back end, Python interpretter, Hat debugging, yhi-stack, whole program optimisation. Lots of these things are breeding grounds for various useful technologies, and most are marching towards genuine usefulness.

The Future

The original structure of nhc was as one big set of modules – some were broadly divided into type checking/parsing etc, but the overall structure and grouping was weaker than in other compilers – particularly GHC which has a very well defined structure. One of our first actions was to split up the code into hierarchical modules, introducing Type.*, Parse.* etc to better divide the components.

Some of the sections are shared – for example both hpc and Hat share the nhc Parse modules, and could equally share the Yhc parse modules. This direction is obviously attractive – by opening up more parts of the compiler for reuse in other projects we gain in many ways:

  • The authors of other project benefit from having libraries to build upon, not needlessly replicating the functionality already present in Yhc.
  • We benefit from having more people focused and dependent on our code, resulting in a bigger community.
  • We get additional bug reports.
  • By structuring code as a library, “hacks” that before might have been tolerated quickly take on a whole new aura of ugliness.
  • By isolating each component, we minimize the amount of whole compiler knowledge required.

This direction attracts us, and we see this as the future direction of our compiler.

Interestingly, almost at the same time the GHC developers have introduced a GHC API. We do not wish to replicate the GHC API, in particular their API is more complex than ours. We also wish to design an API, and then construct some future version of Yhc around the API we develop – rather than exporting an API.

For the next major Yhc version we would hope to rewrite the parser, type checker, renamer and desugarer – leaving only the new ByteCode and Core aspects still intact. This is clearly a lot of work, and we view this as a very long term plan. For the moment we hope to push the Core and ByteCode features of Yhc into new areas.

In the meantime things like libary compatability and Haskell' loom more closely on horizon. Our progress towards these goals relies on the help of volunteers.

Acknowledgements

Thanks to everyone who has submitted a patch, become a buildbot, reported bugs or done anything else to benefit the Yhc project. We've put together a list of most of the people (if we've missed you, please shout, and we'll add your name in future versions of this document!)

Andrew Wilkinson, Bernie Pope, Bob Davie, Brian Alliet, Christopher Lane Hinson, Dimitry Golubovsky, Gabor Greif, Goetz Isenmann, Isaac Dupree, Kartik Vaddadi, Krasimir Angelov, Malcolm Wallace, Michal Palka, Mike Dodds, Neil Mitchell, Robert Dockins, Samuel Bronson, Stefan O'Rear, Thorkil Naur, Tom Shackell