Libraries and tools/HJS

From HaskellWiki
Jump to navigation Jump to search

HJS - Haskell Javascript Parser and Interpreter

Current Status - March 2007

Most program fragments can be parsed. This includes a Haskell in Javascript example, Yhc/Javascript, and JS implemented in JS files, Narcissus

A smaller set of program fragments can be run. This includes a Javascript inheritance framework

Working toward running the YCR2JS and Narcissus files.

Current version is available from HackageDB

Later: Review the parser code, inefficiencies and poor Haskell constructs.


HJS is based on the grammar and behaviour specified in ECMA-262, 3rd Edition with additions and modifications from JavaScript 1.5.

Javascript is typically associated with the web dynamic pages and has been called The World's Most Misunderstood Programming Language


-- Debug drops into debugger on start, Trace shows
-- tracing and ShowHeap will print the heap at program completion.
data RunFlag = Debug | Trace | ShowHeap deriving (Show,Eq)

-- Run all files in 'testsuite'
runTest :: IO ()

-- Run file
runFile :: [RunFlag] -> FilePath -> IO ()

-- Run supplied string.
runString :: [RunFlag] -> String -> IO Bool


With Monad Transformers and Modular Interpreters as the starting point.


This is the type that carries primitive values, objects (in the form of an ID), references and other internal data.

type Value =  
       -- Primitive values
              Either Int                       
            ( Either String 
            ( Either Bool                              
            ( Either Undefined 
            ( Either Null 
       -- Internal values
            ( Either ObjId                     -- Object ID
            ( Either Ref                       -- Reference
            ( Either [ObjId]                   -- List of object IDs
            ( Either CallValue                 -- Code, used for the 'Call' property of Function object. 
            ( Either [String]                  -- List of argument names
	    ( Either BreakContinue
            ()   ))))))))))

CallValue allows us to easily incorporate builtin functions along side Javascript functions.

data CallValue = CallJS [SourceElement] | CallBuiltIn (InterpM Value)

The class

class SubType sub sup where
    inj :: sub -> sup
    prj :: sup -> Maybe sub

allows us to inject into a Value type, and project out of it.


-- Context is a scope chain, variable object, this and current function
type Ctx     = ([ObjId], ObjId, ObjId, ObjId)

-- State is a context stack, object store and line and column of current statement being used.
type JSState = ([Ctx], Map ObjId Object,(Int,Int))

-- Compose state and error. The order is important, if it where the other way round state would be lost.
type InterpM = ErrorT Throwable (StateT JSState IO)


The lexer and parser are two seperate instances of a Parsec parser. The tokens which are returned by the Lexer are:

data Token
      | TokenInt Int
      | TokenIdent String
      | TokenStringLit String
      | TokenNL
      | TokenRegex (String,String)
      | TokenEof
      | TokenRID String
      | TokenROP String
 deriving (Show,Eq)

The lexer returns a list of tuples (SourcePos,Token).

The actual parser constructs follow the grammer rules but are altered to get around left recursion and a whole swath of them ignored by using Parsec's builin expression handling.

Statements are decorated with line and column numbers to aid locating of run time errors. As a statement is interpreted this information is added to the state for use in error reporting.

Two challenges were literal regular expressions and automatic semi-colon insertion.

The parser was previously written using Happy and Alex but I found the resulting Haskell file large, slow to compile and slow to run. Using Parsec gives me more control.

Object Framework

Object is a record type with an ID and a map of properties:

data Object = Object {
                      idd :: ObjId,
		      prototype :: Maybe ObjId,
		      klass :: String,
		      value :: Maybe Value,
		      properties :: M.Map String (Value,[Attribute]) 
		     } deriving Show

Properties on an object have a list of attributes which indicate if the property is to be included in enumeration, is writable etc. There are functions to get and set property values. The 'getProperty' function will traverse up the __proto__ chain looking for the property.

getProperty :: ObjId -> String -> InterpM Value
putProperty :: ObjId -> String -> Value -> InterpM ()

Interpreter Class

All syntax constructs are instances of this class.

class InterpC t where
    interp  :: t -> InterpM Value

Some examples are:

instance (InterpC t1, InterpC t2) => InterpC (Either t1 t2) where
   interp (Left x) = interp x
   interp (Right x) = interp x

interp (Assign left AssignNormal right) = do 
		                                  v <- interp right
                                                  r <- interp left
                                                  putValue r v
                                                  return v

interp (DoWhile s e ) = do
                            vv <- (interp s) `catchError` handleBreakContinue 
                            case (prj vv) of
                                  (Just Break)  -> return (inj (0::Int))
                                  _   -> do 
                                            b <- interp e >>= toRealBool
                                            case b of
                                                  False -> return vv
                                                  _ -> interp (DoWhile s e )

Note: Can the DoWhile case be simplified?

Exceptions, Breaks, Continues and Returns

These are handled by throwing and catching (in the Error monad) values of the type:

data Throwable = ThrowReturn Value 
	       | ThrowBreak (Maybe String)
	       | ThrowContinue (Maybe String)
	       | ThrowException Value
	       | ThrowTypeError String
	       | ThrowInternalError String 
		 deriving Show

For each of the above there is a specific handling function. If the value does not match the correct constructor then the error is re-thrown. For example

throwReturn v = throwError (ThrowReturn v)
handleReturn (ThrowReturn v) = return v
handleReturn s = throwError s

Type Conversion

Type conversion within the Value type is done using the Convert class:

class Convert a where
   typeOf :: a -> [Int]
   typeOf _ = []
   toBoolean :: a -> InterpM Value
   toBoolean _ = return $ inj False
   toNumber :: a -> InterpM Value
   toNumber _ = inj (0::Int)
   toString :: a -> InterpM Value
   toString _ = return $ inj ""
   toPrimitive :: PrimHint -> a  -> InterpM Value
   toPrimitive  _ i = return $ undefinedValue

For instance, toNumber will convert a Value containing something to one containing a number. These operate in the Monad as some will make calls to functions or get properties.

In order to get these to work with Value values, native Haskell types and Either are made instances of Convert

Conversion to native Haskell types is done using the functions

toRealInt :: Convert a => a -> InterpM Int
toRealString :: Convert a => a -> InterpM String
toRealBool :: Convert a => a -> InterpM Bool

Tracing and Debugging

As the first Monad in the Monad stack IO, it is easy to do tracing:

traceM :: String -> InterpM ()
traceM s = do 
               f <- getFlags
               case elem Trace f of
                   True -> liftIO $ putStrLn s 
                   False -> return ()

Run with 'Trace' in the run flags list to get tracing.

A simple debugger is also builtin. Running with 'Debug' in the run flags will cause execution to drop into the debug on program start. The following type represents the debugger commands.

data DebugAction = DBBreak Int          -- Set a break point at a line
		  | DBContinue          -- Continue execution to next break point
		  | StepOver            -- Step over a statement
		  | StepInto            -- Step into a statement (not implemented yet)
		  | PrintObj Int        -- Print object with ID
		  | PrintHeap           -- Print the whole heap
		  | PrintVar String     -- Print variable
		  | PrintLine           -- Print current statement
		  | PrintStack          -- Print the stack (not implemented yet)
                  | Eval String         -- Eval the supplied JS
		    deriving (Show,Eq)