Haskell Quiz/Bytecode Compiler/Solution Justin Bailey

From HaskellWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This solution isn't the cleanest or quite tested, but if you load it in hugs and run:

compile $ parse $ tokenize $ stmt1
compile $ parse $ tokenize $ stmt2
...
compile $ parse $ tokenize $ stmt8

You'll see the byte codes generated. I didn't implement any optimizations (i.e. using SWAP).

Sadly, the parsing algorithm does not preserve precedence correctly. For example, parsing "2*2+2" produces

> parse $ tokenize "2*2+2"
Statement Mult (Statement Plus (Val 2) (Val 2)) (Val 2)

Which is incorrect. Any suggestions?

The code below is literate Haskell.

\begin{code}
import Data.Char(isAlpha, isDigit, isSeparator)

data Op = Plus | Minus | Mult | Div | Pow | Mod
  deriving (Show, Eq)
  
data Token = RightParen 
           | LeftParen
           | Number Integer
           | Operator Op
  deriving (Show, Eq)

data Expression = Statement Op Expression Expression
           | Val Integer
           | Empty
  deriving (Show)

data Bytecode = NOOP | CONST Integer | LCONST Integer
            | ADD
            | SUB
            | MUL
            | POW
            | DIV
            | MOD
            | SWAP
  deriving (Show)

expr1 = Statement Plus (Val 1) (Val 2)
expr2 = Statement Mult (Statement Plus (Val 1) (Val 2))  (Val 3)

-- Take a statement and evaluate 
eval (Val n) = n
eval (Statement op left right)
        | op == Mult = eval left * eval right
        | op == Minus = eval left - eval right
        | op == Plus = eval left + eval right
        | op == Div = eval left `div` eval right
        | op == Pow = eval left ^ eval right
        | op == Mod = eval left `mod` eval right

-- Take a string and tokenize it
tokenize [] = []
tokenize ('(':xs) = LeftParen : tokenize xs
tokenize (')':xs) = RightParen : tokenize xs
tokenize (x:xs) | isDigit x = let (val, remaining) = span isDigit (x:xs)
                              in Number (read val) : tokenize remaining
                | isOperator x = toOperator x : tokenize xs
                | isSeparator x = tokenize (dropWhile isSeparator xs)
                where
                  makeOp '+' = Just $ Operator Plus
                  makeOp '-' = Just $ Operator Minus
                  makeOp '*' = Just $ Operator Mult
                  makeOp '/' = Just $ Operator Div
                  makeOp '^' = Just $ Operator Pow
                  makeOp '%' = Just $ Operator Mod
                  makeOp _ = Nothing
                  toOperator x = case makeOp x of
                                      Just x -> x
                                      _ -> error "Bad operator."
                  isOperator x = case makeOp x of
                                      Just x -> True
                                      _ -> False

stmt1 = "1 + 2" -- 3
stmt2 = "1 + 2 * 3" -- 7
stmt3 = "(1 + 2) * 3" -- 9
stmt4 = "4 - 5 * 2" -- -6
stmt5 = "5 * (2 - 4)" -- -10
stmt6 = "(1*3)*4*(5*6)" -- 360
stmt7 = "2^(2+(3/2)^2)" -- 8
stmt8 = "(10%3)*(2+2)" -- 4

{-
 Based on http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

 E -> E + E
 E -> E * E
 E -> E / E
 E -> E - E
 E -> E % E
 E -> E ^ E
 E -> ( E )
 E -> n
 
 Transform to
 
  E --> Exp(0) 
  Exp(p) --> P {B Exp(q)} 
  P --> "(" E ")" | v
  B --> "+" | "-"  | "*" |"/" | "^" | "%"

 Precedence
 
  ^       3
  *, /, % 2
  +, -    1 

-}

-- define precdence of operators
precedence Plus = 1
precedence Minus = 1
precedence Mult = 2
precedence Div = 3
precedence Mod = 2
precedence Pow = 3

-- Precedence comparison - gets precedence of 
-- given operator and determines if its greater than the value given.
(>*) op val = precedence op >= val  

-- Precedence addition - for left associative operators,
-- return its precedence + 1. For righ associative, just return the operators
-- precedence.
prec_add p@(Pow) = precedence p
prec_add p = 1 + precedence p

parse [] = error "Can't parse empty list of tokens"
parse tokens = fst $ parseE tokens 0
  where parseE tokens prec
          = let (p, remaining) = parseP tokens prec
            in
              if remaining == []
              then (p, remaining)
              else case head remaining of
                Operator op ->
                  if op >* prec
                  then
                    let (right, rest) = parseE (tail remaining) $ prec_add op
                    in (Statement op p right, rest)
                  else let (left, rest) = parseE (tail remaining) $ prec
                    in (Statement op left p, rest)
                _ -> (p, remaining)
        parseP ((Number n):ts) prec = (Val n, ts)
        parseP ((LeftParen):ts) prec
          = let (e, remaining) = parseE ts 0
            in (e, tail remaining)

compile stmt = compile' stmt []

compile' (Statement op left right) instr =
  let li = compile' left instr
      ri = compile' right instr
      lri = li ++ ri
  in case op of
    Plus -> lri ++ [ADD]
    Minus -> lri ++ [SUB]
    Mult -> lri ++ [MUL]
    Div -> lri ++ [DIV]
    Mod -> lri ++ [MOD]
    Pow -> lri ++ [POW]
compile' (Val n) instr = instr ++ [LCONST n]

\end{code}