Personal tools

Parsing expressions and statements

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
(update to new Haskell Platform (parsec-3))
(Made easier for copy/paste)
(One intermediate revision by one user not shown)

Latest revision as of 06:35, 3 February 2012

We use the Parsec library to parse a small grammar of expressions and statements. The main purpose is to showcase
, which cover a large class of common applications.

(Parsec comes with Haskell Platform; also available as parsec.)

We will use these modules:

> import Control.Applicative((<*))
> import Text.Parsec
> import Text.Parsec.String
> import Text.Parsec.Expr
> import Text.Parsec.Token
> import Text.Parsec.Language


[edit] 1 Sample grammar

Here is the expression grammar:

expr  ::= var | const | ( expr ) | unop expr | expr duop expr
var   ::= letter { letter | digit }*
const ::= true | false
unop  ::= ~
duop  ::= & | =

Operator precedence from high to low: ~, &, =. Both binary operators are left-associative.

Here is the statement grammar:

stmt ::= nop | var := expr | if expr then stmt else stmt fi | while expr do stmt od
       | stmt { ; stmt }+

In addition to these grammars, we will also allow block comments like {- this is a comment -}.

We will parse these grammars into an internal representation (abstract syntax tree). (In some applications you skip the internal representation and go straight to evaluation or code generation or... but let's fix just one goal here.) Here is an internal representation:

> data Expr = Var String | Con Bool | Uno Unop Expr | Duo Duop Expr Expr
>     deriving Show
> data Unop = Not deriving Show
> data Duop = And | Iff deriving Show
> data Stmt = Nop | String := Expr | If Expr Stmt Stmt | While Expr Stmt
>           | Seq [Stmt]
>     deriving Show

[edit] 2 Sample parser

Here is the plan. Fill in a record to specify comments, roles of characters (what goes into identifiers, what goes into operators), reserved words, reserved operators. Give this record to
and get a record of utility parsers that enable us to work at the token level (rather than the character level). The expression parser is obtained with the help of
. The statement parser is written as a recursive descent parser.

[edit] 2.1 Define symbols

is the name of the record type we have to fill in. Many presets are provided so that we can pick one and just customize a few fields. A minimalist preset is
and we change it with:
def = emptyDef{ commentStart = "{-"
              , commentEnd = "-}"
              , identStart = letter
              , identLetter = alphaNum
              , opStart = oneOf "~&=:"
              , opLetter = oneOf "~&=:"
              , reservedOpNames = ["~", "&", "=", ":="]
              , reservedNames = ["true", "false", "nop",
                                 "if", "then", "else", "fi",
                                 "while", "do", "od"]

[edit] 2.2 Make token parser

When we pass the above record to
, the return value is a record of type
. Its fields are little parsers that we can use. They parse and return various tokens (identifiers, operators, reserved things, all sorts of brackets) and skip comments as we have specified. They also eat whitespaces after the tokens. Using them, our expression parser and statement parser can be written at the token level and without worrying about whitespaces. The only caveat: they don't eat whitespaces at the very, very, very beginning, and we have to do that ourselves, but even that is made easy. There are only a few of the many provided parsers we will use here. Using record pattern matching, we call
makeTokenParser def
and pick out just those we need:
TokenParser{ parens = m_parens
           , identifier = m_identifier
           , reservedOp = m_reservedOp
           , reserved = m_reserved
           , semiSep1 = m_semiSep1
           , whiteSpace = m_whiteSpace } = makeTokenParser def

Here is an overview of what these parsers do:

m_parens p
parses an open parenthesis, then runs
, then parses a close parenthesis, and returns what
parses and returns an identifier, checking that it does not clash with a reserved word.
by example:
m_reservedOp ":="
checks that the next token is the
reserved operator.
by example:
m_reserved "od"
checks that the next token is the
reserved word.
m_semiSep1 p
parses and returns a semicolon-separated sequence of one or more
eats whitespaces.

You are encouraged to explore these and other parsers provided in the record. They are all very handy.

[edit] 2.3 Expression parser

The expression parser is obtained from
. We do not need to write our own recursive descent parser or perform left/right-factoring.
exprparser :: Parser Expr
exprparser = buildExpressionParser table term <?> "expression"
table = [ [Prefix (m_reservedOp "~" >> return (Uno Not))]
        , [Infix (m_reservedOp "&" >> return (Duo And)) AssocLeft]
        , [Infix (m_reservedOp "=" >> return (Duo Iff)) AssocLeft]
term = m_parens exprparser
       <|> fmap Var m_identifier
       <|> (m_reserved "true" >> return (Con True))
       <|> (m_reserved "false" >> return (Con False))

We give operator precedence (by ordering them in the table), association, "semantic action", and how to parse an "atomic term" including the parenthesized case (the only place we do a recursive descent, and it's trivial). It is possible to place several operators at the same precedence, though not shown here. The general format for the table is hard to explain but easy to illustrate and copy.

[edit] 2.4 Statement parser

The statement parser is best done by recursive descent, with special treatment to the semicolon-separated list (easy with
), utilizing the handy token parsers and the expression parser. We also remember to skip whitespaces just once at the very, very, very beginning:
mainparser :: Parser Stmt
mainparser = m_whiteSpace >> stmtparser <* eof
      stmtparser :: Parser Stmt
      stmtparser = fmap Seq (m_semiSep1 stmt1)
      stmt1 = (m_reserved "nop" >> return Nop)
              <|> do { v <- m_identifier
                     ; m_reservedOp ":="
                     ; e <- exprparser
                     ; return (v := e)
              <|> do { m_reserved "if"
                     ; b <- exprparser
                     ; m_reserved "then"
                     ; p <- stmtparser
                     ; m_reserved "else"
                     ; q <- stmtparser
                     ; m_reserved "fi"
                     ; return (If b p q)
              <|> do { m_reserved "while"
                     ; b <- exprparser
                     ; m_reserved "do"
                     ; p <- stmtparser
                     ; m_reserved "od"
                     ; return (While b p)

[edit] 3 Sample usage

Finally, here is a little piece of code for casual testing. This function parses the string parameter and outputs either a parse error or the answer.

> play :: String -> IO ()
> play inp = case parse mainparser "" inp of
>              { Left err -> print err
>              ; Right ans -> print ans
>              }