Difference between revisions of "A practical Template Haskell Tutorial"

From HaskellWiki
Jump to navigation Jump to search
m (→‎Reification: Fix syntax)
m (→‎Reification: Fix syntax)
Line 246: Line 246:
 
genFmap :: [Con] -> Q Dec
 
genFmap :: [Con] -> Q Dec
 
genFmap cs
 
genFmap cs
= do funD `fmap (map genFmapClause cs)
+
= do funD 'fmap (map genFmapClause cs)
   
 
genFmapClause :: Con -> Q Clause
 
genFmapClause :: Con -> Q Clause

Revision as of 21:10, 21 June 2018

This tutorial explores the Glasgow Haskell Compiler's compile-time meta programming in Template Haskell. It motivates use cases for meta programming and explains the different Template Haskell features on simple toy programs. The aim is to give an overview of Template Haskell's functionality in an example-driven manner.

Introduction

Template Haskell (TH) is the standard framework for doing type-safe, compile-time meta programming in the Glasgow Haskell Compiler (GHC). It allows writing Haskell meta programs, which are evaluated at compile-time, and which produce Haskell programs as the results of their execution.

Template Haskell was conceived by Tim Sheard and Simon Peyton Jones[1] by drawing on the ideas of Lisp macros, but in the typed setting of Haskell. Since then, the original implementation has evolved quite a bit[2][3]. Most notably, in 2007 Geoffrey Mainland added support for quasi quoting[4], which makes the embedding of domain specific languages into the Haskell host language much easier.

As it exists today, Template Haskell has two main areas of application: Haskell code generation at compile-time and facilitating the embedding of domain specific languages.

As a code generator, Template Haskell empowers a user to write many, syntactically different, programs all at once by means of a single meta program. All that is needed is a uniform, algorithmic description to create the different result programs. And the meta program then precisely implements the algorithm to compute all the different result programs as its output. This proves useful for example to avoid writing the same repetitive, boilerplate code over and over again. To this end, Template Haskell is used (among many others) in the aeson library to automatically derive a data type's ToJSON and FromJSON instances for JSON serialization; and in the lens library to mechanically create a data type's lenses.

As a framework for creating domain specific languages (EDSLs), Template Haskell allows a user to embed programs written in another programming language inside of a Haskell program. This enables writing parts of the program in the concrete, domain specific syntax of a different programming language. It has the benefit to think about -- and to express -- domain specific problems in the language best suited for the task. In particular, it lets a user focus on the domain specific problem and removes all additional language burdens induced by inconvenient syntax, unsuited control constructs, etc. Programs from the embedded language are parsed and translated into corresponding (but syntactically heavier) Haskell code at compile-time by Template Haskell. In this sense, (e.g.,) the shakespearean template languages from the shakespeare library use Template Haskell at their core. They expose succinct domain specific languages to write HTML, CSS, and Javascript code inside of a Haskell based web application.

Template Haskell by Examples

In this section, we will review the Template Haskell features to write meta programs. The first set of examples show-cases Template Haskell's potential as a code generator; the second set of examples highlights its facilities to create embedded domain specific languages (EDSLs). All examples require GHC's language extension TemplateHaskell to be enabled.

To avoid confusion in the sequel, we distinguish between meta programs and object programs. Meta programs are the Haskell programs that run at compile-time and which generate Template Haskell object programs as the results of their execution; they are the programs that devise or manipulate other programs by some algorithmic means. Object programs, on the other hand, are the Template Haskell programs manipulated and built by the Haskell meta programs at compile-time.

Template Haskell as a Code Generator

As an introductory example, consider Haskell's Prelude function curry :: ((a,b) -> c) -> a -> b -> c, which converts a function taking a pair to its curried equivalent. Unfortunately, there are no Prelude functions that provide the same currying functionality for functions taking arbitrary -tuples. Moreover, having to write more than a few of these functions manually is, while trivial, a very repetitive and cumbersome task. Instead we wish to generate needed curry3, curry5, or curry8 functions through a single meta program on demand. Template Haskell lets us do just this. The idea is to write a meta function curryN :: Int -> Q Exp which, given a number n >= 1, constructs the source code for an -ary curry function:

{-# LANGUAGE TemplateHaskell #-}
import Control.Monad
import Language.Haskell.TH

curryN :: Int -> Q Exp
curryN n = do
  f  <- newName "f"
  xs <- replicateM n (newName "x")
  let args = map VarP (f:xs)
      ntup = TupE (map VarE xs)
  return $ LamE args (AppE (VarE f) ntup)

For input , meta function curryN builds a lambda abstraction LamE that pattern matches against a function f and argument variables x1, x2, ..., xn; in its body, it then applies function f to the -tuple (x1, x2, ..., xn) derived from the pattern matched variables. The names used to refer to the variables f and x1 through xn are generated monadically by function newName :: String -> Q Name to always generate fresh names not used anywhere else. Hence, the value returned by curryN is a monadic computation of type Q Exp. When executed, this monadic computation yields an expression Exp representing the object program of an -ary curry function. For example, (curryN 3) returns a monadic computation that yields an expression representing a curry3 function of type ((a, b, c) -> d) -> a -> b -> c -> d in abstract syntax.[5]


To run a meta program like curryN at compile-time, we enclose it with Template Haskell's splice operator $ by writing (e.g.,) $(curryN 3). This evaluates the meta program curryN 3 and puts the resulting object program \f x1 x2 x3 -> f (x1, x2, x3) in place of the splice. In general, the splice operator $ can be applied to any monadic Q computation, hereby performing this computation at compile-time and inserting the resulting object program as real Haskell code. To ensure type safety, meta programs to be run are type checked beforehand to indeed yield a valid Template Haskell object program. Hence, only imported, fully-typechecked meta programs can be run via the splice operator $: in particular, we have to evaluate the meta program $(curryN 3) in a separate module to where curryN is defined.

To generate function declarations for the first curry functions, we can devise a further meta program on top of curryN as follows:

genCurries :: Int -> Q [Dec]
genCurries n = forM [1..n] mkCurryDec
  where mkCurryDec ith = do
          cury <- curryN ith
          let name = mkName $ "curry" ++ show ith
          return $ FunD name [Clause [] (NormalB cury) []]

Running $(genCurries 20) will then splice in the first 20 curry functions at compile-time, namely:

curry1  = \ f x1 -> f (x1)
curry2  = \ f x1 x2 -> f (x1, x2)
curry3  = \ f x1 x2 x3 -> f (x1, x2, x3)
curry4  = \ f x1 x2 x3 x4 -> f (x1, x2, x3, x4)
...
curry20 = \ f x1 x2 ... x20 -> f (x1, x2, ..., x20)

Note that in this case, genCurries returns a list of top-level function declarations that bind the anonymous lambda abstractions built by curryN. Also, to name the function bindings, we use function mkName :: String -> Name instead of newName :: String -> Q Name. The reason is that here we want to generate functions curry1 to curry20 with exactly the prescribed names, so they can be captured and referred to from other parts of the program.

Evaluating Haskell (meta) programs at compile-time and splicing in the generated object programs as regular Haskell code is the first central building block of Template Haskell. The two other core mechanisms are exhibited by the implementations of curryN and genCurries: algebraic data types and the quotation monad Q.

First, object programs created by Template Haskell are represented as regular algebraic data types, describing a program in the form of an abstract syntax tree. The Template Haskell library provides algebraic data types Exp, Pat, Dec, and Type to represent Haskell's surface syntax of expressions, patterns, declarations, and types, respectively. Virtually every concrete Haskell syntactic construct has a corresponding abstract syntax constructor in one of the four ADTs. Furthermore, all Haskell identifiers are represented by the abstract Name data type. By representing object programs as regular algebraic data types (and thus as data), normal Haskell can be used as the meta programming language to build object programs.

Second, TH object programs are built inside the quotation monad Q. This monad is performed by the splice operator "$" at compile-time as part of evaluating the meta program. In the examples so far, the Q monad was only needed to provide fresh identifiers with function newName :: String -> Q Name for the generated Haskell expressions. The other main feature that requires a monadic construction of object programs is reification, which allows to query compile-time information during the object program's construction. We will explain reification in detail later.

Thus, Template Haskell's core functionality constitutes evaluating object programs with "$" and building them from algebraic data types inside the quotation monad Q. However, constructing object programs in terms of their abstract syntax trees is quite verbose and leads to clumsy meta programs. Therefore the Template Haskell API also provides two further interfaces to build object programs more conveniently: syntax construction functions and quotation brackets.

Syntax construction functions directly relate to the syntax constructors from the algebraic data types Exp, Pat, Dec, and Type for representing Haskell code. However, they hide the monadic nature of building object programs. For example, recall our definition of the genCurries meta function from above:

genCurries :: Int -> Q [Dec]
genCurries n = forM [1..n] mkCurryDec
  where mkCurryDec ith = do
          cury <- curryN ith
          let name = mkName $ "curry" ++ show ith
          return $ FunD name [Clause [] (NormalB cury) []]

To use the object program generated by the sub call to curryN in the larger context of the returned function declaration, we have to first perform curryN and bind its result to cury. The reason is that we have to account for curryN's generation of fresh names before we can continue. Using syntax construction functions instead of data constructors, however, abstracts from the monadic construction of genCurries, thus making its code a little shorter:

genCurries :: Int -> Q [Dec]
genCurries n = forM [1..n] mkCurryDec
  where mkCurryDec ith = funD name [clause [] (normalB (curryN ith)) []]
          where name = mkName $ "curry" ++ show ith

The new funD, clause, and normalB functions directly correspond to the formerly used FunD, Clause, and NormalB constructors. The only difference lies in their types:

FunD :: Name -> [Clause] -> Dec funD :: Name -> [Q Clause] -> Q Dec
Clause :: [Pat] -> Body -> Clause clause :: [Q Pat] -> Q Body -> Q Clause
NormalB :: Exp -> Body normalB :: Q Exp -> Q Body

While the syntax constructors work with raw TH expressions, the syntax construction functions expect their monadic counterparts. They construct a TH object program directly in Q, thus freeing the API consumer from doing the monadic wrapping and unwrapping manually. For every syntax constructor, there is a corresponding monadic syntax construction function provided.

On top of syntax construction functions, quotation brackets are a further shortcut for representing Haskell code. They allow to specify an object program using just regular Haskell syntax by enclosing it inside oxford brackets [| .. |]. That way, object programs can be specified yet much more succinctly. For example, a meta program building a Haskell expression for the identity function is still quite verbose, if expressed with either ADTs or syntax construction functions:

genId :: Q Exp
genId = do
  x <- newName "x"
  lamE [varP x] (varE x)

Using quotation brackets, writing the same meta program can be abbreviated much further as:

genId' :: Q Exp
genId' = [| \x -> x |]

Quotation brackets quote regular Haskell code as the corresponding object program fragments inside the Q monad. There are quotation brackets for quoting Haskell expressions ([e| .. |]|), patterns ([p| .. |]), declarations ([d| .. |]), and types ([t| .. |]). Writing [| .. |] is hereby just another way of saying [e| .. |]. Using quotation brackets in a sense lifts Haskell's concrete syntax into corresponding object program expressions inside the Q monad. By doing so, quotation brackets represent the dual of the already introduced splice operator $: Evaluating a meta program with "$" splices in the generated object program as real Haskell code; in contrast, quotation brackets [| .. |] turn real Haskell code into an object program. Consequently, quotation brackets and the splice operator cancel each other out. The equation $([| e |]) = e holds for all expressions e and similar equations hold for declarations, and types[2].

In addition, there is support for quoting Haskell (value and type) identifiers as corresponding Names inside Template Haskell. This allows to refer to regular Haskell identifiers from within TH object programs. For example, writing 'genId yields a TH Name referring to the genId identifier. Similarly, ''Q gives a Name referring to the Q type identifier.

Generic Maps

As a second example that uses both syntax construction functions as well as quotation brackets, let's consider a meta program mapN :: Int -> Q Dec to build "generic" map functions at compile-time. Invoking $(mapN 1) should generate the well-known standard function map :: (a -> b) -> [a] -> [b]; evaluating $(mapN 2) should splice in a binary map function of type (a -> b -> c) -> [a] -> [b] -> [c], and so on.[6]

mapN :: Int -> Q Dec
mapN n
  | n >= 1    = funD name [cl1, cl2]
  | otherwise = fail "mapN: argument n may not be <= 0."
  where
    name = mkName $ "map" ++ show n
    cl1  = do f  <- newName "f"
              xs <- replicateM n (newName "x")
              ys <- replicateM n (newName "ys")
              let argPatts  = varP f : consPatts
                  consPatts = [ [p| $(varP x) : $(varP ys) |]
                              | (x,ys) <- xs `zip` ys ]
                  apply     = foldl (\ g x -> [| $g $(varE x) |])
                  first     = apply (varE f) xs
                  rest      = apply (varE name) (f:ys)
              clause argPatts (normalB [| $first : $rest |]) []
    cl2  = clause (replicate (n+1) wildP) (normalB (conE '[])) []

The implementation of mapN is very much in the spirit of meta function curryN from the first example. For instance, evaluating splice $(mapN 3) splices in the following map function at compile-time:

map3 f (x:xs) (y:ys) (z:zs) = f x y z : map3 f xs ys zs
map3 _ _      _      _      = []

Nonetheless, meta function mapN exhibits a couple of new Template Haskell features: First, quotation brackets and splices are used in several places to abbreviate the object program construction. For example, helper definition apply used to generate map3's body f x y z : map3 f xs ys zs shows the use of quotation brackets; it also highlights how splicing ($) and quotes ([| .. |]) cancel each other out. Second, identifier quotes (namely, '[]) are used to create an object program Name that refers to Haskell's built-in list constructor []. Third, the example advertises how all three APIs for building Template Haskell object programs can be interleaved. The lowermost verbose API of building a raw TH data value inside the quotation monad Q can be abbreviated, where possible, with syntax constructor functions and quotation brackets.

Lastly, the mapN example exemplifies how Haskell's static scoping is extended to object programs. The scoping principle for object programs is just as in normal Haskell: Identifiers are bound to their lexically enclosing binders in scope at the point the object program is defined. Quotation brackets and splices don't alter static scopes, even though splices may bring an object program into scope at a location, where a conflicting closure is present. For example, consider this snippet:

x :: Int
x = 42

static :: Q Exp
static = [| x |]

plus42 :: Int -> Int
plus42 x = $static + x

Here the occurrence of x in static refers to the global identifier x that is lexically in scope during its definition. Splicing in static into a different scope later where a different local x is present (i.e., plus42's local identifier x), doesn't alter the link between static's x and the global identifier x.

The only exception to static scoping in Template Haskell are the names generated by mkName :: String -> Name. These names implement dynamic scoping and can be captured in spliced-in code. Changing the previous snippet to

x :: Int
x = 42

dynamic :: Q Exp
dynamic = VarE (mkName "x")

times2 :: Int -> Int
times2 x = $dynamic + x

results in the identifier x spliced in by $dynamic to be bound to the closest x in scope. Hence, its binder is times2's local identifier x and not the global x.

Reification

The final major Template Haskell feature not yet described is program reification. Briefly, reification allows a meta program to query compile-time information about other program parts while constructing the object program. It allows the meta program to inspect other program pieces to answer questions such as: "what's this variable's type?", "what are the class instances of this type class?", or "which constructors does this data type have and and how do they look like?". The main use case is to generate boilerplate code which auto-completes manually written code. A prime example is to generically derive type class instances from bare data type definitions.

Suppose we've defined the following polymorphic data types for representing potentially erroneous values, lists, and binary trees, respectively:

data Result e a = Err e | Ok a
data List     a = Nil | Cons a (List a)
data Tree     a = Leaf a | Node (Tree a) a (Tree a)

Moreover, suppose we want to derive Functor instances for all of these types. Deriving these instances manually is straightforward, but writing them all out by hand is quite cumbersome. Especially since writing a Functor instance follows the same pattern across all of the above types and in fact any type T a.

To make a type constructor T an instance of Functor, one needs to implement method fmap :: (a -> b) -> T a -> T b. Its definition is hereby precisely determined by parametricity and the functor laws: By parametricity, all values of type a must be replaced according to the provided function with values of type b. Furthermore, by the functor laws, all other shapes of the input value of type T a must be preserved when transforming it to the output value of type T b.

Meta function deriveFunctor :: Name -> Q [Dec] below implements the idea of this algorithm:

data Deriving = Deriving { tyCon :: Name, tyVar :: Name }

deriveFunctor :: Name -> Q [Dec]
deriveFunctor ty
  = do (TyConI tyCon) <- reify ty
       (tyConName, tyVars, cs) <- case tyCon of
         DataD _ nm tyVars cs _   -> return (nm, tyVars, cs)
         NewtypeD _ nm tyVars c _ -> return (nm, tyVars, [c])
         _ -> fail "deriveFunctor: tyCon may not be a type synonym."

       let (KindedTV tyVar StarT) = last tyVars
           instanceType           = conT ''Functor `appT`
             (foldl apply (conT tyConName) (init tyVars))

       putQ $ Deriving tyConName tyVar
       sequence [instanceD (return []) instanceType [genFmap cs]]
  where
    apply t (PlainTV name)    = appT t (varT name)
    apply t (KindedTV name _) = appT t (varT name)

Given the name of a type constructor (e.g. Result, List, etc.), deriveFunctor derives the code for this type constructor's Functor instance. For example, running the splice $(deriveFunctor ''Tree) generates the following code:

instance Functor Tree where
  fmap f (Leaf x)     = Leaf (f x)
  fmap f (Node l x r) = Node (fmap f l) (f x) (fmap f r)

Meta function deriveFunctor shows reification in action. It calls function reify :: Name -> Q Info on the input type constructor's name to yield information about this data type's definition. Using reify, it thus learns whether the data type was defined using the data or newtype keyword, which constructors it defines and what their shapes are. Based on the learned structure, deriveFunctor is then able to generate a suitable definition of fmap and its different clauses via the auxiliaries genFmap, genFmapClause, and newField, defined below. These auxiliary definitions generate one fmap clause for each of the data type's constructors. And each clause then transforms its constructor by recursively modifying all of the constructor's fields of type a through fmap's function f, while retaining all other shapes.

genFmap :: [Con] -> Q Dec
genFmap cs
  = do funD 'fmap (map genFmapClause cs)

genFmapClause :: Con -> Q Clause
genFmapClause c@(NormalC name fieldTypes)
  = do f          <- newName "f"
       fieldNames <- replicateM (length fieldTypes) (newName "x")

       let pats = varP f:[conP name (map varP fieldNames)]
           body = normalB $ appsE $
             conE name : map (newField f) (zip fieldNames fieldTypes)

       clause pats body []

newField :: Name -> (Name, StrictType) -> Q Exp
newField f (x, (_, fieldType))
  = do Just (Deriving typeCon typeVar) <- getQ
       case fieldType of
         VarT typeVar' | typeVar' == typeVar ->
           [| $(varE f) $(varE x) |]
         ty `AppT` VarT typeVar' |
           leftmost ty == (ConT typeCon) && typeVar' == typeVar ->
             [| fmap $(varE f) $(varE x) |]
         _ -> [| $(varE x) |]

leftmost :: Type -> Type
leftmost (AppT ty1 _) = leftmost ty1
leftmost ty           = ty

In more detail, deriveFunctor works as follows. First, via reify it observes the input data type's name tyConName, its declared type variables tyVars, and its exposed constructors cs. It then determines the data type's right-most type variable tyVar and stores it together with the data type's type constructor name tyConName in the Q monad's user state. This state information is retrieved later again from inside auxiliary definition newField. Next, deriveFunctor derives a Functor's fmap definition using auxiliary genFmap. For each of the input data type's value constructors cs, genFmap generates an fmap clause using helper function genFmapClause. The latter recursively maps the provided function f :: a -> b over all of a constructor's fields of type a, while leaving all other fields untouched. Each field is hereby modified through f or left unchanged by auxiliary newField based on the field's type: if a field's type is a (which is stored in the retrieved tyVar inside function newField), then f needs to be applied to it; otherwise it needs to remain unchanged.


In an analogous manner to deriveFunctor, a function deriveFoldable :: Name -> Q [Dec] can be devised to derive a data type's Foldable instance. All that is needed is to provide a definition for function foldMap :: Monoid m => (a -> m) -> T a -> m. Again, foldMap's definition follows directly from a data type's bare definition, which can be observed by means of reification. This highlights particularly how the functionality offered by Template Haskell provides a low-level API into the GHC compiler to manipulate abstract syntax trees at compile-time. This mechanism is quite powerful and even allows to simulate some of GHC's offered language extensions, e.g., -XDeriveFunctor and -XDeriveFoldable, to be implemented as a library on top of Template Haskell.

Template Haskell for building Embedded Domain specific Languages (EDSLs)

To see Template Haskell's potential for building an EDSL, consider the problem of pattern matching text with regular expressions. Suppose, as part of a Haskell program we need to devise many different regular expressions and use them to pattern match text fragments. Regular expressions are easily defined by an algebraic data type capturing their structure, as well as an evaluator checking whether a regular expression matches some input string. [7]

data RegExp
  = Char (Set Char)    -- [a], [abc], [a-z]; matches a single character from the specified class
  | Alt RegExp RegExp  -- r1 | r2 (alternation); matches either r1 or r2
  | Seq RegExp RegExp  -- r1 r2 (concatenation); matches r1 followed by r2
  | Star RegExp        -- r* (Kleene star); matches r zero or more times
  | Empty              -- matches only the empty string
  | Void               -- matches nothing (always fails)
  | Var String         -- a variable holding another regexp (explained later)
  deriving Show

match :: RegExp -> String -> Bool
match r s = nullable (foldl deriv r s)

The evaluator match is hereby based on the concept of derivatives[8]: an initial regular expression r matches an input string s, if r matches the first character of s and its derivative regular expression (deriv r) matches the remainder of s:

nullable :: RegExp -> Bool
nullable (Char _)    = False
nullable (Alt r1 r2) = nullable r1 || nullable r2
nullable (Seq r1 r2) = nullable r1 && nullable r2
nullable (Star _)    = True
nullable Empty       = True
nullable Void        = False
nullable (Var _)     = False

deriv :: RegExp -> Char -> RegExp
deriv (Char cs) c
  | c `Set.member` cs = Empty
  | otherwise         = Void
deriv (Alt r1 r2) c   = Alt (deriv r1 c) (deriv r2 c)
deriv (Seq r1 r2) c
  | nullable r1       = Alt (Seq (deriv r1 c) r2) (deriv r2 c)
  | otherwise         = Seq (deriv r1 c) r2
deriv (Star r) c      = deriv (Alt Empty (Seq r (Star r))) c
deriv Empty _         = Void
deriv Void _          = Void
deriv (Var _) _       = Void

The RegExp data type and the match function solve the initially posed problem of providing regular expressions in Haskell. However, specifying regular expressions in abstract syntax is extremely tedious. For example, consider defining a regular expression for checking the wellformedness of email addresses ending with the top level domain .com. In its usual concrete syntax, such a regular expression is easily defined as ([a-z]|[0-9])*@([a-z]|[0-9])*.com, but writing it in terms of the RegExp dataype is verbose and unintuitive. Moreover, parsing functions like

  • compile :: String -> RegExp, or
  • compile' :: String -> Either CompileError RegExp

do not remedy the problem of working with regular expressions in concrete syntax. Due to "compiling" regular expressions at run time, they don't provide any compile-time type-safety guarantees that the input raw expression is wellformed; thus they lead to either run time exceptions for illformed regular expressions (e.g., compile) or induce a tedious handling for compiled regexes (e.g., compile').

To preserve type safety and yet to be able to use regular expressions conveniently, we want to embed the concrete regular expression syntax into the Haskell host language. This can be done via Template Haskell's quasi quotes and furthermore enabling the QuasiQuotes extension. This allows defining quasi quotes for regular expressions, denoted [regex| .. |], where anything inside the quasi quote is considered part of an embedded regular expression language. Using quasi quotes, we can then specify the regex for email addresses from above naturally as follows:

validDotComMail :: RegExp
validDotComMail = [regex|([a-z]|[0-9])*@([a-z]|[0-9])*.com|]

We can even compose regular expressions easily from smaller building blocks:

alphaNum, validDotComMail' :: RegExp
alphaNum         = [regex|[a-z]|[0-9]|]
validDotComMail' = [regex|${alphaNum}*@${alphaNum}*.com|]

Writing ${alphaNum} interpolates the regex referred to by alphaNum into the larger regex validDotComMail'. In essence, this means that we can define our own notion of splicing values from the Haskell meta language into the embedded object language of regular expressions. We can go further and even allow to run Haskell code when interpolating with ${..}. For example, refining our wellformedness check for .com mail addresses, we might want to ensure at least one character to occur on either side of the "@" symbol:

chars, validDotComMail'' :: RegExp
chars             = [regex|[a-z]|[A-Z]|[0-9]|[-_.]|]
validDotComMail'' = [regex|${plus chars}@${plus chars}.com|]

plus :: RegExp -> RegExp
plus r = Seq r (Star r)

Here, plus corresponds to the usual regex combinator that requires a given regex to occur at least once. Note how plus is defined as a regular Haskell function and then used inside of the embedded regex language to build the regular expression for validDotComMail''.

Intuitively, a quasi quote like [regex| .. |] converts an embedded language's concrete syntax to Haskell code at compile-time. It is defined by a quasi quoter, which is a parser for the embedded language. Its task is to parse the embedded language's syntax into a corresponding Template Haskell expression and then to splice this expression as real Haskell code in place of the quasi quote. The conversion of embedded language code to corresponding Haskell code hereby happens before typechecking the Haskell module. Hence, trying to splice in malformed embedded language fragments will raise a Haskell type error at compile-time.

The quasi quoter regex for our embedded language of regular expressions can be defined as follows:

regex :: QuasiQuoter
regex = QuasiQuoter {
    quoteExp  = compile
  , quotePat  = notHandled "patterns"
  , quoteType = notHandled "types"
  , quoteDec  = notHandled "declarations"
  }
  where notHandled things = error $
          things ++ " are not handled by the regex quasiquoter."

compile :: String -> Q Exp
compile s =
  case P.parse regexParser "" s of
    Left  err    -> fail (show err)
    Right regexp -> [e| regexp |]

That is, formally a QuasiQuoter consists of four parsers,

quoteExp  :: String -> Q Exp
quotePat  :: String -> Q Pat
quoteType :: String -> Q Type
quoteDec  :: String -> Q Dec

to parse raw strings of the embedded language into the different categories of Haskell syntax. In this example, however, we only want to splice embedded regular expressions into the context of Haskell expressions, so we only define the quoteExp parser in the regex quasi quoter. This parser compiles an embedded regular expression given as a string into a corresponding Template Haskell expression.

Compilation by the compile function proceeds in two stages: First, we parse the input string regex into a corresponding RegExp value. Second, we encode this RegExp value as a Haskell expression in Template Haskell's Q Exp type. It is the second step that allows us to interpolate variables (or even code) from the Haskell host language into the EDSL for regular expressions.

Parsing a raw regular expression into a corresponding RegExp value is a routine task using (e.g.) the parsec library:

regexParser :: Parsec String () RegExp
regexParser = alts <* eof where
  atom       = try var <|> char
  var        = Var <$> (string "${" *> many1 (noneOf "}") <* P.char '}')
  char       = charclass <|> singlechar
  singlechar = (Char . Set.singleton) <$> noneOf specials
  charclass  = fmap (Char . Set.fromList) $
                 P.char '[' *> content <* P.char ']'
  content    = try (concat <$> many1 range)
                 <|> many1 (noneOf specials)
  range      = enumFromTo
                 <$> (noneOf specials <* P.char '-')
                 <*> noneOf specials
  alts       = try (Alt <$> seqs <*> (P.char '|' *> alts)) <|> seqs
  seqs       = try (Seq <$> star <*> seqs) <|> star
  star       = try (Star <$> (atom <* P.char '*'))
                 <|> try (Star <$> (P.char '(' *> alts <* string ")*"))
                 <|> atom
  specials   = "[]()*|"

To represent regular expressions of type RegExp as Template Haskell expressions of type Q Exp, Template Haskell's Lift typeclass is used. Its method lift :: Lift a => a -> Q Exp lifts values from the Haskell meta language (e.g., a RegExp value) into Template Haskell's expression language (i.e., a Q Exp value). The lift function is implicitly invoked by quote [e| regexp |] in function compile.


Most of the lifting is a direct encoding of the syntactic structure of the RegExp value; the only interesting case is when lifting the regular expression variable Var vars. In this case, we treat the words in the string vars as referring to identifiers from the Haskell host language, which we apply in a left associative manner to each other. Doing this enables interpolation of Haskell identifiers or even simple forms of Haskell expressions into our EDSL of regular expressions as shown by the regexes validDotComMail', and validDotComMail'' above.

instance Lift a => Lift (Set a) where
  lift set = appE (varE `Set.fromList) (lift (Set.toList set))

instance Lift RegExp where
  -- lift :: RegExp -> Q Exp
  lift (Char cs)     = apply `Char  [lift cs]
  lift (Alt r1 r2)   = apply `Alt   (map lift [r1, r2])
  lift (Seq r1 r2)   = apply `Seq   (map lift [r1, r2])
  lift (Star r1)     = apply `Star  (map lift [r1])
  lift Empty         = apply `Empty []
  lift Void          = apply `Void  []
  lift (Var vars)    = foldl1 appE $ map (varE . mkName) (words vars)

apply :: Name -> [Q Exp] -> Q Exp
apply n = foldl appE (conE n)

These two steps constitute the conversion of raw string regular expressions into Template Haskell expressions inside of the compile function and define the regex quasiquoter. Whenever we write a quasi quote like [regex| .. |] in a Haskell expression context, regex's parser quoteExp converts the regex EDSL into a Template Haskell expression Q Exp and splices in the result as a wellformed RegExp value. This example shows how Template Haskell and quasi quotes can be used to define a type-safe, domain specific language for regular expressions.

Shakespearean Templates

In much the same manner as in the last example, Template Haskell and quasi quotes are used in Michael Snoyman's shakespeare library[9][10]. It defines embedded templating languages for working with the internet's web languages from within a Haskell web application. In particular, the shakespeare library provides the template languages Hamlet, Cassius, and Julius for writing embedded HTML, CSS, and Javascript code, respectively. All three templating languages internally work quite similarly to the previous example's EDSL for regular expressions: quasi quotes allow one to write HTML, CSS, or JavaScript code in concrete (though slightly modified) syntax inside of Haskell. Moreover, identifiers from the Haskell host language as well as code fragments can be interpolated into the template languages at compile-time. In the remainder we will briefly show-case the shakespeare library's templating language Hamlet for creating HTML documents; the other templating languages Cassius and Julius are similar.

To create and output a simple web page from inside a Haskell application, the following is enough:

import Data.Text
import Text.Hamlet
import Text.Blaze.Html.Renderer.String

data Page = Home | About | Github

mkUrls :: Page -> [(Text, Text)] -> Text
mkUrls Home   _ = "/home.html"
mkUrls About  _ = "/about.html"
mkUrls Github _ = "https://www.github.com/bollmann"

webPage :: Text -> Text -> HtmlUrl Page
webPage title content = [hamlet|
  <html>
    <head>
      <title>#{Text.toUpper title}
    <body>
      <h1>#{title}
      <div>Welcome to my Shakespearean Templates page!
      <hr>
      <div>Links:
        <ul>
          <a href=@{Home}>My Homepage
          <a href=@{About}>About me
          <a href=@{Github}>Check out my Github
      <hr>
      <div>#{content}
 |]

main = putStrLn $ renderHtml $
  webPage "Hello Shakespeare!" "Hello World!" mkUrls

Running this Haskell program, outputs an HTML page as specified by the Hamlet templating language, embedded through quasi quote [hamlet| .. |] in function webPage. Hamlet closely resembles real HTML syntax, but is even more terse: instead of a closing HTML tag, Hamlet uses indentation to indicate the span of the tag. Furthermore, Hamlet allows to interpolate code or identifiers from the Haskell host language when creating an HTML template. Interpolation of Haskell code into Hamlet is done by writing #{ .. }. In the above example, the HTML page's title and content are interpolated from Haskell identifiers. Note particularly how in the webpage's title we uppercase the interpolated title using Haskell's Text.toUpper function inside of the Hamlet language.

In addition to this standard interpolation, Hamlet can also interpolate links by writing @{..}. These links are specified as values of the Page data type inside the template and the mkUrls render function translates them to real URLs later. Hamlet's URL interpolation has commonly be phrased as creating "type-safe URLs". One reason is that, just like with normal variable interpolation, all interpolated links have to exist and be type correct at compile-time; in this case, links must be values of the Page data type. Hence, as soon as a link's constructor shape is changed, the compiler statically forces us to update all references to this link as well. Furthermore, there is only one distinct place in the code to maintain or update a link's raw URL, thus minimizing the risk of dead URLs.

For example, suppose we want to add more external links to our web page. We could model this fact by changing the Page data type to

data Page = Home | About | External ExternalPage
data ExternalPage = Github | Haskell | Reddit

and, moreover, changing the mkUrls renderer function to account for the new links:

mkUrls :: Page -> [(Text, Text)] -> Text
mkUrls Home            _ = "/home.html"
mkUrls About           _ = "/about.html"
mkUrls (External page) _ = mkExternalUrls page

mkExternalUrls :: ExternalPage -> Text
mkExternalUrls Github  = "https://www.github.com"
mkExternalUrls Haskell = "http://www.haskell.org"
mkExternalUrls Reddit  = "http://www.reddit.com/r/haskell"

Doing just these changes, will then cause a compile-time error in our webPage template, since we haven't updated the Github reference to our newly adjusted link structure. Hence, the compiler reminds (and in fact forces) us to update all locations in the code that used the old Github link to now use the new External Github (as well as optionally the External Haskell, etc.) links.

Finally, Hamlet allows to use some control constructs like if conditionals, for loops, and let bindings to embed basic business logic into a webpage's template. Michael Snoyman gives a gentle (and much more in-depth) introduction to shakespearean templates and Yesod[9][11].

References

  1. Tim Sheard and Simon Peyton Jones. Template Meta-Programming for Haskell. SIGPLAN Not., 37(12):60-75, December 2002. URL: https://www.microsoft.com/en-us/research/publication/template-meta-programming-for-haskell/?from=http://research.microsoft.com/~simonpj/papers/meta-haskell/
  2. 2.0 2.1 Tim Sheard and Simon Peyton Jones. Notes on Template Haskell, Version 2. URL: https://www.haskell.org/ghc/docs/papers/th2.ps, 2003.
  3. Simon Peyton Jones. Major Proposed Revision of Template Haskell. URL: https://ghc.haskell.org/trac/ghc/blog/Template%20Haskell%20Proposal, 2010
  4. Geoffrey Mainland. Why it's nice to be quoted: Quasiquoting for Haskell. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell '07, pages 73-82, New York, NY, USA, 2007. ACM
  5. Note that meta function curryN cannot be written in normal Haskell per se as the type for a generated -ary curry function depends on . Thus, the definition of curryN requires dependent types to be expressed in Haskell, a feature not yet present. However, there already exist ingenious alternatives to "faking" dependent types in Haskell; see for instance this paper for a solution to simulate functions like curryN without dependent types).
  6. Note that -ary maps are better written using Applicative Functors and ZipLists, as this allows to define them first-class from within regular Haskell. For understanding Template Haskell as a code generator, this example is still useful though.
  7. This example draws on Penn's CIS 552 Advanced Programming course, specifically Assignment 5: http://www.seas.upenn.edu/~cis552/current/hw/hw05/Main.html.
  8. Janusz A. Brzozowski. Derivatives of regular expressions. J. ACM, 11(4):481–494, October 1964.
  9. 9.0 9.1 Michael Snoyman. Shakespearean Templates. URL: http://www.yesodweb.com/book/shakespearean-templates [Accessed: May 2016].
  10. Michael Snoyman. The shakespeare Haskell library. URL: http://hackage.haskell.org/package/shakespeare [Accessed: May 2016].
  11. Michael Snoyman. Haskell and Yesod. URL: http://www.yesodweb.com/book-1.4 [Accessed: May 2016].

Cite error: <ref> tag with name "dep-tys" defined in <references> is not used in prior text.