# Generalised algebraic datatype

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The GHC Users Guide has a GADTs section.

## Motivating example

Generalised Algebraic Datatypes (GADTs) are datatypes for which a constructor has a non standard type. Indeed, in type systems incorporating GADTs, there are very few restrictions on the type that the data constructors can take. To show you how this could be useful, we will implement an evaluator for the typed SK calculus. Note that the K combinator is operationally similar to $\lambda\;x\;y\;.\;x$ and, similarly, S is similar to the combinator $\lambda\;x\;y\;z\;.\;x\;z\;(\;y\;z\;)$ which, in simply typed lambda calculus, have types $a \rightarrow b \rightarrow a$ and $(a \rightarrow b \rightarrow c) \rightarrow (a \rightarrow b) \rightarrow a \rightarrow c$ Without GADTs we would have to write something like this:

data Term = K | S | Term :@ Term
infixl 6 :@


With GADTs, however, we can have the terms carry around more type information and create more interesting terms, like so:

data Term x where
K :: Term (a -> b -> a)
S :: Term ((a -> b -> c)  -> (a -> b) -> a -> c)
(:@) :: Term (a -> b) -> (Term a) -> Term b
infixl 6 :@


now we can write a small step evaluator:

eval::Term a -> Term a
eval (K :@ x :@ y) = x
eval (S :@ x :@ y :@ z) = x :@ z :@ (y :@ z)
eval x = x


Since the types of the so-called object language, being the typed SK calculus, are mimicked by the type system in our meta language, being Haskell, we have a pretty convincing argument that the evaluator won't mangle our types. We say that typing is preserved under evaluation (preservation.) Note that this is an argument and not a proof.

This, however, comes at a price: let's see what happens when you try to convert strings into our object language:

parse "K" = K
parse "S" = S


you'll get a nasty error like so:

   Occurs check: cannot construct the infinite type: c = b -> c
Expected type: Term ((a -> b -> c) -> (a -> b) -> a -> b -> c)
Inferred type: Term ((a -> b -> c) -> (a -> b) -> a -> c)
In the definition of foo': foo "S" = S


One could, however, reason that parse has type: String -> exists a. Term a, see also Existential type.

## Example with lists

here's another, smaller example:

data Empty
data NonEmpty
data List x y where
Nil :: List a Empty
Cons:: a -> List a b ->  List a NonEmpty

safeHead:: List x NonEmpty -> x
safeHead (Cons a b) = a


now safeHead can only be applied to non empty lists, and will never evaluate to bottom. This too comes at a cost; consider the function:

silly 0 = Nil
silly 1 = Cons 1 Nil


yields an objection from ghc:

Couldn't match Empty' against NonEmpty'
Expected type: List a Empty
Inferred type: List a NonEmpty
In the application Cons 1 Nil'
In the definition of silly': silly 1 = Cons 1 Nil


## Parsing Example

Note that GADTs provide a rather nice platform for embedded domain specific languages. In particular, they allow an EDSL to use Haskell's type system for its own purposes. As a simple example, we might have an EDSL for simple (regexp-like) parsers that looks something like:

data Parser tok a where
Zero :: Parser tok ()
One :: Parser tok ()
Check :: (tok -> Bool) -> Parser tok tok
Satisfy :: ([tok] -> Bool) -> Parser tok [tok]
Push :: tok -> Parser tok a -> Parser tok a
Plus :: Parser tok a -> Parser tok b -> Parser tok (Either a b)
Times :: Parser tok a -> Parser tok b -> Parser tok (a,b)
Star :: Parser tok a -> Parser tok [a]


An evaluator/parser is then straightforward. Below it's written monadically for convenience, but this also means that we could generalise the return type to being in any MonadPlus. Note that an advantage of this representation which we don't show here is that we could also write a function which applies algebraic rules to the structure to try to simplify the parser before running it. (Though if we were really concerned with efficiency, we'd probably also need a couple more primitives.)

parse :: Parser tok a -> [tok] -> Maybe a

-- Zero always fails.
parse Zero ts = mzero

-- One matches only the empty string.
parse One [] = return ()
parse One _  = mzero

-- Check p matches a string with exactly one token t such that p t holds.
parse (Check p) [t] = if p t then return t else mzero
parse (Check p) _ = mzero

-- Satisfy p any string such that p ts holds.
parse (Satisfy p) xs = if p xs then return xs else mzero

-- Push t x matches a string ts when x matches (t:ts).
parse (Push t x) ts = parse x (t:ts)

-- Plus x y matches when either x or y does.
parse (Plus x y) ts = liftM Left (parse x ts) mplus liftM Right (parse y ts)

-- Times x y matches the concatenation of x and y.
parse (Times x y) [] = liftM2 (,) (parse x []) (parse y [])
parse (Times x y) (t:ts) =
parse (Times (Push t x) y) ts mplus
liftM2 (,) (parse x []) (parse y (t:ts))

-- Star x matches zero or more copies of x.
parse (Star x) [] = return []
parse (Star x) (t:ts) = do
(v,vs) <- parse (Times x (Star x)) (t:ts)
return (v:vs)


Finally, we might define some examples:

token x = Check (== x)
string xs = Satisfy (== xs)

p = Times (token 'a') (token 'b')
p1 = Times (Star (token 'a')) (Star (token 'b'))
p2 = Star p1

blocks :: (Eq tok) => Parser tok [[tok]]
blocks = Star (Satisfy allEqual)
where allEqual xs = and (zipWith (==) xs (drop 1 xs))

evenOdd = Plus (Star (Times (Check even) (Check odd)))
(Star (Times (Check odd) (Check even)))


Testing this in ghci:

*Main> parse p "ab"
Just ('a','b')
*Main> parse p "ac"
Nothing
*Main> parse p1 "aaabbbb"
Just ("aaa","bbbb")
*Main> parse p2 "aaabbbbaabbbbbbbaaabbabab"
Just [("aaa","bbbb"),("aa","bbbbbbb"),("aaa","bb"),("a","b"),("a","b")]
*Main> :t p2
p2 :: Parser Char [([Char], [Char])]
*Main> parse blocks "aaaabbbbbbbbcccccddd"
Just ["aaaa","bbbbbbbb","ccccc","ddd"]
*Main> parse evenOdd [0..9]
Just (Left [(0,1),(2,3),(4,5),(6,7),(8,9)])
*Main> parse evenOdd [1..10]
Just (Right [(1,2),(3,4),(5,6),(7,8),(9,10)])

`