Difference between revisions of "Generalised algebraic datatype"

From HaskellWiki
Jump to navigation Jump to search
(Updated the link to "First-Class Phantom Types by James Cheney and Ralf Hinze")
(35 intermediate revisions by 14 users not shown)
Line 1: Line 1:
  +
== Papers ==
* A short descriptions on generalised algebraic datatypes here [http://haskell.org/ghc/docs/latest/html/users_guide/gadt.html as GHC language features]
 
  +
* Another description with links on [http://hackage.haskell.org/trac/haskell-prime/wiki/GADTs Haskell' wiki]
 
  +
See also [[Research papers/Type systems#Generalised Algebraic Data Types (GADTs)|research papers on type systems]].
  +
  +
* A short description on generalised algebraic datatypes here [http://www.haskell.org/ghc/docs/latest/html/users_guide/data-type-extensions.html#gadt as GHC language features].
  +
* Another description with links on [http://hackage.haskell.org/trac/haskell-prime/wiki/GADTs Haskell' wiki].
  +
* [http://ecommons.library.cornell.edu/handle/1813/5614 First-Class Phantom Types] by James Cheney and Ralf Hinze
  +
* [http://cristal.inria.fr/~fpottier/publis/pottier-regis-gianas-05.pdf Stratified type inference for generalized algebraic data types] by François Pottier and Yann Régis-Gianas. It contains also a lot of links to other papers on GADTs.
  +
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/gadt/index.htm Simple unification-based type inference for GADTs] by Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geoffrey Washburn. (Revised April 2006.)
  +
* [http://www.cs.mu.oz.au/~sulzmann/manuscript/simple-translate-gadts.ps Translating Generalized Algebraic Data Types to System F] written by [http://www.cs.mu.oz.au/~sulzmann/ Martin Sulzmann] and Meng Wang. [http://www.cs.mu.oz.au/~sulzmann/2005.html Many other papers]. The talk mentions also the notion of [[phantom type]], and [[existential type]], and [[type witness]].
   
 
== Motivating example ==
 
== Motivating example ==
Line 14: Line 22:
 
Without GADTs we would have to write something like this:
 
Without GADTs we would have to write something like this:
 
<haskell>
 
<haskell>
data Term = K | S | :@ Term Term
+
data Term = K | S | Term :@ Term
 
infixl 6 :@
 
infixl 6 :@
 
</haskell>
 
</haskell>
Line 33: Line 41:
 
eval x = x
 
eval x = x
 
</haskell>
 
</haskell>
Since the types of the so-called object language, being the typed SK calculus, are mimicked by the type system in our meta language, being haskell, we have a pretty convincing argument that the evaluator won't mangle our types. We say that typing is preserved under evaluation (preservation.) Note that this is an argument and not a proof.
+
Since the types of the so-called object language, being the typed SK calculus, are mimicked by the type system in our meta language, being Haskell, we have a pretty convincing argument that the evaluator won't mangle our types. We say that typing is preserved under evaluation (preservation.) Note that this is an argument and not a proof.
   
 
This, however, comes at a price: let's see what happens when you try to convert strings into our object language:
 
This, however, comes at a price: let's see what happens when you try to convert strings into our object language:
Line 47: Line 55:
 
In the definition of `foo': foo "S" = S
 
In the definition of `foo': foo "S" = S
   
One could, however, reason that parse has type: String -> exists a. Term a
+
One could, however, reason that parse has type: <hask>String -> exists a. Term a</hask>, see also [[Existential type]].
   
 
== Example with lists ==
 
== Example with lists ==
Line 63: Line 71:
 
</haskell>
 
</haskell>
   
now myhead can only be applied to non empty lists. This too comes at a cost; consider the function:
+
now safeHead can only be applied to non empty lists, and will never evaluate to bottom. This too comes at a cost; consider the function:
   
 
<haskell>
 
<haskell>
Line 77: Line 85:
 
In the definition of `silly': silly 1 = Cons 1 Nil
 
In the definition of `silly': silly 1 = Cons 1 Nil
   
  +
== Parsing Example ==
  +
  +
Note that GADTs provide a rather nice platform for embedded domain specific languages. In particular, they allow an [[EDSL]] to use Haskell's type system for its own purposes. As a simple example, we might have an EDSL for simple (regexp-like) parsers that looks something like:
  +
  +
<haskell>
  +
data Parser tok a where
  +
Zero :: Parser tok ()
  +
One :: Parser tok ()
  +
Check :: (tok -> Bool) -> Parser tok tok
  +
Satisfy :: ([tok] -> Bool) -> Parser tok [tok]
  +
Push :: tok -> Parser tok a -> Parser tok a
  +
Plus :: Parser tok a -> Parser tok b -> Parser tok (Either a b)
  +
Times :: Parser tok a -> Parser tok b -> Parser tok (a,b)
  +
Star :: Parser tok a -> Parser tok [a]
  +
</haskell>
  +
  +
An evaluator/parser is then straightforward. Below it's written monadically for
  +
convenience, but this also means that we could generalise the return type to being in any MonadPlus. Note that an advantage of this representation which we don't show here is that we could also write a function which applies algebraic rules to the structure to try to simplify the parser before running it. (Though if we were really concerned with efficiency, we'd probably also need a couple more primitives.)
  +
  +
<haskell>
  +
parse :: Parser tok a -> [tok] -> Maybe a
  +
  +
-- Zero always fails.
  +
parse Zero ts = mzero
  +
  +
-- One matches only the empty string.
  +
parse One [] = return ()
  +
parse One _ = mzero
  +
  +
-- Check p matches a string with exactly one token t such that p t holds.
  +
parse (Check p) [t] = if p t then return t else mzero
  +
parse (Check p) _ = mzero
  +
  +
-- Satisfy p any string such that p ts holds.
  +
parse (Satisfy p) xs = if p xs then return xs else mzero
  +
  +
-- Push t x matches a string ts when x matches (t:ts).
  +
parse (Push t x) ts = parse x (t:ts)
  +
  +
-- Plus x y matches when either x or y does.
  +
parse (Plus x y) ts = liftM Left (parse x ts) `mplus` liftM Right (parse y ts)
  +
  +
-- Times x y matches the concatenation of x and y.
  +
parse (Times x y) [] = liftM2 (,) (parse x []) (parse y [])
  +
parse (Times x y) (t:ts) =
  +
parse (Times (Push t x) y) ts `mplus`
  +
liftM2 (,) (parse x []) (parse y (t:ts))
  +
  +
-- Star x matches zero or more copies of x.
  +
parse (Star x) [] = return []
  +
parse (Star x) (t:ts) = do
  +
(v,vs) <- parse (Times x (Star x)) (t:ts)
  +
return (v:vs)
  +
</haskell>
  +
  +
Finally, we might define some examples:
  +
  +
<haskell>
  +
token x = Check (== x)
  +
string xs = Satisfy (== xs)
  +
  +
p = Times (token 'a') (token 'b')
  +
p1 = Times (Star (token 'a')) (Star (token 'b'))
  +
p2 = Star p1
  +
  +
blocks :: (Eq tok) => Parser tok [[tok]]
  +
blocks = Star (Satisfy allEqual)
  +
where allEqual xs = and (zipWith (==) xs (drop 1 xs))
  +
  +
evenOdd = Plus (Star (Times (Check even) (Check odd)))
  +
(Star (Times (Check odd) (Check even)))
  +
  +
</haskell>
  +
  +
Testing this in ghci:
  +
<pre>
  +
*Main> parse p "ab"
  +
Just ('a','b')
  +
*Main> parse p "ac"
  +
Nothing
  +
*Main> parse p1 "aaabbbb"
  +
Just ("aaa","bbbb")
  +
*Main> parse p2 "aaabbbbaabbbbbbbaaabbabab"
  +
Just [("aaa","bbbb"),("aa","bbbbbbb"),("aaa","bb"),("a","b"),("a","b")]
  +
*Main> :t p2
  +
p2 :: Parser Char [([Char], [Char])]
  +
*Main> parse blocks "aaaabbbbbbbbcccccddd"
  +
Just ["aaaa","bbbbbbbb","ccccc","ddd"]
  +
*Main> parse evenOdd [0..9]
  +
Just (Left [(0,1),(2,3),(4,5),(6,7),(8,9)])
  +
*Main> parse evenOdd [1..10]
  +
Just (Right [(1,2),(3,4),(5,6),(7,8),(9,10)])
  +
  +
</pre>
  +
  +
== Projects containing GADTs ==
  +
  +
Papers on [[Libraries and tools/Database interfaces/HaskellDB|HaskellDB]] describe problems when GADTs can help (but HaskellDB solves these problems with [[phantom type]]s).
   
  +
[[Darcs]] represents motivating examples for GADTs, too -- and uses them.
== Example ==
 
  +
The motivations are described in David Roundy's FOSDEM slides ([http://physics.oregonstate.edu/~roundyd/talks/fosdem-2006.pdf Implementing the Darcs Patch Formalism and Verifying It]) (see p. 11, 13--14.). The talk mentions also the notions of [[phantom type]], and [[existential type]], and [[type witness]] (see p. 15).
An example: it seems to me that generalised algebraic datatypes can provide a nice solution to a problem described in the documentation of [[Libraries and tools/Database interfaces/HaskellDB|HaskellDB]] project: in Daan Leijen and Erik Meijer's [http://www.haskell.org/haskellDB/doc.html paper] (see PostScript version) on the [http://www.haskell.org/haskellDB/ original HaskellDB] page: making typeful (safe) representation of terms of another language (here: SQL). In this example, the problem has been solved in a funny way with [[Phantom type]]
 
* we make first an untyped language,
 
* and then a typed one on top of it.
 
So we we destroy and rebuild -- is it a nice topic for a myth or scifi where a dreamworld is simulated on top of a previously homogenized world to look like the original?
 
   
  +
== See also ==
But solving the problem with GADTs seems to be a more direct way (maybe that's why [http://research.microsoft.com/Users/simonpj/papers/gadt/index.htm Simple unification-based type inference for GADTs] mentions that they are also called as ''first-class phantom types''?)
 
   
  +
* [[Algebraic data type]]
== Related concepts ==
 
  +
* [[GADTs for dummies]]
There are other developed tricks with types in [[Type]], and another way to a more general framework in [[Dependent type]]s. Epigram is a fully dependently typed language, and its [http://www.e-pig.org/downloads/epigram-notes.pdf Epigram tutorial] (section 6.1) mentions that Haskell is closely related to Epigram, and attributes this relatedness e.g. exactly to the presence of GADTs.
 
  +
* [http://en.wikibooks.org/wiki/Haskell/GADT The Haskell Wikibook section about GADT]
  +
* [http://apfelmus.nfshost.com/blog/2010/06/01-gadts-video.html A video that explains GADTs]
   
The more general problem (representing the terms of a language with the terms of another language) can develop surprising things, e.g. ''quines'' (self-replicating or self-representing programs). More details and links on quines can be seen in the section [[Combinatory logic#Self-replication, quines, reflective programming|Self-replication, quines, reflective programming]] of the page [[Combinatory logic]].
 
   
[[Category:Language]]</haskell>
+
[[Category:Glossary]]
  +
[[Category:Language extensions]]

Revision as of 07:21, 22 November 2012

Papers

See also research papers on type systems.

Motivating example

Generalised Algebraic Datatypes (GADTs) are datatypes for which a constructor has a non standard type. Indeed, in type systems incorporating GADTs, there are very few restrictions on the type that the data constructors can take. To show you how this could be useful, we will implement an evaluator for the typed SK calculus. Note that the K combinator is operationally similar to and, similarly, S is similar to the combinator which, in simply typed lambda calculus, have types and Without GADTs we would have to write something like this:

data Term = K | S | Term :@ Term 
infixl 6 :@

With GADTs, however, we can have the terms carry around more type information and create more interesting terms, like so:

data Term x where
    K :: Term (a -> b -> a)
    S :: Term ((a -> b -> c)  -> (a -> b) -> a -> c)
    Const :: a -> Term a
    (:@) :: Term (a -> b) -> (Term a) -> Term b
infixl 6 :@

now we can write a small step evaluator:

eval::Term a -> Term a
eval (K :@ x :@ y) = x
eval (S :@ x :@ y :@ z) = x :@ z :@ (y :@ z)
eval x = x

Since the types of the so-called object language, being the typed SK calculus, are mimicked by the type system in our meta language, being Haskell, we have a pretty convincing argument that the evaluator won't mangle our types. We say that typing is preserved under evaluation (preservation.) Note that this is an argument and not a proof.

This, however, comes at a price: let's see what happens when you try to convert strings into our object language:

parse "K" = K
parse "S" = S

you'll get a nasty error like so:

   Occurs check: cannot construct the infinite type: c = b -> c
     Expected type: Term ((a -> b -> c) -> (a -> b) -> a -> b -> c)
     Inferred type: Term ((a -> b -> c) -> (a -> b) -> a -> c)
   In the definition of `foo': foo "S" = S

One could, however, reason that parse has type: String -> exists a. Term a, see also Existential type.

Example with lists

here's another, smaller example:

data Empty
data NonEmpty
data List x y where
     Nil :: List a Empty
     Cons:: a -> List a b ->  List a NonEmpty

safeHead:: List x NonEmpty -> x
safeHead (Cons a b) = a

now safeHead can only be applied to non empty lists, and will never evaluate to bottom. This too comes at a cost; consider the function:

silly 0 = Nil
silly 1 = Cons 1 Nil

yields an objection from ghc:

Couldn't match `Empty' against `NonEmpty'
     Expected type: List a Empty
     Inferred type: List a NonEmpty
   In the application `Cons 1 Nil'
   In the definition of `silly': silly 1 = Cons 1 Nil

Parsing Example

Note that GADTs provide a rather nice platform for embedded domain specific languages. In particular, they allow an EDSL to use Haskell's type system for its own purposes. As a simple example, we might have an EDSL for simple (regexp-like) parsers that looks something like:

data Parser tok a where
    Zero :: Parser tok ()
    One :: Parser tok ()
    Check :: (tok -> Bool) -> Parser tok tok
    Satisfy :: ([tok] -> Bool) -> Parser tok [tok]
    Push :: tok -> Parser tok a -> Parser tok a
    Plus :: Parser tok a -> Parser tok b -> Parser tok (Either a b)
    Times :: Parser tok a -> Parser tok b -> Parser tok (a,b)
    Star :: Parser tok a -> Parser tok [a]

An evaluator/parser is then straightforward. Below it's written monadically for convenience, but this also means that we could generalise the return type to being in any MonadPlus. Note that an advantage of this representation which we don't show here is that we could also write a function which applies algebraic rules to the structure to try to simplify the parser before running it. (Though if we were really concerned with efficiency, we'd probably also need a couple more primitives.)

parse :: Parser tok a -> [tok] -> Maybe a

-- Zero always fails.
parse Zero ts = mzero

-- One matches only the empty string.
parse One [] = return ()
parse One _  = mzero

-- Check p matches a string with exactly one token t such that p t holds.
parse (Check p) [t] = if p t then return t else mzero
parse (Check p) _ = mzero

-- Satisfy p any string such that p ts holds.
parse (Satisfy p) xs = if p xs then return xs else mzero

-- Push t x matches a string ts when x matches (t:ts).
parse (Push t x) ts = parse x (t:ts)

-- Plus x y matches when either x or y does.
parse (Plus x y) ts = liftM Left (parse x ts) `mplus` liftM Right (parse y ts)

-- Times x y matches the concatenation of x and y.
parse (Times x y) [] = liftM2 (,) (parse x []) (parse y [])
parse (Times x y) (t:ts) = 
    parse (Times (Push t x) y) ts `mplus`
    liftM2 (,) (parse x []) (parse y (t:ts))

-- Star x matches zero or more copies of x.
parse (Star x) [] = return []
parse (Star x) (t:ts) = do
    (v,vs) <- parse (Times x (Star x)) (t:ts)
    return (v:vs)

Finally, we might define some examples:

token x = Check (== x)
string xs = Satisfy (== xs)

p = Times (token 'a') (token 'b')
p1 = Times (Star (token 'a')) (Star (token 'b'))
p2 = Star p1

blocks :: (Eq tok) => Parser tok [[tok]]
blocks = Star (Satisfy allEqual)
    where allEqual xs = and (zipWith (==) xs (drop 1 xs))

evenOdd = Plus (Star (Times (Check even) (Check odd)))
               (Star (Times (Check odd) (Check even)))

Testing this in ghci:

*Main> parse p "ab"
Just ('a','b')
*Main> parse p "ac"
Nothing
*Main> parse p1 "aaabbbb"
Just ("aaa","bbbb")
*Main> parse p2 "aaabbbbaabbbbbbbaaabbabab"
Just [("aaa","bbbb"),("aa","bbbbbbb"),("aaa","bb"),("a","b"),("a","b")]
*Main> :t p2
p2 :: Parser Char [([Char], [Char])]
*Main> parse blocks "aaaabbbbbbbbcccccddd"
Just ["aaaa","bbbbbbbb","ccccc","ddd"]
*Main> parse evenOdd [0..9]
Just (Left [(0,1),(2,3),(4,5),(6,7),(8,9)])
*Main> parse evenOdd [1..10]
Just (Right [(1,2),(3,4),(5,6),(7,8),(9,10)])

Projects containing GADTs

Papers on HaskellDB describe problems when GADTs can help (but HaskellDB solves these problems with phantom types).

Darcs represents motivating examples for GADTs, too -- and uses them. The motivations are described in David Roundy's FOSDEM slides (Implementing the Darcs Patch Formalism and Verifying It) (see p. 11, 13--14.). The talk mentions also the notions of phantom type, and existential type, and type witness (see p. 15).

See also