Difference between revisions of "GADTs for dummies"
(Added categories; some minor edits) 
(Disoriented.) 

Line 162:  Line 162:  
possible values of "Collection Int" along with "[Int]" and "Set Int". 
possible values of "Collection Int" along with "[Int]" and "Set Int". 

−  +  At first glance, it seems that multiplevalue functions are meaningless  they 

−  +  can't be used to define datatypes, because we need concrete types here. But on 

−  +  the second look :) we can find them useful to define type constraints and 

−  +  type families. 

We can also represent multiplevalue function as predicate: 
We can also represent multiplevalue function as predicate: 

Line 303:  Line 303:  
−  +  In practice, typelevel arithmetic by itself is not very useful. It becomes a 

−  +  fantastic tool when combined with another feature that type classes provide  

−  +  member functions. For example: 

<haskell> 
<haskell> 

Line 336:  Line 336:  
== Back to GADTs == 
== Back to GADTs == 

−  If you are wonder how relates all these interesting type manipulations 
+  If you are wonder how relates all these interesting type manipulations to 
−  +  GADTs, now you come to the answer. As you know, Haskell contains highly 

−  +  developed ways to express datatodata functions. Now we also know that 

−  +  Haskell contains rich facilities to write typetotype functions in form of 

−  type 
+  "type" statements and type classes. But how do "data" statements fits in this 
−  +  infrastructure? 

−  My answer: they just defines typetodata constructors translation. 
+  My answer: they just defines typetodata constructors translation. Moreover, 
−  +  this translation may give multiple results. Say, the following definition: 

−  following definition: 

<haskell> 
<haskell> 

Line 374:  Line 374:  
</haskell> 
</haskell> 

−  But how 
+  But how flexible are "data" definitions? As you should remember, "type" 
−  +  definitions was very limited in their features, while type classes, vice 

−  +  versa, much more developed than ordinary Haskell functions facilities. What 

−  +  about features of "data" definitions examined as sort of functions? 

−  examined as sort of functions? 

−  On the one 
+  On the one hand, they supports multiple statements and multiple results and 
−  +  can be recursive, like the "List" definition above. On the other, that's all  

−  +  no pattern matching or even type constants on the left side and no guards. 

−  the left side and no guards. 

−  Lack of pattern matching means that left side can contain only free 
+  Lack of pattern matching means that left side can contain only free type 
−  +  variables, that in turn means that left sides of all "data" statements for one 

−  +  type will be essentially the same. Therefore, repeated left sides in 

−  +  multistatement "data" definitions are omitted and instead of 

−  and instead of 

<haskell> 
<haskell> 

Line 401:  Line 401:  
−  And here finally comes the GADTs! It's just a way to define data types 
+  And here finally comes the GADTs! It's just a way to define data types using 
−  +  pattern matching and constants on the left side of "data" statements! How 

−  +  about this: 

<haskell> 
<haskell> 

Line 411:  Line 411:  
</haskell> 
</haskell> 

−  Amazed? After all, GADTs seems really very simple and obvious 
+  Amazed? After all, GADTs seems really very simple and obvious extension to 
−  +  data type definition facilities. 

Line 420:  Line 420:  
−  The idea is to allow a data constructor's return type to 
+  The idea is to allow a data constructor's return type to be specified 
−  +  directly: 

<haskell> 
<haskell> 

Line 430:  Line 430:  
</haskell> 
</haskell> 

−  In a function that performs pattern matching on Term, the pattern 
+  In a function that performs pattern matching on Term, the pattern match gives 
−  +  type as well as value information. For example, consider this function: 

−  consider this function: 

<haskell> 
<haskell> 

Line 440:  Line 440:  
</haskell> 
</haskell> 

−  If the argument matches Lit, it must have been built with a 
+  If the argument matches Lit, it must have been built with a Lit constructor, 
−  +  so type 'a' must be Int, and hence we can return 'i' (an Int) in the right 

−  +  hand side. The same objections applies to the Pair constructor. 

−  constructor. 

Revision as of 12:10, 17 April 2009
For a long time, I didn't understand what GADTs are and how they can be used. It was a sort of conspiracy of silence  people who understand GADTs think
it is all obvious, and don't need any explanation, but I still
couldn't understand.
Now I have an idea how it works, and think that it was really obvious :) and I want to share my understanding  may be my way to realize GADTs can help someone else. See also Generalised algebraic datatype
Contents
Type functions
A "data" declaration is a way to declare both type constructor and data constructors. For example,
data Either a b = Left a  Right b
declares type constructor "Either" and two data constructors "Left" and "Right". Ordinary Haskell functions works with data constructors:
isLeft (Left a) = True
isLeft (Right b) = False
but there is also an analogous way to work with type constructors!
type X a = Either a a
declares a TYPE FUNCTION named "X". Its parameter "a" must be some type and it returns some type as its result. We can't use "X" on data values, but we can use it on type values. Type constructors declared with "data" statements and type functions declared with "type" statements used together to build arbitrarily complex types. In such "computations" type constructors serves as basic "values" and type functions as a way to process them.
Indeed, type functions in Haskell are very limited compared to ordinary functions  they don't support pattern matching, nor multiple statements, nor recursion.
Hypothetical Haskell extension  Fullfeatured type functions
Let's build hypothetical Haskell extension, that mimics for type functions the wellknown ways to define ordinary functions, including pattern matching:
type F [a] = Set a
multiple statements (this is meaningful only in presence of pattern matching):
type F Bool = Char
F String = Int
and recursion (which again needs pattern matching and multiple statements):
type F [a] = F a
F (Map a b) = F b
F (Set a) = F a
F a = a
As you may already have guessed, this last definition calculates a simple base type of arbitrarilynested collections, e.g.:
F [[[Set Int]]] =
F [[Set Int]] =
F [Set Int] =
F (Set Int) =
F Int =
Int
Let's don't forget about statement guards:
type F a  IsSimple a == TrueType = a
Here we define type function F only for simple datatypes by using in guard type function "IsSimple":
type IsSimple Bool = TrueType
IsSimple Int = TrueType
....
IsSimple Double = TrueType
IsSimple a = FalseType
data TrueType = T
data FalseType = F
These definitions seem a bit odd, and while we are in imaginary land, let's consider a way to write this shorter:
type F a  IsSimple a = a
type IsSimple Bool
IsSimple Int
....
IsSimple Double
Here, we just defined list of simple types, the implied result of all written statements for "IsSimple" is True value, and False value for anything else. Essentially, "IsSimple" is no less than TYPE PREDICATE!
I really love it! :) How about constructing a predicate that traverses a complex type trying to decide whether it contains "Int" anywhere?
type HasInt Int
HasInt [a] = HasInt a
HasInt (Set a) = HasInt a
HasInt (Map a b)  HasInt a
HasInt (Map a b)  HasInt b
or a type function that substitutes one type with another inside arbitrarydeep types:
type Replace t a b  t==a = b
Replace [t] a b = [Replace t a b]
Replace (Set t) a b = Set (Replace t a b)
Replace (Map t1 t2) a b = Map (Replace t1 a b) (Replace t2 a b)
Replace t a b = t
One more hypothetical extension  multivalue type functions
Let's add more fun! We will introduce one more hypothetical Haskell extension  type functions that may have MULTIPLE VALUES. Say,
type Collection a = [a]
Collection a = Set a
Collection a = Map b a
So, "Collection Int" has "[Int]", "Set Int" and "Map String Int" as its values, i.e. different collection types with elements of type "Int".
Pay attention to the last statement of the "Collection" definition, where we've used type variable "b" that was not mentioned on the left side nor defined in any other way. It's perfectly possible  anyway "Collection" function has multiple values, so using on the right side some free variable that can be replaced with any type is not a problem at all  the "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are possible values of "Collection Int" along with "[Int]" and "Set Int".
At first glance, it seems that multiplevalue functions are meaningless  they can't be used to define datatypes, because we need concrete types here. But on the second look :) we can find them useful to define type constraints and type families.
We can also represent multiplevalue function as predicate:
type Collection a [a]
Collection a (Set a)
Collection a (Map b a)
If you remember Prolog, you should guess that predicate, in contrast to function, is multipurpose thing  it can be used to deduce any parameter from other ones. For example, in this hypothetical definition:
head  Collection Int a :: a > Int
we define 'head' function for any Collection containing Ints.
And in this, again, hypothetical definition:
data Safe c  Collection c a = Safe c a
we deduced element type 'a' from collection type 'c' passed as the parameter to the type constructor.
Back to real Haskell  type classes
Reading all those glorious examples you may be wondering  why Haskell don't yet supports fullfeatured type functions? Hold your breath... Haskell already contains them and at least GHC implements all the mentioned abilities more than 10 years ago! They just was named... TYPE CLASSES! Let's translate all our examples to their language:
class IsSimple a
instance IsSimple Bool
instance IsSimple Int
....
instance IsSimple Double
Haskell'98 standard supports type classes with only one parameter that limits us to defining only type predicates like this one. But GHC and Hugs supports multiparameter type classes that allows us to define arbitrarilycomplex type functions
class Collection a c
instance Collection a [a]
instance Collection a (Set a)
instance Collection a (Map b a)
All the "hypothetical" Haskell extensions we investigated earlier  actually implemented at the type class level!
Pattern matching:
instance Collection a [a]
Multiple statements:
instance Collection a [a]
instance Collection a (Set a)
Recursion:
instance (Collection a c) => Collection a [c]
Pattern guards:
instance (IsSimple a) => Collection a (UArray a)
Let's define type class which contains any collection which uses Int as its elements or indexes:
class HasInt a
instance HasInt Int
instance (HasInt a) => HasInt [a]
instance (HasInt a) => HasInt (Map a b)
instance (HasInt b) => HasInt (Map a b)
Anther example is a class that replaces all occurrences of 'a' with
'b' in type 't' and return result as 'res':
class Replace t a b res
instance Replace t a a t
instance Replace [t] a b [Replace t a b]
instance (Replace t a b res)
=> Replace (Set t) a b (Set res)
instance (Replace t1 a b res1, Replace t2 a b res2)
=> Replace (Map t1 t2) a b (Map res1 res2)
instance Replace t a b t
You can compare it to the hypothetical definition we gave earlier. It's important to note that type class instances, as opposite to function statements, are not checked in order. Instead, most _specific_ instance automatically selected. So, in Replace case, the last instance that is most general will be selected only if all other are failed to match and that is that we want.
In many other cases this automatic selection is not powerful enough and we are forced to use some artificial tricks or complain to the language developers. The two most wellknown language extensions proposed to solve such problems are instance priorities, which allow to explicitly specify instance selection order, and '/=' constraints, which can be used to explicitly prohibit unwanted matches:
instance Replace t a a t
instance (a/=b) => Replace [t] a b [Replace t a b]
instance (a/=b, t/=[_]) => Replace t a b t
You can check that these instances are no more overlaps.
In practice, typelevel arithmetic by itself is not very useful. It becomes a
fantastic tool when combined with another feature that type classes provide 
member functions. For example:
class Collection a c where
foldr1 :: (a > a > a) > c > a
class Num a where
(+) :: a > a > a
sum :: (Num a, Collection a c) => c > a
sum = foldr1 (+)
I'll be also glad to see possibility to use type classes in data
declarations like this:
data Safe c = (Collection c a) => Safe c a
but afaik this is also not yet implemented
UNIFICATION
...
Back to GADTs
If you are wonder how relates all these interesting type manipulations to GADTs, now you come to the answer. As you know, Haskell contains highly developed ways to express datatodata functions. Now we also know that Haskell contains rich facilities to write typetotype functions in form of "type" statements and type classes. But how do "data" statements fits in this infrastructure?
My answer: they just defines typetodata constructors translation. Moreover, this translation may give multiple results. Say, the following definition:
data Maybe a = Just a  Nothing
defines typetodata constructors function "Maybe" that has parameter "a" and for each "a" has two possible results  "Just a" and "Nothing". We can rewrite it in the same hypothetical syntax that was used above for multivalue type functions:
data Maybe a = Just a
Maybe a = Nothing
Or how about this:
data List a = Cons a (List a)
List a = Nil
and this:
data Either a b = Left a
Either a b = Right b
But how flexible are "data" definitions? As you should remember, "type" definitions was very limited in their features, while type classes, vice versa, much more developed than ordinary Haskell functions facilities. What about features of "data" definitions examined as sort of functions?
On the one hand, they supports multiple statements and multiple results and can be recursive, like the "List" definition above. On the other, that's all  no pattern matching or even type constants on the left side and no guards.
Lack of pattern matching means that left side can contain only free type variables, that in turn means that left sides of all "data" statements for one type will be essentially the same. Therefore, repeated left sides in multistatement "data" definitions are omitted and instead of
data Either a b = Left a
Either a b = Right b
we write just
data Either a b = Left a
 Right b
And here finally comes the GADTs! It's just a way to define data types using
pattern matching and constants on the left side of "data" statements! How
about this:
data T String = D1 Int
T Bool = D2
T [a] = D3 (a,a)
Amazed? After all, GADTs seems really very simple and obvious extension to data type definition facilities.
The idea is to allow a data constructor's return type to be specified directly:
data Term a where
Lit :: Int > Term Int
Pair :: Term a > Term b > Term (a,b)
...
In a function that performs pattern matching on Term, the pattern match gives type as well as value information. For example, consider this function:
eval :: Term a > a
eval (Lit i) = i
eval (Pair a b) = (eval a, eval b)
...
If the argument matches Lit, it must have been built with a Lit constructor, so type 'a' must be Int, and hence we can return 'i' (an Int) in the right hand side. The same objections applies to the Pair constructor.
Further reading
The best paper on type level arithmetic using type classes I've seen is "Faking it: simulating dependent types in Haskell" ( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most part of my article is just duplicates his work.
The great demonstration of typelevel arithmetic is TypeNats package which "defines typelevel natural numbers and arithmetic operations on them including addition, subtraction, multiplication, division and GCD" ( darcs get partial tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )
I should also mention here Oleg Kiselyov page on typelevel programming in Haskell: http://okmij.org/ftp/Haskell/types.html
There are plenty of GADTrelated papers, but best for beginners
remains the "Fun with phantom types"
(http://www.informatik.unibonn.de/~ralf/publications/With.pdf).
Phantom types is another name of GADT. You should also know that this
paper uses old GADT syntax. This paper is mustread because it
contains numerous examples of practical GADT usage  theme completely
omitted from my article.
Other GADTrelated papers i know:
"Dynamic Optimization for Functional Reactive Programming using Generalized Algebraic Data Types" http://www.cs.nott.ac.uk/~nhn/Publications/icfp2005.pdf
"Phantom types" (actually more scientific version of "Fun with phantom types") http://citeseer.ist.psu.edu/rd/0,606209,1,0.25,Download/http:qSqqSqwww.informatik.unibonn.deqSq~ralfqSqpublicationsqSqPhantom.ps.gz
"Phantom types and subtyping" http://arxiv.org/ps/cs.PL/0403034
"Existentially quantified type classes" by Stuckey, Sulzmann and Wazny (URL?)
Random rubbish from previous versions of article
data family Map k :: * > *
data instance Map () v = MapUnit (Maybe v)
data instance Map (a, b) v = MapPair (Map a (Map b v))
let's consider wellknown 'data' declarations:
data T a = D a a Int
it can be seen as function 'T' from type 'a' to some data constructor.
'T Bool', for example, gives result 'D Bool Bool Int', while
'T [Int]' gives result 'D [Int] [Int] Int'.
'data' declaration can also have several "results", say
data Either a b = Left a  Right b
and "result" of 'Either Int String' can be either "Left Int" or "Right String"
Well, to give compiler confidence that 'a' can be deduced in just one way from 'c', we can add some form of hint:
type Collection :: a c  c>a
Collection a [a]
Collection a (Set a)
Collection a (Map b a)
The first line i added tell the compiler that Collection predicate has two parameters and the second parameter determines the first. Based on this restriction, compiler can detect and prohibit attempts to define different element types for the same collection:
type Collection :: a c  c>a
Collection a (Map b a)
Collection b (Map b a)  error! prohibits functional dependency
Of course, Collection is just a function from 'c' to 'a', but if we will define it directly as a function:
type Collection [a] = a
Collection (Set a) = a
Collection (Map b a) = a
 it can't be used in 'head' definition above. Moreover, using functional dependencies we can define bidirectional functions:
type TwoTimesBigger :: a b  a>b, b>a
TwoTimesBigger Int8 Int16
TwoTimesBigger Int16 Int32
TwoTimesBigger Int32 Int64
or predicates with 3, 4 or more parameters with any relations between them. It's a great power!