https://wiki.haskell.org/api.php?action=feedcontributions&user=Goens&feedformat=atomHaskellWiki - User contributions [en]2021-10-18T02:35:31ZUser contributionsMediaWiki 1.27.4https://wiki.haskell.org/index.php?title=GADTs_for_dummies&diff=60119GADTs for dummies2015-09-16T21:57:25Z<p>Goens: typo</p>
<hr />
<div>[[Category:Tutorials]]<br />
[[Category:Language extensions]]<br />
<br />
For a long time, I didn't understand what GADTs were or how they could be used. It was sort of a conspiracy of silence &mdash; people who understood GADTs thought<br />
that everything was obvious and didn't need further explanation, but I still<br />
couldn't understand them.<br />
<br />
Now that I have an idea of how it works, I think that it was really obvious. :) So, I want to share my understanding of GADTs. Maybe the way I realized how GADTs work could help<br />
someone else. See also [[Generalised algebraic datatype]]<br />
<br />
== Type functions ==<br />
<br />
A "data" declaration is a way to declare both type constructor and data<br />
constructors. For example,<br />
<br />
<haskell><br />
data Either a b = Left a | Right b<br />
</haskell><br />
<br />
declares type constructor "Either" and two data constructors "Left"<br />
and "Right". Ordinary Haskell functions work with data constructors:<br />
<br />
<haskell><br />
isLeft (Left a) = True<br />
isLeft (Right b) = False<br />
</haskell><br />
<br />
but there is also an analogous way to work with type constructors!<br />
<br />
<haskell><br />
type X a = Either a a<br />
</haskell><br />
<br />
declares a TYPE FUNCTION named "X". Its parameter "a" must be some type<br />
and it returns some type as its result. We can't use "X" on data values,<br />
but we can use it on type values. Type constructors declared with<br />
"data" statements and type functions declared with "type" statements<br />
can be used together to build arbitrarily complex types. In such<br />
"computations" type constructors serves as basic "values" and type<br />
functions as a way to process them.<br />
<br />
Indeed, type functions in Haskell are very limited compared to<br />
ordinary functions - they don't support pattern matching,<br />
nor multiple statements, nor recursion.<br />
<br />
== Hypothetical Haskell extension - Full-featured type functions ==<br />
<br />
Let's build a hypothetical Haskell extension that mimics, for type<br />
functions, the well-known ways to define ordinary functions, including<br />
pattern matching:<br />
<br />
<haskell><br />
type F [a] = Set a<br />
</haskell><br />
<br />
multiple statements (this is meaningful only in the presence of pattern matching):<br />
<br />
<haskell><br />
type F Bool = Char<br />
F String = Int<br />
</haskell><br />
<br />
and recursion (which again needs pattern matching and multiple statements):<br />
<br />
<haskell><br />
type F [a] = F a<br />
F (Map a b) = F b<br />
F (Set a) = F a<br />
F a = a<br />
</haskell><br />
<br />
As you may already have guessed, this last definition calculates a simple base type of arbitrarily-nested collections, e.g.: <br />
<br />
<haskell><br />
F [[[Set Int]]] = <br />
F [[Set Int]] =<br />
F [Set Int] = <br />
F (Set Int) =<br />
F Int =<br />
Int<br />
</haskell><br />
<br />
Let's not forget about statement guards:<br />
<br />
<haskell><br />
type F a | IsSimple a == TrueType = a<br />
</haskell><br />
<br />
Here we define type function F only for simple datatypes by using a<br />
guard type function "IsSimple":<br />
<br />
<haskell><br />
type IsSimple Bool = TrueType<br />
IsSimple Int = TrueType<br />
....<br />
IsSimple Double = TrueType<br />
IsSimple a = FalseType<br />
<br />
data TrueType = T<br />
data FalseType = F<br />
</haskell><br />
<br />
These definitions seem a bit odd, and while we are in imaginary land,<br />
let's consider a way to write this shorter:<br />
<br />
<haskell><br />
type F a | IsSimple a = a<br />
<br />
type IsSimple Bool<br />
IsSimple Int<br />
....<br />
IsSimple Double<br />
</haskell><br />
<br />
Here, we defined a list of simple types. The implied result of all<br />
written statements for "IsSimple" is True, and False for<br />
everything else. Essentially, "IsSimple" is a TYPE PREDICATE!<br />
<br />
I really love it! :) How about constructing a predicate that traverses a<br />
complex type trying to decide whether it contains "Int" anywhere?<br />
<br />
<haskell><br />
type HasInt Int<br />
HasInt [a] = HasInt a<br />
HasInt (Set a) = HasInt a<br />
HasInt (Map a b) | HasInt a<br />
HasInt (Map a b) | HasInt b<br />
</haskell><br />
<br />
or a type function that substitutes one type with another inside<br />
arbitrarily-deep types:<br />
<br />
<haskell><br />
type Replace t a b | t==a = b<br />
Replace [t] a b = [Replace t a b]<br />
Replace (Set t) a b = Set (Replace t a b)<br />
Replace (Map t1 t2) a b = Map (Replace t1 a b) (Replace t2 a b)<br />
Replace t a b = t<br />
</haskell><br />
<br />
== One more hypothetical extension - multi-value type functions ==<br />
<br />
Let's add more fun! We will introduce one more hypothetical Haskell<br />
extension - type functions that may have MULTIPLE VALUES. Say,<br />
<br />
<haskell><br />
type Collection a = [a]<br />
Collection a = Set a<br />
Collection a = Map b a<br />
</haskell><br />
<br />
So, "Collection Int" has "[Int]", "Set Int" and "Map String Int" as<br />
its values, i.e. different collection types with elements of type<br />
"Int".<br />
<br />
Pay attention to the last statement of the "Collection" definition, where<br />
we used the type variable "b" that was not mentioned on the left side,<br />
nor defined in any other way. Since it's perfectly possible for the<br />
"Collection" function to have multiple values, using some free variable on<br />
the right side that can be replaced with any type is not a problem<br />
at all. "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are<br />
possible values of "Collection Int" along with "[Int]" and "Set Int".<br />
<br />
At first glance, it seems that multiple-value functions are meaningless - they<br />
can't be used to define datatypes, because we need concrete types here. But<br />
if we take another look, they can be useful to define type constraints and<br />
type families.<br />
<br />
We can also represent a multiple-value function as a predicate:<br />
<br />
<haskell><br />
type Collection a [a]<br />
Collection a (Set a)<br />
Collection a (Map b a)<br />
</haskell><br />
<br />
If you're familiar with Prolog, you should know that a predicate, in contrast to<br />
a function, is a multi-directional thing - it can be used to deduce any<br />
parameter from the other ones. For example, in this hypothetical definition:<br />
<br />
<haskell><br />
head | Collection Int a :: a -> Int<br />
</haskell><br />
<br />
we define a 'head' function for any Collection containing Ints.<br />
<br />
And in this, again, hypothetical definition:<br />
<br />
<haskell><br />
data Safe c | Collection c a = Safe c a<br />
</haskell><br />
<br />
we deduced element type 'a' from collection type 'c' passed as the<br />
parameter to the type constructor.<br />
<br />
<br />
<br />
== Back to real Haskell - type classes ==<br />
<br />
After reading about all of these glorious examples, you may be wondering<br />
"Why doesn't Haskell support full-featured type functions?" Hold your breath...<br />
Haskell already contains them, and GHC has implemented all of the<br />
capabilities mentioned above for more than 10 years! They were just named...<br />
TYPE CLASSES! Let's translate all of our examples to their language:<br />
<br />
<haskell><br />
class IsSimple a<br />
instance IsSimple Bool<br />
instance IsSimple Int<br />
....<br />
instance IsSimple Double<br />
</haskell><br />
<br />
The Haskell'98 standard supports type classes with only one parameter.<br />
That limits us to only defining type predicates like this one. But GHC and<br />
Hugs support multi-parameter type classes that allow us to define<br />
arbitrarily-complex type functions<br />
<br />
<haskell><br />
class Collection a c<br />
instance Collection a [a]<br />
instance Collection a (Set a)<br />
instance Collection a (Map b a)<br />
</haskell><br />
<br />
All of the "hypothetical" Haskell extensions we investigated earlier are<br />
actually implemented at the type class level!<br />
<br />
Pattern matching:<br />
<br />
<haskell><br />
instance Collection a [a]<br />
</haskell><br />
<br />
Multiple statements:<br />
<br />
<haskell><br />
instance Collection a [a]<br />
instance Collection a (Set a)<br />
</haskell><br />
<br />
Recursion:<br />
<br />
<haskell><br />
instance (Collection a c) => Collection a [c]<br />
</haskell><br />
<br />
Pattern guards:<br />
<br />
<haskell><br />
instance (IsSimple a) => Collection a (UArray a)<br />
</haskell><br />
<br />
<br />
<br />
Let's define a type class which contains any collection which uses Int in<br />
its elements or indexes:<br />
<br />
<haskell><br />
class HasInt a<br />
instance HasInt Int<br />
instance (HasInt a) => HasInt [a]<br />
instance (HasInt a) => HasInt (Map a b)<br />
instance (HasInt b) => HasInt (Map a b)<br />
</haskell><br />
<br />
<br />
Another example is a class that replaces all occurrences of 'a' with<br />
'b' in type 't' and return the result as 'res':<br />
<br />
<haskell><br />
class Replace t a b res<br />
instance Replace t a a t<br />
instance (Replace t a b res)<br />
=> Replace [t] a b [res]<br />
instance (Replace t a b res)<br />
=> Replace (Set t) a b (Set res)<br />
instance (Replace t1 a b res1, Replace t2 a b res2)<br />
=> Replace (Map t1 t2) a b (Map res1 res2)<br />
instance Replace t a b t<br />
</haskell><br />
<br />
You can compare it to the hypothetical definition we gave earlier.<br />
It's important to note that type class instances, as opposed to<br />
function statements, are not checked in order. Instead, the most<br />
_specific_ instance is automatically selected. So, in the Replace case, the<br />
last instance, which is the most general instance, will be selected only if all the others<br />
fail to match, which is what we want.<br />
<br />
In many other cases this automatic selection is not powerful enough<br />
and we are forced to use some artificial tricks or complain to the<br />
language developers. The two most well-known language extensions<br />
proposed to solve such problems are instance priorities, which allow<br />
us to explicitly specify instance selection order, and '/=' constraints,<br />
which can be used to explicitly prohibit unwanted matches:<br />
<br />
<haskell><br />
instance Replace t a a t<br />
instance (a/=b) => Replace [t] a b [Replace t a b]<br />
instance (a/=b, t/=[_]) => Replace t a b t<br />
</haskell><br />
<br />
You can check that these instances no longer overlap.<br />
<br />
<br />
In practice, type-level arithmetic by itself is not very useful. It becomes a<br />
fantastic tool when combined with another feature that type classes provide -<br />
member functions. For example:<br />
<br />
<haskell><br />
class Collection a c where<br />
foldr1 :: (a -> a -> a) -> c -> a<br />
<br />
class Num a where<br />
(+) :: a -> a -> a<br />
<br />
sum :: (Num a, Collection a c) => c -> a<br />
sum = foldr1 (+)<br />
</haskell><br />
<br />
<br />
I'll also be glad to see the possibility of using type classes in data<br />
declarations, like this:<br />
<br />
<haskell><br />
data Safe c = (Collection c a) => Safe c a<br />
</haskell><br />
<br />
but as far as I know, this is not yet implemented.<br />
<br />
<br />
UNIFICATION<br />
...<br />
<br />
== Back to GADTs ==<br />
<br />
If you are wondering how all of these interesting type manipulations relate to<br />
GADTs, here is the answer. As you know, Haskell contains highly<br />
developed ways to express data-to-data functions. We also know that<br />
Haskell contains rich facilities to write type-to-type functions in the form of<br />
"type" statements and type classes. But how do "data" statements fit into this<br />
infrastructure?<br />
<br />
My answer: they just define a type-to-data constructor translation. Moreover,<br />
this translation may give multiple results. Say, the following definition:<br />
<br />
<haskell><br />
data Maybe a = Just a | Nothing<br />
</haskell><br />
<br />
defines type-to-data constructors function "Maybe" that has a parameter<br />
"a" and for each "a" has two possible results - "Just a" and<br />
"Nothing". We can rewrite it in the same hypothetical syntax that was<br />
used above for multi-value type functions:<br />
<br />
<haskell><br />
data Maybe a = Just a<br />
Maybe a = Nothing<br />
</haskell><br />
<br />
Or how about this:<br />
<br />
<haskell><br />
data List a = Cons a (List a)<br />
List a = Nil<br />
</haskell><br />
<br />
and this:<br />
<br />
<haskell><br />
data Either a b = Left a<br />
Either a b = Right b<br />
</haskell><br />
<br />
But how flexible are "data" definitions? As you should remember, "type"<br />
definitions were very limited in their features, while type classes,<br />
on the other hand, were more developed than ordinary Haskell functions<br />
facilities. What about features of "data" definitions examined as sort of functions?<br />
<br />
On the one hand, they supports multiple statements and multiple results and<br />
can be recursive, like the "List" definition above. On the other, that's all -<br />
no pattern matching or even type constants on the left side and no guards.<br />
<br />
Lack of pattern matching means that the left side can contain only free type<br />
variables. That in turn means that the left sides of all "data" statements for a<br />
type will be essentially the same. Therefore, repeated left sides in<br />
multi-statement "data" definitions are omitted and instead of<br />
<br />
<haskell><br />
data Either a b = Left a<br />
Either a b = Right b<br />
</haskell><br />
<br />
we write just<br />
<br />
<haskell><br />
data Either a b = Left a<br />
| Right b<br />
</haskell><br />
<br />
<br />
And here we finally come to GADTs! It's just a way to define data types using<br />
pattern matching and constants on the left side of "data" statements!<br />
Let's say we want to do this:<br />
<br />
<haskell><br />
data T String = D1 Int<br />
T Bool = D2<br />
T [a] = D3 (a,a)<br />
</haskell><br />
<br />
<br />
We cannot do this using a standard data definition. So, now we must use a GADT definition:<br />
<br />
<haskell><br />
data T a where<br />
D1 :: Int -> T String<br />
D2 :: T Bool<br />
D3 :: (a,a) -> T [a]<br />
</haskell><br />
<br />
Amazed? After all, GADTs seem to be a really simple and obvious extension to<br />
data type definition facilities.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
The idea here is to allow a data constructor's return type to be specified<br />
directly:<br />
<br />
<haskell><br />
data Term a where<br />
Lit :: Int -> Term Int<br />
Pair :: Term a -> Term b -> Term (a,b)<br />
...<br />
</haskell><br />
<br />
In a function that performs pattern matching on Term, the pattern match gives<br />
type as well as value information. For example, consider this function:<br />
<br />
<haskell><br />
eval :: Term a -> a<br />
eval (Lit i) = i<br />
eval (Pair a b) = (eval a, eval b)<br />
...<br />
</haskell><br />
<br />
If the argument matches Lit, it must have been built with a Lit constructor,<br />
so type 'a' must be Int, and hence we can return 'i' (an Int) in the right<br />
hand side. The same thing applies to the Pair constructor.<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Further reading ==<br />
<br />
The best paper on type level arithmetic using type classes I've seen<br />
is "Faking it: simulating dependent types in Haskell"<br />
( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.2636 ). Most of<br />
this article comes from his work.<br />
<br />
A great demonstration of type-level arithmetic is in the TypeNats package,<br />
which "defines type-level natural numbers and arithmetic operations on<br />
them including addition, subtraction, multiplication, division and GCD"<br />
( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )<br />
<br />
I should also mention here [http://okmij.org/ftp/Haskell/types.html Oleg Kiselyov's page on type-level programming in Haskell].<br />
<br />
<br />
There are plenty of GADT-related papers, but the best one for beginners<br />
is [http://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf "Fun with phantom types"].<br />
Phantom types is another name of GADT. You should also know that this<br />
paper uses old GADT syntax. This paper is a must-read because it<br />
contains numerous examples of practical GADT usage - a theme completely<br />
omitted from my article.<br />
<br />
<br />
Other GADT-related papers:<br />
<br />
* [http://www.cs.nott.ac.uk/~nhn/Publications/icfp2005.pdf "Dynamic Optimization for Functional Reactive Programming using Generalized Algebraic Data Types"]<br />
<br />
* [http://citeseer.ist.psu.edu/rd/0,606209,1,0.25,Download/http:qSqqSqwww.informatik.uni-bonn.deqSq~ralfqSqpublicationsqSqPhantom.ps.gz "Phantom Types"] (actually a more scientific version of "Fun with phantom types")<br />
<br />
* [http://arxiv.org/ps/cs.PL/0403034 "Phantom types and subtyping"]<br />
<br />
* "Existentially quantified type classes" by Stuckey, Sulzmann and Wazny (URL?)</div>Goens