Difference between revisions of "GHC/Type families"
m 
(fix link to gadt in user guide) 

(81 intermediate revisions by 39 users not shown)  
Line 1:  Line 1:  
−  [[Category:GHCIndexed types]] 

+  {{GHCUsersGuideglasgow_extstypefamiliesa Type Family section}} 

−  == Type families in GHC == 

−  Indexed type families are a new addition to GHC's type extensions. They are currently an '''experimental''' feature whose design is bound to still change to some degree. Type families are a generalisation of [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html associated data types] and [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html associated type synonyms], which are sketched in a [http://www.cse.unsw.edu.au/~chak/papers/SSPC07.html recent draft paper]. They essentially provide typeindexed data types and named functions on types, which are useful for generic programming and highly parameterised library interfaces as well as interfaces with enhanced static information, much like dependent types. 

−  '''NB:''' Type families are currently only available in GHC's development version. Type synonym families are especially new and only recommended for early adopters. In any case, feedback of what works and what doesn't is most appreciated. Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. 

+  Indexed type families, or '''type families''' for short, are a Haskell extension supporting adhoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type classtype classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types. 

−  == What do I need to use type families? == 

+  Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages. 

−  Indexed type families are implemented in GHC's HEAD (from version 6.9), which you can obtain in binary form from the [http://www.haskell.org/ghc/dist/current/dist/ nightly snapshots] or in source form from the [http://hackage.haskell.org/trac/ghc/wiki/Building/GettingTheSources source repository]. Please use the the [mailto:glasgowhaskellusers@haskell.org GHC mailing list] for questions or for the discussion of this language extension and its description on this wiki page. 

+  ''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>XTypeFamilies</tt> even though the code is legal Haskell 98.'' 

−  
−  To enable type families in GHC 6.9, you need to supply the compiler option <tt>XTypeFamilies</tt> (or <tt>fglasgowexts</tt>, which implies it). NB: The option used to be called <tt>findexedtypes</tt> and then <tt>ftypefamilies</tt> before GHC's change in policy for options enabling language extensions. 

== What are type families? == 
== What are type families? == 

−  Indexed type families, or type families for short, are type constructors that represent ''sets of types.'' Set members are denoted by supplying the type family constructor with type parameters, which are called ''type indicies''. The difference between vanilla parametrised type constructors and family constructors is much like between parametrically polymorphic functions and (adhoc polymorphic) methods of type classes. Parametric polymorphic functions behave the same at all type instances, whereas class methods can change their behaviour in dependence on the class type parameters. Similarly, vanilla type constructors imply the same data representation for all type instances, but family constructors can have varying representation types for varying type indicies. 

+  The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types). 

+  
+  Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain. 

+  
+  As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows: 

+  
+  <haskell> 

+   Declare a listlike data family 

+  data family XList a 

+  
+   Declare a listlike instance for Char 

+  data instance XList Char = XCons !Char !(XList Char)  XNil 

+  
+   Declare a numberlike instance for () 

+  data instance XList () = XListUnit !Int 

+  </haskell> 

+  
+  The righthand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFunstutorial paper]] has more indepth examples of programming with type families. 

+  
+  [[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of typelevel programming than the relational style of functional dependencies. 

+  
+  == Requirements to use type families == 

−  Indexed type families come in two flavours: ''data families'' and ''type synonym families''. They are the indexed family variants of algebraic data types and type synonyms, respectively. The instances of data families can be data types and newtypes. 

+  Type families are a GHC extension enabled with the <code>XTypeFamilies</code> flag or the <code>{# LANGUAGE TypeFamilies #}</code> option. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation. 

== An associated data type example == 
== An associated data type example == 

−  As an example, consider Ralf Hinze's [http://www. 
+  As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. 
=== The class declaration === 
=== The class declaration === 

Line 32:  Line 49:  
insert :: k > v > GMap k v > GMap k v 
insert :: k > v > GMap k v > GMap k v 

</haskell> 
</haskell> 

−  The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type 
+  The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/typefamilies.html#datafamilydeclarations ''kind signature''] (here <hask>* > *</hask>) for the associated data type <hask>GMap k</hask>  analogous to how methods receive a type signature in a class declaration. 
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multiparameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts. 
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multiparameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts. 

Line 40:  Line 57:  
=== An Int instance === 
=== An Int instance === 

−  To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data. 
+  To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly: 
<haskell> 
<haskell> 

instance GMapKey Int where 
instance GMapKey Int where 

−  data GMap Int v = GMapInt (Data. 
+  data GMap Int v = GMapInt (Data.IntMap.IntMap v) 
−  empty = GMapInt Data. 
+  empty = GMapInt Data.IntMap.empty 
−  lookup k (GMapInt m) 
+  lookup k (GMapInt m) = Data.IntMap.lookup k m 
−  insert k v (GMapInt m) = GMapInt (Data. 
+  insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m) 
</haskell> 
</haskell> 

−  The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type. 
+  The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type. 
−  NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a 
+  NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future. 
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>. 
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>. 

Line 68:  Line 85:  
=== Product and sum instances === 
=== Product and sum instances === 

−  +  Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>): 

<haskell> 
<haskell> 

instance (GMapKey a, GMapKey b) => GMapKey (a, b) where 
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where 

Line 84:  Line 101:  
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2 
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2 

insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2 
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2 

−  insert (Right 
+  insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2) 
</haskell> 
</haskell> 

−  If you find this code algorithmically surprising, I'd suggest to have a look at [http://www. 
+  If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask>  not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context. 
=== Using a generic map === 
=== Using a generic map === 

−  Finally, some code building and 
+  Finally, some code building and querying a generic map: 
<haskell> 
<haskell> 

myGMap :: GMap (Int, Either Char ()) String 
myGMap :: GMap (Int, Either Char ()) String 

Line 105:  Line 122:  
=== Download the code === 
=== Download the code === 

−  If you want to play with this example without copying it off the wiki, just download the [http://darcs.haskell.org/testsuite/tests/ghcregress/indexedtypes/should_run/GMapAssoc.hs 
+  If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghcregress/indexedtypes/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite. 
== Detailed definition of data families == 
== Detailed definition of data families == 

−  Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type 
+  Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the typeindices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were  possibly accidentally  omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types. 
=== Family declarations === 
=== Family declarations === 

Line 121:  Line 138:  
data family Array e 
data family Array e 

</haskell> 
</haskell> 

−  Named arguments can also be given explicit kind signatures if needed. Just as with [ 
+  Named arguments can also be given explicit kind signatures if needed. Just as with [https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#generalisedalgebraicdatatypesgadts GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with 
<haskell> 
<haskell> 

data family Array :: * > * 
data family Array :: * > * 

Line 128:  Line 145:  
==== Associated family declarations ==== 
==== Associated family declarations ==== 

−  When a data family is declared as part of a type class, we drop the <hask>family</hask> 
+  When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form 
<haskell> 
<haskell> 

class GMapKey k where 
class GMapKey k where 

Line 134:  Line 151:  
... 
... 

</haskell> 
</haskell> 

−  In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type 
+  In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as typeindices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. 
+  
+  Example is admissible: 

<haskell> 
<haskell> 

−  class C a b c where 
+  class C a b c where { data T c a :: * }  OK 
−  data T 
+  class C a b c where { data T a a :: * }  Bad: repeated variable 
+  class D a where { data T a x :: * }  Bad: x is not a class variable 

+  class D a where { data T a :: * > * }  OK 

</haskell> 
</haskell> 

Line 148:  Line 165:  
In this example, the declaration has only one variant. In general, it can be any number. 
In this example, the declaration has only one variant. In general, it can be any number. 

−  Data and newtype instance declarations are only legit when an appropriate family declaration is in scope  just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although 
+  Data and newtype instance declarations are only legit when an appropriate family declaration is in scope  just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both. 
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible: 
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible: 

Line 155:  Line 172:  
data instance T Int = A 
data instance T Int = A 

data instance T Char = B 
data instance T Char = B 

−  +  nonsense :: T a > Int 

−  +  nonsense A = 1  WRONG: These two equations together... 

−  +  nonsense B = 2  ...will produce a type error. 

</haskell> 
</haskell> 

−  Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families 
+  Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families  in contrast to GADTs  are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct. 
==== Associated type instances ==== 
==== Associated type instances ==== 

Line 166:  Line 183:  
<haskell> 
<haskell> 

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where 
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where 

−  data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v 
+  data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) 
... 
... 

</haskell> 
</haskell> 

−  The most important point about associated family instances is that the type 
+  The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>. 
−  Instances for an associated family can only appear as part of 
+  Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared  just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type. 
==== Scoping of class parameters ==== 
==== Scoping of class parameters ==== 

Line 194:  Line 211:  
deriving Show 
deriving Show 

</haskell> 
</haskell> 

−  which 
+  which implicitly defines an instance of the form 
<haskell> 
<haskell> 

instance Show v => Show (GMap () v) where ... 
instance Show v => Show (GMap () v) where ... 

</haskell> 
</haskell> 

−  Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct. 
+  Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct. 
==== Overlap ==== 
==== Overlap ==== 

Line 219:  Line 236:  
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name. 
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name. 

* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported. 
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported. 

−  * <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask> 
+  * <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>. 
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before. 
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before. 

* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before. 
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before. 

Line 258:  Line 275:  
=== An <hask>instance</hask>=== 
=== An <hask>instance</hask>=== 

−  Instances change 
+  Instances change correspondingly. An instance of the twoparameter class 
<haskell> 
<haskell> 

instance Eq e => Collects e [e] where 
instance Eq e => Collects e [e] where 

Line 269:  Line 286:  
toList l = l 
toList l = l 

</haskell> 
</haskell> 

−  +  becomes an instance of a singleparameter class, where the dependent type parameter turns into an associated type instance declaration: 

<haskell> 
<haskell> 

instance Eq e => Collects [e] where 
instance Eq e => Collects [e] where 

Line 298:  Line 315:  
== Detailed definition of type synonym families == 
== Detailed definition of type synonym families == 

−  Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type 
+  Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the typeindices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were  possibly accidentally  omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types. 
=== Family declarations === 
=== Family declarations === 

Line 313:  Line 330:  
<haskell> 
<haskell> 

type family F a b :: * > *  F's arity is 2, 
type family F a b :: * > *  F's arity is 2, 

−   although 
+   although its overall kind is * > * > * > * 
</haskell> 
</haskell> 

Given this declaration the following are examples of wellformed and malformed types: 
Given this declaration the following are examples of wellformed and malformed types: 

Line 322:  Line 339:  
F Bool  WRONG: unsaturated application 
F Bool  WRONG: unsaturated application 

</haskell> 
</haskell> 

+  
+  A toplevel type family can be declared as open or closed. (Associated type 

+  families are always open.) A closed type family has all of its equations 

+  defined in one place and cannot be extended, whereas an open family can have 

+  instances spread across modules. The advantage of a closed family is that 

+  its equations are tried in order, similar to a termlevel function definition: 

+  <haskell> 

+  type family G a where 

+  G Int = Bool 

+  G a = Char 

+  </haskell> 

+  With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask> 

+  and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families. 

==== Associated family declarations ==== 
==== Associated family declarations ==== 

Line 331:  Line 361:  
... 
... 

</haskell> 
</haskell> 

−  The argument names of the type family must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrieved example is admissible: 

+  Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples 

<haskell> 
<haskell> 

−  class C a b c where 
+  class C a b c where { type T c a :: * }  OK 
−  type T 
+  class D a where { type T a x :: * }  No: x is not a class parameter 
+  class D a where { type T a :: * > * }  OK 

</haskell> 
</haskell> 

−  These rules are exactly as for associated data families. 

−  === 
+  === Type instance declarations === 
−  Instance declarations of type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be nonvariable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible  these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is 
+  Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be nonvariable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible  these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is 
<haskell> 
<haskell> 

type instance Elem [e] = e 
type instance Elem [e] = e 

</haskell> 
</haskell> 

−  Type family instance declarations are only legitimate when an appropriate family declaration is in scope  just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration, and the number of type parameters in an instance declaration must match the number of type parameters in the family declaration. Finally, the righthand side of a type instance must be a monotype; i.e., it may not include explicit foralls. Here are some examples of admissible and illegal type instances: 

+  A type family instance declaration must satisfy the following rules: 

+  * An appropriate family declaration is in scope  just like class instances require the class declaration to be visible. 

+  * The instance declaration conforms to the kind determined by its family declaration 

+  * The number of type parameters in an instance declaration matches the number of type parameters in the family declaration. 

+  * The righthand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain. 

+  
+  Here are some examples of admissible and illegal type instances and closed families: 

<haskell> 
<haskell> 

type family F a :: * 
type family F a :: * 

Line 353:  Line 388:  
type instance F (forall a. (a, b)) = b  WRONG: a forall type appears in a type parameter 
type instance F (forall a. (a, b)) = b  WRONG: a forall type appears in a type parameter 

type instance F Float = forall a.a  WRONG: righthand side may not be a forall type 
type instance F Float = forall a.a  WRONG: righthand side may not be a forall type 

+  
+  type family F2 a where  OK! 

+  F2 (Maybe Int) = Int 

+  F2 (Maybe Bool) = Bool 

+  F2 (Maybe a) = String 

type family G a b :: * > * 
type family G a b :: * > * 

Line 358:  Line 398:  
type instance G Int Char Float = Double  WRONG: must be two type parameters 
type instance G Int Char Float = Double  WRONG: must be two type parameters 

</haskell> 
</haskell> 

+  
+  ==== Closed family simplification ==== 

+  
+  Included in ghc starting 7.8.1. 

+  
+  When dealing with closed families, simplifying the type is harder than just finding a lefthand side that matches and replacing that with a righthand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold: 

+  
+  # There is a substitution from the variables in the equation's LHS that makes the lefthand side of the branch coincide with the target. 

+  # For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation. 

+  
+  Now, we define ''apart'' and ''compatible'': 

+  # Two types are ''apart'' when one cannot simplify to the other, even after arbitrary typefamily simplifications 

+  # Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification. 

+  
+  Some examples are in order: 

+  <haskell> 

+  type family F a where 

+  F Int = Bool 

+  F Bool = Char 

+  F a = Bool 

+  
+  type family And (a :: Bool) (b :: Bool) :: Bool where 

+  And False c = False 

+  And True d = d 

+  And e False = False 

+  And f True = f 

+  And g g = g 

+  </haskell> 

+  
+  In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>. 

+  
+  In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification. 

+  
+  Why do all of this? It's a matter of type safety. Consider this example: 

+  
+  <haskell> 

+  type family J a b where 

+  J a a = Int 

+  J a b = Bool 

+  </haskell> 

+  
+  Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>. 

+  
+  It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed termlevel GADT pattern matches to refine its type assumptions. For example: 

+  
+  <haskell> 

+  data G :: * > * where 

+  GInt :: G Int 

+  GBool :: G Bool 

+  
+  type family Foo (a :: *) :: * where 

+  Foo Int = Char 

+  Foo a = Double 

+  
+  bar :: G a > Foo a 

+  bar GInt = 'x' 

+  bar _ = 3.14 

+  </haskell> 

+  
+  The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet. 

==== Associated type instances ==== 
==== Associated type instances ==== 

Line 369:  Line 469:  
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter. 
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter. 

−  Instances for an associated family can only appear as part of 
+  Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared  just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type. 
==== Overlap ==== 
==== Overlap ==== 

−  The instance declarations of 
+  The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. 
−  Here are two 
+  Here are two examples to illustrate the condition under which overlap is permitted. 
<haskell> 
<haskell> 

type instance F (a, Int) = [a] 
type instance F (a, Int) = [a] 

Line 386:  Line 486:  
==== Decidability ==== 
==== Decidability ==== 

−  In order to guarantee that type inference in the presence of type families decidable, we need to place a number of additional restrictions on the formation of type instance declarations. Instance declarations have the general form 
+  In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form 
<haskell> 
<haskell> 

type instance F t1 .. tn = t 
type instance F t1 .. tn = t 

</haskell> 
</haskell> 

where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, 
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, 

−  # the total number of symbols (type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, 

+  # <hask>s1 .. sm</hask> do not contain any type family constructors, 

−  # 
+  # the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and 
−  # <hask>s1 .. sm</hask> 
+  # for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>. 
−  These restrictions are 
+  These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application  see the above mentioned paper for details. 
+  
+  If the option <tt>XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference. 

=== Equality constraints === 
=== Equality constraints === 

−  Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which 
+  Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example: 
<haskell> 
<haskell> 

sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 > c2 > c2 
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 > c2 > c2 

</haskell> 
</haskell> 

−  where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary 
+  where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higherrank types are otherwise enabled. 
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form 
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form 

Line 413:  Line 513:  
type F a 
type F a 

</haskell> 
</haskell> 

−  That is, we represent every functional dependency (FD) <hask>a1 .. an > b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process. 
+  That is, we represent every functional dependency (FD) <hask>a1 .. an > b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process. 
+  
+  == Frequently asked questions == 

+  
+  === Comparing type families and functional dependencies === 

+  
+  Functional dependencies cover some of the same territory as type families. How do the two compare? 

+  There are some articles about this question: 

+  
+  * Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]" 

+  * [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families 

+  
+  === Injectivity, type inference, and ambiguity === 

+  
+  A common problem is if: 

+  
+  <haskell> 

+  type family F a 

+  
+  f :: F a > F a 

+  f = undefined 

+  
+  g :: F Int > F Int 

+  g x = f x 

+  </haskell> 

+  
+  Then compiler complains about the definition of <tt>g</tt>: 

+  
+  <code> 

+  Couldn't match expected type `F Int' against inferred type `F a1'. 

+  </code> 

+  
+  In typechecking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 > F a1</tt> for some asyetunknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances 

+  <haskell> 

+  type instance F Int = Bool 

+  type instance F Char = Bool 

+  </haskell> 

+  Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected. 

+  
+  However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus 

+  <haskell> 

+  f :: F a > F a 

+  f = undefined 

+  
+  g x = f x 

+  </haskell> 

+  GHC will happily infer the type <tt>g :: F a > F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.) 

+  
+  '''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>. 

+  The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for noninjective type functions. 

+  
+  The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: 

+  <haskell> 

+  bad :: (Read a, Show a) => String > String 

+  bad x = show (read x) 

+  </haskell> 

+  At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated. 

+  
+  The only solution is to avoid ambiguous types. In the type signature of a function, 

+  * Ensure that every type variable occurs in the part after the "<tt>=></tt>" 

+  * Ensure that every type variable appears at least once outside a type function call. 

+  Alternatively, you can use data families, which create new types and are therefore injective. The following code works: 

+  
+  <haskell>data family F a 

+  
+  f :: F a > F a 

+  f = undefined 

+  
+  g :: F Int > F Int 

+  g x = f x</haskell> 

== References == 
== References == 

Line 420:  Line 520:  
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241253, 2005. 
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241253, 2005. 

* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007. 
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007. 

−  * [http://www.cse.unsw.edu.au/~chak/papers/ 
+  * [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon PeytonJones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 5162, 2008. 
+  * [[Simonpj/Talk:FunWithTypeFuns  Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chungchieh Shan (the source for this paper can be found at http://patchtag.com/r/schoenfinkel/typefunctions/wiki ) 

[[Category:Typelevel programming]] 
[[Category:Typelevel programming]] 

−  [[Category:Language 
+  [[Category:Language extensions]] 
+  [[Category:GHCIndexed types]] 
Latest revision as of 05:58, 23 July 2020
The GHC Users Guide has a Type Family section.
Indexed type families, or type families for short, are a Haskell extension supporting adhoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of type classes: families are used to define overloaded data in the same way that classes are used to define overloaded functions. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.
Type families come in two flavors: data families and type synonym families. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or associated with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.
NB: see also Simon's blog entry on let generalisation for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with XTypeFamilies even though the code is legal Haskell 98.
Contents
What are type families?
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called type indices) yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its kind and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of Char
in the usual way, with cons cells. We can do the same thing to represent a list of ()
, but since a strict ()
value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's Char
then the list consists of cons cells; if it's ()
, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:
 Declare a listlike data family
data family XList a
 Declare a listlike instance for Char
data instance XList Char = XCons !Char !(XList Char)  XNil
 Declare a numberlike instance for ()
data instance XList () = XListUnit !Int
The righthand sides of the two data instance
declarations are exactly ordinary data definitions. In fact, a data instance
declaration is nothing more than a shorthand for a data
declaration followed by a type instance
(see below) declaration. However, they define two instances of the same parametric data type, XList Char
and XList ()
, whereas ordinary data declarations define completely unrelated types. A recent tutorial paper has more indepth examples of programming with type families.
GADTs bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of typelevel programming than the relational style of functional dependencies.
Requirements to use type families
Type families are a GHC extension enabled with the XTypeFamilies
flag or the {# LANGUAGE TypeFamilies #}
option. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation.
An associated data type example
As an example, consider Ralf Hinze's generalised tries, a form of generic finite maps.
The class declaration
We define a type class whose instances are the types that we can use as keys in our generic maps:
class GMapKey k where
data GMap k :: * > *
empty :: GMap k v
lookup :: k > GMap k v > Maybe v
insert :: k > v > GMap k v > GMap k v
The interesting part is the associated data family declaration of the class. It gives a kind signature (here * > *
) for the associated data type GMap k
 analogous to how methods receive a type signature in a class declaration.
What it is important to notice is that the first parameter of the associated type GMap
coincides with the class parameter of GMapKey
. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multiparameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.
The second important point is that as GMap k
has kind * > *
and k
(implicitly) has kind *
, the type constructor GMap
(without an argument) has kind * > * > *
. Consequently, we see that GMap
is applied to two arguments in the signatures of the methods empty
, lookup
, and insert
.
An Int instance
To use Ints as keys into generic maps, we declare an instance that simply uses Data.IntMap
, thusly:
instance GMapKey Int where
data GMap Int v = GMapInt (Data.IntMap.IntMap v)
empty = GMapInt Data.IntMap.empty
lookup k (GMapInt m) = Data.IntMap.lookup k m
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)
The Int
instance of the associated data type GMap
needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here v
). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.
As an exercise, implement an instance for Char
that maps back to the Int
instance using the conversion functions Char.ord
and Char.chr
.
A unit instance
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for GMap
:
instance GMapKey () where
data GMap () v = GMapUnit (Maybe v)
empty = GMapUnit Nothing
lookup () (GMapUnit v) = v
insert () v (GMapUnit _) = GMapUnit $ Just v
For unit, the map is just a Maybe
value.
Product and sum instances
Next, let us define the instances for pairs and sums (i.e., Either
):
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where
data GMap (a, b) v = GMapPair (GMap a (GMap b v))
empty = GMapPair empty
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of
Nothing > insert a (insert b v empty) gm
Just gm2 > insert a (insert b v gm2 ) gm
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
empty = GMapEither empty empty
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)
If you find this code algorithmically surprising, I'd suggest to have a look at Ralf Hinze's paper. The only novelty concerning associated types, in these two instances, is that the instances have a context (GMapKey a, GMapKey b)
. Consequently, the right hand sides of the associated type declarations can use GMap
recursively at the key types a
and b
 not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.
Using a generic map
Finally, some code building and querying a generic map:
myGMap :: GMap (Int, Either Char ()) String
myGMap = insert (5, Left 'c') "(5, Left 'c')" $
insert (4, Right ()) "(4, Right ())" $
insert (5, Right ()) "This is the one!" $
insert (5, Right ()) "This is the two!" $
insert (6, Right ()) "(6, Right ())" $
insert (5, Left 'a') "(5, Left 'a')" $
empty
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap
Download the code
If you want to play with this example without copying it off the wiki, just download the source code[1] for GMap
from GHC's test suite.
Detailed definition of data families
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the typeindices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were  possibly accidentally  omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.
Family declarations
Indexed data families are introduced by a signature, such as
data family GMap k :: * > *
The special family
distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to *
if omitted. An example is
data family Array e
Named arguments can also be given explicit kind signatures if needed. Just as with GADT declarations named arguments are entirely optional, so that we can declare Array
alternatively with
data family Array :: * > *
Associated family declarations
When a data family is declared as part of a type class, we drop the family
keyword. The GMap
declaration takes the following form
class GMapKey k where
data GMap k :: * > *
...
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as typeindices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: the named type parameters of the data declaration must be a permutation of a subset of the class variables.
Example is admissible:
class C a b c where { data T c a :: * }  OK
class C a b c where { data T a a :: * }  Bad: repeated variable
class D a where { data T a x :: * }  Bad: x is not a class variable
class D a where { data T a :: * > * }  OK
Instance declarations
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword data
or newtype
is followed by instance
and that some or all of the type arguments can be nonvariable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible  exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the Either
instance for GMap
is
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
In this example, the declaration has only one variant. In general, it can be any number.
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope  just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the data
keyword, instances can be either data
or newtype
s, or a mix of both.
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:
data family T a
data instance T Int = A
data instance T Char = B
nonsense :: T a > Int
nonsense A = 1  WRONG: These two equations together...
nonsense B = 2  ...will produce a type error.
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families  in contrast to GADTs  are open; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.
Associated type instances
When an associated family instance is declared within a type class instance, we drop the instance
keyword in the family instance. So, the Either
instance for GMap
becomes:
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
...
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of GMap
, namely Either a b
, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable v
.
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared  just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as undefined
, can assume the type.
Scoping of class parameters
In the case of multiparameter type classes, the visibility of class parameters in the righthand side of associated family instances depends solely on the parameters of the data family. As an example, consider the simple class declaration
class C a b where
data T a
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:
instance C [c] d where
data T [c] = MkT (c, d)  WRONG!! 'd' is not in scope
Here, the righthand side of the data instance mentions the type variable d
that does not occur in its lefthand side. We cannot admit such data instances as they would compromise type safety.
Type class instances of family instances
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have deriving
clauses. For example, we can write
data GMap () v = GMapUnit (Maybe v)
deriving Show
which implicitly defines an instance of the form
instance Show v => Show (GMap () v) where ...
Note that class instances are always for particular instances of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of different instances of a single type family. It would require a form of extensible case construct.
Overlap
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.
Import and export
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation T(..)
in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form T(..)
denotes the family constructor and all currently visible data constructors  in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as GMap(GMapEither)
, is analogous.
Associated families
As expected, an import or export item of the form C(..)
denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword type
. So for example, when explicitly listing the components of the GMapKey
class, we write GMapKey(type GMap, empty, lookup, insert)
.
Examples
Assuming our running GMapKey
class example, let us look at some export lists and their meaning:

module GMap (GMapKey) where...
: Exports just the class name. 
module GMap (GMapKey(..)) where...
: Exports the class, the associated typeGMap
and the member functionsempty
,lookup
, andinsert
. None of the data constructors is exported. 
module GMap (GMapKey(..), GMap(..)) where...
: As before, but also exports all the data constructorsGMapInt
,GMapChar
,GMapUnit
,GMapPair
, andGMapEither
. 
module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...
: As before. 
module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...
: As before.
Finally, you can write GMapKey(type GMap)
to denote both the class GMapKey
as well as its associated type GMap
. However, you cannot write GMapKey(type GMap(..))
— i.e., subcomponent specifications cannot be nested. To specify GMap
's data constructors, you have to list it separately.
Instances
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.
An associated type synonym example
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expressionoriented notation; i.e., functions on types are really represented by functions and not relations.
The class
declaration
Here's an example from Mark Jones' seminal paper on functional dependencies:
class Collects e ce  ce > e where
empty :: ce
insert :: e > ce > ce
member :: e > ce > Bool
toList :: ce > [e]
With associated type synonyms we can write this as
class Collects ce where
type Elem ce
empty :: ce
insert :: Elem ce > ce > ce
member :: Elem ce > ce > Bool
toList :: ce > [Elem ce]
Instead of the multiparameter type class, we use a single parameter class, and the parameter e
turned into an associated type synonym Elem ce
.
An instance
Instances change correspondingly. An instance of the twoparameter class
instance Eq e => Collects e [e] where
empty = []
insert e l = (e:l)
member e [] = False
member e (x:xs)
 e == x = True
 otherwise = member e xs
toList l = l
becomes an instance of a singleparameter class, where the dependent type parameter turns into an associated type instance declaration:
instance Eq e => Collects [e] where
type Elem [e] = e
empty = []
insert e l = (e:l)
member e [] = False
member e (x:xs)
 e == x = True
 otherwise = member e xs
toList l = l
Using generic collections
With Functional Dependencies the code would be:
sumCollects :: (Collects e c1, Collects e c2) => c1 > c2 > c2
sumCollects c1 c2 = foldr insert c2 (toList c1)
In contrast, with associated type synonyms, we get:
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 > c2 > c2
sumCollects c1 c2 = foldr insert c2 (toList c1)
Detailed definition of type synonym families
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the typeindices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were  possibly accidentally  omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.
Family declarations
Indexed type families are introduced by a signature, such as
type family Elem c :: *
The special family
distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to *
if omitted. An example is
type family Elem c
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:
type family F a b :: * > *  F's arity is 2,
 although its overall kind is * > * > * > *
Given this declaration the following are examples of wellformed and malformed types:
F Char [Int]  OK! Kind: * > *
F Char [Int] Bool  OK! Kind: *
F IO Bool  WRONG: kind mismatch in the first argument
F Bool  WRONG: unsaturated application
A toplevel type family can be declared as open or closed. (Associated type families are always open.) A closed type family has all of its equations defined in one place and cannot be extended, whereas an open family can have instances spread across modules. The advantage of a closed family is that its equations are tried in order, similar to a termlevel function definition:
type family G a where
G Int = Bool
G a = Char
With this definition, the type G Int
becomes Bool
and, say, G Double
becomes Char
. See also here for more information about closed type families.
Associated family declarations
When a type family is declared as part of a type class, we drop the family
special. The Elem
declaration takes the following form
class Collects ce where
type Elem ce :: *
...
Exactly as in the case of an associated data declaration, the named type parameters must be a permutation of a subset of the class parameters. Examples
class C a b c where { type T c a :: * }  OK
class D a where { type T a x :: * }  No: x is not a class parameter
class D a where { type T a :: * > * }  OK
Type instance declarations
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword type
is followed by instance
and that some or all of the type arguments can be nonvariable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible  these are the exact same requirements as for data instances. For example, the [e]
instance for Elem
is
type instance Elem [e] = e
A type family instance declaration must satisfy the following rules:
 An appropriate family declaration is in scope  just like class instances require the class declaration to be visible.
 The instance declaration conforms to the kind determined by its family declaration
 The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.
 The righthand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.
Here are some examples of admissible and illegal type instances and closed families:
type family F a :: *
type instance F [Int] = Int  OK!
type instance F String = Char  OK!
type instance F (F a) = a  WRONG: type parameter mentions a type family
type instance F (forall a. (a, b)) = b  WRONG: a forall type appears in a type parameter
type instance F Float = forall a.a  WRONG: righthand side may not be a forall type
type family F2 a where  OK!
F2 (Maybe Int) = Int
F2 (Maybe Bool) = Bool
F2 (Maybe a) = String
type family G a b :: * > *
type instance G Int = (,)  WRONG: must be two type parameters
type instance G Int Char Float = Double  WRONG: must be two type parameters
Closed family simplification
Included in ghc starting 7.8.1.
When dealing with closed families, simplifying the type is harder than just finding a lefthand side that matches and replacing that with a righthand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:
 There is a substitution from the variables in the equation's LHS that makes the lefthand side of the branch coincide with the target.
 For each previous equation in the family: either the LHS of that equation is apart from the type family application, or the equation is compatible with the chosen equation.
Now, we define apart and compatible:
 Two types are apart when one cannot simplify to the other, even after arbitrary typefamily simplifications
 Two equations are compatible if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.
Some examples are in order:
type family F a where
F Int = Bool
F Bool = Char
F a = Bool
type family And (a :: Bool) (b :: Bool) :: Bool where
And False c = False
And True d = d
And e False = False
And f True = f
And g g = g
In F
, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from Bool
.
In And
, every pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.
Why do all of this? It's a matter of type safety. Consider this example:
type family J a b where
J a a = Int
J a b = Bool
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce unsafeCoerce
.
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed termlevel GADT pattern matches to refine its type assumptions. For example:
data G :: * > * where
GInt :: G Int
GBool :: G Bool
type family Foo (a :: *) :: * where
Foo Int = Char
Foo a = Double
bar :: G a > Foo a
bar GInt = 'x'
bar _ = 3.14
The last line will fail to typecheck, because GHC doesn't know that the type variable a
can't be Int
here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.
Associated type instances
When an associated family instance is declared within a type class instance, we drop the instance
keyword in the family instance. So, the [e]
instance for Elem
becomes:
instance (Eq (Elem [e])) => Collects ([e]) where
type Elem [e] = e
...
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is [e]
, which coincides with the only class parameter.
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared  just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as undefined
, can assume the type.
Overlap
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety.
Here are two examples to illustrate the condition under which overlap is permitted.
type instance F (a, Int) = [a]
type instance F (Int, b) = [b]  overlap permitted
type instance G (a, Int) = [a]
type instance G (Char, a) = [a]  ILLEGAL overlap, as [Char] /= [Int]
Decidability
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of Type Checking with Open Type Functions). Instance declarations have the general form
type instance F t1 .. tn = t
where we require that for every type family application (G s1 .. sm)
in t
,

s1 .. sm
do not contain any type family constructors,  the total number of symbols (data type constructors and type variables) in
s1 .. sm
is strictly smaller than int1 .. tn
, and  for every type variable
a
,a
occurs ins1 .. sm
at most as often as int1 .. tn
.
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, loopy equalities, such as a ~ [F a]
, where a recursive occurrence of a type variable is underneath a family application and data constructor application  see the above mentioned paper for details.
If the option XUndecidableInstances is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.
Equality constraints
Type context can include equality constraints of the form t1 ~ t2
, which denote that the types t1
and t2
need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 > c2 > c2
where we require that the element type of c1
and c2
are the same. In general, the types t1
and t2
of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higherrank types are otherwise enabled.
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form
class C a b  a > b
to
class (F a ~ b) => C a b where
type F a
That is, we represent every functional dependency (FD) a1 .. an > b
by an FD type family F a1 .. an
and a superclass context equality F a1 .. an ~ b
, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.
Frequently asked questions
Comparing type families and functional dependencies
Functional dependencies cover some of the same territory as type families. How do the two compare? There are some articles about this question:
 Experiences in converting functional dependencies to type families: "Functional dependencies vs. type families"
 GHC trac on a comparison of functional dependencies and type families
Injectivity, type inference, and ambiguity
A common problem is if:
type family F a
f :: F a > F a
f = undefined
g :: F Int > F Int
g x = f x
Then compiler complains about the definition of g:
Couldn't match expected type `F Int' against inferred type `F a1'.
In typechecking g's right hand side GHC discovers (by instantiating f's type with a fresh type variable) that it has type F a1 > F a1 for some asyetunknown type a1. Now it tries to make the inferred type match g's type signature. Well, you say, just make a1 equal to Int and you are done. True, but what if there were these instances
type instance F Int = Bool
type instance F Char = Bool
Then making a1 equal to Char would also make the two types equal. Because there is (potentially) more than one choice, the program is rejected.
However (and confusingly) if you omit the type signature on g altogether, thus
f :: F a > F a
f = undefined
g x = f x
GHC will happily infer the type g :: F a > F a. But you can't write that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC infers a type it can't check, is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)
What is the problem? The nub of the issue is this: knowing that F t1=F t2 does not imply that t1 = t2. The difficulty is that the type function F need not be injective; it can map two distinct types to the same type. For an injective type constructor like Maybe, if we know that Maybe t1 = Maybe t2, then we know that t1 = t2. But not so for noninjective type functions.
The problem starts with f. Its type is ambiguous; even if I know the argument and result types for f, I cannot use that to find the type at which a should be instantiated. (So arguably, f should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes:
bad :: (Read a, Show a) => String > String
bad x = show (read x)
At a call of bad one cannot tell at what type a should be instantiated.
The only solution is to avoid ambiguous types. In the type signature of a function,
 Ensure that every type variable occurs in the part after the "=>"
 Ensure that every type variable appears at least once outside a type function call.
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:
data family F a
f :: F a > F a
f = undefined
g :: F Int > F Int
g x = f x
References
 Associated Types with Class. Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In Proceedings of The 32nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL'05), pages 113, ACM Press, 2005.
 Associated Type Synonyms. Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming, ACM Press, pages 241253, 2005.
 System F with Type Equality Coercions. Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation, ACM Press, 2007.
 Type Checking With Open Type Functions. Tom Schrijvers, Simon PeytonJones, Manuel M. T. Chakravarty, Martin Sulzmann. In Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming, ACM Press, pages 5162, 2008.
 Fun with Type Functions Oleg Kiselyov, Simon Peyton Jones, Chungchieh Shan (the source for this paper can be found at http://patchtag.com/r/schoenfinkel/typefunctions/wiki )