# GHC/Type families

### From HaskellWiki

m (correction) |
(→Type instance declarations: example with branched instance simplification) |
||

(30 intermediate revisions by 15 users not shown) | |||

Line 1: | Line 1: | ||

− | [[ | + | Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types. |

+ | |||

+ | Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages. | ||

+ | |||

+ | ''NB: see also Simon's [http://hackage.haskell.org/trac/ghc/blog/LetGeneralisationInGhc7 blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.'' | ||

== What are type families? == | == What are type families? == | ||

− | + | The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types). | |

− | + | Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain. | |

− | + | As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows: | |

+ | |||

+ | <haskell> | ||

+ | -- Declare a list-like data family | ||

+ | data family XList a | ||

+ | |||

+ | -- Declare a list-like instance for Char | ||

+ | data instance XList Char = XCons !Char !(XList Char) | XNil | ||

+ | |||

+ | -- Declare a number-like instance for () | ||

+ | data instance XList () = XListUnit !Int | ||

+ | </haskell> | ||

− | + | The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. | |

− | + | [[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependences are similar to type families, and many type classes that use functional dependences can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependences. | |

== What do I need to use type families? == | == What do I need to use type families? == | ||

− | The first stable release of GHC that properly supports type families is 6.10.1. ( | + | Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> pragma. The first stable release of GHC that properly supports type families is 6.10.1. (The 6.8 release included an early partial implementation, but its use is deprecated.) Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for a discussion of this language extension and its description on this wiki page. |

== An associated data type example == | == An associated data type example == | ||

Line 31: | Line 46: | ||

insert :: k -> v -> GMap k v -> GMap k v | insert :: k -> v -> GMap k v -> GMap k v | ||

</haskell> | </haskell> | ||

− | The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - | + | The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration. |

What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts. | What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts. | ||

Line 85: | Line 100: | ||

insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2) | insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2) | ||

</haskell> | </haskell> | ||

− | If you find this code algorithmically surprising, I'd suggest to have a look at [http://www. | + | If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context. |

=== Using a generic map === | === Using a generic map === | ||

Line 104: | Line 119: | ||

=== Download the code === | === Download the code === | ||

− | If you want to play with this example without copying it off the wiki, just download the [http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs | + | If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite. |

== Detailed definition of data families == | == Detailed definition of data families == | ||

Line 169: | Line 184: | ||

<haskell> | <haskell> | ||

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where | instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where | ||

− | data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v | + | data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) |

... | ... | ||

</haskell> | </haskell> | ||

Line 341: | Line 356: | ||

</haskell> | </haskell> | ||

− | === | + | === Type instance declarations === |

− | + | There are two forms of type family instance declaration: unbranched and | |

+ | branched. Branched instances list any number of alternatives, to be | ||

+ | checked in order from top to bottom, similarly to normal function | ||

+ | declarations. Unbranched instances supply only one left-hand side. | ||

+ | |||

+ | Unbranched instance declarations of type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is | ||

<haskell> | <haskell> | ||

type instance Elem [e] = e | type instance Elem [e] = e | ||

</haskell> | </haskell> | ||

+ | Branched instance declarations, on the other hand, allow many different | ||

+ | left-hand-side type patterns. These patterns are tried in order, from | ||

+ | top to bottom, when simplifying a type family application. A branched instance | ||

+ | declaration is introduced by <hask>type instance where</hask>. For example: | ||

+ | <haskell> | ||

+ | type instance where | ||

+ | F Int = Double | ||

+ | F Bool = Char | ||

+ | F a = String | ||

+ | </haskell> | ||

+ | |||

+ | In this example, we declare an instance for <hask>F</hask> such | ||

+ | that <hask>F Int</hask> simplifies to <hask>Double</hask>, | ||

+ | <hask>F Bool</hask> simplifies to <hask>Char</hask>, and for | ||

+ | any other type <hask>a</hask> that is known not to be | ||

+ | <hask>Int</hask> or <hask>Bool</hask>, <hask>F a</hask> | ||

+ | simplifies to <hask>String</hask>. Note that GHC must | ||

+ | be sure that <hask>a</hask> cannot unify with | ||

+ | <hask>Int</hask> or <hask>Bool</hask> in that last case; if | ||

+ | a programmer specifies just <hask>F a</hask> in their code, GHC will | ||

+ | not be able to simplify the type. After all, <hask>a</hask> might later | ||

+ | be instantiated with <hask>Int</hask>. | ||

+ | |||

+ | Branched instances and unbranched instances may be mixed freely for the same | ||

+ | type family. | ||

+ | |||

A type family instance declaration must satisfy the following rules: | A type family instance declaration must satisfy the following rules: | ||

* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. | * An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. | ||

Line 362: | Line 408: | ||

type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter | type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter | ||

type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type | type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type | ||

+ | type instance where -- OK! | ||

+ | F (Maybe Int) = Int | ||

+ | F (Maybe Bool) = Bool | ||

+ | F (Maybe a) = String | ||

+ | type instance where -- WRONG: conflicts with earlier instances (see below) | ||

+ | F Int = Float | ||

+ | F a = [a] | ||

type family G a b :: * -> * | type family G a b :: * -> * | ||

Line 367: | Line 420: | ||

type instance G Int Char Float = Double -- WRONG: must be two type parameters | type instance G Int Char Float = Double -- WRONG: must be two type parameters | ||

</haskell> | </haskell> | ||

+ | |||

+ | ==== Branched instance simplification ==== | ||

+ | |||

+ | When dealing with branched instances, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select a type instance branch to use in a given type family application if and only if the following 2 conditions hold: | ||

+ | |||

+ | # There is a substitution from the variables in the branch statement that makes the left-hand side of the branch coincide with the type family application. | ||

+ | # There exists ''no'' substitution from the variables in the branch statement ''and the variables in the type family application'' that make the left-hand side of the branch coincide with the (substituted) type family application. | ||

+ | |||

+ | Another way to state (2) is that the left-hand side of the branch must not ''unify'' with the type family application. | ||

+ | |||

+ | Why do this? It's a matter of type safety. Consider this example: | ||

+ | |||

+ | <haskell> | ||

+ | type family J a b | ||

+ | type instance where | ||

+ | J a a = Int | ||

+ | J a b = Bool | ||

+ | </haskell> | ||

+ | |||

+ | Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>. | ||

+ | |||

+ | It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example: | ||

+ | |||

+ | <haskell> | ||

+ | data G :: * -> * where | ||

+ | GInt :: G Int | ||

+ | GBool :: G Bool | ||

+ | |||

+ | type family Foo (a :: *) :: * | ||

+ | type instance where | ||

+ | Foo Int = Char | ||

+ | Foo a = Double | ||

+ | |||

+ | bar :: G a -> Foo a | ||

+ | bar GInt = 'x' | ||

+ | bar _ = 3.14 | ||

+ | </haskell> | ||

+ | |||

+ | The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet. | ||

==== Associated type instances ==== | ==== Associated type instances ==== | ||

Line 379: | Line 471: | ||

Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type. | Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type. | ||

+ | |||

+ | Branched instances are not allowed for associated types. | ||

==== Overlap ==== | ==== Overlap ==== | ||

− | The instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. | + | The unbranched instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, if the instances are unbranched, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. Branched instances are not permitted to overlap with any other instances, branched or unbranched. |

Here are two examples to illustrate the condition under which overlap is permitted. | Here are two examples to illustrate the condition under which overlap is permitted. | ||

Line 391: | Line 485: | ||

type instance G (a, Int) = [a] | type instance G (a, Int) = [a] | ||

type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int] | type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int] | ||

+ | |||

+ | type instance H Int = Int | ||

+ | type instance where -- ILLEGAL overlap, as branched instances may not overlap | ||

+ | H a = a | ||

</haskell> | </haskell> | ||

Line 429: | Line 527: | ||

== Frequently asked questions == | == Frequently asked questions == | ||

+ | |||

+ | === Comparing type families and functional dependencies === | ||

+ | |||

+ | Functional dependencies cover some of the same territory as type families. How do the two compare? | ||

+ | There are some articles about this question: | ||

+ | |||

+ | * Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]" | ||

+ | * [http://hackage.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families | ||

=== Injectivity, type inference, and ambiguity === | === Injectivity, type inference, and ambiguity === | ||

Line 446: | Line 552: | ||

Couldn't match expected type `F Int' against inferred type `F a1' | Couldn't match expected type `F Int' against inferred type `F a1' | ||

</haskell> | </haskell> | ||

− | In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating f's type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make < | + | In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances |

<haskell> | <haskell> | ||

type instance F Int = Bool | type instance F Int = Bool | ||

type instance F Char = Bool | type instance F Char = Bool | ||

</haskell> | </haskell> | ||

− | Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is more than one choice, the program is rejected. | + | Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected. |

+ | However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus | ||

+ | <haskell> | ||

+ | f :: F a -> F a | ||

+ | f = undefined | ||

+ | |||

+ | g x = f x | ||

+ | </haskell> | ||

+ | GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.) | ||

+ | |||

+ | '''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>. | ||

The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions. | The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions. | ||

Line 464: | Line 580: | ||

The only solution is to avoid ambiguous types. In the type signature of a function, | The only solution is to avoid ambiguous types. In the type signature of a function, | ||

* Ensure that every type variable occurs in the part after the "<tt>=></tt>" | * Ensure that every type variable occurs in the part after the "<tt>=></tt>" | ||

− | * Ensure that every type variable appears at least once outside a type function call. | + | * Ensure that every type variable appears at least once outside a type function call. |

+ | Alternatively, you can use data families, which create new types and are therefore injective. The following code works: | ||

− | + | <haskell>data family F a | |

− | <haskell> | + | |

− | f :: F a -> | + | f :: F a -> F a |

f = undefined | f = undefined | ||

− | + | ||

− | g :: F | + | g :: F Int -> F Int |

− | g x = | + | g x = f x</haskell> |

− | </haskell> | + | |

− | + | ||

== References == | == References == | ||

Line 482: | Line 597: | ||

* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007. | * [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007. | ||

* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008. | * [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008. | ||

+ | * [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) | ||

[[Category:Type-level programming]] | [[Category:Type-level programming]] | ||

− | [[Category:Language | + | [[Category:Language extensions]] |

+ | [[Category:GHC|Indexed types]] |

## Revision as of 14:58, 4 April 2013

Indexed type families, or **type families** for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of type classes: families are used to define overloaded *data* in the same way that classes are used to define overloaded *functions*. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.

Type families come in two flavors: *data families* and *type synonym families*. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or *associated* with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.

*NB: see also Simon's blog entry on let generalisation for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with -XTypeFamilies even though the code is legal Haskell 98.*

## Contents |

## 1 What are type families?

The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called *type indices*) yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).

Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its kind and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.

As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of-- Declare a list-like data family data family XList a -- Declare a list-like instance for Char data instance XList Char = XCons !Char !(XList Char) | XNil -- Declare a number-like instance for () data instance XList () = XListUnit !Int

`data instance`

declarations are exactly ordinary data definitions. In fact, a `data instance`

declaration is nothing more than a shorthand for a `data`

declaration followed by a `type instance`

(see below) declaration. However, they define two instances of the same parametric data type, GADTs bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependences are similar to type families, and many type classes that use functional dependences can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependences.

## 2 What do I need to use type families?

Type families are a GHC extension enabled with the `-XTypeFamilies`

flag or the `{-# LANGUAGE TypeFamilies #-}`

pragma. The first stable release of GHC that properly supports type families is 6.10.1. (The 6.8 release included an early partial implementation, but its use is deprecated.) Please report bugs via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the GHC mailing list for questions or for a discussion of this language extension and its description on this wiki page.

## 3 An associated data type example

As an example, consider Ralf Hinze's generalised tries, a form of generic finite maps.

### 3.1 The class declaration

We define a type class whose instances are the types that we can use as keys in our generic maps:

class GMapKey k where data GMap k :: * -> * empty :: GMap k v lookup :: k -> GMap k v -> Maybe v insert :: k -> v -> GMap k v -> GMap k v

*associated data family*declaration of the class. It gives a

*kind signature*(here

### 3.2 An Int instance

To use Ints as keys into generic maps, we declare an instance that simply usesinstance GMapKey Int where data GMap Int v = GMapInt (Data.IntMap.IntMap v) empty = GMapInt Data.IntMap.empty lookup k (GMapInt m) = Data.IntMap.lookup k m insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)

NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.

As an exercise, implement an instance for### 3.3 A unit instance

Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance forinstance GMapKey () where data GMap () v = GMapUnit (Maybe v) empty = GMapUnit Nothing lookup () (GMapUnit v) = v insert () v (GMapUnit _) = GMapUnit $ Just v

### 3.4 Product and sum instances

Next, let us define the instances for pairs and sums (i.e.,instance (GMapKey a, GMapKey b) => GMapKey (a, b) where data GMap (a, b) v = GMapPair (GMap a (GMap b v)) empty = GMapPair empty lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of Nothing -> insert a (insert b v empty) gm Just gm2 -> insert a (insert b v gm2 ) gm instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) empty = GMapEither empty empty lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1 lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2 insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2 insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)

### 3.5 Using a generic map

Finally, some code building and querying a generic map:

myGMap :: GMap (Int, Either Char ()) String myGMap = insert (5, Left 'c') "(5, Left 'c')" $ insert (4, Right ()) "(4, Right ())" $ insert (5, Right ()) "This is the one!" $ insert (5, Right ()) "This is the two!" $ insert (6, Right ()) "(6, Right ())" $ insert (5, Left 'a') "(5, Left 'a')" $ empty main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap

### 3.6 Download the code

If you want to play with this example without copying it off the wiki, just download the source code[1] for## 4 Detailed definition of data families

Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

### 4.1 Family declarations

Indexed data families are introduced by a signature, such as

data family GMap k :: * -> *

`data family Array e`

data family Array :: * -> *

#### 4.1.1 Associated family declarations

When a data family is declared as part of a type class, we drop theclass GMapKey k where data GMap k :: * -> * ...

In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: **the named type parameters of the data declaration must be a permutation of a subset of the class variables**.

Example is admissible:

class C a b c where { data T c a :: * } -- OK class C a b c where { data T a a :: * } -- Bad: repeated variable class D a where { data T a x :: * } -- Bad: x is not a class variable class D a where { data T a :: * -> * } -- OK

### 4.2 Instance declarations

Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyworddata instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)

In this example, the declaration has only one variant. In general, it can be any number.

Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with theEven if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:

data family T a data instance T Int = A data instance T Char = B nonsense :: T a -> Int nonsense A = 1 -- WRONG: These two equations together... nonsense B = 2 -- ...will produce a type error.

Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are *open*; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.

#### 4.2.1 Associated type instances

When an associated family instance is declared within a type class instance, we drop theinstance (GMapKey a, GMapKey b) => GMapKey (Either a b) where data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) ...

#### 4.2.2 Scoping of class parameters

In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends *solely* on the parameters of the data family. As an example, consider the simple class declaration

class C a b where data T a

Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:

instance C [c] d where data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope

#### 4.2.3 Type class instances of family instances

Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can havedata GMap () v = GMapUnit (Maybe v) deriving Show

which implicitly defines an instance of the form

instance Show v => Show (GMap () v) where ...

Note that class instances are always for particular *instances* of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of *different* instances of a single type family. It would require a form of extensible case construct.

#### 4.2.4 Overlap

The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.

### 4.3 Import and export

The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation#### 4.3.1 Associated families

As expected, an import or export item of the form#### 4.3.2 Examples

Assuming our running- : Exports just the class name.module GMap (GMapKey) where...
- : Exports the class, the associated typemodule GMap (GMapKey(..)) where...and the member functionsGMap,empty, andlookup. None of the data constructors is exported.insert
- : As before, but also exports all the data constructorsmodule GMap (GMapKey(..), GMap(..)) where...,GMapInt,GMapChar,GMapUnit, andGMapPair.GMapEither
- : As before.module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...
- : As before.module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...

#### 4.3.3 Instances

Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.

## 5 An associated type synonym example

Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.

### 5.1 The class declaration

Here's an example from Mark Jones' seminal paper on functional dependencies:

class Collects e ce | ce -> e where empty :: ce insert :: e -> ce -> ce member :: e -> ce -> Bool toList :: ce -> [e]

With associated type synonyms we can write this as

class Collects ce where type Elem ce empty :: ce insert :: Elem ce -> ce -> ce member :: Elem ce -> ce -> Bool toList :: ce -> [Elem ce]

### 5.2 An instance

Instances change correspondingly. An instance of the two-parameter class

instance Eq e => Collects e [e] where empty = [] insert e l = (e:l) member e [] = False member e (x:xs) | e == x = True | otherwise = member e xs toList l = l

becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:

instance Eq e => Collects [e] where type Elem [e] = e empty = [] insert e l = (e:l) member e [] = False member e (x:xs) | e == x = True | otherwise = member e xs toList l = l

### 5.3 Using generic collections

With Functional Dependencies the code would be:

sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2 sumCollects c1 c2 = foldr insert c2 (toList c1)

In contrast, with associated type synonyms, we get:

sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2 sumCollects c1 c2 = foldr insert c2 (toList c1)

## 6 Detailed definition of type synonym families

Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

### 6.1 Family declarations

Indexed type families are introduced by a signature, such as

type family Elem c :: *

`type family Elem c`

Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:

type family F a b :: * -> * -- F's arity is 2, -- although its overall kind is * -> * -> * -> *

Given this declaration the following are examples of well-formed and malformed types:

F Char [Int] -- OK! Kind: * -> * F Char [Int] Bool -- OK! Kind: * F IO Bool -- WRONG: kind mismatch in the first argument F Bool -- WRONG: unsaturated application

#### 6.1.1 Associated family declarations

When a type family is declared as part of a type class, we drop theclass Collects ce where type Elem ce :: * ...

Exactly as in the case of an associated data declaration, **the named type parameters must be a permutation of a subset of the class parameters**. Examples

class C a b c where { type T c a :: * } -- OK class D a where { type T a x :: * } -- No: x is not a class parameter class D a where { type T a :: * -> * } -- OK

### 6.2 Type instance declarations

There are two forms of type family instance declaration: unbranched and branched. Branched instances list any number of alternatives, to be checked in order from top to bottom, similarly to normal function declarations. Unbranched instances supply only one left-hand side.

Unbranched instance declarations of type families are very similar to standard type synonym declarations. The only two differences are that the keywordtype instance Elem [e] = e

Branched instance declarations, on the other hand, allow many different left-hand-side type patterns. These patterns are tried in order, from top to bottom, when simplifying a type family application. A branched instance

declaration is introduced bytype instance where F Int = Double F Bool = Char F a = String

Branched instances and unbranched instances may be mixed freely for the same type family.

A type family instance declaration must satisfy the following rules:

- An appropriate family declaration is in scope - just like class instances require the class declaration to be visible.
- The instance declaration conforms to the kind determined by its family declaration
- The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.
- The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.

Here are some examples of admissible and illegal type instances:

type family F a :: * type instance F [Int] = Int -- OK! type instance F String = Char -- OK! type instance F (F a) = a -- WRONG: type parameter mentions a type family type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type type instance where -- OK! F (Maybe Int) = Int F (Maybe Bool) = Bool F (Maybe a) = String type instance where -- WRONG: conflicts with earlier instances (see below) F Int = Float F a = [a] type family G a b :: * -> * type instance G Int = (,) -- WRONG: must be two type parameters type instance G Int Char Float = Double -- WRONG: must be two type parameters

#### 6.2.1 Branched instance simplification

When dealing with branched instances, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select a type instance branch to use in a given type family application if and only if the following 2 conditions hold:

- There is a substitution from the variables in the branch statement that makes the left-hand side of the branch coincide with the type family application.
- There exists
*no*substitution from the variables in the branch statement*and the variables in the type family application*that make the left-hand side of the branch coincide with the (substituted) type family application.

Another way to state (2) is that the left-hand side of the branch must not *unify* with the type family application.

Why do this? It's a matter of type safety. Consider this example:

type family J a b type instance where J a a = Int J a b = Bool

It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:

data G :: * -> * where GInt :: G Int GBool :: G Bool type family Foo (a :: *) :: * type instance where Foo Int = Char Foo a = Double bar :: G a -> Foo a bar GInt = 'x' bar _ = 3.14

#### 6.2.2 Associated type instances

When an associated family instance is declared within a type class instance, we drop theinstance (Eq (Elem [e])) => Collects ([e]) where type Elem [e] = e ...

Branched instances are not allowed for associated types.

#### 6.2.3 Overlap

The unbranched instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, if the instances are unbranched, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. Branched instances are not permitted to overlap with any other instances, branched or unbranched.

Here are two examples to illustrate the condition under which overlap is permitted.

type instance F (a, Int) = [a] type instance F (Int, b) = [b] -- overlap permitted type instance G (a, Int) = [a] type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int] type instance H Int = Int type instance where -- ILLEGAL overlap, as branched instances may not overlap H a = a

#### 6.2.4 Decidability

In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of Type Checking with Open Type Functions). Instance declarations have the general form

type instance F t1 .. tn = t

- do not contain any type family constructors,s1 .. sm
- the total number of symbols (data type constructors and type variables) in is strictly smaller than ins1 .. sm, andt1 .. tn
- for every type variable ,aoccurs inaat most as often as ins1 .. sm.t1 .. tn

*loopy equalities*, such as

If the option `-XUndecidableInstances` is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.

### 6.3 Equality constraints

Type context can include equality constraints of the formsumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2

Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form

class C a b | a -> b

to

class (F a ~ b) => C a b where type F a

NB: Equalities in superclass contexts are not fully implemented in the GHC 6.10 branch.

## 7 Frequently asked questions

### 7.1 Comparing type families and functional dependencies

Functional dependencies cover some of the same territory as type families. How do the two compare? There are some articles about this question:

- Experiences in converting functional dependencies to type families: "Functional dependencies vs. type families"
- GHC trac on a comparison of functional dependencies and type families

### 7.2 Injectivity, type inference, and ambiguity

A common problem is this

type family F a f :: F a -> F a f = undefined g :: F Int -> F Int g x = f x

The compiler complains about the definition of `g` saying

Couldn't match expected type `F Int' against inferred type `F a1'

In type-checking `g`'s right hand side GHC discovers (by instantiating `f`'s type with a fresh type variable) that it has type `F a1 -> F a1` for some as-yet-unknown type `a1`. Now it tries to make the inferred type match `g`'s type signature. Well, you say, just make `a1` equal to `Int` and you are done. True, but what if there were these instances

type instance F Int = Bool type instance F Char = Bool

Then making `a1` equal to `Char` would *also* make the two types equal. Because there is (potentially) more than one choice, the program is rejected.

However (and confusingly) if you omit the type signature on `g` altogether, thus

f :: F a -> F a f = undefined g x = f x

GHC will happily infer the type `g :: F a -> F a`. But you can't *write* that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC *infers* a type it can't *check*, is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)

**What is the problem?** The nub of the issue is this: knowing that `F t1`=`F t2` does *not* imply that `t1` = `t2`.
The difficulty is that the type function `F` need not be *injective*; it can map two distinct types to the same type. For an injective type constructor like `Maybe`, if we know that `Maybe t1` = `Maybe t2`, then we know that `t1` = `t2`. But not so for non-injective type functions.

The problem starts with `f`. Its type is *ambiguous*; even if I know the argument and result types for `f`, I cannot use that to find the type at which `a` should be instantiated. (So arguably, `f` should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes:

bad :: (Read a, Show a) => String -> String bad x = show (read x)

At a call of `bad` one cannot tell at what type `a` should be instantiated.

The only solution is to avoid ambiguous types. In the type signature of a function,

- Ensure that every type variable occurs in the part after the "
`=>`" - Ensure that every type variable appears at least once outside a type function call.

Alternatively, you can use data families, which create new types and are therefore injective. The following code works:

data family F a f :: F a -> F a f = undefined g :: F Int -> F Int g x = f x

## 8 References

- Associated Types with Class. Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In
*Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)*, pages 1-13, ACM Press, 2005. - Associated Type Synonyms. Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In
*Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming*, ACM Press, pages 241-253, 2005. - System F with Type Equality Coercions. Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In
*Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation*, ACM Press, 2007. - Type Checking With Open Type Functions. Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In
*Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming*, ACM Press, pages 51-62, 2008. - Fun with Type Functions Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki )