- 1 Data and type families in GHC
- 2 What do I need to use data families?
- 3 An associated data type example
- 4 Definition of the type system extension
- 5 A associated type synonym example
- 6 Open issues
Data and type families in GHC
Data and type families are a new addition to GHC's type extensions. They are currently an experimental feature and so their design may still change to some degree. The current implementation covers data families (including newtypes), which can either be on the toplevel or inside class declarations - the latter are also known as associated data types.
NB: Please note that type families (and hence also associated type synonyms) are not fully implemented yet. Any attempt to use them will lead to strange error messages. For the moment, you can only use
What do I need to use data families?
Indexed type families are implemented in GHC's HEAD (from version 6.7), which you can obtain in binary form from the nightly snapshots or in source form from the source repository. To enable indexed type families in GHC 6.7, you need to supply the compiler option -ftype-families (or -fglasgow-exts, which implies it). NB: The option used to be called -findexed-types for a while.
An associated data type example
As an example, consider Ralf Hinze's generalised tries, a form of generic finite maps.
The class declaration
We define a type class whose instances are the types that we can use as keys in our generic maps:
class GMapKey k where data GMap k :: * -> * empty :: GMap k v lookup :: k -> GMap k v -> Maybe v insert :: k -> v -> GMap k v -> GMap k v
The interesting part is the associated data family declaration of the class. It gives a kind signature (here
* -> *) for the associated data type
GMap k - analog to how methods receive a type signature in a class declaration.
What it is important to notice is that the first parameter of the associated type
GMap coincides with the class parameter of
GMapKey. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.
The second important point is that as
GMap k has kind
* -> * and
k (implicitly) has kind
*, the type constructor
GMap (without an argument) has kind
* -> * -> *. Consequently, we see that
GMap is applied to two arguments in the signatures of the methods
An Int instance
To use Ints as keys into generic maps, we declare an instance that simply uses
instance GMapKey Int where data GMap Int v = GMapInt (Data.Map.Map Int v) empty = GMapInt Data.Map.empty lookup k (GMapInt m) = Data.Map.lookup k m insert k v (GMapInt m) = GMapInt (Data.Map.insert k v m)
Int instance of the associated data type
GMap needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here
v). As mentioned before any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundemental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.
As an exercise, implement an instance for
Char that maps back to the
Int instance using the conversion functions
A unit instance
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for
instance GMapKey () where data GMap () v = GMapUnit (Maybe v) empty = GMapUnit Nothing lookup () (GMapUnit v) = v insert () v (GMapUnit _) = GMapUnit $ Just v
For unit, the map is just a
Product and sum instances
Nest, let us define the instances for pairs and sums (i.e.,
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where data GMap (a, b) v = GMapPair (GMap a (GMap b v)) empty = GMapPair empty lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of Nothing -> insert a (insert b v empty) gm Just gm2 -> insert a (insert b v gm2 ) gm instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v) empty = GMapEither empty empty lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1 lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2 insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2 insert (Right a) v (GMapEither gm1 gm2) = GMapEither gm1 (insert a v gm2)
If you find this code algorithmically surprising, I'd suggest to have a look at Ralf Hinze's paper. The only novelty concerning associated types, in these two instances, is that the instances have a context
(GMapKey a, GMapKey b). Consequently, the right hand sides of the associated type declarations can use
GMap recursively at the key types
b - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.
Using a generic map
Finally, some code building and quering a generic map:
myGMap :: GMap (Int, Either Char ()) String myGMap = insert (5, Left 'c') "(5, Left 'c')" $ insert (4, Right ()) "(4, Right ())" $ insert (5, Right ()) "This is the one!" $ insert (5, Right ()) "This is the two!" $ insert (6, Right ()) "(6, Right ())" $ insert (5, Left 'a') "(5, Left 'a')" $ empty main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap
Download the code
If you want to play with this example without copying it off the wiki, just download the source code for
GMap from GHC's test suite.
Definition of the type system extension
[NB: The following material will eventually become documentation for type families in the GHC User Manual. Please help to improve it by forwarding suggestions to email@example.com.]
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indexes to coincide with the class parameters. However, the later can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.
Indexed data families are introduced by a signature, such as
data family GMap k :: * -> *
family distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to
* if omitted. An example is
data family Array e
Named arguments can also be given explicit kind signatures if needed. Just as with GADT declarations named arguments are entirely optional, so that we can declare
Array alternatively with
data family Array :: * -> *
Associated family declarations
When a data family is declared as part of a type class, we drop the
family special. The
GMap declaration takes the following form
class GMapKey k where data GMap k :: * -> * ...
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indexes. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrieved example is admissible:
class C a b c where data T c a :: *
If kind signatures are required for argument variables, they need to be given in the class head.
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword
newtype is followed by
instance and that some or all of the type arguments can be non-variable types. For example, the
Either instance for
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
In this example, the declaration has only one variant. In general, it can be be any number.
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although, all data families are declared with the
data keyword, instances can be either
newtypes, or a mix of both.
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:
data family T a data instance T Int = A data instance T Char = B nonsence :: T a -> Int nonsence A = 1 -- WRONG: These two equations together... nonsence B = 2 -- ...will produce a type error.
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families are - in contrast to GADTs - are open; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.
Associated type instances
When an associated family instance is declared within a type class instance, we drop the
instance keyword in the family instance. So, the
Either instance for
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v ...
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is first argument of
Either a b, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable
Instances for an associated family can only appear as part of instances declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as
undefined, can assume the type.
Class instances of type instances
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have
deriving clauses. For example, we can write
data GMap () v = GMapUnit (Maybe v) deriving Show
which implcitly defines an instance of the form
instance Show v => Show (GMap () v) where ...
Note that class instances are always for particular instances of a data family and never for an entire family as a whole. This for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of different instances of a single type family. It would require a form of extensible case construct.
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.
Import and export
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation
T(..) in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form
T(..) denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as
GMap(GMapEither), is analogous.
As expected, an import or export item of the form
C(..) denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword
type. So for example, when explicitly listing the components of the
GMapKey class, we write
GMapKey(type GMap, empty, lookup, insert).
Assuming our running
GMapKey class example, let us look at some export lists and their meaning:
module GMap (GMapKey) where...: Exports just the class name.
module GMap (GMapKey(..)) where...: Exports the class, the associated type
GMapand the member functions
insert. None of the data constructors is exported.
module GMap (GMapKey(..), GMap(..)) where...: As before, but also exports all the data constructors
module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...: As before.
module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...: As before.
Finally, you can write
GMapKey(type GMap) to denote both the class
GMapKey as well as its associated type
GMap. However, you cannot write
GMapKey(type GMap(..)) — i.e., sub-component specifications cannot be nested. To specify
GMap's data constructors, you have to list it separately.
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.
A associated type synonym example
Associated data types ar much better behaved than functional dependencies. Hence, GHC's type checking and inference are more powerful. Here is an example of how to port your existing functional depenency programs to associated type synonyms.
Here's an example from the original Functional Dependencies paper:
class Collects e ce | ce -> e where empty :: ce insert :: e -> ce -> ce member :: e -> ce -> Bool toList :: ce -> [e]
With associated type synonyms we can write this as:
class Collects ce where type Elem ce empty :: ce insert :: Elem ce -> ce -> ce member :: Elem ce -> ce -> Bool toList :: ce -> [Elem ce]
That's right, we have dropped from a multi-paramter type class
to a single parameter type class. The parameter
has become an associated type
Here's an instance, with functional dependencies:
instance Eq e => Collects e [e] where empty =  insert e l = (e:l) member e  = False member e (x:xs) | e == x = True | otherwise = member e xs toList l = l
With associated type synonyms, this becomes:
instance Eq e => Collects [e] where type Elem [e] = e empty =  insert e l = (e:l) member e  = False member e (x:xs) | e == x = True | otherwise = member e xs toList l = l
Using generic collections
With Functional Dependencies the code would be:
sumCollects :: (Collects e c1,Collects e c2) => c1 -> c2 -> c2 sumCollects c1 c2 = foldr insert c2 (toList c1)
In contrast, with associated type synonyms, we get:
sumCollects :: (Collects c1,Collects c2,Elem c1 ~ Elem c2) => c1 -> c2 -> c2 sumCollects c1 c2 = foldr insert c2 (toList c1)