Difference between revisions of "GHC.Generics"

From HaskellWiki
Jump to navigation Jump to search
(→‎Example: serialization: Fix broken link to generic programming comparison report)
 
(49 intermediate revisions by 8 users not shown)
Line 1: Line 1:
GHC 7.2 includes support for ''datatype-generic programming''. This means you can have more classes for which you do not have to give an instance, like <hask>Show</hask> or <hask>Functor</hask>. This is accomplished through two new features, enabled with two new flags: <tt>DeriveRepresentable</tt> and <tt>DefaultSignatures</tt>. We'll show how this all works in a detailed example.
+
GHC 7.2 includes improved support for ''datatype-generic programming'' through two features, enabled with two flags: <tt>DeriveGeneric</tt> and <tt>DefaultSignatures</tt>. We show how this all works in this page, starting with a detailed example.
   
== Serialization ==
+
== Example: serialization ==
   
 
Suppose you are writing a class for serialization of data. You have a type <hask>Bit</hask> representing bits, and a class <hask>Serialize</hask>:
 
Suppose you are writing a class for serialization of data. You have a type <hask>Bit</hask> representing bits, and a class <hask>Serialize</hask>:
Line 31: Line 31:
 
He will have to specify an <hask>instance Serialize (UserTree a) where ...</hask> himself. This, however, is tedious, especially because most instances will probably be rather trivial, and should be derived automatically.
 
He will have to specify an <hask>instance Serialize (UserTree a) where ...</hask> himself. This, however, is tedious, especially because most instances will probably be rather trivial, and should be derived automatically.
   
It is here that generic programming can help you. If you are familiar with [http://www.cs.uu.nl/wiki/bin/view/GenericProgramming/SYB SYB] you could use it at this stage, but now we'll see how to do this with the new generic deriving mechanism.
+
It is here that generic programming can help you. If you are familiar with [http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-010.pdf Scrap your boilerplate (SYB)] you could use it at this stage, but now we'll see how to do this with the GHC Generic features.
   
 
=== Generic serialization ===
 
=== Generic serialization ===
Line 101: Line 101:
 
<haskell>
 
<haskell>
 
class GSerialize f where
 
class GSerialize f where
gput :: f a -> [Bin]
+
gput :: f a -> [Bit]
 
</haskell>
 
</haskell>
   
Line 111: Line 111:
 
</haskell>
 
</haskell>
   
The serializing multiple arguments is simply the concatenation of each of the individual serializations:
+
The serialization of multiple arguments is simply the concatenation of each of the individual serializations:
   
 
<haskell>
 
<haskell>
Line 136: Line 136:
   
 
<haskell>
 
<haskell>
instance (Serialize a) => GSerialize (K1 i c a) where
+
instance (Serialize a) => GSerialize (K1 i a) where
 
gput (K1 x) = put x
 
gput (K1 x) = put x
 
</haskell>
 
</haskell>
   
So, if a user datatype as a parameter which is instantiated to <hask>Int</hask>, at this stage we will use the library instance for <hask>Serialize Int</hask>.
+
So, if a user datatype has a parameter which is instantiated to <hask>Int</hask>, at this stage we will use the library instance for <hask>Serialize Int</hask>.
   
 
==== Default implementations ====
 
==== Default implementations ====
Line 149: Line 149:
   
 
<haskell>
 
<haskell>
class Representable0 a where
+
class Generic a where
 
-- Encode the representation of a user datatype
 
-- Encode the representation of a user datatype
type Rep0 a :: * -> *
+
type Rep a :: * -> *
 
-- Convert from the datatype to its representation
 
-- Convert from the datatype to its representation
from0 :: a -> (Rep0 a) x
+
from :: a -> (Rep a) x
 
-- Convert from the representation to the datatype
 
-- Convert from the representation to the datatype
to0 :: (Rep0 a) x -> a
+
to :: (Rep a) x -> a
 
</haskell>
 
</haskell>
   
Line 161: Line 161:
   
 
<haskell>
 
<haskell>
instance Representable0 (UserTree a) where
+
instance Generic (UserTree a) where
type Rep0 (UserTree a) = RepUserTree a
+
type Rep (UserTree a) = RepUserTree a
   
from0 Leaf = L1 U1
+
from Leaf = L1 U1
from0 (Node a l r) = R1 (a :*: l :*: r)
+
from (Node a l r) = R1 (a :*: l :*: r)
   
to0 (L1 U1) = Leaf
+
to (L1 U1) = Leaf
to0 (R1 (a :*: l :*: r)) = Node a l r
+
to (R1 (a :*: l :*: r)) = Node a l r
 
</haskell>
 
</haskell>
   
 
(Note that we are using the simpler representation <hask>RepUserTree</hask> instead of the real representation <hask>RealRepUserTree</hask>, just for simplicity.)
 
(Note that we are using the simpler representation <hask>RepUserTree</hask> instead of the real representation <hask>RealRepUserTree</hask>, just for simplicity.)
   
Equipped with a <hask>Representable0</hask> instance, we are ready to tell the compiler how it can serialize any representable type:
+
Equipped with a <hask>Generic</hask> instance, we are ready to tell the compiler how it can serialize any representable type:
   
 
<haskell>
 
<haskell>
putDefault :: (Representable0 a, GSerialize (Rep0 a)) => a -> [Bit]
+
putDefault :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
putDefault a = gput (from0 a)
+
putDefault a = gput (from a)
 
</haskell>
 
</haskell>
   
The type of <hask>putDefault</hask> says that we can serialize any <tt>a</tt> into a list of bits, as long as that <tt>a</tt> is <hask>Representable0</hask>, and its representation <hask>Rep0 a</hask> has a <hask>GSerialize</hask> instance. The implementation is very simple: first convert the value to its representation using <hask>from</hask>, and then call <hask>gput</hask> on that representation.
+
The type of <hask>putDefault</hask> says that we can serialize any <tt>a</tt> into a list of bits, as long as that <tt>a</tt> is <hask>Generic</hask>, and its representation <hask>Rep a</hask> has a <hask>GSerialize</hask> instance. The implementation is very simple: first convert the value to its representation using <hask>from</hask>, and then call <hask>gput</hask> on that representation.
   
 
However, we still have to write a <hask>Serialize</hask> instance for the user dataype:
 
However, we still have to write a <hask>Serialize</hask> instance for the user dataype:
Line 195: Line 195:
 
==== Deriving representations ====
 
==== Deriving representations ====
   
The <hask>Representable0</hask> class, and all the representation types, come with GHC in the <tt>GHC.Generics</tt> module. GHC can also derive <hask>Representable0</hask> for user types, so all the user has to do is:
+
The <hask>Generic</hask> class, and all the representation types, come with GHC in the <tt>GHC.Generics</tt> module. GHC can also derive <hask>Generic</hask> for user types, so all the user has to do is:
   
 
<haskell>
 
<haskell>
{-# LANGUAGE DeriveRepresentable #-}
+
{-# LANGUAGE DeriveGeneric #-}
   
 
data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
 
data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
deriving Representable0
+
deriving Generic
 
</haskell>
 
</haskell>
   
(Standlone deriving also works fine, and you can use it for types you have not defined yourself, but are imported from somewhere else.) You will need the new <tt>DeriveRepresentable</tt> language pragma.
+
(Standalone deriving also works fine, and you can use it for types you have not defined yourself, but are imported from somewhere else.) You will need the new <tt>DeriveGeneric</tt> language pragma.
   
 
==== More general default methods ====
 
==== More general default methods ====
   
We don't want the user to have to write the <hask>instance Serialize (UserTree a)</hask> himself, since most of the times it will just be <hask>putDefault</hask>. However, we cannot make <hask>putDefault</hask> the default implementation of the <hask>put</hask> method, because that would require adding the <hask>(Representable0 a, GSerialize (Rep0 a))</hask> constraint to the class head. This would restrict the ability to give ad-hoc instances for types that are not representable, for instance.
+
We don't want the user to have to write the <hask>instance Serialize (UserTree a)</hask> himself, since most of the times it will just be <hask>putDefault</hask>. However, we cannot make <hask>putDefault</hask> the default implementation of the <hask>put</hask> method, because that would require adding the <hask>(Generic a, GSerialize (Rep a))</hask> constraint to the class head. This would restrict the ability to give ad-hoc instances for types that are not representable, for instance.
   
 
We solved this by allowing the user to give a different signature for default methods:
 
We solved this by allowing the user to give a different signature for default methods:
Line 218: Line 218:
 
put :: a -> [Bit]
 
put :: a -> [Bit]
   
default put :: (Representable0 a, GSerialize (Rep0 a)) => a -> [Bit]
+
default put :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
put a = gput (from0 a)
+
put a = gput (from a)
 
</haskell>
 
</haskell>
   
Line 230: Line 230:
 
</haskell>
 
</haskell>
   
GHC fills out the implementation for <hask>put</hask> using the default method. It will type-check correctly because we have a <hask>Representable0</hask> instance for <hask>UserTree</hask>, and <hask>GSerialize</hask> instances for all the representation types.
+
GHC fills out the implementation for <hask>put</hask> using the default method. It will type-check correctly because we have a <hask>Generic</hask> instance for <hask>UserTree</hask>, and <hask>GSerialize</hask> instances for all the representation types.
  +
  +
=== Complete working example ===
  +
  +
<haskell>
  +
{-# LANGUAGE DefaultSignatures, DeriveGeneric, TypeOperators, FlexibleContexts #-}
  +
  +
import GHC.Generics
  +
import Data.Bits
  +
  +
  +
data Bit = O | I deriving Show
  +
  +
class Serialize a where
  +
put :: a -> [Bit]
  +
  +
default put :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
  +
put a = gput (from a)
  +
  +
get :: [Bit] -> (a, [Bit])
  +
  +
default get :: (Generic a, GSerialize (Rep a)) => [Bit] -> (a, [Bit])
  +
get xs = (to x, xs')
  +
where (x, xs') = gget xs
  +
  +
class GSerialize f where
  +
gput :: f a -> [Bit]
  +
gget :: [Bit] -> (f a, [Bit])
  +
  +
-- | Unit: used for constructors without arguments
  +
instance GSerialize U1 where
  +
gput U1 = []
  +
gget xs = (U1, xs)
  +
  +
-- | Products: encode multiple arguments to constructors
  +
instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where
  +
gput (a :*: b) = gput a ++ gput b
  +
gget xs = (a :*: b, xs'')
  +
where (a, xs') = gget xs
  +
(b, xs'') = gget xs'
  +
  +
-- | Sums: encode choice between constructors
  +
instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where
  +
gput (L1 x) = O : gput x
  +
gput (R1 x) = I : gput x
  +
gget (O:xs) = (L1 x, xs')
  +
where (x, xs') = gget xs
  +
gget (I:xs) = (R1 x, xs')
  +
where (x, xs') = gget xs
  +
  +
-- | Meta-information (constructor names, etc.)
  +
instance (GSerialize a) => GSerialize (M1 i c a) where
  +
gput (M1 x) = gput x
  +
gget xs = (M1 x, xs')
  +
where (x, xs') = gget xs
  +
  +
-- | Constants, additional parameters and recursion of kind *
  +
instance (Serialize a) => GSerialize (K1 i a) where
  +
gput (K1 x) = put x
  +
gget xs = (K1 x, xs')
  +
where (x, xs') = get xs
  +
  +
instance Serialize Bool where
  +
put True = [I]
  +
put False = [O]
  +
get (I:xs) = (True, xs)
  +
get (O:xs) = (False, xs)
  +
  +
--
  +
-- Try it out. (Normally this would be in a separate module.)
  +
--
  +
  +
data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
  +
deriving (Generic, Show)
  +
  +
instance (Serialize a) => Serialize (UserTree a)
  +
  +
main = do
  +
let xs = put True
  +
print (fst . get $ xs :: Bool)
  +
let ys = put (Leaf :: UserTree Bool)
  +
print (fst . get $ ys :: UserTree Bool)
  +
let zs = put (Node False Leaf Leaf :: UserTree Bool)
  +
print (fst . get $ zs :: UserTree Bool)
  +
</haskell>
   
 
== Different perspectives ==
 
== Different perspectives ==
Line 248: Line 332:
 
=== The generic programmer ===
 
=== The generic programmer ===
   
If you are a library author and are eager to make your classes easy to instantiate by your users, then you should invest some time in defining instances for each of the representation types of <tt>GHC.Generics</tt> and defining a generic default method. See the example for <tt>Serialize</tt> above, and the [http://dreixel.net/research/pdf/gdmh.pdf original paper] for many other examples (but make sure to check the [[Changes from the paper]] section).
+
If you are a library author and are eager to make your classes easy to instantiate by your users, then you should invest some time in defining instances for each of the representation types of <tt>GHC.Generics</tt> and defining a generic default method. See the example for <tt>Serialize</tt> above, and the [http://dreixel.net/research/pdf/gdmh.pdf original paper] for many other examples (but make sure to check the [[#Changes from the paper|changes from the paper]]).
   
 
=== The GHC hacker ===
 
=== The GHC hacker ===
  +
  +
If you are working on the GHC source code, you might find it useful to know what kind of changes were made. There is a [http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/GenericDeriving Trac wiki page] with a lower-level overview of things and also keeping track of what still needs to be done.
   
 
== Changes from the paper ==
 
== Changes from the paper ==
  +
  +
In the [http://dreixel.net/research/pdf/gdmh.pdf paper] we describe the implementation in [http://www.cs.uu.nl/wiki/UHC UHC]. The implementation in GHC is slightly different:
  +
  +
* <tt>Representable0</tt> and <tt>Representable1</tt> have become <tt>Generic</tt> and <tt>Generic1</tt>, respectively. <tt>from0</tt>, <tt>to0</tt>, and <tt>Rep0</tt> also lost the <tt>0</tt> at the end of their names.
  +
  +
* We are using type families, so the <tt>Generic</tt> and <tt>Generic1</tt> type classes have only one type argument. So, in GHC the classes look like what we describe in the "Avoiding extensions" part of Section 2.3 of the paper. This change affects only a generic function writer, and not a generic function user.
  +
  +
* Default definitions (Section 3.3) work differently. In GHC we don't use a <tt>DERIVABLE</tt> pragma; instead, a type class can declare a ''generic default method'', which is akin to a standard default method, but includes a default type signature. This removes the need for a separate default definition and a pragma. For example, the <tt>Encode</tt> class of Section 3.1 is now:
  +
<haskell>
  +
class Encode a where
  +
encode :: a -> [Bit]
  +
default encode :: (Generic a, Encode1 (Rep a)) => a -> [Bit]
  +
encode = encode1 . from
  +
</haskell>
  +
  +
* To derive generic functionality to a user type, the user no longer uses <hask>deriving instance</hask> (Section 4.6.1). Instead, the user gives an instance without defining the method; GHC then uses the generic default. For instance:
  +
<haskell>
  +
instance Encode [a] -- works if there is an instance Generic [a]
  +
</haskell>
   
 
== Limitations ==
 
== Limitations ==
   
  +
We cannot derive <tt>Generic</tt> instances for:
To be written.
 
  +
* Datatypes with a context;
  +
* Existentially-quantified datatypes;
  +
* GADTs.
  +
  +
  +
[[Category:GHC]]
  +
[[Category:Language extensions]]

Latest revision as of 20:24, 19 October 2020

GHC 7.2 includes improved support for datatype-generic programming through two features, enabled with two flags: DeriveGeneric and DefaultSignatures. We show how this all works in this page, starting with a detailed example.

Example: serialization

Suppose you are writing a class for serialization of data. You have a type Bit representing bits, and a class Serialize:

data Bit = O | I

class Serialize a where
  put :: a -> [Bit]

You might have written some instances already:

instance Serialize Int where
  put i = serializeInt i

instance Serialize a => Serialize [a] where
  put []    = []
  put (h:t) = put h ++ put t

A user of your library, however, will have his own datatypes, like:

data UserTree a = Node a (UserTree a) (UserTree a) | Leaf

He will have to specify an instance Serialize (UserTree a) where ... himself. This, however, is tedious, especially because most instances will probably be rather trivial, and should be derived automatically.

It is here that generic programming can help you. If you are familiar with Scrap your boilerplate (SYB) you could use it at this stage, but now we'll see how to do this with the GHC Generic features.

Generic serialization

First you have to tell the compiler how to serialize any datatype, in general. Since Haskell datatypes have a regular structure, this means you can just explain how to serialize a few basic datatypes.

Representation types

We can represent most Haskell datatypes using only the following primitive types:

-- | Unit: used for constructors without arguments
data U1 p = U1

-- | Constants, additional parameters and recursion of kind *
newtype K1 i c p = K1 { unK1 :: c }

-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }

-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p

For starters, try to ignore the p parameter in all types; it's there just for future compatibility. The easiest way to understand how you can use these types to represent others is to see an example. Let's represent the UserTree type shown before:

type RepUserTree a =
  -- A UserTree is either a Leaf, which has no arguments
      U1
  -- ... or it is a Node, which has three arguments that we put in a product
  :+: a :*: UserTree a :*: UserTree a

Simple, right? Different constructors become alternatives of a sum, and multiple arguments become products. In fact, we want to have some more information in the representation, like datatype and constructor names, and to know if a product argument is a parameter or a type. We use the other primitives for this, and the representation looks more like:

type RealRepUserTree a =
  -- Information about the datatype
  M1 D Data_UserTree (
  -- Leaf, with information about the constructor
      M1 C Con_Leaf U1
  -- Node, with information about the constructor
  :+: M1 C Con_Node (
            -- Constructor argument, which could have information
            -- about a record selector label
            M1 S NoSelector (
              -- Argument, tagged with P because it is a parameter
              K1 P a)
        -- Another argument, tagged with R because it is 
        -- a recursive occurrence of a type
        :*: M1 S NoSelector (K1 R (UserTree a))
        -- Idem
        :*: M1 S NoSelector (K1 R (UserTree a))
  ))

A bit more complicated, but essentially the same. Datatypes like Data_UserTree are empty datatypes used only for providing meta-information in the representation; you don't have to worry much about them for now. Also, GHC generates these representations for you automatically, so you should never have to define them yourself! All of this is explained in much more detail in Section 2.1. of the original paper describing the new generic deriving mechanism.

A generic function

Since GHC can represent user types using only those primitive types, all you have to do is to tell GHC how to serialize each of the individual primitive types. The best way to do that is to create a new type class:

class GSerialize f where
  gput :: f a -> [Bit]

This class looks very much like the original Serialize class, just that the type argument is of kind * -> *, since our generic representation types have this p parameter lying around. Now we need to give instances for each of the basic types. For units there's nothing to serialize:

instance GSerialize U1 where
  gput U1 = []

The serialization of multiple arguments is simply the concatenation of each of the individual serializations:

instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where
  gput (a :*: b) = gput a ++ gput b

The case for sums is the most interesting, as we have to record which alternative we are in. We will use a 0 for left injections and a 1 for right injections:

instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where
  gput (L1 x) = O : gput x
  gput (R1 x) = I : gput x

We don't need to encode the meta-information, so we just go over it recursively :

instance (GSerialize a) => GSerialize (M1 i c a) where
  gput (M1 x) = gput x

Finally, we're only left with the arguments. For these we will just use our first class, Serialize, again:

instance (Serialize a) => GSerialize (K1 i a) where
  gput (K1 x) = put x

So, if a user datatype has a parameter which is instantiated to Int, at this stage we will use the library instance for Serialize Int.

Default implementations

We've seen how to represent user types generically, and how to define functions on representation types. However, we still have to tie these two together, explaining how to convert user types to their representation and then applying the generic function.

The representation RepUserTree we have seen earlier is only one component of the representation; we also need functions to convert to and from the user datatype into the representation. For that we use another type class:

class Generic a where
  -- Encode the representation of a user datatype
  type Rep a :: * -> *
  -- Convert from the datatype to its representation
  from  :: a -> (Rep a) x
  -- Convert from the representation to the datatype
  to    :: (Rep a) x -> a

So, for the UserTree datatype shown before, GHC generates the following instance:

instance Generic (UserTree a) where
  type Rep (UserTree a) = RepUserTree a

  from Leaf         = L1 U1
  from (Node a l r) = R1 (a :*: l :*: r)

  to (L1 U1)              = Leaf
  to (R1 (a :*: l :*: r)) = Node a l r

(Note that we are using the simpler representation RepUserTree instead of the real representation RealRepUserTree, just for simplicity.)

Equipped with a Generic instance, we are ready to tell the compiler how it can serialize any representable type:

putDefault :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
putDefault a = gput (from a)

The type of putDefault says that we can serialize any a into a list of bits, as long as that a is Generic, and its representation Rep a has a GSerialize instance. The implementation is very simple: first convert the value to its representation using from, and then call gput on that representation.

However, we still have to write a Serialize instance for the user dataype:

instance (Serialize a) => Serialize (UserTree a) where
  put = putDefault

Using GHC's new features

What we have seen so far could all already be done, at the cost of writing a lot of boilerplate code yourself (or spending hours writing Template Haskell code to do it for you). Now we'll see how the new features of GHC can help you.

Deriving representations

The Generic class, and all the representation types, come with GHC in the GHC.Generics module. GHC can also derive Generic for user types, so all the user has to do is:

{-# LANGUAGE DeriveGeneric #-}

data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
  deriving Generic

(Standalone deriving also works fine, and you can use it for types you have not defined yourself, but are imported from somewhere else.) You will need the new DeriveGeneric language pragma.

More general default methods

We don't want the user to have to write the instance Serialize (UserTree a) himself, since most of the times it will just be putDefault. However, we cannot make putDefault the default implementation of the put method, because that would require adding the (Generic a, GSerialize (Rep a)) constraint to the class head. This would restrict the ability to give ad-hoc instances for types that are not representable, for instance.

We solved this by allowing the user to give a different signature for default methods:

{-# LANGUAGE DefaultSignatures #-}

class Serialize a where
  put :: a -> [Bit]

  default put :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
  put a = gput (from a)

With the new language pragma DefaultSignatures, GHC allows you to put the keyword default before a (new) type signature for a method inside a class declaration. If you give such a default type signature, then you have to provide a default method implementation, which will be type-checked using the default signature, and not the original one.

Now the user can simply write:

instance (Serialize a) => Serialize (UserTree a)

GHC fills out the implementation for put using the default method. It will type-check correctly because we have a Generic instance for UserTree, and GSerialize instances for all the representation types.

Complete working example

{-# LANGUAGE DefaultSignatures, DeriveGeneric, TypeOperators, FlexibleContexts #-}

import GHC.Generics
import Data.Bits


data Bit = O | I deriving Show

class Serialize a where
  put :: a -> [Bit]

  default put :: (Generic a, GSerialize (Rep a)) => a -> [Bit]
  put a = gput (from a)

  get :: [Bit] -> (a, [Bit])

  default get :: (Generic a, GSerialize (Rep a)) => [Bit] -> (a, [Bit])
  get xs = (to x, xs')
    where (x, xs') = gget xs

class GSerialize f where
  gput :: f a -> [Bit]
  gget :: [Bit] -> (f a, [Bit])

-- | Unit: used for constructors without arguments
instance GSerialize U1 where
  gput U1 = []
  gget xs = (U1, xs)

-- | Products: encode multiple arguments to constructors
instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where
  gput (a :*: b) = gput a ++ gput b
  gget xs = (a :*: b, xs'')
    where (a, xs') = gget xs
          (b, xs'') = gget xs'

-- | Sums: encode choice between constructors
instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where
  gput (L1 x) = O : gput x
  gput (R1 x) = I : gput x
  gget (O:xs) = (L1 x, xs')
    where (x, xs') = gget xs
  gget (I:xs) = (R1 x, xs')
    where (x, xs') = gget xs

-- | Meta-information (constructor names, etc.)
instance (GSerialize a) => GSerialize (M1 i c a) where
  gput (M1 x) = gput x
  gget xs = (M1 x, xs')
    where (x, xs') = gget xs

-- | Constants, additional parameters and recursion of kind *
instance (Serialize a) => GSerialize (K1 i a) where
  gput (K1 x) = put x
  gget xs = (K1 x, xs')
    where (x, xs') = get xs

instance Serialize Bool where
  put True = [I]
  put False = [O]
  get (I:xs) = (True, xs)
  get (O:xs) = (False, xs)

--
-- Try it out. (Normally this would be in a separate module.)
--

data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
  deriving (Generic, Show)

instance (Serialize a) => Serialize (UserTree a)

main = do
  let xs = put True
  print (fst . get $ xs :: Bool)
  let ys = put (Leaf :: UserTree Bool)
  print (fst . get $ ys :: UserTree Bool)
  let zs = put (Node False Leaf Leaf :: UserTree Bool)
  print (fst . get $ zs :: UserTree Bool)

Different perspectives

We outline the changes introduced in 7.2 regarding support for generic programming from the perspective of three different types of users: the end-user, the generic programmer, and the GHC hacker.

The end-user

If you know nothing about generic programming and would like to keep it that way, then you will be pleased to know that using generics in GHC 7.2 is easier than ever. As soon as you encounter a class with a default signature (like Serialize above), you will be able to give empty instances for your datatypes, like this:

instance (Serialize a) => Serialize (UserTree a)

You will need to add a deriving Generic clause to each datatype that you want to have generic implementations for. You might have datatypes that use other datatypes, and you might need Generic instances for those too. In that case, you can import the module where the datatype is defined and give a standalone deriving Generic instance. In either case, you will need the -XDeriveGeneric flag.

The generic programmer

If you are a library author and are eager to make your classes easy to instantiate by your users, then you should invest some time in defining instances for each of the representation types of GHC.Generics and defining a generic default method. See the example for Serialize above, and the original paper for many other examples (but make sure to check the changes from the paper).

The GHC hacker

If you are working on the GHC source code, you might find it useful to know what kind of changes were made. There is a Trac wiki page with a lower-level overview of things and also keeping track of what still needs to be done.

Changes from the paper

In the paper we describe the implementation in UHC. The implementation in GHC is slightly different:

  • Representable0 and Representable1 have become Generic and Generic1, respectively. from0, to0, and Rep0 also lost the 0 at the end of their names.
  • We are using type families, so the Generic and Generic1 type classes have only one type argument. So, in GHC the classes look like what we describe in the "Avoiding extensions" part of Section 2.3 of the paper. This change affects only a generic function writer, and not a generic function user.
  • Default definitions (Section 3.3) work differently. In GHC we don't use a DERIVABLE pragma; instead, a type class can declare a generic default method, which is akin to a standard default method, but includes a default type signature. This removes the need for a separate default definition and a pragma. For example, the Encode class of Section 3.1 is now:
class Encode a where
  encode :: a -> [Bit]
  default encode :: (Generic a, Encode1 (Rep a)) => a -> [Bit]
  encode = encode1 . from
  • To derive generic functionality to a user type, the user no longer uses deriving instance (Section 4.6.1). Instead, the user gives an instance without defining the method; GHC then uses the generic default. For instance:
instance Encode [a] -- works if there is an instance Generic [a]

Limitations

We cannot derive Generic instances for:

  • Datatypes with a context;
  • Existentially-quantified datatypes;
  • GADTs.