Difference between revisions of "GHC.Generics"
Line 171: | Line 171: | ||
</haskell> |
</haskell> |
||
− | (Note that we are using the simpler representation <hask>RepUserTree</hask> instead of the real representation <hask>RealRepUserTree</hask>, |
+ | (Note that we are using the simpler representation <hask>RepUserTree</hask> instead of the real representation <hask>RealRepUserTree</hask>, just for simplicity.) |
+ | |||
+ | Equipped with a <hask>Representable0</hask> instance, we are ready to tell the compiler how it can serialize any representable type: |
||
+ | |||
+ | <haskell> |
||
+ | putDefault :: (Representable0 a, GSerialize (Rep0 a)) => a -> [Bit] |
||
+ | putDefault a = gput (from0 a) |
||
+ | </haskell> |
||
+ | |||
+ | The type of <hask>putDefault</hask> says that we can serialize any <tt>a</tt> into a list of bits, as long as that <tt>a</tt> is <hask>Representable0</hask>, and its representation <hask>Rep0 a</hask> has a <hask>GSerialize</hask> instance. The implementation is very simple: first convert the value to its representation using <hask>from</hask>, and then call <hask>gput</hask> on that representation. |
||
+ | |||
+ | However, we still have to write a <hask>Serialize</hask> instance for the user dataype: |
||
+ | |||
+ | <haskell> |
||
+ | instance (Serialize a) => Serialize (UserTree a) where |
||
+ | put = putDefault |
||
+ | </haskell> |
||
+ | |||
+ | === Using GHC's new features === |
||
+ | |||
+ | What we have seen so far could all already be done, at the cost of writing a lot of boilerplate code yourself (or spending hours writing [Template Haskell] code to do it for you). |
Revision as of 12:45, 4 May 2011
GHC 7.2 includes a new generic deriving mechanism. This means you can have more classes that you can derive
, like Show
or Functor
. This is accomplished through two new features, enabled with two new flags: DeriveRepresentable and DefaultSignatures. We'll show how this all works in a detailed example.
Serialization
Suppose you are writing a class for serialization of data. You have a type Bit
representing bits, and a class Serialize
:
data Bit = O | I
class Serialize a where
put :: a -> [Bit]
You might have written some instances already:
instance Serialize Int where
put i = serializeInt i
instance Serialize a => Serialize [a] where
put [] = []
put (h:t) = put h ++ put t
A user of your library, however, will have their his datatypes, like:
data UserTree a = Node a (UserTree a) (UserTree a) | Leaf
He will have to specify an instance Serialize (UserTree a) where ...
himself. This, however, is tedious, especially because most instances will probably be rather trivial, and should be derived automatically.
It is here that generic programming can help you. If you are familiar with SYB you could use it at this stage, but now we'll see how to do this with the new generic deriving mechanism.
Generic serialization
First you have to tell the compiler how to serialize any datatype, in general. Since Haskell datatypes have a regular structure, this means you can just explain how to serialize a few basic datatypes.
Representation types
We can represent most Haskell datatypes using only the following primitive types:
-- | Unit: used for constructors without arguments
data U1 p = U1
-- | Constants, additional parameters and recursion of kind *
newtype K1 i c p = K1 { unK1 :: c }
-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }
-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)
-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p
For starters, try to ignore the p parameter in all types; it's there just for future compatibility. The easiest way to understand how you can use these types to represent others is to see an example. Let's represent the UserTree
type shown before:
type RepUserTree a =
-- A UserTree is either a Leaf, which has no arguments
U1
-- ... or it is a Node, which has three arguments that we put in a product
:+: a :*: UserTree a :*: UserTree a
Simple, right? Different constructors become alternatives of a sum, and multiple arguments become products. In fact, we want to have some more information in the representation, like datatype and constructor names, and to know if a product argument is a parameter or a type. We use the other primitives for this, and the representation looks more like:
type RealRepUserTree a =
-- Information about the datatype
M1 D Data_UserTree (
-- Leaf, with information about the constructor
M1 C Con_Leaf U1
-- Node, with information about the constructor
:+: M1 C Con_Node (
-- Constructor argument, which could have information
-- about a record selector label
M1 S NoSelector (
-- Argument, tagged with P because it is a parameter
K1 P a)
-- Another argument, tagged with R because it is
-- a recursive occurrence of a type
:*: M1 S NoSelector (K1 R (UserTree a))
-- Idem
:*: M1 S NoSelector (K1 R (UserTree a))
))
A bit more complicated, but essentially the same. Datatypes like Data_UserTree
are empty datatypes used only for providing meta-information in the representation; you don't have to worry much about them for now. Also, GHC generates these representations for you automatically, so you should never have to define them yourself! All of this is explained in much more detail in Section 2.1. of the original paper describing the new generic deriving mechanism.
A generic function
Since GHC can represent user types using only those primitive types, all you have to do is to tell GHC how to serialize each of the individual primitive types. The best way to do that is to create a new type class:
class GSerialize f where
gput :: f a -> [Bin]
This class looks very much like the original Serialize
class, just that the type argument is of kind * -> *
, since our generic representation types have this p parameter lying around. Now we need to give instances for each of the basic types. For units there's nothing to serialize:
instance GSerialize U1 where
gput U1 = []
The serializing multiple arguments is simply the concatenation of each of the individual serializations:
instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where
gput (a :*: b) = gput a ++ gput b
The case for sums is the most interesting, as we have to record which alternative we are in. We will use a 0 for left injections and a 1 for right injections:
instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where
gput (L1 x) = O : gput x
gput (R1 x) = I : gput x
We don't need to encode the meta-information, so we just go over it recursively :
instance (GSerialize a) => GSerialize (M1 i c a) where
gput (M1 x) = gput x
Finally, we're only left with the arguments. For these we will just use our first class, Serialize
, again:
instance (Serialize a) => GSerialize (K1 i c a) where
gput (K1 x) = put x
So, if a user datatype as a parameter which is instantiated to Int
, at this stage we will use the library instance for Serialize Int
.
Default implementations
We've seen how to represent user types generically, and how to define functions on representation types. However, we still have to tie these two together, explaining how to convert user types to their representation and then applying the generic function.
The representation RepUserTree
we have seen earlier is only one component of the representation; we also need functions to convert to and from the user datatype into the representation. For that we use another type class:
class Representable0 a where
-- Encode the representation of a user datatype
type Rep0 a :: * -> *
-- Convert from the datatype to its representation
from0 :: a -> (Rep0 a) x
-- Convert from the representation to the datatype
to0 :: (Rep0 a) x -> a
So, for the UserTree
datatype shown before, GHC generates the following instance:
instance Representable0 (UserTree a) where
type Rep0 (UserTree a) = RepUserTree a
from0 Leaf = L1 U1
from0 (Node a l r) = R1 (a :*: l :*: r)
to0 (L1 U1) = Leaf
to0 (R1 (a :*: l :*: r)) = Node a l r
(Note that we are using the simpler representation RepUserTree
instead of the real representation RealRepUserTree
, just for simplicity.)
Equipped with a Representable0
instance, we are ready to tell the compiler how it can serialize any representable type:
putDefault :: (Representable0 a, GSerialize (Rep0 a)) => a -> [Bit]
putDefault a = gput (from0 a)
The type of putDefault
says that we can serialize any a into a list of bits, as long as that a is Representable0
, and its representation Rep0 a
has a GSerialize
instance. The implementation is very simple: first convert the value to its representation using from
, and then call gput
on that representation.
However, we still have to write a Serialize
instance for the user dataype:
instance (Serialize a) => Serialize (UserTree a) where
put = putDefault
Using GHC's new features
What we have seen so far could all already be done, at the cost of writing a lot of boilerplate code yourself (or spending hours writing [Template Haskell] code to do it for you).