Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Haskell
Wiki community
Recent changes
Random page
HaskellWiki
Search
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
GHC.Generics
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Generic serialization === First you have to tell the compiler how to serialize any datatype, in general. Since Haskell datatypes have a regular structure, this means you can just explain how to serialize a few basic datatypes. ==== Representation types ==== We can represent most Haskell datatypes using only the following primitive types: <haskell> -- | Unit: used for constructors without arguments data U1 p = U1 -- | Constants, additional parameters and recursion of kind * newtype K1 i c p = K1 { unK1 :: c } -- | Meta-information (constructor names, etc.) newtype M1 i c f p = M1 { unM1 :: f p } -- | Sums: encode choice between constructors infixr 5 :+: data (:+:) f g p = L1 (f p) | R1 (g p) -- | Products: encode multiple arguments to constructors infixr 6 :*: data (:*:) f g p = f p :*: g p </haskell> For starters, try to ignore the <tt>p</tt> parameter in all types; it's there just for future compatibility. The easiest way to understand how you can use these types to represent others is to see an example. Let's represent the <hask>UserTree</hask> type shown before: <haskell> type RepUserTree a = -- A UserTree is either a Leaf, which has no arguments U1 -- ... or it is a Node, which has three arguments that we put in a product :+: a :*: UserTree a :*: UserTree a </haskell> Simple, right? Different constructors become alternatives of a sum, and multiple arguments become products. In fact, we want to have some more information in the representation, like datatype and constructor names, and to know if a product argument is a parameter or a type. We use the other primitives for this, and the representation looks more like: <haskell> type RealRepUserTree a = -- Information about the datatype M1 D Data_UserTree ( -- Leaf, with information about the constructor M1 C Con_Leaf U1 -- Node, with information about the constructor :+: M1 C Con_Node ( -- Constructor argument, which could have information -- about a record selector label M1 S NoSelector ( -- Argument, tagged with P because it is a parameter K1 P a) -- Another argument, tagged with R because it is -- a recursive occurrence of a type :*: M1 S NoSelector (K1 R (UserTree a)) -- Idem :*: M1 S NoSelector (K1 R (UserTree a)) )) </haskell> A bit more complicated, but essentially the same. Datatypes like <hask>Data_UserTree</hask> are empty datatypes used only for providing meta-information in the representation; you don't have to worry much about them for now. Also, GHC generates these representations for you automatically, so you should never have to define them yourself! All of this is explained in much more detail in Section 2.1. of [http://dreixel.net/research/pdf/gdmh.pdf the original paper describing the new generic deriving mechanism]. ==== A generic function ==== Since GHC can represent user types using only those primitive types, all you have to do is to tell GHC how to serialize each of the individual primitive types. The best way to do that is to create a new type class: <haskell> class GSerialize f where gput :: f a -> [Bit] </haskell> This class looks very much like the original <hask>Serialize</hask> class, just that the type argument is of kind <hask>* -> *</hask>, since our generic representation types have this <tt>p</tt> parameter lying around. Now we need to give instances for each of the basic types. For units there's nothing to serialize: <haskell> instance GSerialize U1 where gput U1 = [] </haskell> The serialization of multiple arguments is simply the concatenation of each of the individual serializations: <haskell> instance (GSerialize a, GSerialize b) => GSerialize (a :*: b) where gput (a :*: b) = gput a ++ gput b </haskell> The case for sums is the most interesting, as we have to record which alternative we are in. We will use a 0 for left injections and a 1 for right injections: <haskell> instance (GSerialize a, GSerialize b) => GSerialize (a :+: b) where gput (L1 x) = O : gput x gput (R1 x) = I : gput x </haskell> We don't need to encode the meta-information, so we just go over it recursively : <haskell> instance (GSerialize a) => GSerialize (M1 i c a) where gput (M1 x) = gput x </haskell> Finally, we're only left with the arguments. For these we will just use our first class, <hask>Serialize</hask>, again: <haskell> instance (Serialize a) => GSerialize (K1 i a) where gput (K1 x) = put x </haskell> So, if a user datatype has a parameter which is instantiated to <hask>Int</hask>, at this stage we will use the library instance for <hask>Serialize Int</hask>. ==== Default implementations ==== We've seen how to represent user types generically, and how to define functions on representation types. However, we still have to tie these two together, explaining how to convert user types to their representation and then applying the generic function. The representation <hask>RepUserTree</hask> we have seen earlier is only one component of the representation; we also need functions to convert to and from the user datatype into the representation. For that we use another type class: <haskell> class Generic a where -- Encode the representation of a user datatype type Rep a :: * -> * -- Convert from the datatype to its representation from :: a -> (Rep a) x -- Convert from the representation to the datatype to :: (Rep a) x -> a </haskell> So, for the <hask>UserTree</hask> datatype shown before, GHC generates the following instance: <haskell> instance Generic (UserTree a) where type Rep (UserTree a) = RepUserTree a from Leaf = L1 U1 from (Node a l r) = R1 (a :*: l :*: r) to (L1 U1) = Leaf to (R1 (a :*: l :*: r)) = Node a l r </haskell> (Note that we are using the simpler representation <hask>RepUserTree</hask> instead of the real representation <hask>RealRepUserTree</hask>, just for simplicity.) Equipped with a <hask>Generic</hask> instance, we are ready to tell the compiler how it can serialize any representable type: <haskell> putDefault :: (Generic a, GSerialize (Rep a)) => a -> [Bit] putDefault a = gput (from a) </haskell> The type of <hask>putDefault</hask> says that we can serialize any <tt>a</tt> into a list of bits, as long as that <tt>a</tt> is <hask>Generic</hask>, and its representation <hask>Rep a</hask> has a <hask>GSerialize</hask> instance. The implementation is very simple: first convert the value to its representation using <hask>from</hask>, and then call <hask>gput</hask> on that representation. However, we still have to write a <hask>Serialize</hask> instance for the user dataype: <haskell> instance (Serialize a) => Serialize (UserTree a) where put = putDefault </haskell>
Summary:
Please note that all contributions to HaskellWiki are considered to be released under simple permissive license (see
HaskellWiki:Copyrights
for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION!
Cancel
Editing help
(opens in new window)
Toggle limited content width