Revision as of 09:52, 5 December 2013

Introduction

This page describes the design, usage and motivation for CTRex.

CTRex is a library for Haskell which implements extensible records using closed type families and type literals. It does not use overlapping instances.

Features:

Row-polymorphism

Support for scoped labels (i.e. duplicate labels) and non-scoped labels (i.e. the lacks predicate on rows).

The value level interface and the type level interface correspond to each other.

The order of labels (except for duplicate labels) does not matter. I.e. {x = 0, y = 0} and {y = 0, x = 0} have the same type.

Syntactic sugar on the value level as well as type level.

If all values in a record satisfy a constraint such as Show, then we are able to do operations on all fields in a record, if that operation only requires that the constraint is satisfied. In this way we can create instances such as Forall r Show => Show (Rec r). This is available to the application programmer as well.

Fast extend, lookup and restriction (all O(log n)) using HashMaps.

The haddock documentation is available here.

What the hell are extensible records?

Basic extensible records

Records are values that contain other values, which are indexed by name. Examples of records are structs in c. In Haskell, we can currently declare a record types as follows:

 data HRec = HRec { x :: Int, y :: Bool, z :: String }

 data HRec2 = HRec2 { p :: Bool, q :: Char }

Extensible records are records where we can add values (with corresponding label) to existing records. Suppose we have a record x = { x = 0, y = 0 }. If we have an extensible record system we can then add a value to this record:

 extend z "Bla" x

Which gives {x = 0, y = 0 , z = "Bla"}

A type-level record, i.e. the mapping of labels to types, such as {x = Int, y = Bool, z = String }, is called a row.

Such extension is not possible with the Rec type above, the fields of the record are fixed. In the non-extensible record system in Haskell currently, the records are typed nominally, which means that we see if two record types are the same by checking the names of the records. For example to check if the type HRec is the same as HRec2 we check if their names (HRec and HRec2) are equal, not if they have the same fields with the same types.

In an extensible record system the record type are 'structural: two records have the same type if they carry the same fields with the same types, i.e. if they have the same row. This also means we do not have to declare the type of the record before using it. For example (in CTRex).

 x := 0 .| y := False .| empty

Constructs { x = 0, y = 0 } with type :

 Rec ("x" ::= Int :| y ::= Bool .| Empty)

which means {x = Int, y = Bool }

This structural typing has the advantage that we can go from {x = Int, y = Bool } to {x = Int, y = Bool, z = String } by simply adding the field, we do not have to write a specific function to convert the two types (which would have been necessary with nominal record typing).

Because the associated type for a label is in the row of the record, labels can be used in different records for different types. This is currently not possible with standard records :

 
data X = X { x :: Int, y :: Bool } 
data Y = Y { x :: Bool }

Will give a Multiple declarations of `x' error.

To summarize, extensible records have the following advantages:

Labels can be used in different records for different types
Records do not have to be declared before use.
Structural typing eliminates the need for explicit conversion functions.

Row polymorphism

Some extensible record systems, such as CTRex, support row polymorphism. This concept is best explained by an example, consider the following function (in CTRex):

 
f :: Rec ("x" ::= Double .| "y" ::= Double .| Empty) -> Rec ("x" ::= Double .| "y" ::= Double .| "dist" ::= Double .| Empty)
f r = norm := dist ((r.!x * r.!x) + (r.!y * r.!y)) .| r

Where (.!) is the function to get a value from a record. This function takes a record with row { x = Double, y = Double } and add returns the same record with a label dist added, which carries the distance of the point (x,y) from (0,0).

Now suppose we want to call this function with record { name = "PointA", x = 10, y = 10 }. This is not possible, because the row { name = String, x = Double, y = Double } and { x = Double, y = Double } are not the same.

For this we need row polymorphism: we want to make the function f polymorphic in the rest of the row as follows:

 
f :: Rec ("x" ::= Double .| "y" ::= Double .| r) -> Rec ("x" ::= Double .| "y" ::= Double .| "dist" ::= Double .| r)
f r = norm := dist ((r.!x * r.!x) + (r.!y * r.!y)) .| r

Notice that in CTRex, the only difference with the previous version is that we replaced Empty with r in the type. Now the type f reads as follows: Given some record where x and y have type Double and there are some other fields with types described by row r, return a record which x,y and dist have type Double and there are also some other fields described by row r.

Of course, since f cannot know anything about the the rest of the row, it can do nothing else than just leave it alone or replace it with undefined.

Difference between Heterogenous maps and extensible records

A question that may arise after the previous section is: What is the difference between extensible records and heterogenous map (a map that can store values of different types, see for example the HMap package (blatant plug, my package )).

In hetrogenous maps the type associated with a key is present in the type of the key. For example, in HMap a key has type Key x a where a is the type of the things we can store at this key, for example Int.

In extensible records, the type associated with a key (now called label) is stored in the type of the record, i.e. in its row. The row states for example that x is associated with Int the label itself does not hold any information on the associated type. In fact, the associated type may differ between records.

This means that if a record has x ::= Int in its row, then we are sure that this record has a value of type Int for x. In a hetrogenous map, we can never be sure if a key is present in a map, i.e. lookup x m may return Maybe.

Programmer interface

Labels

Labels (such as x,y and z) in CTRex are type level symbols (i.e. type level strings). We can point to a label by using the label type:

data Label (s :: Symbol) = Label

For example, we can declare shorthands for pointing at the type level symbol "x", "y" and "z" as follows.

x = Label :: Label "x" 
y = Label :: Label "y" 
z = Label :: Label "z"

Of course it would be much nicer to just write `x instead of Label :: Label "x" but this is currently not available. This may change in the future.

Rows and records

A record has the following type: Rec (r :: Row *), where r is the row.

The constructors of Rec as well as the constructors of the datakind Row are not exported.

Hence we can only manipulate records and rows by the value and type level operations given in the CTRex module.

Operations

For all operations available on records, the value level interface and the type level interface correspond to each other.

For example, the value level operation for extending a record (adding a field) has type

 extend :: KnownSymbol l => Label l -> a -> Rec r -> Rec (Extend l a r)

whereas the type level operation for adding a field has type

 Extend :: Symbol -> * -> Row * -> Row *

Here we use regular type syntax to denote the kinds of the closed type family Extend

In this way each value level operation (that changes the type) has a corresponding type level operation with a similar name. If the value level operation is not infix the type level operation is named the same, but starting with a capital. And if the value level operation is an operator, is starts with a '.' and the type level operation starts with a ':'.

The following operations are available:

Extension:
- Value level: extend :: KnownSymbol l => Label l -> a -> Rec r -> Rec (Extend l a r)
- Type level: Extend :: Symbol -> * -> Row * -> Row *
Selection:
- Value level: (.!) :: KnownSymbol l => Rec r -> Label l -> r :! l
- Type level: (:!) :: Row * -> Symbol -> *
Restriction:
- Value level: (.-) :: KnownSymbol l => Rec r -> Label l -> Rec (r :- l)
- Type level: (:-) Row * -> Symbol -> Row *
Record merge :
- Value level: (.++) :: Rec l -> Rec r -> Rec (l :++ r)
- Type level: (:++) :: Row * -> Row * -> Row *

Rename (This operation can also be expressed using restriction, selection and selection, but this looks nicer):
- Value level: rename :: (KnownSymbol l, KnownSymbol l') => Label l -> Label l' -> Rec r -> Rec (Rename l l' r)
- Type level: Rename :: Symbol -> Symbol -> Row * -> Row *

Syntactic Sugar

We provide some handy declarations which allow us to chain operations with nicer syntax. For example we can write:

 
p :<-| z .| y :<- 'b' .| z :!= False .| x := 2 .| y := 'a' .| empty

instead of

 
 rename z p $ update y 'b' $ extendUnique z False $ extend x 2 $ extend y 'a' empty

For this we have a GADT datatype RecOp which takes two arguments:

c, the type of the constaint that should hold on the input row.
rop, the row operation (see below). with the following constructors:

This datatype has the following constructors, all of which are sugar for record operations.

(:<-) :: Label -> a -> RecOp (HasType l a) RUp Record update. Sugar for update.
(:=) :: KnownSymbol l => Label l -> a -> RecOp NoConstr (l ::= a) Record extension. Sugar for extend.
(:!=) :: KnownSymbol l => Label l -> a -> RecOp (Lacks l) (l ::= a) Record extension, without shadowing. Sugar for extendUnique. See the section on duplicate labels.
(:<-|) :: (KnownSymbol l, KnownSymbol l') => Label l' -> Label l -> RecOp NoConstr (l' ::<-| l) Record label renaming. Sugar for rename.
(:<-!) :: (KnownSymbol l, KnownSymbol l', r :\ l') => Label l' -> Label l -> RecOp (Lacks l') (l' ::<-| l) Record label renaming. Sugar for renameUnique. See the section on duplicate labels.

On the type level the same pattern again arises, we have a datakind (RowOp *) with the following constructors:

RUp :: RowOp * Row operation for(:<-). Identitity row operation.
(::=) :: Symbol -> * -> RowOp * Row extension operation. Sugar for Extend. Type level operation for (:=) and (:!=)
::<-| Row renaming. Sugar for Rename. Type level operation for (:<-|) and (:<-!)

We then have a type level operation to perform a row operation:

 (:|) :: RowOp * -> Row * -> Row *

And a value level operation to perform a record operation:

 (.|) :: c r => RecOp c ro -> Rec r -> Rec (ro :| r)

Notice that the constraint from the record operation is placed on the input row.

Also notice that this means that this sugar is also available when writing types:

 Rec ("p" ::<-| "z" :| RUp :| "z" ::= Bool :| "x" ::= Double :| "y" ::= Char :| Empty)

is the type exactly corresponding to:

 
p :<-| z .| y :<- 'b' .| z :!= False .| x := 2 .| y := 'a' .| empty

and equivalent to

  Rename "p" "z" (Extend "z" Bool (Extend x Double (Extend "x" Int Empty)))

and of course equivalent to:

 "p" ::= Bool :| "x" ::= Double :| "y" ::= Int :|  Empty

Duplicate labels, and lacks

Rows and records can contain duplicate labels as described in the paper Extensible records with scoped labels by Daan Leijen.

Hence we can write:

 z =  x := 10 .| x := "bla" .| Empty :: Rec ("x" ::= Int :| "x" ::= String :| Empty)

We can recover the information on the second instance of x by removing x:

 z .- x :: Rec ("x" ::= String :| Empty)

The motivation for this is as follows: Suppose we have a function

 f :: Rec ("x" ::= Int :| r) -> (Rec ("x" ::= Bool .| r)

and we want to write the following function:

g :: Rec r -> Rec ("p" ::= String .| r)
g r = let r' = f (x := 10 .| r)
          (c,r'') = decomp r x
          v = if c then "Yes" else "Nope"
      in p := v .| r''

If it was not possible for records and rows to contain duplicate label the type of g would be:

 g :: r :\ "x" => Rec r -> Rec ("p" ::= String .| r)

The constraint r :\ "x" reads as r lacks "x". The constraint leaks the implementation of g. We only use "x" internally in g , there is no reason for this constraint to hold in the rest of the program.

However, in other situations, duplicate labels may be undesired, for instance because we want to be sure that we do not hide previous information. For this reason we also provide the already introduced `lacks` constraint.

We also provide a handy record extension function that has this constraint, so that you do not have to add types yourself:

 extendUnique :: (KnownSymbol l, l :\ r) => Label l -> a -> Rec r -> Rec (Extend l a r)

The same thing for renaming:

 extendUnique :: (KnownSymbol l, r :\ l) => Label l -> a -> Rec r -> Rec (Extend l a r)

We also provide a constraint to test that two Rows are disjoint. Corresponding to this we also provide a function to merge with this constraint:

 .+ :: Disjoint l r => Rec l -> Rec r -> Rec (l :+ r)

Notice that .+ is commutative, while .++ is not.

@@ Line 67: / Line 67: @@
 * Records do not have to be declared before use.
 * Structural typing eliminates the need for explicit conversion functions.
+=== Row polymorphism ===
+Some extensible record systems, such as CTRex, support '''row polymorphism'''. This concept is best explained by an example, consider the following function (in CTRex):
+<haskell>
+f :: Rec ("x" ::= Double .| "y" ::= Double .| Empty) -> Rec ("x" ::= Double .| "y" ::= Double .| "dist" ::= Double .| Empty)
+f r = norm := dist ((r.!x * r.!x) + (r.!y * r.!y)) .| r
+</haskell>
+Where <hask>(.!)</hask> is the function to get a value from a record. This function takes a record with row <hask>{ x = Double, y = Double }</hask> and add returns the same record with a label dist added, which carries the distance of the point <hask>(x,y)</hask> from <hask>(0,0)</hask>.
+Now suppose we want to call this function with record <hask>{ name = "PointA", x = 10, y = 10  }</hask>. This is not possible, because the row <hask>{ name = String, x = Double, y = Double  }</hask> and  <hask>{ x = Double, y = Double  }</hask> are not the same.
+For this we need row polymorphism: we want to make the function <hask>f</hask> '''polymorphic''' in the rest of the row as follows:
+<haskell>
+f :: Rec ("x" ::= Double .| "y" ::= Double .| r) -> Rec ("x" ::= Double .| "y" ::= Double .| "dist" ::= Double .| r)
+f r = norm := dist ((r.!x * r.!x) + (r.!y * r.!y)) .| r
+</haskell>
+Notice that in CTRex, the only difference with the previous version is that we replaced <hask>Empty</hask> with <hask>r</hask> in the type. Now the type <hask> f </hask> reads as follows: Given some record where x and y have type Double and there are some other fields with types described by row <hask>r</hask>, return a record which x,y and dist have type Double and there are also some other fields described by row <hask>r</hask>.
+Of course, since <hask>f</hask> cannot know anything about the the rest of the row, it can do nothing else than just leave it alone or replace it with <hask>undefined</hask>.
 === Difference between Heterogenous maps and extensible records ===
@@ Line 77: / Line 99: @@
 This means that if a record has <hask>x ::= Int </hask> in its row, then we are '''sure''' that this record has a value of type <hask>Int</hask> for <hask>x</hask>. In a hetrogenous map, we can never be sure if a key is present in a map, i.e. <hask>lookup x m</hask> may return <hask>Maybe</hask>.
-=== Row polymorphism ===
 = Programmer interface =

Difference between revisions of "CTRex"