Difference between revisions of "Smart constructors"

From HaskellWiki
Jump to navigation Jump to search
(Add section on runtime optimisation)
 
(20 intermediate revisions by 9 users not shown)
Line 16: Line 16:
 
* ensure only resistors with the right number of bands are constructed.
 
* ensure only resistors with the right number of bands are constructed.
   
= Runtime checking : smart constructors =
+
== Runtime checking : smart constructors ==
   
== A first attempt ==
+
=== A first attempt ===
   
Code up the a typical [[Type|data type]] describing a resistor value:
+
Code up a typical [[Type|data type]] describing a resistor value:
   
  +
<haskell>
data Resistor = Metal Bands
 
| Ceramic Bands
+
data Resistor = Metal Bands
  +
| Ceramic Bands
deriving Show
 
  +
deriving Show
   
type Bands = Int
+
type Bands = Int
  +
</haskell>
   
This has a problem however, that the constructors of type ''Resistor'' are
+
This has a problem however, that the [[constructor]]s of type ''Resistor'' are
 
unable to check that only bands of size 4 to 8 are built. It is quite
 
unable to check that only bands of size 4 to 8 are built. It is quite
 
legal to say:
 
legal to say:
Line 37: Line 39:
 
for example.
 
for example.
   
== Smart(er) constructors ==
+
=== Smart(er) constructors ===
   
 
Smart constructors are just functions that build values of the required
 
Smart constructors are just functions that build values of the required
Line 43: Line 45:
 
so:
 
so:
   
  +
<haskell>
metalResistor :: Bands -> Resistor
 
metalResistor n | n < 4 || n > 8 = error "Invalid number of resistor bands"
+
metalResistor :: Bands -> Resistor
  +
metalResistor n | n < 4 || n > 8 = error "Invalid number of resistor bands"
| otherwise = Metal n
 
  +
| otherwise = Metal n
  +
</haskell>
   
 
This function behaves like the constructor ''Metal'', but also performs
 
This function behaves like the constructor ''Metal'', but also performs
Line 52: Line 56:
   
 
Running this code:
 
Running this code:
*Main> metalResistor 4
+
> metalResistor 4
Metal 4
+
Metal 4
   
*Main> metalResistor 7
+
> metalResistor 7
Metal 7
+
Metal 7
 
 
*Main> metalResistor 9
+
> metalResistor 9
*** Exception: Invalid number of resistor bands
+
*** Exception: Invalid number of resistor bands
   
*Main> metalResistor 0
+
> metalResistor 0
*** Exception: Invalid number of resistor bands
+
*** Exception: Invalid number of resistor bands
   
 
One extra step has to be made though, to make the interface safe. When
 
One extra step has to be made though, to make the interface safe. When
Line 69: Line 73:
 
reckless user could bypass the smart constructor:
 
reckless user could bypass the smart constructor:
   
  +
<haskell>
module Resistor (
 
  +
module Resistor (
Resistor, -- abstract, hiding constructors
 
metalResistor, -- only way to build a metal resistor
+
Resistor, -- abstract, hiding constructors
  +
metalResistor, -- only way to build a metal resistor
) where
 
  +
) where
   
 
...
 
...
  +
</haskell>
   
== Using assertions ==
+
=== Using assertions ===
   
 
Hand-coding error messages can be tedious when used often. Instead we
 
Hand-coding error messages can be tedious when used often. Instead we
can use the ''assert'' function, provided (at least with GHC). We
+
can use the ''assert'' function, provided (from Control.Exception). We
 
rewrite the smart constructor as:
 
rewrite the smart constructor as:
   
  +
<haskell>
metalResistor :: Bands -> Resistor
 
metalResistor n = assert (n >= 4 && n <= 8) $ Metal n
+
metalResistor :: Bands -> Resistor
  +
metalResistor n = assert (n >= 4 && n <= 8) $ Metal n
  +
</haskell>
   
 
And now obtain more detailed error messages, automatically generated for us:
 
And now obtain more detailed error messages, automatically generated for us:
   
*Main> metalResistor 0
+
> metalResistor 0
*** Exception: A.hs:4:18-23: Assertion failed
+
*** Exception: A.hs:4:18-23: Assertion failed
   
 
We at least now are given the line and column in which the error occured.
 
We at least now are given the line and column in which the error occured.
   
= Compile-time checking : the type system =
+
== Compile-time checking : the type system ==
   
== Enforcing the constraint statically ==
+
=== Enforcing the constraint statically ===
   
 
There are other ways to obtain numerical checks like this. The most
 
There are other ways to obtain numerical checks like this. The most
Line 101: Line 109:
 
by lifting the band count into the type level.
 
by lifting the band count into the type level.
   
  +
In the following example, instead of checking the band count at runtime,
''Todo: example''
 
  +
we instead lift the resistor band count into the type level, and have
  +
the typecheck perform the check statically, using [[Phantom type|phantom types]]
  +
and [[Peano numbers]].
   
  +
We thus remove the need for a runtime check, meaning faster code. A consequence
== Extensions ==
 
  +
of this decision is that since the band count is now represented in the type,
  +
it is no longer necessary to carry it around at runtime, meaning less data has
  +
to be allocated.
  +
  +
Firstly, define some [[Peano numbers]] to represent the number of bands as types:
  +
  +
<haskell>
  +
data Z = Z
  +
data S a = S a
  +
</haskell>
  +
  +
Now specify a class for cardinal numbers.
  +
  +
<haskell>
  +
class Card c where
  +
  +
instance Card Z where
  +
instance (Card c) => Card (S c) where
  +
</haskell>
  +
  +
Ok, now we're set. So encode a type-level version of the bounds check.
  +
Only resistors with bands >= 4 and <= 8 are valid:
  +
  +
<haskell>
  +
class Card size => InBounds size where
  +
  +
instance InBounds (S (S (S (S Z)))) where -- four
  +
instance InBounds (S (S (S (S (S Z))))) where -- five
  +
instance InBounds (S (S (S (S (S (S Z)))))) where -- six
  +
instance InBounds (S (S (S (S (S (S (S Z))))))) where -- seven
  +
instance InBounds (S (S (S (S (S (S (S (S Z)))))))) where -- eight
  +
</haskell>
  +
  +
Now define a new resistor type. Note that since the bounds is represented in the
  +
type, ''we no longer need to store the bounds in the resistor value''.
  +
  +
<haskell>
  +
data Resistor size = Resistor deriving Show
  +
</haskell>
  +
  +
And, finally, a convenience constructor for us to use, encoding the bounds
  +
check in the type:
  +
  +
<haskell>
  +
resistor :: InBounds size => size -> Resistor size
  +
resistor _ = Resistor
  +
</haskell>
  +
  +
=== Examples ===
  +
  +
First, define some convenience values:
  +
  +
<haskell>
  +
d0 = undefined :: Z
  +
d3 = undefined :: S (S (S Z))
  +
d4 = undefined :: S (S (S (S Z)))
  +
d6 = undefined :: S (S (S (S (S (S Z)))))
  +
d8 = undefined :: S (S (S (S (S (S (S (S Z)))))))
  +
d10 = undefined :: S (S (S (S (S (S (S (S (S (S Z)))))))))
  +
</haskell>
  +
  +
Now try to construct some resistors:
  +
  +
> resistor d0
  +
No instance for (InBounds Z)
  +
  +
So the value 0 isn't in bounds, as we want. And it is a ''compile-time error''
  +
to try to create such a resistor.
  +
  +
> resistor d3
  +
No instance for (InBounds (S (S (S Z))))
  +
  +
Ok, how about a valid resistor?
  +
  +
> resistor d4
  +
Resistor
  +
  +
Great!
  +
  +
> :t resistor d4
  +
resistor d4 :: Resistor (S (S (S (S Z))))
  +
  +
And its type encodes the number of bands.
  +
  +
> resistor d6
  +
Resistor
  +
> resistor d8
  +
Resistor
  +
  +
> :t resistor d8
  +
resistor d8 :: Resistor (S (S (S (S (S (S (S (S Z))))))))
  +
  +
Similar result for other valid resistors.
  +
  +
> resistor d10
  +
No instance for (InBounds (S (S (S (S (S (S (S (S (S (S Z)))))))))))
  +
  +
And 10 is too big.
  +
  +
=== Summary ===
  +
  +
By using a standard encoding of numeric values on the type level we are able to
  +
encode a bounds check in the type of a value, thus removing a runtime check,
  +
and removing the need to store the numeric value at runtime. The code is safer,
  +
as it is impossible to compile the program unless all resistors have the
  +
correct number of bands.
  +
  +
An extension would be to use a decimal encoding for the integers (at the
  +
expense of longer code).
  +
  +
=== Extensions ===
   
 
Further checks can be obtained by separating the metal and ceramic
 
Further checks can be obtained by separating the metal and ceramic
Line 111: Line 233:
 
A ''newtype'' is useful for this:
 
A ''newtype'' is useful for this:
   
  +
<haskell>
newtype MetalResistor = Metal Bands
 
newtype CeramicResistor = Ceramic Bands
+
newtype MetalResistor = Metal Bands
  +
newtype CeramicResistor = Ceramic Bands
  +
</haskell>
   
 
now, a function of resistors must have either a ''MetalResistor'' type, or a
 
now, a function of resistors must have either a ''MetalResistor'' type, or a
 
''CeramicResistor'' type:
 
''CeramicResistor'' type:
   
  +
<haskell>
foo :: MetalResistor -> Int
 
  +
foo :: MetalResistor -> Int
foo (MetalResistor n) = n
 
  +
foo (MetalResistor n) = n
  +
</haskell>
   
 
You can't write a function over both resistor types (other than a purely
 
You can't write a function over both resistor types (other than a purely
 
polymorphic function).
 
polymorphic function).
   
== Related work ==
+
=== Related work ===
   
  +
* These ideas are also discussed in [[Dimensionalized numbers]] and on the old wiki [http://web.archive.org/web/20050227183721/http://www.haskell.org/hawiki/NonTrivialTypeSynonyms here] (for compile-time unit analysis error catching at the type level).
These ideas are also discussed on the old wiki
 
  +
[http://haskell.org/hawiki/NonTrivialTypeSynonyms here] and
 
  +
* There is also [https://wiki.haskell.org/Liquid_Haskell Liquid Haskell], which allows you to annotate your functions with invariants ("the list that this function produces has to be sorted", etc) that are run through a SMT solver at compile time.
[http://haskell.org/hawiki/DimensionalizedNumbers also here] (for
 
  +
compile-time unit analysis error catching at the type level).
 
  +
* [http://nikita-volkov.github.io/refined/ "refined" library], which abstracts over smart constructors and provides compile-time checking, essentially implementing refinement types.
More [http://haskell.org/hawiki/WrapperTypes here] too.
 
  +
  +
* Recently migrated are the pages on [[worker wrapper]] and [[factory function]].
   
 
In general, the more information you place on the type level, the more
 
In general, the more information you place on the type level, the more
 
static checks you get -- and thus less chance for bugs.
 
static checks you get -- and thus less chance for bugs.
   
= Runtime Optimisation : smart constructors =
+
== Runtime Optimisation : smart constructors ==
   
 
Another use for smart constructors is to perform basic optimisations, often to obtain a normal form for constructed data. For example, consider a data structure representing addition and multiplication of variables.
 
Another use for smart constructors is to perform basic optimisations, often to obtain a normal form for constructed data. For example, consider a data structure representing addition and multiplication of variables.
   
  +
<haskell>
data Expression = Variable String
 
  +
data Expression = Variable String
| Add [Expression]
 
| Multiply [Expression]
+
| Add [Expression]
  +
| Multiply [Expression]
  +
</haskell>
   
In this data structure, it is possible to represent a value such as <tt>Add [Variable "a", Add [Variable "b", Variable "c"]]</tt> more compactly as <tt>Add [Variable "a", Variable "b", Variable "c"]].
+
In this data structure, it is possible to represent a value such as <tt>Add [Variable "a", Add [Variable "b", Variable "c"]]</tt> more compactly as <tt>Add [Variable "a", Variable "b", Variable "c"]</tt>.
   
 
This can be done automatically with smart constructors such as:
 
This can be done automatically with smart constructors such as:
   
  +
<haskell>
add :: [Expression] -> Expression
 
  +
add :: [Expression] -> Expression
add xs = Add (concatMap fromAdd xs)
 
  +
add xs = Add (concatMap fromAdd xs)
 
multiply :: [Expression] -> Expression
+
multiply :: [Expression] -> Expression
multiply xs = Multiply (concatMap fromMultiply xs)
+
multiply xs = Multiply (concatMap fromMultiply xs)
 
fromAdd (Add xs) = xs
 
fromAdd x = [x]
 
   
fromMultiply (Multiply xs) = xs
+
fromAdd (Add xs) = xs
fromMultiply x = [x]
+
fromAdd x = [x]
  +
fromMultiply (Multiply xs) = xs
  +
fromMultiply x = [x]
  +
</haskell>
   
 
[[Category:Idioms]]
 
[[Category:Idioms]]
  +
[[Category:Glossary]]

Latest revision as of 18:21, 6 June 2020

Smart constructors

This is an introduction to a programming idiom for placing extra constraints on the construction of values by using smart constructors.

Sometimes you need guarantees about the values in your program beyond what can be accomplished with the usual type system checks. Smart constructors can be used for this purpose.

Consider the following problem: we want to be able to specify a data type for electronic resistors. The resistors come in two forms, metal and ceramic. Resistors are labelled with a number of bands, from 4 to 8.

We'd like to be able to

  • ensure only resistors with the right number of bands are constructed.

Runtime checking : smart constructors

A first attempt

Code up a typical data type describing a resistor value:

data Resistor = Metal   Bands
              | Ceramic Bands 
                deriving Show

type Bands = Int

This has a problem however, that the constructors of type Resistor are unable to check that only bands of size 4 to 8 are built. It is quite legal to say:

*Main> :t Metal 23
Metal 23 :: Resistor

for example.

Smart(er) constructors

Smart constructors are just functions that build values of the required type, but perform some extra checks when the value is constructed, like so:

metalResistor :: Bands -> Resistor
metalResistor n | n < 4 || n > 8 = error "Invalid number of resistor bands" 
                | otherwise      = Metal n

This function behaves like the constructor Metal, but also performs a check. This check will be carried out at runtime, once, when the value is built.

Running this code:

> metalResistor 4
  Metal 4
> metalResistor 7
  Metal 7

> metalResistor 9
  *** Exception: Invalid number of resistor bands
> metalResistor 0
  *** Exception: Invalid number of resistor bands

One extra step has to be made though, to make the interface safe. When exporting the type Resistor we need to hide the (unsafe) constructors, and only export the smart constructors, otherwise a reckless user could bypass the smart constructor:

module Resistor (
         Resistor,       -- abstract, hiding constructors
         metalResistor,  -- only way to build a metal resistor
       ) where

 ...

Using assertions

Hand-coding error messages can be tedious when used often. Instead we can use the assert function, provided (from Control.Exception). We rewrite the smart constructor as:

metalResistor :: Bands -> Resistor
metalResistor n = assert (n >= 4 && n <= 8) $ Metal n

And now obtain more detailed error messages, automatically generated for us:

> metalResistor 0
  *** Exception: A.hs:4:18-23: Assertion failed

We at least now are given the line and column in which the error occured.

Compile-time checking : the type system

Enforcing the constraint statically

There are other ways to obtain numerical checks like this. The most interesting are probably the static checks that can be done with Type arithmetic, that enforce the number of bands at compile time, rather than runtime, by lifting the band count into the type level.

In the following example, instead of checking the band count at runtime, we instead lift the resistor band count into the type level, and have the typecheck perform the check statically, using phantom types and Peano numbers.

We thus remove the need for a runtime check, meaning faster code. A consequence of this decision is that since the band count is now represented in the type, it is no longer necessary to carry it around at runtime, meaning less data has to be allocated.

Firstly, define some Peano numbers to represent the number of bands as types:

 
data Z   = Z
data S a = S a

Now specify a class for cardinal numbers.

class Card c where
 
instance Card Z where
instance (Card c) => Card (S c) where

Ok, now we're set. So encode a type-level version of the bounds check. Only resistors with bands >= 4 and <= 8 are valid:

class Card size => InBounds size where
 
instance InBounds (S (S (S (S Z)))) where                 -- four
instance InBounds (S (S (S (S (S Z))))) where             -- five
instance InBounds (S (S (S (S (S (S Z)))))) where         -- six
instance InBounds (S (S (S (S (S (S (S Z))))))) where     -- seven
instance InBounds (S (S (S (S (S (S (S (S Z)))))))) where -- eight

Now define a new resistor type. Note that since the bounds is represented in the type, we no longer need to store the bounds in the resistor value.

data Resistor size = Resistor deriving Show

And, finally, a convenience constructor for us to use, encoding the bounds check in the type:

 
resistor :: InBounds size => size -> Resistor size
resistor _ = Resistor

Examples

First, define some convenience values:

d0  = undefined :: Z
d3  = undefined :: S (S (S Z))
d4  = undefined :: S (S (S (S Z)))
d6  = undefined :: S (S (S (S (S (S Z)))))
d8  = undefined :: S (S (S (S (S (S (S (S Z)))))))
d10 = undefined :: S (S (S (S (S (S (S (S (S (S Z)))))))))

Now try to construct some resistors:

> resistor d0
   No instance for (InBounds Z)

So the value 0 isn't in bounds, as we want. And it is a compile-time error to try to create such a resistor.

> resistor d3
   No instance for (InBounds (S (S (S Z))))

Ok, how about a valid resistor?

> resistor d4
Resistor

Great!

> :t resistor d4
resistor d4 :: Resistor (S (S (S (S Z))))

And its type encodes the number of bands.

> resistor d6
Resistor
> resistor d8
Resistor
> :t resistor d8
resistor d8 :: Resistor (S (S (S (S (S (S (S (S Z))))))))

Similar result for other valid resistors.

> resistor d10
   No instance for (InBounds (S (S (S (S (S (S (S (S (S (S Z)))))))))))

And 10 is too big.

Summary

By using a standard encoding of numeric values on the type level we are able to encode a bounds check in the type of a value, thus removing a runtime check, and removing the need to store the numeric value at runtime. The code is safer, as it is impossible to compile the program unless all resistors have the correct number of bands.

An extension would be to use a decimal encoding for the integers (at the expense of longer code).

Extensions

Further checks can be obtained by separating the metal and ceramic values on the type level, so no function that takes a metal resistor can be accidentally passed a ceramic one.

A newtype is useful for this:

newtype MetalResistor   = Metal   Bands
newtype CeramicResistor = Ceramic Bands

now, a function of resistors must have either a MetalResistor type, or a CeramicResistor type:

foo :: MetalResistor -> Int
foo (MetalResistor n) = n

You can't write a function over both resistor types (other than a purely polymorphic function).

Related work

  • These ideas are also discussed in Dimensionalized numbers and on the old wiki here (for compile-time unit analysis error catching at the type level).
  • There is also Liquid Haskell, which allows you to annotate your functions with invariants ("the list that this function produces has to be sorted", etc) that are run through a SMT solver at compile time.
  • "refined" library, which abstracts over smart constructors and provides compile-time checking, essentially implementing refinement types.

In general, the more information you place on the type level, the more static checks you get -- and thus less chance for bugs.

Runtime Optimisation : smart constructors

Another use for smart constructors is to perform basic optimisations, often to obtain a normal form for constructed data. For example, consider a data structure representing addition and multiplication of variables.

data Expression = Variable String
                | Add [Expression]
                | Multiply [Expression]

In this data structure, it is possible to represent a value such as Add [Variable "a", Add [Variable "b", Variable "c"]] more compactly as Add [Variable "a", Variable "b", Variable "c"].

This can be done automatically with smart constructors such as:

add :: [Expression] -> Expression
add xs = Add (concatMap fromAdd xs)
multiply :: [Expression] -> Expression
multiply xs = Multiply (concatMap fromMultiply xs)

fromAdd (Add xs) = xs
fromAdd x = [x]
fromMultiply (Multiply xs) = xs
fromMultiply x = [x]