# Difference between revisions of "Newtype"

DonStewart (talk | contribs) (file under language) |
(→Examples) |
||

(10 intermediate revisions by 6 users not shown) | |||

Line 1: | Line 1: | ||

− | One frequent question is what is the difference between data and newtype? The answer has to do with the level of undefinedness that occurs in the values. The following sample code shows how three different data declarations behave with undefined present. This shows the difference in behavior. |
||

+ | A <hask>newtype</hask> declaration creates a new type in much the same way as <hask>data</hask>. The syntax and usage of newtypes is virtually identical to that of data declarations - in fact, you can replace the <hask>newtype</hask> keyword with <hask>data</hask> and it'll still compile, indeed there's even a good chance your program will still work. The converse is not true, however - <hask>data</hask> can only be replaced with <hask>newtype</hask> if the type has ''exactly one constructor'' with ''exactly one field'' inside it. |
||

− | Another difference is that newtypes can be compiled to have only the overhead of the wrapped type, probably making them more efficient than data types. And at least GHC has [http://www.haskell.org/ghc/dist/current/docs/users_guide/type-extensions.html#newtype-deriving extended the deriving syntax] to make usage of newtypes easier. |
||

+ | Some examples: |
||

− | Newtypes can be used transparently in [http://www.cse.unsw.edu.au/~chak/haskell/ffi/ FFI wrappers], including [http://www.haskell.org/ghc/dist/current/docs/users_guide/ffi.html#id3177212 IO] when using GHC. |
||

+ | <haskell> |
||

+ | newtype Fd = Fd CInt |
||

+ | -- data Fd = Fd CInt would also be valid |
||

⚫ | |||

+ | -- newtypes can have deriving clauses just like normal types |
||

+ | newtype Identity a = Identity a |
||

+ | deriving (Eq, Ord, Read, Show) |
||

+ | |||

+ | -- record syntax is still allowed, but only for one field |
||

+ | newtype State s a = State { runState :: s -> (s, a) } |
||

+ | |||

+ | -- this is *not* allowed: |
||

+ | -- newtype Pair a b = Pair { pairFst :: a, pairSnd :: b } |
||

+ | -- but this is: |
||

+ | data Pair a b = Pair { pairFst :: a, pairSnd :: b } |
||

+ | -- and so is this: |
||

+ | newtype NPair a b = NPair (a, b) |
||

+ | </haskell> |
||

+ | |||

+ | Sounds pretty limited! So why does anyone use <hask>newtype</hask>? |
||

+ | |||

+ | == The short version == |
||

+ | |||

+ | The restriction to one constructor with one field means that the new type and the type of the field are in direct correspondence: |
||

+ | |||

+ | <haskell> |
||

+ | State :: (s -> (s, a)) -> State s a |
||

+ | runState :: State s a -> (s -> (s, a)) |
||

+ | </haskell> |
||

+ | |||

+ | or in mathematical terms they are ''isomorphic''. This means that after the type is checked at compile time, at run time the two types can be treated essentially the same, without the overhead or indirection normally associated with a data constructor. So if you want to declare different type class instances for a particular type, or want to make a type abstract, you can wrap it in a <hask>newtype</hask> and it'll be considered distinct to the type-checker, but identical at runtime. You can then use all sorts of deep trickery like phantom or recursive types without worrying about GHC shuffling buckets of bytes for no reason. |
||

+ | |||

+ | == The messy bits == |
||

+ | |||

+ | Why doesn't everyone just use <hask>newtype</hask> whenever they can, then? Well, quite often they do. But there is a subtle yet semantically significant difference. When we create a data type supposedly isomorphic to <hask>Bool</hask> like so: |
||

+ | |||

+ | <haskell>data Any = Any { getAny :: Bool }</haskell> |
||

+ | |||

+ | we actually find that the isomorphism isn't exact: |
||

+ | |||

+ | <haskell> |
||

+ | Any . getAny $ Any True = Any True -- okay, fine |
||

+ | Any . getAny $ Any False = Any False -- also fine |
||

+ | Any . getAny $ Any ⊥ = Any ⊥ -- so far so good |
||

+ | Any . getAny $ ⊥ = Any ⊥ -- wait a second... |
||

+ | </haskell> |
||

+ | ([[Bottom|what's that upside-down T thing?]]) |
||

+ | |||

+ | The problem is that types declared with the <hask>data</hask> keyword are ''lifted'' - that is, they contain their own ⊥ value that is distinct from all the others. In this example, we have <hask>⊥ :: Any</hask> distinct from <hask>Any ⊥ :: Any</hask>. What this means is that the following pattern match: |
||

+ | |||

+ | <haskell> |
||

+ | case x of |
||

+ | Any _ -> () |
||

+ | </haskell> |
||

+ | |||

+ | must evaluate its argument, even though it seems like the pattern match can't fail: we must check whether <hask>x</hask> is <hask>⊥</hask> or <hask>Any y</hask> for some <hask>y</hask>. |
||

+ | |||

+ | This is intrinsic to Haskell's lazy, non-total semantics. The problem is that this means tracking whether a value is wrapped in a constructor or not, which means keeping track of those extra constructors at runtime even when all they do is distinguish an extra bottom value we don't even want. So in order to be consistent, but also allow the exact isomorphism to be preserved, Haskell provides the <hask>newtype</hask> keyword, for the construction of unlifted types. Pattern-matching on a newtype constructor doesn't do any work, because there is no separate ⊥ so every value in the type is wrapped in the constructor. |
||

+ | |||

+ | == What about strict types? == |
||

+ | |||

+ | You may notice that a type like |
||

+ | |||

+ | <haskell>data Identity' a = Identity' !a</haskell> |
||

+ | |||

+ | has <hask>Identity' ⊥ = ⊥</hask> and so you might think you have your coveted isomorphism. But all the strictness annotation means is that <hask>Identity' ⊥</hask> really means <hask>Identity' $! ⊥</hask> - the semantics of the type are fundamentally the same, and in particular the case expression still forces the value. |
||

+ | |||

+ | == Examples == |
||

<haskell> |
<haskell> |
||

Line 12: | Line 77: | ||

data Foo1 = Foo1 Int -- Defines Foo1 constructor that lazily refers to an Int |
data Foo1 = Foo1 Int -- Defines Foo1 constructor that lazily refers to an Int |
||

data Foo2 = Foo2 !Int -- Defines Foo2 constructor that strictly refers to an Int |
data Foo2 = Foo2 !Int -- Defines Foo2 constructor that strictly refers to an Int |
||

− | newtype Foo3 = Foo3 Int -- Defines Foo3 constructor that |
+ | newtype Foo3 = Foo3 Int -- Defines Foo3 constructor that has the same representation as Int |

-- Argument is lazy and ignored, so |
-- Argument is lazy and ignored, so |
||

Line 50: | Line 115: | ||

</haskell> |
</haskell> |
||

+ | Here are some tabulated examples which may be easier to read: |
||

+ | |||

+ | {| class="wikitable" |
||

+ | |- |
||

+ | ! |
||

+ | ! <haskell>newtype Foo = Foo ()</haskell> |
||

+ | ! <haskell>data Foo = Foo ()</haskell> |
||

+ | ! <haskell>data Foo = Foo !()</haskell> |
||

+ | |- |
||

+ | | <haskell>case Foo undefined of Foo _ -> ()</haskell> |
||

+ | | <haskell>()</haskell> |
||

+ | | <haskell>()</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | |- |
||

+ | | <haskell>case undefined of Foo _ -> ()</haskell> |
||

+ | | <haskell>()</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | |- |
||

+ | | <haskell> Foo undefined `seq` ()</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | | <haskell>()</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | |- |
||

+ | | <haskell> (undefined :: Foo) `seq` ()</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | | <haskell>undefined</haskell> |
||

+ | |} |
||

+ | |||

+ | == See also == |
||

+ | |||

⚫ | |||

+ | |||

+ | [[Category:FAQ]] |
||

[[Category:Language]] |
[[Category:Language]] |

## Latest revision as of 15:57, 22 May 2016

A `newtype`

declaration creates a new type in much the same way as `data`

. The syntax and usage of newtypes is virtually identical to that of data declarations - in fact, you can replace the `newtype`

keyword with `data`

and it'll still compile, indeed there's even a good chance your program will still work. The converse is not true, however - `data`

can only be replaced with `newtype`

if the type has *exactly one constructor* with *exactly one field* inside it.

Some examples:

```
newtype Fd = Fd CInt
-- data Fd = Fd CInt would also be valid
-- newtypes can have deriving clauses just like normal types
newtype Identity a = Identity a
deriving (Eq, Ord, Read, Show)
-- record syntax is still allowed, but only for one field
newtype State s a = State { runState :: s -> (s, a) }
-- this is *not* allowed:
-- newtype Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- but this is:
data Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- and so is this:
newtype NPair a b = NPair (a, b)
```

Sounds pretty limited! So why does anyone use `newtype`

?

## The short version

The restriction to one constructor with one field means that the new type and the type of the field are in direct correspondence:

```
State :: (s -> (s, a)) -> State s a
runState :: State s a -> (s -> (s, a))
```

or in mathematical terms they are *isomorphic*. This means that after the type is checked at compile time, at run time the two types can be treated essentially the same, without the overhead or indirection normally associated with a data constructor. So if you want to declare different type class instances for a particular type, or want to make a type abstract, you can wrap it in a `newtype`

and it'll be considered distinct to the type-checker, but identical at runtime. You can then use all sorts of deep trickery like phantom or recursive types without worrying about GHC shuffling buckets of bytes for no reason.

## The messy bits

Why doesn't everyone just use `newtype`

whenever they can, then? Well, quite often they do. But there is a subtle yet semantically significant difference. When we create a data type supposedly isomorphic to `Bool`

like so:

```
data Any = Any { getAny :: Bool }
```

we actually find that the isomorphism isn't exact:

```
Any . getAny $ Any True = Any True -- okay, fine
Any . getAny $ Any False = Any False -- also fine
Any . getAny $ Any ⊥ = Any ⊥ -- so far so good
Any . getAny $ ⊥ = Any ⊥ -- wait a second...
```

(what's that upside-down T thing?)

The problem is that types declared with the `data`

keyword are *lifted* - that is, they contain their own ⊥ value that is distinct from all the others. In this example, we have `⊥ :: Any`

distinct from `Any ⊥ :: Any`

. What this means is that the following pattern match:

```
case x of
Any _ -> ()
```

must evaluate its argument, even though it seems like the pattern match can't fail: we must check whether `x`

is `⊥`

or `Any y`

for some `y`

.

This is intrinsic to Haskell's lazy, non-total semantics. The problem is that this means tracking whether a value is wrapped in a constructor or not, which means keeping track of those extra constructors at runtime even when all they do is distinguish an extra bottom value we don't even want. So in order to be consistent, but also allow the exact isomorphism to be preserved, Haskell provides the `newtype`

keyword, for the construction of unlifted types. Pattern-matching on a newtype constructor doesn't do any work, because there is no separate ⊥ so every value in the type is wrapped in the constructor.

## What about strict types?

You may notice that a type like

```
data Identity' a = Identity' !a
```

has `Identity' ⊥ = ⊥`

and so you might think you have your coveted isomorphism. But all the strictness annotation means is that `Identity' ⊥`

really means `Identity' $! ⊥`

- the semantics of the type are fundamentally the same, and in particular the case expression still forces the value.

## Examples

```
module Foo where
data Foo1 = Foo1 Int -- Defines Foo1 constructor that lazily refers to an Int
data Foo2 = Foo2 !Int -- Defines Foo2 constructor that strictly refers to an Int
newtype Foo3 = Foo3 Int -- Defines Foo3 constructor that has the same representation as Int
-- Argument is lazy and ignored, so
-- undefined does not cause failure since
-- the contructor pattern match succeeds.
x1 = case Foo1 undefined of
Foo1 _ -> 1 -- 1
-- Argument is strict (because of !), so
-- undefined does cause failure.
x2 = case Foo2 undefined of
Foo2 _ -> 1 -- undefined
-- The newtype behaves like Int, see yInt below
x3 = case Foo3 undefined of
Foo3 _ -> 1 -- 1
-- Constructor pattern match fails
y1 = case undefined of
Foo1 _ -> 1 -- undefined
-- Constructor pattern match fails
y2 = case undefined of
Foo2 _ -> 1 -- undefined
-- The newtype behaves like Int, there is no
-- constructor at runtime.
y3 = case undefined of
Foo3 _ -> 1 -- 1
-- Demonstration of Int behavior
int :: Int
int = undefined
yInt = case int of
_ -> 1 -- 1
```

Here are some tabulated examples which may be easier to read:

```
newtype Foo = Foo ()
``` |
```
data Foo = Foo ()
``` |
```
data Foo = Foo !()
``` | |
---|---|---|---|

```
case Foo undefined of Foo _ -> ()
``` |
```
()
``` |
```
()
``` |
```
undefined
``` |

```
case undefined of Foo _ -> ()
``` |
```
()
``` |
```
undefined
``` |
```
undefined
``` |

```
Foo undefined `seq` ()
``` |
```
undefined
``` |
```
()
``` |
```
undefined
``` |

```
(undefined :: Foo) `seq` ()
``` |
```
undefined
``` |
```
undefined
``` |
```
undefined
``` |

## See also

The Haskell 98 Report defines newtypes in section 4.2.3.