Difference between revisions of "Hask"

From HaskellWiki
Jump to navigation Jump to search
(Corrected title, since this was not about cartesian closeness)
 
(27 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''Hask''' is the name usually given to the [[Category theory|category]] having Haskell types as objects and Haskell functions between them as morphisms.
+
'''Hask''' is the [[Category theory|category]] of Haskell types and functions.
   
  +
Informally, the objects of '''Hask''' are Haskell types, and the morphisms from objects <hask>A</hask> to <hask>B</hask> are Haskell functions of type <hask>A -> B</hask>. The identity morphism for object <hask>A</hask> is <hask>id :: A -> A</hask>, and the composition of morphisms <hask>f</hask> and <hask>g</hask> is <hask>f . g = \x -> f (g x)</hask>.
A type-constructor that is an instance of the Functor class is an endofunctor on Hask.
 
   
  +
However, subtleties arise from questions such as the following.
* [http://www.cs.gunma-u.ac.jp/~hamana/Papers/cpo.pdf Makoto Hamana: ''What is the category for Haskell?'']
 
   
  +
* <b>When are two morphisms equal?</b> People often like to reason using some axioms or rewrite theory, such as [[beta reduction]] and [[eta conversion]], but these are subtle with non-termination. A more basic notion of equality comes from contextual equivalence, or observational equivalence, which requires us to pick a basic notion of observation (e.g. termination versus non-termination).
A solution approach to the issue of partiality making many of the identities required by categorical constructions not literally true in Haskell:
 
   
  +
* <b>Do we consider the entire Haskell language, or just a fragment?</b> Possible language fragment choices are: do we focus on terminating programs? do we consider [[seq]]? do we consider [[errors]] and [[exceptions]]? what about [[unsafe functions]]? and do we only consider the [[kind]] <hask>*</hask>, or also other kinds? The choice of fragment that is used affects the contextual equivalence, the notions of observation, and the universal properties within the category.
* [http://www.cs.nott.ac.uk/~nad/publications/danielsson-popl2006-tr.pdf Nils A. Danielsson, John Hughes, Patrik Jansson, and Jeremy Gibbons. ''Fast and loose reasoning is morally correct.'']
 
   
  +
One <I>can</I> easily make a syntactic category for a typed [[lambda calculus]], and there are many books about this. Perhaps the first place this becomes delicate in Haskell is the fact that Haskell types are not [[unboxed]] by default. This means we have an <hask>undefined</hask> expression at every type. Even the function type <hask>A -> B</hask>, which we would like to use as the type of morphisms, is boxed.
   
  +
== Is '''Hask''' even a category? ==
   
  +
Consider:
== The seq problem ==
 
   
  +
<haskell>
The right identity law fails in '''Hask''' if we distinguish values which can be distinguished by <hask>seq</hask>, since:
 
  +
undef1 = undefined :: a -> b
  +
undef2 = \_ -> undefined
  +
</haskell>
   
  +
Note that these are not the same value:
<hask>id . undefined = \x -> id (undefined x) = \x -> undefined x</hask>
 
   
  +
<haskell>
should be equal to <hask>undefined</hask>, but can be distinguished from it using <hask>seq</hask>:
 
  +
seq undef1 () = undefined
  +
seq undef2 () = ()
  +
</haskell>
   
  +
This might be a problem, because <hask>undef1 . id = undef2</hask>. In order to make '''Hask''' a category, we define two functions <hask>f</hask> and <hask>g</hask> as the same morphism if <hask>f x = g x</hask> for all <hask>x</hask>. Thus <hask>undef1</hask> and <hask>undef2</hask> are different ''values'', but the same ''morphism'' in '''Hask'''.
ghci> <hask>(undefined :: Int -> Int) `seq` ()</hask>
 
  +
* Exception: Prelude.undefined
 
  +
A way to resolve this would be to say that a morphism is <I>not</i> an arbitrary expression of type <hask>a -> b</hask>, but rather an expression of the form <hask>\x -> e</hask>, or an expression <hask>e</hask> of type <hask>b</hask> with a free variable <hask>x:a</hask>. Indeed the latter is the usual way to build a syntactic category.
ghci> <hask>(id . undefined :: Int -> Int) `seq` ()</hask>
 
  +
()
 
  +
== Does '''Hask''' have categorical coproducts and products? ==
  +
  +
Actual '''Hask''' does not have sums, products, or an initial object, and <hask>()</hask> is not a terminal object. The Monad identities fail for almost all instances of the Monad class.
  +
  +
{| class="wikitable"
  +
|+ Why '''Hask''' isn't as nice as you'd thought.
  +
! scope="col" |
  +
! scope="col" | Initial Object
  +
! scope="col" | Terminal Object
  +
! scope="col" | Sum
  +
! scope="col" | Product
  +
! scope="col" | Product
  +
|-
  +
! scope="row" | Type
  +
| <hask>data Empty</hask>
  +
| <hask>data () = ()</hask>
  +
| <hask>data Either a b
  +
= Left a | Right b</hask>
  +
| <hask>data (a,b) =
  +
(,) { fst :: a, snd :: b}</hask>
  +
| <hask>data P a b =
  +
P {fstP :: !a, sndP :: !b}</hask>
  +
|-
  +
! scope="row" | Requirement
  +
| There is a unique function
  +
<br /><hask>u :: Empty -> r</hask>
  +
| There is a unique function
  +
<br /><hask>u :: r -> ()</hask>
  +
| For any functions
  +
<br /><hask>f :: a -> r</hask>
  +
<br /><hask>g :: b -> r</hask>
  +
  +
there is a unique function
  +
<hask>u :: Either a b -> r</hask>
  +
  +
such that:
  +
<hask>u . Left = f</hask>
  +
<br /><hask>u . Right = g</hask>
  +
| For any functions
  +
<br /><hask>f :: r -> a</hask>
  +
<br /><hask>g :: r -> b</hask>
  +
  +
there is a unique function
  +
<hask>u :: r -> (a,b)</hask>
  +
  +
such that:
  +
<hask>fst . u = f</hask>
  +
<br /><hask>snd . u = g</hask>
  +
| For any functions
  +
<br /><hask>f :: r -> a</hask>
  +
<br /><hask>g :: r -> b</hask>
  +
  +
there is a unique function
  +
<hask>u :: r -> P a b</hask>
  +
  +
such that:
  +
<hask>fstP . u = f</hask>
  +
<br /><hask>sndP . u = g</hask>
  +
|-
  +
! scope="row" | Candidate
  +
| <hask>u1 r = case r of {}</hask>
  +
| <hask>u1 _ = ()</hask>
  +
| <hask>u1 (Left a) = f a</hask>
  +
<br /><hask>u1 (Right b) = g b</hask>
  +
| <hask>u1 r = (f r,g r)</hask>
  +
| <hask>u1 r = P (f r) (g r)</hask>
  +
|-
  +
! scope="row" | Example failure condition
  +
| <hask>r ~ ()</hask>
  +
| <hask>r ~ ()</hask>
  +
| <hask>r ~ ()</hask>
  +
<br /><hask>f _ = ()</hask>
  +
<br /><hask>g _ = ()</hask>
  +
| <hask>r ~ ()</hask>
  +
<br /><hask>f _ = undefined</hask>
  +
<br /><hask>g _ = undefined</hask>
  +
| <hask>r ~ ()</hask>
  +
<br /><hask>f _ = ()</hask>
  +
<br /><hask>g _ = undefined</hask>
  +
|-
  +
! scope="row" | Alternative u
  +
| <hask>u2 _ = ()</hask>
  +
| <hask>u2 _ = undefined</hask>
  +
| <hask>u2 _ = ()</hask>
  +
| <hask>u2 _ = undefined</hask>
  +
|
  +
|-
  +
! scope="row" | Difference
  +
| <hask>u1 undefined = undefined</hask>
  +
<br /><hask>u2 undefined = ()</hask>
  +
| <hask>u1 _ = ()</hask>
  +
<br /><hask>u2 _ = undefined</hask>
  +
| <hask>u1 undefined = undefined</hask>
  +
<br /><hask>u2 undefined = ()</hask>
  +
| <hask>u1 _ = (undefined,undefined)</hask>
  +
<br /><hask>u2 _ = undefined</hask>
  +
| <hask>f _ = ()</hask>
  +
<br /><hask>(fstP . u1) _ = undefined</hask>
  +
|- style="background: red;"
  +
! scope="row" | Result
  +
! scope="col" | FAIL
  +
! scope="col" | FAIL
  +
! scope="col" | FAIL
  +
! scope="col" | FAIL
  +
! scope="col" | FAIL
  +
|}
  +
  +
Using [[unboxed]] types might alleviate some of these problems. For example, revisiting the first column, but using the UnliftedDatatypes extension, we can define
  +
<hask>data UEmpty :: UnliftedType where</hask>,
  +
but now if we instist that <hask>u1</hask> and <hask>u2</hask> have type <hask>UEmpty -> ()</hask>, then <hask>u2 undefined = undefined = u1 undefined</hask>.
  +
  +
== "Platonic" '''Hask''' ==
  +
  +
Because of these difficulties, Haskell developers tend to think in some subset of Haskell where types do not have bottom values. This means that it only includes functions that terminate, and typically only finite values. The corresponding category has the expected initial and terminal objects, sums and products, and instances of Functor and Monad really are endofunctors and monads.
  +
  +
== Links ==
  +
  +
* [http://www.cs.gunma-u.ac.jp/~hamana/Papers/cpo.pdf Makoto Hamana: ''What is the category for Haskell?'']
  +
* [http://www.cs.nott.ac.uk/~nad/publications/danielsson-popl2006-tr.pdf Nils A. Danielsson, John Hughes, Patrik Jansson, and Jeremy Gibbons. ''Fast and loose reasoning is morally correct.'']
  +
* [https://ncatlab.org/nlab/show/syntactic+category NLab: Syntactic category]
   
{{stub}}
 
 
[[Category:Mathematics]]
 
[[Category:Mathematics]]
  +
[[Category:Theoretical foundations]]

Latest revision as of 10:33, 9 March 2023

Hask is the category of Haskell types and functions.

Informally, the objects of Hask are Haskell types, and the morphisms from objects A to B are Haskell functions of type A -> B. The identity morphism for object A is id :: A -> A, and the composition of morphisms f and g is f . g = \x -> f (g x).

However, subtleties arise from questions such as the following.

  • When are two morphisms equal? People often like to reason using some axioms or rewrite theory, such as beta reduction and eta conversion, but these are subtle with non-termination. A more basic notion of equality comes from contextual equivalence, or observational equivalence, which requires us to pick a basic notion of observation (e.g. termination versus non-termination).
  • Do we consider the entire Haskell language, or just a fragment? Possible language fragment choices are: do we focus on terminating programs? do we consider seq? do we consider errors and exceptions? what about unsafe functions? and do we only consider the kind *, or also other kinds? The choice of fragment that is used affects the contextual equivalence, the notions of observation, and the universal properties within the category.

One can easily make a syntactic category for a typed lambda calculus, and there are many books about this. Perhaps the first place this becomes delicate in Haskell is the fact that Haskell types are not unboxed by default. This means we have an undefined expression at every type. Even the function type A -> B, which we would like to use as the type of morphisms, is boxed.

Is Hask even a category?

Consider:

undef1 = undefined :: a -> b
undef2 = \_ -> undefined

Note that these are not the same value:

seq undef1 () = undefined
seq undef2 () = ()

This might be a problem, because undef1 . id = undef2. In order to make Hask a category, we define two functions f and g as the same morphism if f x = g x for all x. Thus undef1 and undef2 are different values, but the same morphism in Hask.

A way to resolve this would be to say that a morphism is not an arbitrary expression of type a -> b, but rather an expression of the form \x -> e, or an expression e of type b with a free variable x:a. Indeed the latter is the usual way to build a syntactic category.

Does Hask have categorical coproducts and products?

Actual Hask does not have sums, products, or an initial object, and () is not a terminal object. The Monad identities fail for almost all instances of the Monad class.

Why Hask isn't as nice as you'd thought.
Initial Object Terminal Object Sum Product Product
Type data Empty data () = () data Either a b = Left a | Right b data (a,b) = (,) { fst :: a, snd :: b} data P a b = P {fstP :: !a, sndP :: !b}
Requirement There is a unique function


u :: Empty -> r

There is a unique function


u :: r -> ()

For any functions


f :: a -> r
g :: b -> r

there is a unique function u :: Either a b -> r

such that: u . Left = f
u . Right = g

For any functions


f :: r -> a
g :: r -> b

there is a unique function u :: r -> (a,b)

such that: fst . u = f
snd . u = g

For any functions


f :: r -> a
g :: r -> b

there is a unique function u :: r -> P a b

such that: fstP . u = f
sndP . u = g

Candidate u1 r = case r of {} u1 _ = () u1 (Left a) = f a


u1 (Right b) = g b

u1 r = (f r,g r) u1 r = P (f r) (g r)
Example failure condition r ~ () r ~ () r ~ ()


f _ = ()
g _ = ()

r ~ ()


f _ = undefined
g _ = undefined

r ~ ()


f _ = ()
g _ = undefined

Alternative u u2 _ = () u2 _ = undefined u2 _ = () u2 _ = undefined
Difference u1 undefined = undefined


u2 undefined = ()

u1 _ = ()


u2 _ = undefined

u1 undefined = undefined


u2 undefined = ()

u1 _ = (undefined,undefined)


u2 _ = undefined

f _ = ()


(fstP . u1) _ = undefined

Result FAIL FAIL FAIL FAIL FAIL

Using unboxed types might alleviate some of these problems. For example, revisiting the first column, but using the UnliftedDatatypes extension, we can define data UEmpty :: UnliftedType where, but now if we instist that u1 and u2 have type UEmpty -> (), then u2 undefined = undefined = u1 undefined.

"Platonic" Hask

Because of these difficulties, Haskell developers tend to think in some subset of Haskell where types do not have bottom values. This means that it only includes functions that terminate, and typically only finite values. The corresponding category has the expected initial and terminal objects, sums and products, and instances of Functor and Monad really are endofunctors and monads.

Links