# Difference between revisions of "Correctness of short cut fusion"

(wrote new page) |
(commit additions) |
||

Line 23: | Line 23: | ||

<haskell> |
<haskell> |
||

− | foldr c n (build g) |
+ | foldr c n (build g) <nowiki>→</nowiki> g c n |

</haskell> |
</haskell> |
||

− | |||

===<hask>destroy</hask>/<hask>unfoldr</hask>=== |
===<hask>destroy</hask>/<hask>unfoldr</hask>=== |
||

Line 49: | Line 48: | ||

<haskell> |
<haskell> |
||

− | destroy g (unfoldr p e) |
+ | destroy g (unfoldr p e) <nowiki>→</nowiki> g p e |

</haskell> |
</haskell> |
||

Line 112: | Line 111: | ||

destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e |
destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e |
||

</haskell> |
</haskell> |
||

+ | |||

+ | What ''is'' known is that semantic equivalence can be recovered here by putting moderate restrictions on p. |
||

+ | More precisely, if <hask>g</hask> does not use <hask>seq</hask> and <hask>p</hask> is a strict function that never returns <hask>Just <math>\bot</math></hask> (where <math>\bot</math> denotes any kind of failure or nontermination), then indeed: |
||

+ | |||

+ | <haskell> |
||

+ | destroy g (unfoldr p e) = g p e |
||

+ | </haskell> |
||

+ | |||

+ | ===In presence of <hask>seq</hask>=== |
||

+ | |||

+ | This is the more interesting setting, given that in Haskell there is no way to retrict the use of <hask>seq</hask>, so in any given program we must be prepared for the possibility that the <hask>g</hask> appearing in the <hask>foldr</hask>/<hask>build</hask>- or the <hask>destroy</hask>/<hask>unfoldr</hask>-rule is defined using <hask>seq</hask>. |

## Revision as of 14:41, 6 July 2006

## Contents

## Short cut fusion

*Short cut fusion* allows elimination of intermediate data structures using rewrite rules that can also be performed automatically during compilation.

The two most popular instances are the `foldr`

/`build`

- and the `destroy`

/`unfoldr`

-rule for Haskell lists.

`foldr`

/`build`

The `foldr`

/`build`

-rule eliminates intermediate lists produced by `build`

and consumed by `foldr`

, where these functions are defined as follows:

```
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n [] = n
foldr c n (x:xs) = c x (foldr c n xs)
build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
```

Note the *rank-2 polymorphic* type of `build`

.

The `foldr`

/`build`

-rule now means the following replacement for appropriately typed `g`

, `c`

, and `n`

:

```
foldr c n (build g) <nowiki>→</nowiki> g c n
```

`destroy`

/`unfoldr`

The `destroy`

/`unfoldr`

-rule eliminates intermediate lists produced by `unfoldr`

and consumed by `destroy`

, where these functions are defined as follows:

```
destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c
destroy g = g listpsi
listpsi :: [a] -> Maybe (a,[a])
listpsi [] = Nothing
listpsi (x:xs) = Just (x,xs)
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr p e = case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e'
```

Note the *rank-2 polymorphic* type of `destroy`

.

The `destroy`

/`unfoldr`

-rule now means the following replacement for appropriately typed `g`

, `p`

, and `e`

:

```
destroy g (unfoldr p e) <nowiki>→</nowiki> g p e
```

## Correctness

If the `foldr`

/`build`

- and the `destroy`

/`unfoldr`

-rule are to be automatically performed during compilation, as is possible using GHC's *RULES pragmas*, we clearly want them to be equivalences.
That is, the left- and right-hand sides should be semantically the same for each instance of either rule.
Unfortunately, this is not so in Haskell.

We can distinguish two situations, depending on whether `g`

is defined using `seq`

or not.

### In absence of `seq`

If `g`

does not use `seq`

, then the `foldr`

/`build`

-rule really is a semantic equivalence, that is, it holds that

```
foldr c n (build g) = g c n
```

The two sides are semantically interchangeable.

The `destroy`

/`unfoldr`

-rule, however, is not a semantic equivalence.
To see this, consider the following instance:

```
g = \x y -> case x y of Just z -> 0
p = \x -> if x==0 then Just undefined else Nothing
e = 0
```

These values have appropriate types for being used in the `destroy`

/`unfoldr`

-rule. But with them, that rule's left-hand side "evaluates" as follows:

```
destroy g (unfoldr p e) = g listpsi (unfoldr p e)
= case listpsi (unfoldr p e) of Just z -> 0
= case listpsi (case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= case listpsi (case Just undefined of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= undefined
```

while its right-hand side "evaluates" as follows:

```
g p e = case p e of Just z -> 0
= case Just undefined of Just z -> 0
= 0
```

Thus, by applying the `destroy`

/`unfoldr`

-rule, a nonterminating (or otherwise failing) program can be transformed into a safely terminating one.
The obvious questions now are:

- Can the converse also happen, that is, can a safely terminating program be transformed into a failing one?
- Can a safely terminating program be transformed into another safely terminating one that gives a different value as result?

There is no formal proof yet, but strong evidence supporting the conjecture that the answer to both questions is "**No!**".

The conjecture goes that if `g`

does not use `seq`

, then the `destroy`

/`unfoldr`

-rule is a semantic approximation from left to right, that is, it holds that

```
destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e
```

What *is* known is that semantic equivalence can be recovered here by putting moderate restrictions on p.
More precisely, if `g`

does not use `seq`

and `p`

is a strict function that never returns `Just <math>\bot</math>`

(where denotes any kind of failure or nontermination), then indeed:

```
destroy g (unfoldr p e) = g p e
```

### In presence of `seq`

This is the more interesting setting, given that in Haskell there is no way to retrict the use of `seq`

, so in any given program we must be prepared for the possibility that the `g`

appearing in the `foldr`

/`build`

- or the `destroy`

/`unfoldr`

-rule is defined using `seq`

.