# Correctness of short cut fusion

## Contents

## Short cut fusion

*Short cut fusion* allows elimination of intermediate data structures using rewrite rules that can also be performed automatically during compilation.

The two most popular instances are the `foldr`

/`build`

- and the `destroy`

/`unfoldr`

-rule for Haskell lists.

`foldr`

/`build`

The `foldr`

/`build`

-rule eliminates intermediate lists produced by `build`

and consumed by `foldr`

, where these functions are defined as follows:

```
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n [] = n
foldr c n (x:xs) = c x (foldr c n xs)
build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
```

Note the *rank-2 polymorphic* type of `build`

.

The `foldr`

/`build`

-rule now means the following replacement for appropriately typed `g`

, `c`

, and `n`

:

```
foldr c n (build g) <nowiki>→</nowiki> g c n
```

`destroy`

/`unfoldr`

The `destroy`

/`unfoldr`

-rule eliminates intermediate lists produced by `unfoldr`

and consumed by `destroy`

, where these functions are defined as follows:

```
destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c
destroy g = g listpsi
listpsi :: [a] -> Maybe (a,[a])
listpsi [] = Nothing
listpsi (x:xs) = Just (x,xs)
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr p e = case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e'
```

Note the *rank-2 polymorphic* type of `destroy`

.

The `destroy`

/`unfoldr`

-rule now means the following replacement for appropriately typed `g`

, `p`

, and `e`

:

```
destroy g (unfoldr p e) <nowiki>→</nowiki> g p e
```

## Correctness

If the `foldr`

/`build`

- and the `destroy`

/`unfoldr`

-rule are to be automatically performed during compilation, as is possible using GHC's *RULES pragmas*, we clearly want them to be equivalences.
That is, the left- and right-hand sides should be semantically the same for each instance of either rule.
Unfortunately, this is not so in Haskell.

We can distinguish two situations, depending on whether `g`

is defined using `seq`

or not.

### In absence of `seq`

If `g`

does not use `seq`

, then the `foldr`

/`build`

-rule really is a semantic equivalence, that is, it holds that

```
foldr c n (build g) = g c n
```

The two sides are semantically interchangeable.

The `destroy`

/`unfoldr`

-rule, however, is not a semantic equivalence.
To see this, consider the following instance:

```
g = \x y -> case x y of Just z -> 0
p = \x -> if x==0 then Just undefined else Nothing
e = 0
```

These values have appropriate types for being used in the `destroy`

/`unfoldr`

-rule. But with them, that rule's left-hand side "evaluates" as follows:

```
destroy g (unfoldr p e) = g listpsi (unfoldr p e)
= case listpsi (unfoldr p e) of Just z -> 0
= case listpsi (case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= case listpsi (case Just undefined of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= undefined
```

while its right-hand side "evaluates" as follows:

```
g p e = case p e of Just z -> 0
= case Just undefined of Just z -> 0
= 0
```

Thus, by applying the `destroy`

/`unfoldr`

-rule, a nonterminating (or otherwise failing) program can be transformed into a safely terminating one.
The obvious questions now are:

- Can the converse also happen, that is, can a safely terminating program be transformed into a failing one?
- Can a safely terminating program be transformed into another safely terminating one that gives a different value as result?

There is no formal proof yet, but strong evidence supporting the conjecture that the answer to both questions is "**No!**".

The conjecture goes that if `g`

does not use `seq`

, then the `destroy`

/`unfoldr`

-rule is a semantic approximation from left to right, that is, it holds that

```
destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e
```

What *is* known is that semantic equivalence can be recovered here by putting moderate restrictions on p.
More precisely, if `g`

does not use `seq`

and `p`

is a strict function that never returns `Just <math>\bot</math>`

(where denotes any kind of failure or nontermination), then indeed:

```
destroy g (unfoldr p e) = g p e
```

### In presence of `seq`

This is the more interesting setting, given that in Haskell there is no way to retrict the use of `seq`

, so in any given program we must be prepared for the possibility that the `g`

appearing in the `foldr`

/`build`

- or the `destroy`

/`unfoldr`

-rule is defined using `seq`

.