# Correctness of short cut fusion

### From HaskellWiki

(added paper) |
m (just a small formatting edit) |
||

(9 intermediate revisions by 3 users not shown) | |||

Line 23: | Line 23: | ||

<haskell> | <haskell> | ||

− | foldr c n (build g) | + | foldr c n (build g) → g c n |

</haskell> | </haskell> | ||

Line 48: | Line 48: | ||

<haskell> | <haskell> | ||

− | destroy g (unfoldr p e) | + | destroy g (unfoldr p e) → g p e |

</haskell> | </haskell> | ||

Line 113: | Line 113: | ||

<haskell> | <haskell> | ||

− | destroy g (unfoldr p e) | + | destroy g (unfoldr p e) ⊑ g p e |

</haskell> | </haskell> | ||

What ''is'' known is that semantic equivalence can be recovered here by putting moderate restrictions on p. | What ''is'' known is that semantic equivalence can be recovered here by putting moderate restrictions on p. | ||

− | More precisely, if <hask>g</hask> does not use <hask>seq</hask> and <hask>p</hask> is a strict function that never returns <hask>Just | + | More precisely, if <hask>g</hask> does not use <hask>seq</hask> and <hask>p</hask> is a strict function that never returns <hask>Just ⊥</hask> (where ⊥ denotes any kind of failure or nontermination), then indeed: |

<haskell> | <haskell> | ||

Line 144: | Line 144: | ||

<haskell> | <haskell> | ||

− | foldr c n (build g) | + | foldr c n (build g) ⊒ g c n |

</haskell> | </haskell> | ||

Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions. | Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions. | ||

− | More precisely, if <hask>(c | + | More precisely, if <hask>(c ⊥ ⊥) ≠ ⊥</hask> and <hask>n ≠ ⊥</hask>, then even in the presence of <hask>seq</hask>: |

<haskell> | <haskell> | ||

Line 169: | Line 169: | ||

Conditions for semantic approximation in either direction can be given as follows. | Conditions for semantic approximation in either direction can be given as follows. | ||

− | If <hask>p | + | If <hask>p ≠ ⊥</hask> and <hask>(p ⊥)</hask> ∈ {<hask>⊥</hask>, <hask>Just ⊥</hask>}, then: |

<haskell> | <haskell> | ||

− | destroy g (unfoldr p e) | + | destroy g (unfoldr p e) ⊑ g p e |

</haskell> | </haskell> | ||

− | If <hask>p</hask> is strict and total and never returns <hask>Just | + | If <hask>p</hask> is strict and total and never returns <hask>Just ⊥</hask>, then: |

<haskell> | <haskell> | ||

− | destroy g (unfoldr p e) | + | destroy g (unfoldr p e) ⊒ g p e |

</haskell> | </haskell> | ||

Line 192: | Line 192: | ||

As pointed out above, everything is fine with <hask>foldr</hask>/<hask>build</hask> in the absence of <hask>seq</hask>. | As pointed out above, everything is fine with <hask>foldr</hask>/<hask>build</hask> in the absence of <hask>seq</hask>. | ||

− | If the producer (<hask>build g</hask>) of the intermediate list may be defined using <hask>seq</hask>, then the conditions <hask>(c | + | If the producer (<hask>build g</hask>) of the intermediate list may be defined using <hask>seq</hask>, then the conditions <hask>(c ⊥ ⊥) ≠ ⊥</hask> and <hask>n ≠ ⊥</hask> better be satisified, lest the compiler transform a perfectly fine program into a failing one. |

− | The mentioned conditions are equivalent to requiring that the consumer (<hask>foldr c n</hask>) is a total function, that is, maps non- | + | The mentioned conditions are equivalent to requiring that the consumer (<hask>foldr c n</hask>) is a total function, that is, maps non-⊥ lists to a non-⊥ value. |

It is thus relatively easy to identify whether a list consumer defined in terms of <hask>foldr</hask> is eligible for <hask>foldr</hask>/<hask>build</hask>-fusion in the presence of <hask>seq</hask> or not. | It is thus relatively easy to identify whether a list consumer defined in terms of <hask>foldr</hask> is eligible for <hask>foldr</hask>/<hask>build</hask>-fusion in the presence of <hask>seq</hask> or not. | ||

For example, the Prelude functions <hask>head</hask> and <hask>sum</hask> are generally not, while <hask>map</hask> is. | For example, the Prelude functions <hask>head</hask> and <hask>sum</hask> are generally not, while <hask>map</hask> is. | ||

Line 204: | Line 204: | ||

As above, the compiler cannot figure out automatically whether (and how) a given instance of <hask>destroy</hask>/<hask>unfoldr</hask>-fusion will change the semantics of a program. | As above, the compiler cannot figure out automatically whether (and how) a given instance of <hask>destroy</hask>/<hask>unfoldr</hask>-fusion will change the semantics of a program. | ||

− | An easy way to get rid of the condition regarding <hask>p</hask> never returning <hask>Just | + | An easy way to get rid of the condition regarding <hask>p</hask> never returning <hask>Just ⊥</hask> is to slightly change the definitions of the functions involved: |

<haskell> | <haskell> | ||

Line 221: | Line 221: | ||

</haskell> | </haskell> | ||

− | The new type <hask>Step a b</hask> is almost isomorphic to <hask>Maybe (a,b)</hask>, but avoids the "junk value" <hask>Just | + | The new type <hask>Step a b</hask> is almost isomorphic to <hask>Maybe (a,b)</hask>, but avoids the "junk value" <hask>Just ⊥</hask>. This change does not affect the expressiveness of <hask>unfoldr</hask> or <hask>unfoldr'</hask> with respect to list producers. |

But it allows some of the laws above to be simplified a bit. | But it allows some of the laws above to be simplified a bit. | ||

Line 227: | Line 227: | ||

<haskell> | <haskell> | ||

− | destroy g' (unfoldr' p e) | + | destroy g' (unfoldr' p e) ⊑ g p e |

</haskell> | </haskell> | ||

Line 236: | Line 236: | ||

</haskell> | </haskell> | ||

− | In the potential presence of <hask>seq</hask>, if <hask>p | + | In the potential presence of <hask>seq</hask>, if <hask>p ≠ ⊥</hask> and <hask>p</hask> is strict, then: |

<haskell> | <haskell> | ||

− | destroy' g (unfoldr' p e) | + | destroy' g (unfoldr' p e) ⊑ g p e |

</haskell> | </haskell> | ||

Line 245: | Line 245: | ||

<haskell> | <haskell> | ||

− | destroy' g (unfoldr' p e) | + | destroy' g (unfoldr' p e) ⊒ g p e |

</haskell> | </haskell> | ||

Line 267: | Line 267: | ||

<haskell> | <haskell> | ||

− | destroy g (unfoldr p e) | + | destroy g (unfoldr p e) ⊑ g p e |

</haskell> | </haskell> | ||

Line 291: | Line 291: | ||

* J. Voigtländer. [http://doi.acm.org/10.1145/1328408.1328412 Proving Correctness via Free Theorems: The Case of the destroy/build-Rule]. Partial Evaluation and Semantics-Based Program Manipulation, Proceedings, pages 13-20, ACM Press, 2008. | * J. Voigtländer. [http://doi.acm.org/10.1145/1328408.1328412 Proving Correctness via Free Theorems: The Case of the destroy/build-Rule]. Partial Evaluation and Semantics-Based Program Manipulation, Proceedings, pages 13-20, ACM Press, 2008. | ||

* J. Voigtländer. [http://dx.doi.org/10.1007/978-3-540-78969-7_13 Semantics and Pragmatics of New Shortcut Fusion Rules]. Functional and Logic Programming, Proceedings, LNCS 4989:163-179, Springer-Verlag, 2008. | * J. Voigtländer. [http://dx.doi.org/10.1007/978-3-540-78969-7_13 Semantics and Pragmatics of New Shortcut Fusion Rules]. Functional and Logic Programming, Proceedings, LNCS 4989:163-179, Springer-Verlag, 2008. | ||

− | * P. Johann and J. Voigtländer. [http://wwwtcs.inf.tu-dresden.de/~voigt/iandc.pdf A | + | * P. Johann and J. Voigtländer. [http://wwwtcs.inf.tu-dresden.de/~voigt/iandc.pdf A family of syntactic logical relations for the semantics of Haskell-like languages]. Information and Computation, 207:341-368, 2009. |

+ | |||

+ | |||

[[Category:Tutorials]] | [[Category:Tutorials]] | ||

[[Category:Program transformation]] | [[Category:Program transformation]] |

## Revision as of 04:11, 3 July 2011

## Contents |

## 1 Short cut fusion

Short cut fusion allows elimination of intermediate data structures using rewrite rules that can also be performed automatically during compilation.

The two most popular instances are the### 1.1 foldr/build

The foldr :: (a -> b -> b) -> b -> [a] -> b foldr c n [] = n foldr c n (x:xs) = c x (foldr c n xs) build :: (forall b. (a -> b -> b) -> b -> b) -> [a] build g = g (:) []

*rank-2 polymorphic*type of

foldr c n (build g) → g c n

### 1.2 destroy/unfoldr

The destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c destroy g = g step step :: [a] -> Maybe (a,[a]) step [] = Nothing step (x:xs) = Just (x,xs) unfoldr :: (b -> Maybe (a,b)) -> b -> [a] unfoldr p e = case p e of Nothing -> [] Just (x,e') -> x:unfoldr p e'

*rank-2 polymorphic*type of

destroy g (unfoldr p e) → g p e

## 2 Correctness

If the*RULES pragmas*, we clearly want them to be equivalences.

That is, the left- and right-hand sides should be semantically the same for each instance of either rule. Unfortunately, this is not so in Haskell.

We can distinguish two situations, depending on whether### 2.1 In the absence of seq

#### 2.1.1 foldr/build

If foldr c n (build g) = g c n

The two sides are interchangeable in any program without affecting semantics.

#### 2.1.2 destroy/unfoldr

The To see this, consider the following instance:

g = \x y -> case x y of Just z -> 0 p = \x -> if x==0 then Just undefined else Nothing e = 0

destroy g (unfoldr p e) = g step (unfoldr p e) = case step (unfoldr p e) of Just z -> 0 = case step (case p e of Nothing -> [] Just (x,e') -> x:unfoldr p e') of Just z -> 0 = case step (case Just undefined of Nothing -> [] Just (x,e') -> x:unfoldr p e') of Just z -> 0 = undefined

while its right-hand side "evaluates" as follows:

g p e = case p e of Just z -> 0 = case Just undefined of Just z -> 0 = 0

The obvious questions now are:

- Can the converse also happen, that is, can a safely terminating program be transformed into a failing one?
- Can a safely terminating program be transformed into another safely terminating one that gives a different value as result?

There is no formal proof yet, but strong evidence supporting the conjecture that the answer to both questions is "**No!**".

destroy g (unfoldr p e) ⊑ g p e

What *is* known is that semantic equivalence can be recovered here by putting moderate restrictions on p.

destroy g (unfoldr p e) = g p e

### 2.2 In the presence of seq

This is the more interesting setting, given that in Haskell there is no way to restrict the use of Unsurprisingly, it is also the setting in which more can go wrong than above.

#### 2.2.1 foldr/build

In the presence of The instance

g = seq c = undefined n = 0

The converse cannot happen, because the following always holds:

foldr c n (build g) ⊒ g c n

Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions.

More precisely, iffoldr c n (build g) = g c n

#### 2.2.2 destroy/unfoldr

Contrary to the situation without This is witnessed by the following instance:

g = \x y -> seq x 0 p = undefined e = 0

Conditions for semantic approximation in either direction can be given as follows.

Ifdestroy g (unfoldr p e) ⊑ g p e

destroy g (unfoldr p e) ⊒ g p e

Of course, conditions for semantic equivalence can be obtained by combining the two laws above.

## 3 Discussion

Correctness of short cut fusion is not just an academic issue.

All recent versions of GHC (at least 6.0 - 6.6) automatically perform transformations likeThere has been at least one occasion where, as a result, a safely terminating program was turned into a failing one "in the wild", with a less artificial example than the ones given above.

### 3.1 foldr/build

As pointed out above, everything is fine with ### 3.2 destroy/unfoldr

As above, the compiler cannot figure out automatically whether (and how) a given instance of data Step a b = Done | Yield a b destroy' :: (forall b. (b -> Step a b) -> b -> c) -> [a] -> c destroy' g = g step' step' :: [a] -> Step a [a] step' [] = Done step' (x:xs) = Yield x xs unfoldr' :: (b -> Step a b) -> b -> [a] unfoldr' p e = case p e of Done -> [] Yield x e' -> x:unfoldr' p e'

But it allows some of the laws above to be simplified a bit.

We would still have that ifdestroy g' (unfoldr' p e) ⊑ g p e

destroy' g (unfoldr' p e) = g p e

destroy' g (unfoldr' p e) ⊑ g p e

destroy' g (unfoldr' p e) ⊒ g p e

The worst change in program behavior from a complier user's point of view is when, through application of "optimization" rules, a safely terminating program is transformed into a failing one or one delivering a different result.

This can happen in the presence ofrepeat x = unfoldr (\y -> Just (x,y)) undefined

or

repeat x = unfoldr' (\y -> Yield x y) undefined

A left-to-right approximation as in

destroy g (unfoldr p e) ⊑ g p e

under suitable preconditions might be acceptable in practice. After all, it only means that the transformed program may be "more terminating" than the original one, but not less so.

If one insists on semantic equivalence rather than approximation, then the conditions imposed on the producer of the intermediate list become quite severe, in particular in the potential presence ofFor example, the following producer has to be outlawed then:

enumFromTo n m = unfoldr (\i -> if i>m then Nothing else Just (i,i+1)) n

## 4 Literature

Various parts of the above story, and elaborations thereof, are also told in the following papers:

- A. Gill, J. Launchbury, and S.L. Peyton Jones. A short cut to deforestation. Functional Programming Languages and Computer Architecture, Proceedings, pages 223-232, ACM Press, 1993.
- J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. International Conference on Functional Programming, Proceedings, pages 124-132, ACM Press, 2002.
- P. Johann. On proving the correctness of program transformations based on free theorems for higher-order polymorphic calculi. Mathematical Structures in Computer Science, 15:201-229, 2005.
- P. Johann and J. Voigtländer. The impact of seq on free theorems-based program transformations. Fundamenta Informaticae, 69:63-102, 2006.
- J. Voigtländer and P. Johann. Selective strictness and parametricity in structural operational semantics, inequationally. Theoretical Computer Science, 388:290-318, 2007.
- J. Voigtländer. Proving Correctness via Free Theorems: The Case of the destroy/build-Rule. Partial Evaluation and Semantics-Based Program Manipulation, Proceedings, pages 13-20, ACM Press, 2008.
- J. Voigtländer. Semantics and Pragmatics of New Shortcut Fusion Rules. Functional and Logic Programming, Proceedings, LNCS 4989:163-179, Springer-Verlag, 2008.
- P. Johann and J. Voigtländer. A family of syntactic logical relations for the semantics of Haskell-like languages. Information and Computation, 207:341-368, 2009.