Correctness of short cut fusion: Difference between revisions
removed a refernec |
added discussion |
||
Line 32: | Line 32: | ||
<haskell> | <haskell> | ||
destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c | destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c | ||
destroy g = g | destroy g = g step | ||
step :: [a] -> Maybe (a,[a]) | |||
step [] = Nothing | |||
step (x:xs) = Just (x,xs) | |||
unfoldr :: (b -> Maybe (a,b)) -> b -> [a] | unfoldr :: (b -> Maybe (a,b)) -> b -> [a] | ||
Line 69: | Line 69: | ||
</haskell> | </haskell> | ||
The two sides are | The two sides are interchangeable in any program without affecting semantics. | ||
====<hask>destroy</hask>/<hask>unfoldr</hask>==== | ====<hask>destroy</hask>/<hask>unfoldr</hask>==== | ||
Line 85: | Line 85: | ||
<haskell> | <haskell> | ||
destroy g (unfoldr p e) = g | destroy g (unfoldr p e) = g step (unfoldr p e) | ||
= case | = case step (unfoldr p e) of Just z -> 0 | ||
= case | = case step (case p e of Nothing -> [] | ||
Just (x,e') -> x:unfoldr p e') of Just z -> 0 | |||
= case | = case step (case Just undefined of Nothing -> [] | ||
Just (x,e') -> x:unfoldr p e') of Just z -> 0 | |||
= undefined | = undefined | ||
</haskell> | </haskell> | ||
Line 182: | Line 182: | ||
Of course, conditions for semantic equivalence can be obtained by combining the two laws above. | Of course, conditions for semantic equivalence can be obtained by combining the two laws above. | ||
==Discussion== | |||
Correctness of short cut fusion is not just an academic issue. | |||
There are versions of [[GHC]] (which ones? any officially released ones?) that automatically perform transformations like <hask>foldr</hask>/<hask>build</hask> during their optimization pass (also in the disguise of more specialized rules such as <hask>head</hask>/<hask>build</hask>). | |||
And there has been at least one occasion where, as a result, a safely terminating program was turned into a failing one "in the wild", with a less artificial example than the ones given above. | |||
===<hask>foldr</hask>/<hask>build</hask>=== | |||
As pointed out above, everything is fine with <hask>foldr</hask>/<hask>build</hask> in the absence of <hask>seq</hask>. | |||
If the producer (<hask>build g</hask>) of the intermediate list may be defined using <hask>seq</hask>, then the conditions <hask>(c <math>\bot~\bot)\neq\bot</math></hask> and <hask>n <math>\neq\bot</math></hask> better be satisified, lest the compiler transform a perfectly fine program into a failing one. | |||
The mentioned conditions are equivalent to requiring that the consumer (<hask>foldr c n</hask>) is a total function, that is, maps non-<math>\bot</math> lists to a non-<math>\bot</math> value. | |||
It is thus relatively easy to identify whether a list consumer defined in terms of <hask>foldr</hask> is eligible for <hask>foldr</hask>/<hask>build</hask>-fusion in the presence of <hask>seq</hask> or not. | |||
For example, the Prelude functions <hask>head</hask> and <hask>sum</hask> are generally not, while <hask>map</hask> is. | |||
===<hask>destroy</hask>/<hask>unfoldr</hask>=== | |||
==Literature== | ==Literature== |
Revision as of 09:02, 7 July 2006
Short cut fusion
Short cut fusion allows elimination of intermediate data structures using rewrite rules that can also be performed automatically during compilation.
The two most popular instances are the foldr
/build
- and the destroy
/unfoldr
-rule for Haskell lists.
foldr
/build
foldr
/build
The foldr
/build
-rule eliminates intermediate lists produced by build
and consumed by foldr
, where these functions are defined as follows:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n [] = n
foldr c n (x:xs) = c x (foldr c n xs)
build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
Note the rank-2 polymorphic type of build
.
The foldr
/build
-rule now means the following replacement for appropriately typed g
, c
, and n
:
foldr c n (build g) <nowiki>→</nowiki> g c n
destroy
/unfoldr
destroy
/unfoldr
The destroy
/unfoldr
-rule eliminates intermediate lists produced by unfoldr
and consumed by destroy
, where these functions are defined as follows:
destroy :: (forall b. (b -> Maybe (a,b)) -> b -> c) -> [a] -> c
destroy g = g step
step :: [a] -> Maybe (a,[a])
step [] = Nothing
step (x:xs) = Just (x,xs)
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr p e = case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e'
Note the rank-2 polymorphic type of destroy
.
The destroy
/unfoldr
-rule now means the following replacement for appropriately typed g
, p
, and e
:
destroy g (unfoldr p e) <nowiki>→</nowiki> g p e
Correctness
If the foldr
/build
- and the destroy
/unfoldr
-rule are to be automatically performed during compilation, as is possible using GHC's RULES pragmas, we clearly want them to be equivalences.
That is, the left- and right-hand sides should be semantically the same for each instance of either rule.
Unfortunately, this is not so in Haskell.
We can distinguish two situations, depending on whether g
is defined using seq
or not.
In the absence of seq
seq
foldr
/build
foldr
/build
If g
does not use seq
, then the foldr
/build
-rule really is a semantic equivalence, that is, it holds that
foldr c n (build g) = g c n
The two sides are interchangeable in any program without affecting semantics.
destroy
/unfoldr
destroy
/unfoldr
The destroy
/unfoldr
-rule, however, is not a semantic equivalence.
To see this, consider the following instance:
g = \x y -> case x y of Just z -> 0
p = \x -> if x==0 then Just undefined else Nothing
e = 0
These values have appropriate types for being used in the destroy
/unfoldr
-rule. But with them, that rule's left-hand side "evaluates" as follows:
destroy g (unfoldr p e) = g step (unfoldr p e)
= case step (unfoldr p e) of Just z -> 0
= case step (case p e of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= case step (case Just undefined of Nothing -> []
Just (x,e') -> x:unfoldr p e') of Just z -> 0
= undefined
while its right-hand side "evaluates" as follows:
g p e = case p e of Just z -> 0
= case Just undefined of Just z -> 0
= 0
Thus, by applying the destroy
/unfoldr
-rule, a nonterminating (or otherwise failing) program can be transformed into a safely terminating one.
The obvious questions now are:
- Can the converse also happen, that is, can a safely terminating program be transformed into a failing one?
- Can a safely terminating program be transformed into another safely terminating one that gives a different value as result?
There is no formal proof yet, but strong evidence supporting the conjecture that the answer to both questions is "No!".
The conjecture goes that if g
does not use seq
, then the destroy
/unfoldr
-rule is a semantic approximation from left to right, that is, it holds that
destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e
What is known is that semantic equivalence can be recovered here by putting moderate restrictions on p.
More precisely, if g
does not use seq
and p
is a strict function that never returns Just <math>\bot</math>
(where denotes any kind of failure or nontermination), then indeed:
destroy g (unfoldr p e) = g p e
In the presence of seq
seq
This is the more interesting setting, given that in Haskell there is no way to restrict the use of seq
, so in any given program we must be prepared for the possibility that the g
appearing in the foldr
/build
- or the destroy
/unfoldr
-rule is defined using seq
.
Unsurprisingly, it is also the setting in which more can go wrong than above.
foldr
/build
foldr
/build
In the presence of seq
, the foldr
/build
-rule is not anymore a semantic equivalence.
The instance
g = seq
c = undefined
n = 0
shows, via similar "evaluations" as above, that the right-hand side (g c n
) can be strictly less defined than the left-hand side (foldr c n (build g)
).
The converse cannot happen, because the following always holds:
foldr c n (build g) <math>\sqsupseteq</math> g c n
Moreover, semantic equivalence can again be recovered by putting restrictions on the involved functions.
More precisely, if (c <math>\bot~\bot)\neq\bot</math>
and n <math>\neq\bot</math>
, then even in the presence of seq
:
foldr c n (build g) = g c n
destroy
/unfoldr
destroy
/unfoldr
Contrary to the situation without seq
, now also the destroy
/unfoldr
-rule may decrease the definedness of a program.
This is witnessed by the following instance:
g = \x y -> seq x 0
p = undefined
e = 0
Here the left-hand side of the rule (destroy g (unfoldr p e)
) yields 0
, while the right-hand side (g p e
) yields undefined
.
Conditions for semantic approximation in either direction can be given as follows.
If p <math>\neq\bot</math>
and (p <math>\bot</math>)<math>\in\{\bot,</math>Just <math>\bot\}</math>
, then:
destroy g (unfoldr p e) <math>\sqsubseteq</math> g p e
If p
is strict and total and never returns Just <math>\bot</math>
, then:
destroy g (unfoldr p e) <math>\sqsupseteq</math> g p e
Of course, conditions for semantic equivalence can be obtained by combining the two laws above.
Discussion
Correctness of short cut fusion is not just an academic issue.
There are versions of GHC (which ones? any officially released ones?) that automatically perform transformations like foldr
/build
during their optimization pass (also in the disguise of more specialized rules such as head
/build
).
And there has been at least one occasion where, as a result, a safely terminating program was turned into a failing one "in the wild", with a less artificial example than the ones given above.
foldr
/build
foldr
/build
As pointed out above, everything is fine with foldr
/build
in the absence of seq
.
If the producer (build g
) of the intermediate list may be defined using seq
, then the conditions (c <math>\bot~\bot)\neq\bot</math>
and n <math>\neq\bot</math>
better be satisified, lest the compiler transform a perfectly fine program into a failing one.
The mentioned conditions are equivalent to requiring that the consumer (foldr c n
) is a total function, that is, maps non- lists to a non- value.
It is thus relatively easy to identify whether a list consumer defined in terms of foldr
is eligible for foldr
/build
-fusion in the presence of seq
or not.
For example, the Prelude functions head
and sum
are generally not, while map
is.
destroy
/unfoldr
destroy
/unfoldr
Literature
Various parts of the above story, and elaborations thereof, are also told in the following papers:
- A. Gill, J. Launchbury, and S.L. Peyton Jones. A short cut to deforestation. Functional Programming Languages and Computer Architecture, Proceedings, pages 223-232, ACM Press, 1993.
- J. Svenningsson. Shortcut fusion for accumulating parameters & zip-like functions. International Conference on Functional Programming, Proceedings, pages 124-132, ACM Press, 2002.
- P. Johann. On proving the correctness of program transformations based on free theorems for higher-order polymorphic calculi. Mathematical Structures in Computer Science, 15:201-229, 2005.
- P. Johann and J. Voigtländer. The impact of seq on free theorems-based program transformations. Fundamenta Informaticae, 69:63-102, 2006.
- J. Voigtländer and P. Johann. Selective strictness and parametricity in structural operational semantics. Technical Report TUD-FI06-02, Technische Universität Dresden, 2006.