# Difference between revisions of "GHC/Using rules"

(mention -dverbose-core2core flag) |
DonStewart (talk | contribs) (expand rule with more examples, and discussions) |
||

Line 63: | Line 63: | ||

Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.) |
Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.) |
||

− | Answer: rules are matched against the AST for expressions basically ''bottom-up'' rather than top-down. In this example, "h->g" is the rule that fires. But due to the nature of inlining and so on, there are absolutely no guarantees about this kind of behaviour. If you really need to control the order of matching, phase control is the only reliable mechanism. |
||

+ | Answer: rules are matched against the AST for expressions basically |
||

+ | ''bottom-up'' rather than top-down. In this example, "h->g" is the rule |
||

+ | that fires. But due to the nature of inlining and so on, there are |
||

+ | absolutely no guarantees about this kind of behaviour. If you really |
||

+ | need to control the order of matching, phase control is the only |
||

+ | reliable mechanism. |
||

+ | |||

+ | === Pair rules === |
||

+ | |||

+ | It is often useful to provide two implementations of a function, and |
||

+ | have rewrite rules pick which version to use depending on context. In |
||

+ | both GHC's foldr/build fusion, and more extensively in Data.ByteString's |
||

+ | stream fusion system, pair rules are used to allow the compiler to |
||

+ | choose between two implementations of a function. |
||

+ | |||

+ | Consider the rules: |
||

+ | |||

+ | <haskell> |
||

+ | "FPS length -> fused" [~1] |
||

+ | length = F.strConsumerBi F.lengthS |
||

+ | |||

+ | "FPS length -> unfused" [1] |
||

+ | F.strConsumerBi F.lengthS = length |
||

+ | </haskell> |
||

+ | |||

+ | This rule pair tells the compiler to rewrite occurences of "length" to a |
||

+ | stream-fusible form in early simplifications phases, hoping for fusion |
||

+ | to happen. However, if by phase 1 (remember that phases count down from |
||

+ | 4), the fusible form remains unfused, it is better to rewrite it back to |
||

+ | the unfused-but-fast implementation of length. A similar trick is used |
||

+ | for "map" in the baes libraries. |
||

+ | |||

+ | As we want to match "length" in the rules, we need to ensure that it |
||

+ | isn't inlined too soon: |
||

+ | |||

+ | <haskell> |
||

+ | length :: ByteString -> Int |
||

+ | length (PS _ _ l) = assert (l >= 0) $ l |
||

+ | {-# INLINE [1] length #-} |
||

+ | </haskell> |
||

+ | |||

+ | and we need "strConsumerBi" to stick around for even longer: |
||

+ | |||

+ | <haskell> |
||

+ | strConsumerBi :: (Stream -> a) -> (ByteString -> a) |
||

+ | strConsumerBi f = f . readStrUp |
||

+ | {-# INLINE [0] strConsumerBi #-} |
||

+ | |||

+ | lengthS :: Stream -> Int |
||

+ | lengthS ... |
||

+ | {-# INLINE [0] lengthS #-} |
||

+ | </haskell> |
||

+ | |||

+ | Pair rules thus provide a useful mechanism to allow a library to provide |
||

+ | multiple implementations of a function, picking the best one to use |
||

+ | based on context. |
||

+ | |||

+ | === Custom specialisation rules === |
||

+ | |||

+ | As of ghc 6.6, the rule LHS syntax has been relaxed, allowing for |
||

+ | sections and lambda abstractions to appear. Previously, only |
||

+ | applications of the following form were valid: |
||

+ | |||

+ | <haskell> |
||

+ | "FPS specialise break (x==)" forall x. |
||

+ | break ((==) x) = breakByte x |
||

+ | </haskell> |
||

+ | |||

+ | That is, replace occurences of: break (x==) with the optimised breakByte |
||

+ | function. |
||

+ | |||

+ | This code illustrates how higher order functions can be rewritten to |
||

+ | optimised first order equivalents, for special cases like (==). In the |
||

+ | case of Data.ByteString, functions using (==) or (/=) are much faster |
||

+ | when implemented with memchr(3), and we can use rules to do this, as |
||

+ | long as it is possible to match sections. In ghc 6.6 we can now write: |
||

+ | |||

+ | <haskell> |
||

+ | "FPS specialise break (==x)" forall x. |
||

+ | break (==x) = breakByte x |
||

+ | |||

+ | "FPS specialise break (x==)" forall x. |
||

+ | break (x==) = breakByte x |
||

+ | </haskell> |
||

+ | |||

+ | Some fragility remains in this rule though, as described below. |
||

+ | |||

+ | === Literals, dictionaries and sections === |
||

+ | |||

+ | Consider: |
||

+ | <haskell> |
||

+ | break (== 10) |
||

+ | </haskell> |
||

+ | |||

+ | Hopefully, this can be rewritten to a "breakByte 10" call, however, the |
||

+ | combination of sections, literals and dictionaries for Eq makes this |
||

+ | ratherr fragile. |
||

+ | |||

+ | The rule for break ends up translated by GHC as; |
||

+ | |||

+ | <haskell> |
||

+ | forall ($dEq :: base:GHC.Base.Eq base:GHC.Word.Word8) |
||

+ | (x :: base:GHC.Word.Word8) |
||

+ | |||

+ | break (base:GHC.Base.== @ base:GHC.Word.Word8 $dEq x) = |
||

+ | breakByte x |
||

+ | </haskell> |
||

+ | |||

+ | Notice the LHS: an application of the selector to a (suitably-typed) Eq |
||

+ | dictionary. GHC does very little simplification on LHSs, because if it |
||

+ | does too much, the LHS doesn't look like you thought it did. Here it |
||

+ | might perhaps be better to simplify to GHC.Word.Word8.==, by selecting |
||

+ | from the dictionary, but GHC does not do that. |
||

+ | |||

+ | When this rules works, GHC generates exactly that pattern; we get |
||

+ | |||

+ | <haskell> |
||

+ | eq = (==) deq |
||

+ | main = ... break (\x. eq x y) ... |
||

+ | </haskell> |
||

+ | |||

+ | GHC is anxious about substituting eq inside the lambda, but it does it |
||

+ | because (==) is just a record selector, and hence is very cheap. |
||

+ | |||

+ | But when we put a literal inline, we get an (Eq a) constraint and a (Num |
||

+ | a) constraint (from the literal). Ultimately, 'a' turns out to be Int, |
||

+ | by defaulting, but we don't know that yet. So GHC picks the Eq |
||

+ | dictionary from the Num dictionary: |
||

+ | |||

+ | <haskell> |
||

+ | eq = (==) ($p1 dnum) |
||

+ | main = ... break (\x. eq x y) ... |
||

+ | </haskell> |
||

+ | |||

+ | Now the 'eq' doesn't look quite so cheap, and it isn't inlined, so the |
||

+ | rule does not fire. However, GHC 6.6 has been modified to believe that |
||

+ | nested selection is also cheap, so that makes the rule fire. |
||

+ | |||

+ | The underlying lesson is this: the only robust way to make rules fire is |
||

+ | if the LHS is a normal form. Otherwise GHC may miss the fleeting moment |
||

+ | at which (an instance of) the rule LHS appears in the program. The way |
||

+ | you ensure this is with inline phases: don't inline LHS stuff until |
||

+ | later, so that the LHS stuff appears in the program more than |
||

+ | fleetingly. |
||

+ | |||

+ | But in this case you have (==) on the LHS, and you have no phase control |
||

+ | there. So it gets inlined right away, so the rule doesn't match any |
||

+ | more. The only way the rule "works" is because GHC catches the pattern |
||

+ | right away, before (==) is inlined. Not very robust. |
||

+ | |||

+ | To make this robust, you'd have to say something like |
||

+ | |||

+ | <haskell> |
||

+ | instance Eq Word 8 where |
||

+ | (==) = eqWord8 |
||

+ | |||

+ | eqWord8 = .. |
||

+ | {-# NOINLINE [1] eqWord8 #-} |
||

+ | |||

+ | {-# RULES |
||

+ | "FPS specialise break (x==)" forall x. |
||

+ | break (x`eqWord8`) = breakByte x |
||

+ | #-} |
||

+ | </haskell> |

## Revision as of 03:42, 7 September 2006

## Contents

## Using Rules in GHC

GHC's rewrite rules (invoked by the RULES pragma) offer a powerful way to optimise your program. This page is a place for people who use rewrite rules to collect thoughts about how to use them.

If you aren't already familiar with RULES, read this stuff first:

- The relevant section of the GHC user manual
- Playing by the rules: rewriting as a practical optimisation technique in GHC. This paper, from the 2001 Haskell workshop, describes the idea of rewrite rules.

### Advice about using rewrite rules

- Remember to use the flag
`-fglasgow-exts`and the optimisation flag`-O` - Use the flag
`-ddump-simpl-stats`to see how many rules actually fired. - For even more detail use
`-ddump-simpl-stats -ddump-simpl-iterations`to see the core code at each iteration of the simplifer. Note that this produces**lots**of output so you'll want to direct the output to a file or pipe it to`less`. Looking at the output of this can help you figure out why rules are not firing when you expect them to do so. - Another tip for discovering why rules do not fire, is to use the flag
`-dverbose-core2core`, which (amongst other things) produces the AST after every rule is fired. This can help you to examine whether one rule is creating an expression that thereby prevents another rule from firing, for example. - You need to be careful that your identifiers aren't inlined before your RULES have a chance to fire. To control this we add an
`INLINE [1]`pragma to identifiers we want to match in rules, to ensure they haven't disappeared by the time the rule matching comes around.

### Example: `map`

(This example code is taken from GHC's `base/GHC/Base.lhs` module.)

map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = f x : map f xs

mapFB :: (elt -> lst -> lst) -> (a -> elt) -> a -> lst -> lst {-# INLINE [0] mapFB #-} mapFB c f x ys = c (f x) ys

The rules for map work like this.

Up to (but not including) phase 1, we use the `"map"` rule to
rewrite all saturated applications of map with its build/fold
form, hoping for fusion to happen.
In phase 1 and 0, we switch off that rule, inline build, and
switch on the `"mapList"` rule, which rewrites the foldr/mapFB
thing back into plain map.

It's important that these two rules aren't both active at once (along with build's unfolding) else we'd get an infinite loop in the rules. Hence the activation control below.

The `"mapFB"` rule optimises compositions of map.

This same pattern is followed by many other functions: e.g. append, filter, iterate, repeat, etc.

{-# RULES "map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs) "mapList" [1] forall f. foldr (mapFB (:) f) [] = map f "mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g) #-}

## Questions

### Order of rule-matching

For example, let's say we have two rules

"f->g" forall x y . f x (h y) = g x y "h->g" forall x . h x = g 0 x

and a fragment of the AST corresponding to

f a (h b)

Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.)

Answer: rules are matched against the AST for expressions basically
*bottom-up* rather than top-down. In this example, "h->g" is the rule
that fires. But due to the nature of inlining and so on, there are
absolutely no guarantees about this kind of behaviour. If you really
need to control the order of matching, phase control is the only
reliable mechanism.

### Pair rules

It is often useful to provide two implementations of a function, and have rewrite rules pick which version to use depending on context. In both GHC's foldr/build fusion, and more extensively in Data.ByteString's stream fusion system, pair rules are used to allow the compiler to choose between two implementations of a function.

Consider the rules:

```
"FPS length -> fused" [~1]
length = F.strConsumerBi F.lengthS
"FPS length -> unfused" [1]
F.strConsumerBi F.lengthS = length
```

This rule pair tells the compiler to rewrite occurences of "length" to a stream-fusible form in early simplifications phases, hoping for fusion to happen. However, if by phase 1 (remember that phases count down from 4), the fusible form remains unfused, it is better to rewrite it back to the unfused-but-fast implementation of length. A similar trick is used for "map" in the baes libraries.

As we want to match "length" in the rules, we need to ensure that it isn't inlined too soon:

```
length :: ByteString -> Int
length (PS _ _ l) = assert (l >= 0) $ l
{-# INLINE [1] length #-}
```

and we need "strConsumerBi" to stick around for even longer:

```
strConsumerBi :: (Stream -> a) -> (ByteString -> a)
strConsumerBi f = f . readStrUp
{-# INLINE [0] strConsumerBi #-}
lengthS :: Stream -> Int
lengthS ...
{-# INLINE [0] lengthS #-}
```

Pair rules thus provide a useful mechanism to allow a library to provide multiple implementations of a function, picking the best one to use based on context.

### Custom specialisation rules

As of ghc 6.6, the rule LHS syntax has been relaxed, allowing for sections and lambda abstractions to appear. Previously, only applications of the following form were valid:

```
"FPS specialise break (x==)" forall x.
break ((==) x) = breakByte x
```

That is, replace occurences of: break (x==) with the optimised breakByte function.

This code illustrates how higher order functions can be rewritten to optimised first order equivalents, for special cases like (==). In the case of Data.ByteString, functions using (==) or (/=) are much faster when implemented with memchr(3), and we can use rules to do this, as long as it is possible to match sections. In ghc 6.6 we can now write:

```
"FPS specialise break (==x)" forall x.
break (==x) = breakByte x
"FPS specialise break (x==)" forall x.
break (x==) = breakByte x
```

Some fragility remains in this rule though, as described below.

### Literals, dictionaries and sections

Consider:

```
break (== 10)
```

Hopefully, this can be rewritten to a "breakByte 10" call, however, the combination of sections, literals and dictionaries for Eq makes this ratherr fragile.

The rule for break ends up translated by GHC as;

```
forall ($dEq :: base:GHC.Base.Eq base:GHC.Word.Word8)
(x :: base:GHC.Word.Word8)
break (base:GHC.Base.== @ base:GHC.Word.Word8 $dEq x) =
breakByte x
```

Notice the LHS: an application of the selector to a (suitably-typed) Eq dictionary. GHC does very little simplification on LHSs, because if it does too much, the LHS doesn't look like you thought it did. Here it might perhaps be better to simplify to GHC.Word.Word8.==, by selecting from the dictionary, but GHC does not do that.

When this rules works, GHC generates exactly that pattern; we get

```
eq = (==) deq
main = ... break (\x. eq x y) ...
```

GHC is anxious about substituting eq inside the lambda, but it does it because (==) is just a record selector, and hence is very cheap.

But when we put a literal inline, we get an (Eq a) constraint and a (Num a) constraint (from the literal). Ultimately, 'a' turns out to be Int, by defaulting, but we don't know that yet. So GHC picks the Eq dictionary from the Num dictionary:

```
eq = (==) ($p1 dnum)
main = ... break (\x. eq x y) ...
```

Now the 'eq' doesn't look quite so cheap, and it isn't inlined, so the rule does not fire. However, GHC 6.6 has been modified to believe that nested selection is also cheap, so that makes the rule fire.

The underlying lesson is this: the only robust way to make rules fire is if the LHS is a normal form. Otherwise GHC may miss the fleeting moment at which (an instance of) the rule LHS appears in the program. The way you ensure this is with inline phases: don't inline LHS stuff until later, so that the LHS stuff appears in the program more than fleetingly.

But in this case you have (==) on the LHS, and you have no phase control there. So it gets inlined right away, so the rule doesn't match any more. The only way the rule "works" is because GHC catches the pattern right away, before (==) is inlined. Not very robust.

To make this robust, you'd have to say something like

```
instance Eq Word 8 where
(==) = eqWord8
eqWord8 = ..
{-# NOINLINE [1] eqWord8 #-}
{-# RULES
"FPS specialise break (x==)" forall x.
break (x`eqWord8`) = breakByte x
#-}
```