# Difference between revisions of "Foldr Foldl Foldl'"

(Fix a few typos) |
(Added an example) |
||

Line 177: | Line 177: | ||

You can clearly see that the inner redex is repeatedly reduced |
You can clearly see that the inner redex is repeatedly reduced |
||

first. |
first. |
||

+ | |||

+ | |||

+ | Usually the choice is between <hask>foldr</hask> and <hask>foldl'</hask>, since <hask>foldl</hask> and <hask>foldl'</hask> are the same except for their strictness properties, so if both return a result, it must be the same. <hask>foldl'</hask> is the more efficient way to arrive at that result because it doesn't build a huge thunk. However, if the combining function is lazy in its ''first'' argument, <hask>foldl</hask> may happily return a result where <hask>foldl'</hask> hits an exception: |
||

+ | |||

+ | <haskell> |
||

+ | > (?) :: Int -> Int -> Int |
||

+ | > _ ? 0 = 0 |
||

+ | > x ? y = x*y |
||

+ | > |
||

+ | > list :: [Int] |
||

+ | > list = [2,3,undefined,5,0] |
||

+ | > |
||

+ | > okey = foldl (?) 1 list |
||

+ | > |
||

+ | > boom = foldl' (?) 1 list |
||

+ | </haskell> |
||

+ | |||

+ | Let's see what happens: |
||

+ | |||

+ | <haskell> |
||

+ | okey --> |
||

+ | foldl (?) 1 [2,3,undefined,5,0] --> |
||

+ | foldl (?) (1 ? 2) [3,undefined,5,0] --> |
||

+ | foldl (?) ((1 ? 2) ? 3) [undefined,5,0] --> |
||

+ | foldl (?) (((1 ? 2) ? 3) ? undefined) [5,0] --> |
||

+ | foldl (?) ((((1 ? 2) ? 3) ? undefined) ? 5) [0] --> |
||

+ | foldl (?) (((((1 ? 2) ? 3) ? undefined) ? 5) ? 0) [] --> |
||

+ | ((((1 ? 2) ? 3) ? undefined) ? 5) ? 0 --> |
||

+ | 0 |
||

+ | |||

+ | boom --> |
||

+ | foldl' (?) 1 [2,3,undefined,5,0] --> |
||

+ | 1 ? 2 --> 2 |
||

+ | foldl' (?) 2 [3,undefined,5,0] --> |
||

+ | 2 ? 3 --> 6 |
||

+ | foldl' (?) 6 [undefined,5,0] --> |
||

+ | 6 ? undefined --> |
||

+ | <nowiki>*** Exception: Prelude.undefined</nowiki> |
||

+ | </haskell> |
||

For another explanation about folds see the [http://haskell.org/haskellwiki/Fold Fold] article. |
For another explanation about folds see the [http://haskell.org/haskellwiki/Fold Fold] article. |

## Revision as of 19:28, 8 January 2009

To *foldr*, *foldl* or *foldl'* that's the question! This article demonstrates the differences between these different folds by a simple example.

If you want you can copy/paste this article into your favorite editor and run it.

We are going to define our own folds so we hide the ones from the Prelude:

```
> import Prelude hiding (foldr, foldl)
```

Say we want to calculate the sum of a very big list:

```
> veryBigList = [1..1000000]
```

Lets start with the following:

```
> foldr f z [] = z
> foldr f z (x:xs) = x `f` foldr f z xs
> sum1 = foldr (+) 0
> try1 = sum1 veryBigList
```

If we evaluate *try1* we get:

`*** Exception: stack overflow`

Too bad... So what happened:

```
try1 -->
foldr (+) 0 [1..1000000] -->
1 + (foldr (+) 0 [2..1000000]) -->
1 + (2 + (foldr (+) 0 [3..1000000])) -->
1 + (2 + (3 + (foldr (+) 0 [4..1000000]))) -->
1 + (2 + (3 + (4 + (foldr (+) 0 [5..1000000])))) -->
-- ...
-- ... My stack overflows when there's a chain of around 500000 (+)'s !!!
-- ... But the following would happen if you got a large enough stack:
-- ...
1 + (2 + (3 + (4 + (... + (999999 + (foldr (+) 0 [1000000]))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + (1000000 + ((foldr (+) 0 []))))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + (1000000 + 0))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + 1000000)...)))) -->
1 + (2 + (3 + (4 + (... + 1999999 ...)))) -->
1 + (2 + (3 + (4 + 500000499990))) -->
1 + (2 + (3 + 500000499994)) -->
1 + (2 + 500000499997) -->
1 + 500000499999 -->
500000500000
```

The problem, as you can see, is that a large chain of (+)'s is created which eventually won't fit in your stack anymore. This will then trigger a stack overflow exception.

For a nice interactive animation of the above behaviour see: http://foldr.com

Lets think about how to solve it...

One problem with the chain of (+)'s is that we can't make it smaller (reduce it) until at the very last moment when it's already too late.

The reason we can't reduce it, is that the chain doesn't contain an
expression which can be reduced (a so called "redex" for **red**ucible
**ex**pression.) If it did we could reduce that expression before going
to the next element.

Well, we can introduce a redex by forming the chain in another way. If
instead of the chain `1 + (2 + (3 + (...)))` we could form the chain
`(((0 + 1) + 2) + 3) + ...` then there would always be a redex.

We can form the latter chain by using a function called *foldl*:

```
> foldl f z [] = z
> foldl f z (x:xs) = foldl f (z `f` x) xs
> sum2 = foldl (+) 0
> try2 = sum2 veryBigList
```

Lets evaluate *try2*:

`*** Exception: stack overflow`

Good Lord! Again a stack overflow! Lets see what happens:

```
try2 -->
foldl (+) 0 [1..1000000] -->
foldl (+) (0 + 1) [2..1000000] -->
foldl (+) ((0 + 1) + 2) [3..1000000] -->
foldl (+) (((0 + 1) + 2) + 3) [4..1000000] -->
foldl (+) ((((0 + 1) + 2) + 3) + 4) [5..1000000] -->
-- ...
-- ... My stack overflows when there's a chain of around 500000 (+)'s !!!
-- ... But the following would happen if you got a large enough stack:
-- ...
foldl (+) ((((((0 + 1) + 2) + 3) + 4) + ...) + 999999) [1000000] -->
foldl (+) (((((((0 + 1) + 2) + 3) + 4) + ...) + 999999) + 1000000) [] -->
((((((0 + 1) + 2) + 3) + 4) + ...) + 999999) + 1000000 -->
(((((1 + 2) + 3) + 4) + ...) + 999999) + 1000000 -->
((((3 + 3) + 4) + ...) + 999999) + 1000000 -->
(((6 + 4) + ...) + 999999) + 1000000 -->
((10 + ...) + 999999) + 1000000 -->
(499998500001 + 999999) + 1000000 -->
499999500000 + 1000000
500000500000 -->
```

For a nice interactive animation of the above behaviour see: http://foldl.com (actually this animation is not quite the same :-( )

Well, you clearly see that the redexen `0 + 1`, `(0 + 1) + 2`,
etc. are created. So why doesn't the chain reduce sooner than
before?

The answer is that GHC uses a lazy reduction strategy. This means that GHC only reduces an expression when its value is actually needed.

The reduction strategy works by reducing the outer-left-most redex
first. In this case it are the outer `foldl (+) ... [1..10000]`
redexen which are repeatedly reduced.
So the inner `(((0 + 1) + 2) + 3) + 4` redexen only get reduced when
the foldl is completely gone.

We somehow have to tell the system that the inner redex should be
reduced before the outer. Fortunately this is possible with the
*seq* function:

```
seq :: a -> b -> b
```

*seq* is a primitive system function that when applied to *x* and
*y* will first reduce *x*, then reduce *y* and return the result of
the latter. The idea is that *y* references *x* so that when *y* is
reduced *x* will not be a big unreduced chain anymore.

Now lets fill in the pieces:

```
> foldl' f z [] = z
> foldl' f z (x:xs) = let z' = z `f` x
> in seq z' $ foldl' f z' xs
> sum3 = foldl' (+) 0
> try3 = sum3 veryBigList
```

If we now evaluate *try3* we get the correct answer and we get it very quickly:

```
500000500000
```

Lets see what happens:

```
try3 -->
foldl' (+) 0 [1..1000000] -->
foldl' (+) 1 [2..1000000] -->
foldl' (+) 3 [3..1000000] -->
foldl' (+) 6 [4..1000000] -->
foldl' (+) 10 [5..1000000] -->
-- ...
-- ... You see that the stack doesn't overflow
-- ...
foldl' (+) 499999500000 [1000000] -->
foldl' (+) 500000500000 [] -->
500000500000
```

You can clearly see that the inner redex is repeatedly reduced first.

Usually the choice is between `foldr`

and `foldl'`

, since `foldl`

and `foldl'`

are the same except for their strictness properties, so if both return a result, it must be the same. `foldl'`

is the more efficient way to arrive at that result because it doesn't build a huge thunk. However, if the combining function is lazy in its *first* argument, `foldl`

may happily return a result where `foldl'`

hits an exception:

```
> (?) :: Int -> Int -> Int
> _ ? 0 = 0
> x ? y = x*y
>
> list :: [Int]
> list = [2,3,undefined,5,0]
>
> okey = foldl (?) 1 list
>
> boom = foldl' (?) 1 list
```

Let's see what happens:

```
okey -->
foldl (?) 1 [2,3,undefined,5,0] -->
foldl (?) (1 ? 2) [3,undefined,5,0] -->
foldl (?) ((1 ? 2) ? 3) [undefined,5,0] -->
foldl (?) (((1 ? 2) ? 3) ? undefined) [5,0] -->
foldl (?) ((((1 ? 2) ? 3) ? undefined) ? 5) [0] -->
foldl (?) (((((1 ? 2) ? 3) ? undefined) ? 5) ? 0) [] -->
((((1 ? 2) ? 3) ? undefined) ? 5) ? 0 -->
0
boom -->
foldl' (?) 1 [2,3,undefined,5,0] -->
1 ? 2 --> 2
foldl' (?) 2 [3,undefined,5,0] -->
2 ? 3 --> 6
foldl' (?) 6 [undefined,5,0] -->
6 ? undefined -->
<nowiki>*** Exception: Prelude.undefined</nowiki>
```

For another explanation about folds see the Fold article.