2019-03-28T18:51:46Z

WillNess: /* Tree-like folds */ mention data-ordlist's equivalents

In [[functional programming]], ''fold'' (or ''reduce'') is a family of [[higher order function]]s that process a [[data structure]] in some order and build a return value. This is as opposed to the family of ''unfold'' functions which take a starting value and apply it to a function to generate a data structure.

==Overview==

Typically, a fold deals with two things: a combining [[Function|function]], and a [[data structure]], typically a [[List (computing)|list]] of elements. The fold then proceeds to combine elements of the data structure using the function in some systematic way. For instance, we might want to use a hypothetical function <code>fold</code> to write

fold (+) [1,2,3,4,5]

which would result in 1 + 2 + 3 + 4 + 5, which is 15. In this instance, + is an [[associative operation]] so how one parenthesizes the addition is irrelevant to what the final result value will be, although the operational details will differ as to ''how'' exactly it will be calculated. To a rough approximation, you can think of the fold as replacing the commas in the list with the + operation.

However, in the general case, functions of two parameters are not associative, so the order in which one carries out the combination of the elements matters. On lists, there are two obvious ways to carry this out: either by recursively combining the first element with the results of combining the rest (called a ''right fold'') or by recursively combining the results of combining all but the last element with the last one, (called a ''left fold''). Also, in practice, it is convenient and natural to have an initial value which in the case of a right fold, is used when one reaches the end of the list, and in the case of a left fold, is what is initially combined with the first element of the list. This is perhaps clearer to see in the equations defining <code>foldr</code> and <code>foldl</code> in Haskell. Note that in Haskell, <code>[]</code> represents the empty list, and <code>(x:xs)</code> represents the list starting with x and where the rest of the list is xs.
<haskell>
-- if the list is empty, the result is the initial value z; else
-- apply f to the first element and the result of folding the rest
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

-- if the list is empty, the result is the initial value; else
-- we recurse immediately, making the new initial value the result
-- of combining the old initial value with the first element.
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
</haskell>
One important thing to note in the presence of [[Lazy evaluation | lazy]], or [[Normal-order evaluation | normal-order]] evaluation, is that foldr will immediately return the application of f to the recursive case of folding over the rest of the list. Thus, if f is able to produce some part of its result without reference to the recursive case, and the rest of the result is never demanded, then the recursion will stop. This allows right folds to operate on infinite lists. By contrast, foldl will immediately call itself with new parameters until it reaches the end of the list. This [[tail recursion]] can be efficiently compiled as a loop, but can't deal with infinite lists at all -- it will recurse forever in an [[infinite loop]]. Another technical point to be aware of in the case of left folds in a normal-order evaluation language is that the new initial parameter is not being evaluated before the recursive call is made. This can lead to stack overflows when one reaches the end of the list and tries to evaluate the resulting gigantic expression. For this reason, such languages often provide a stricter variant of left folding which forces the evaluation of the initial parameter before making the recursive call, in Haskell, this is the foldl' (note the apostrophe) function in the Data.List library. Combined with the speed of tail recursion, such folds are very efficient when lazy evaluation of the final result is impossible or undesirable.

==Special folds for nonempty lists==

One often wants to choose the [[identity element]] of the operation ''f'' as the initial value ''z''. When no initial value seems appropriate, for example, when one wants to fold the function which computes the maximum of its two parameters over a list in order to get the maximum element of the list, there are variants of foldr and foldl which use the last and first element of the list respectively as the initial value. In Haskell and several other languages, these are called foldr1 and foldl1, the 1 making reference to the automatic provision of an initial element, and the fact that the lists they are applied to must have at least one element.

These folds use type-symmetrical binary operation: the types of both its arguments, and its result, must be the same. Richard Bird in his 2010 book "Pearls of Functional Algorithm Design" (Cambridge University Press 2010, ISBN 978-0-521-51338-8, p. 42) proposes "a general fold function on non-empty lists" <code>foldrn</code> which transforms its last element, by applying an additional argument function to it, into a value of the result type before starting the folding itself, and is thus able to use type-asymmetrical binary operation like the regular <code>foldr</code> to produce a result of type different from the list's elements type.

==Tree-like folds==

The use of initial value is ''mandatory'' when the combining function is ''asymmetrical'' in its types, i.e. when the type of its result is different from the type of list's elements. Then an initial value must be used, with the same type as that of the function's result, for a ''linear'' chain of applications to be possible, whether ''left-'' or ''right-''oriented.

When the function is ''symmetrical'' in its types the parentheses may be placed in arbitrary fashion thus creating a ''tree'' of nested sub-expressions, e.g. ((1 + 2) + (3 + 4)) + 5. If the binary operation is also ''associative'' this value will be well-defined, i.e. same for any parenthesization, although the operational details of ''how'' it is calculated will differ.

Both finite and indefinitely defined lists can be folded over in a tree-like fashion (except, the <hask>foldt</hask> below, being recursive, can't work with the infinite lists):

<haskell>
foldt :: (a -> a -> a) -> a -> [a] -> a
foldt f z [] = z
foldt f z [x] = x -- aka foldt' of data-ordlist
foldt f z xs = foldt f z (pairs f xs)

foldi :: (a -> a -> a) -> a -> [a] -> a
foldi f z [] = z
foldi f z (x:xs) = f x (foldi f z (pairs f xs)) -- aka foldt of data-ordlist

pairs :: (a -> a -> a) -> [a] -> [a]
pairs f (x:y:t) = f x y : pairs f t
pairs f t = t
</haskell>

In the case of <code>foldi</code> function, to avoid its runaway evaluation on ''indefinitely'' defined lists the function <code>f</code> must ''not always'' demand its second argument's value, at least not all of it, and/or not immediately (example [[Fold#Examples|below]]).

==Folds in other languages==

In Scheme, right and left fold can be written as:

(define (foldr f z xs)
(if (null? xs)
z
(f (car xs) (foldr f z (cdr xs)))))

(define (foldl f z xs)
(if (null? xs)
z
(foldl f (f z (car xs)) (cdr xs))))

The C++ Standard Template Library implements left fold as the function "accumulate" (in the header <numeric>).

==List folds as structural transformations==
One way in which it is perhaps natural to view folds is as a mechanism for replacing the structural components of a data structure with other functions and values in some regular way. In many languages, lists are built up from two primitives: either the list is the empty list, commonly called ''nil'', or it is a list ''cons''tructed by appending an element to the start of some other list, which we call a ''cons''. In Haskell, the cons operation is written as a colon (:), and in scheme and other lisps, it is called cons. One can view a right fold as ''replacing'' the nil at the end of the list with a specific value, and each cons with a specific other function. Hence, one gets a diagram which looks something like this:

[[Image:right-fold-transformation.png]]

In the case of a left fold, the structural transformation being performed is somewhat less natural, but is still quite regular:

[[Image:left-fold-transformation.png]]

These pictures do a rather nice job of motivating the names ''left'' and ''right'' fold visually. It also makes obvious the fact that <code>foldr (:) []</code> is the identity function on lists, as replacing cons with cons and nil with nil will not change anything. The left fold diagram suggests an easy way to reverse a list, <hask>foldl (flip (:)) []</hask>. Note that the parameters to cons must be flipped, because the element to add is now the right hand parameter of the combining function. Another easy result to see from this vantage-point is to write the higher-order [[w:Map (higher-order function) | map function]] in terms of foldr, by composing the function to act on the elements with cons, as:

<haskell>
map f = foldr ((:) . f) []
</haskell>
where the period (.) is an operator denoting [[function composition]].

This way of looking at things provides a simple route to designing fold-like functions on other [[Algebraic data type|algebraic data structures]], like various sorts of trees. One writes a function which recursively replaces the constructors of the datatype with provided functions, and any constant values of the type with provided values. Such functions are generally referred to as [[Catamorphisms]].

==Examples==

Using a Haskell interpreter, we can show the structural transformation which fold functions perform by constructing a string as follows:

<source lang="haskell">
Prelude> foldr (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..13])
"(1+(2+(3+(4+(5+(6+(7+(8+(9+(10+(11+(12+(13+0)))))))))))))"

Prelude> foldl (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..13])
"(((((((((((((0+1)+2)+3)+4)+5)+6)+7)+8)+9)+10)+11)+12)+13)"

Prelude> foldt (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..13])
"((((1+2)+(3+4))+((5+6)+(7+8)))+(((9+10)+(11+12))+13))"

Prelude> foldi (\x y -> concat ["(",x,"+",y,")"]) "0" (map show [1..13])
"(1+((2+3)+(((4+5)+(6+7))+((((8+9)+(10+11))+(12+13))+0))))"
</source>

Infinite tree-like folding is demonstrated e.g. in primes production by unbounded [[Prime_numbers#Tree_merging|sieve of Eratosthenes]]:
<source lang="haskell">
primes :: (Integral a) => [a]
primes = 2 : 3 : ([5,7..] `minus`
foldi (\(x:xs) -> (x:) . union xs) []
[[p*p, p*p+2*p..] | p <- tail primes])
</source>
where the function <code>union</code> operates on ordered lists in a local manner to efficiently produce their union, and <code>minus</code> their set difference, defined at [http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:minus <code>Data.List.Ordered</code>] package or here at [[Prime numbers#Initial_definition|Prime numbers]] page.

For finite lists, e.g. merge sort could be easily defined using tree-like folding as
<source lang="haskell">
mergesort :: (Ord a) => [a] -> [a]
mergesort xs = foldt merge [] [[x] | x <- xs]
</source>
with the function <code>merge</code> a simpler, duplicates-ignoring variant of <code>union</code>. 

Functions <code>head</code> and <code>last</code> could have been defined through folding as
<source lang="haskell">
head = foldr (\a b->a) undefined
last = foldl (\a b->b) undefined
</source>

== See also ==
* [[Foldr Foldl Foldl']]
* [[Foldl as foldr]]
* [[Catamorphisms]]
* [http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29 Wikipedia article on folds]

==External links==
*[http://www.cse.unsw.edu.au/~en1000/haskell/hof.html "Lists, Map, Fold and Tail Recursion"]
*[http://www.cantab.net/users/antoni.diller/haskell/units/unit06.html "Unit 6: The Higher-order fold Functions"]

[[Category:Glossary]]

Foldr Foldl Foldl'

2019-02-25T13:37:44Z

WillNess: c/e

__NOTOC__

To ''foldr'', ''foldl'' or ''foldl''', that is the question!

This article demonstrates the differences between these different folds by a simple example.

If you want you can copy/paste this article into your favorite editor and run it.

We are going to define our own folds so we hide the ones from the Prelude:

<haskell>> import Prelude hiding (foldr, foldl)</haskell>

==Foldr==

Say we want to calculate the sum of a very big list:

<haskell>> veryBigList = [1..1000000]</haskell>

Lets start with the following:

<haskell>
> foldr f z [] = z
> foldr f z (x:xs) = x `f` foldr f z xs

> sum1 = foldr (+) 0

> try1 = sum1 veryBigList
</haskell>

If we evaluate ''try1'' we get:

<tt><nowiki>*** Exception: stack overflow</nowiki></tt>

Too bad... So what happened:
<haskell>
try1 -->
sum1 veryBigList -->
foldr (+) 0 veryBigList -->

foldr (+) 0 [1..1000000] -->
1 + (foldr (+) 0 [2..1000000]) -->
1 + (2 + (foldr (+) 0 [3..1000000])) -->
1 + (2 + (3 + (foldr (+) 0 [4..1000000]))) -->
1 + (2 + (3 + (4 + (foldr (+) 0 [5..1000000])))) -->
-- ...
-- ... My stack overflows when there's a chain of around 500000 (+)'s !!!
-- ... But the following would happen if you got a large enough stack:
-- ...
1 + (2 + (3 + (4 + (... + (999999 + (foldr (+) 0 [1000000]))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + (1000000 + ((foldr (+) 0 []))))...)))) -->

1 + (2 + (3 + (4 + (... + (999999 + (1000000 + 0))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + 1000000)...)))) -->
1 + (2 + (3 + (4 + (... + 1999999 ...)))) -->

1 + (2 + (3 + (4 + 500000499990))) -->
1 + (2 + (3 + 500000499994)) -->
1 + (2 + 500000499997) -->
1 + 500000499999 -->
500000500000
</haskell>

The problem is that (+) is strict in both of its arguments. This means that both arguments must be fully evaluated before (+) can return a result. So to evaluate:

<haskell>1 + (2 + (3 + (4 + (...))))</haskell>

<tt>1</tt> is pushed on the stack. Then:

<haskell>2 + (3 + (4 + (...)))</haskell>

is evaluated. So <tt>2</tt> is pushed on the stack. Then:

<haskell>3 + (4 + (...))</haskell>

is evaluated. So <tt>3</tt> is pushed on the stack. Then:

<haskell>4 + (...)</haskell>

is evaluated. So <tt>4</tt> is pushed on the stack. Then: ...

... your limited stack will eventually run full when you evaluate a large enough chain of (+)s. This then triggers the stack overflow exception.

Lets think about how to solve it...

==Foldl==

One problem with the chain of (+)'s is that it can't be made smaller (reduced) until the very last moment, when it's already too late.

The reason we can't reduce it is that the chain doesn't contain [[reducible expression|an
expression which can be reduced]] (a ''[[reducible expression|redex]]'', for '''red'''ucible
'''ex'''pression.) If it did we could reduce that expression before going
to the next element.

We can introduce a redex by forming the chain in another way. If
instead of the chain <tt>1 + (2 + (3 + (...)))</tt> we could form the chain
<tt>(((0 + 1) + 2) + 3) + ...</tt>, then there would always be a redex.

We can form such a chain by using a function called ''foldl'':

<haskell>
> foldl f z [] = z
> foldl f z (x:xs) = let z' = z `f` x
> in foldl f z' xs

> sum2 = foldl (+) 0

> try2 = sum2 veryBigList
</haskell>

Lets evaluate ''try2'':

<tt><nowiki>*** Exception: stack overflow</nowiki></tt>

Good Lord! Again a stack overflow! Lets see what happens:

<haskell>
try2 -->
sum2 veryBigList -->
foldl (+) 0 veryBigList -->

foldl (+) 0 [1..1000000] -->

let z1 = 0 + 1
in foldl (+) z1 [2..1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
in foldl (+) z2 [3..1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
in foldl (+) z3 [4..1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
in foldl (+) z4 [5..1000000] -->

-- ... after many foldl steps ...

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
in foldl (+) z999997 [999998..1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
in foldl (+) z999998 [999999..1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
z999999 = z999998 + 999999
in foldl (+) z999999 [1000000] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
z999999 = z999998 + 999999
z100000 = z999999 + 1000000
in foldl (+) z1000000 [] -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
z999999 = z999998 + 999999
z100000 = z999999 + 1000000
in z1000000 -->

-- Now a large chain of +'s will be created:

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
z999999 = z999998 + 999999
in z999999 + 1000000 -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
in (z999998 + 999999) + 1000000 -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
in ((z999997 + 999998) + 999999) + 1000000 -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
in (((z999996 + 999997) + 999998) + 999999) + 1000000 -->

-- ...
-- ... My stack overflows when there's a chain of around 500000 (+)'s !!!
-- ... But the following would happen if you got a large enough stack:
-- ...

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
in (((((z4 + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
in ((((((z3 + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

let z1 = 0 + 1
z2 = z1 + 2
in (((((((z2 + 3) + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

let z1 = 0 + 1
in ((((((((z1 + 2) + 3) + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

(((((((((0 + 1) + 2) + 3) + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

-- Now we can actually start reducing:

((((((((1 + 2) + 3) + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

(((((((3 + 3) + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

((((((6 + 4) + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

(((((10 + 5) + ...) + 999997) + 999998) + 999999) + 1000000 -->

((((15 + ...) + 999997) + 999998) + 999999) + 1000000 -->

(((499996500006 + 999997) + 999998) + 999999) + 1000000 -->

((499997500003 + 999998) + 999999) + 1000000 -->

(499998500001 + 999999) + 1000000 -->

499999500000 + 1000000 -->

500000500000 -->
</haskell>

Well, you clearly see that the redexes are created. But instead of being directly reduced, they are allocated on the heap:

<haskell>
let z1 = 0 + 1
z2 = z1 + 2
z3 = z2 + 3
z4 = z3 + 4
...
z999997 = z999996 + 999997
z999998 = z999997 + 999998
z999999 = z999998 + 999999
z1000000 = z999999 + 1000000
in z1000000
</haskell>

Note that your heap is only limited by the amount of memory in your system (RAM and swap). So the only thing this does is filling up a large part of your memory.

The problem starts when we finally evaluate z1000000:

We must evaluate <tt>z1000000 = z999999 + 1000000</tt>, so <tt>1000000</tt> is pushed on the stack. Then <tt>z999999</tt> is evaluated; <tt>z999999 = z999998 + 999999</tt>, so <tt>999999</tt> is pushed on the stack. Then <tt>z999998</tt> is evaluated; <tt>z999998 = z999997 + 999998</tt>, so <tt>999998</tt> is pushed on the stack. Then <tt>z999997</tt> is evaluated...

...your stack will eventually fill when you evaluate a large enough chain of (+)'s. This then triggers the stack overflow exception.

But this is exactly the problem we had in the foldr case — only now the chain of (+)'s is going to the left instead of the right.

So why doesn't the chain reduce sooner than
before?

It's because of GHC's lazy reduction strategy: expressions are reduced only when they are actually needed. In this case, the outer-left-most redexes are reduced first. In this case it's the outer <tt>foldl (+) ... [1..10000]</tt>
redexes which are repeatedly reduced. So the inner <tt>z1, z2, z3, ...</tt> redexes only get reduced when the foldl is completely gone.

==Foldl'==

We somehow have to tell the system that the inner redex should be
reduced before the outer. Fortunately this is possible with the
''seq'' function:

<haskell>seq :: a -> b -> b</haskell>

''seq'' is a primitive system function that when applied to ''x'' and
''y'' will first reduce ''x'' then return ''y''. The idea is that ''y'' references ''x'' so that when ''y'' is reduced ''x'' will not be a big unreduced chain anymore.

Now lets fill in the pieces:

<haskell>
> foldl' f z [] = z
> foldl' f z (x:xs) = let z' = z `f` x
> in seq z' $ foldl' f z' xs

> sum3 = foldl' (+) 0

> try3 = sum3 veryBigList
</haskell>

If we now evaluate ''try3'' we get the correct answer and we get it very quickly:

<haskell>500000500000</haskell>

Lets see what happens:

<haskell>
try3 -->
sum3 veryBigList -->
foldl' (+) 0 veryBigList -->

foldl' (+) 0 [1..1000000] -->
foldl' (+) 1 [2..1000000] -->
foldl' (+) 3 [3..1000000] -->
foldl' (+) 6 [4..1000000] -->
foldl' (+) 10 [5..1000000] -->
-- ...
-- ... You see that the stack doesn't overflow
-- ...
foldl' (+) 499999500000 [1000000] -->
foldl' (+) 500000500000 [] -->
500000500000
</haskell>

You can clearly see that the inner redex is repeatedly reduced
first.

==Conclusion==

Usually the choice is between <hask>foldr</hask> and <hask>foldl'</hask>, since <hask>foldl</hask> and <hask>foldl'</hask> are the same except for their strictness properties, so if both return a result, it must be the same. <hask>foldl'</hask> is the more efficient way to arrive at that result because it doesn't build a huge thunk. However, if the combining function is lazy in its ''first'' argument, <hask>foldl</hask> may happily return a result where <hask>foldl'</hask> hits an exception:

<haskell>
> (?) :: Int -> Int -> Int
> _ ? 0 = 0
> x ? y = x*y
>
> list :: [Int]
> list = [2,3,undefined,5,0]
>
> okey = foldl (?) 1 list
>
> boom = foldl' (?) 1 list
</haskell>

Let's see what happens:

<haskell>
okey -->
foldl (?) 1 [2,3,undefined,5,0] -->
foldl (?) (1 ? 2) [3,undefined,5,0] -->
foldl (?) ((1 ? 2) ? 3) [undefined,5,0] -->
foldl (?) (((1 ? 2) ? 3) ? undefined) [5,0] -->
foldl (?) ((((1 ? 2) ? 3) ? undefined) ? 5) [0] -->
foldl (?) (((((1 ? 2) ? 3) ? undefined) ? 5) ? 0) [] -->
((((1 ? 2) ? 3) ? undefined) ? 5) ? 0 -->
0

boom -->
foldl' (?) 1 [2,3,undefined,5,0] -->
1 ? 2 --> 2
foldl' (?) 2 [3,undefined,5,0] -->
2 ? 3 --> 6
foldl' (?) 6 [undefined,5,0] -->
6 ? undefined -->
*** Exception: Prelude.undefined
</haskell>

Note that even <hask>foldl'</hask> may not do what you expect.
The involved <hask>seq</hask> function does only evaluate the ''top-most constructor''.

If the accumulator is a more complex object, then <hask>fold'</hask> will still build up unevaluated thunks. You can introduce a function or a strict data type which forces the values as far as you need. Failing that, the "brute force" solution is to use {{HackagePackage|id=deepseq}}. For a worked example of this issue, see [http://book.realworldhaskell.org/read/profiling-and-optimization.html#id678431 ''Real World Haskell'' chapter 25].

==Rules of Thumb for Folds==

Folds are among the most useful and common functions in Haskell. They are an often-superior replacement for what in other language would be loops, but can do much more. Here are a few rules of thumb on which folds to use when.

<hask>foldr</hask> is not only the right fold, it is also most commonly the ''right'' fold to use, in particular when transforming lists (or other foldables) into lists with related elements in the same order. Notably, <hask>foldr</hask> will be effective for transforming even infinite lists into other infinite lists. For such purposes, it should be your first and most natural choice. For example, note that <hask>foldr (:) []==id</hask>.

Note that the initial element is irrelevant when <hask>foldr</hask> is applied to an infinite list. For that reason, it is may be good practice when writing a function which should only be applied to infinite lists to replace <hask>foldr f []</hask> with <hask>foldr f undefined</hask>. This both documents that the function should only be applied to infinite lists and will result in an error when you try to apply it to a finite list.

The other very useful fold is <hask>foldl'</hask>. It can be thought of as a foldr with these differences:

* <hask>foldl'</hask> conceptually reverses the order of the list. One consequence is that a <hask>foldl'</hask> (unlike <hask>foldr</hask>) applied to an infinite list will be bottom; it will not produce any usable results, just as an express <hask>reverse</hask> would not. Note that <hask>foldl' (flip (:)) []==reverse</hask>.
* <hask>foldl'</hask> often has much better time and space performance than a <hask>foldr</hask> would for the reasons explained in the previous sections.

You should pick <hask>foldl'</hask> principally in two cases:

* When the list to which it is applied is large, but definitely finite, you do not care about the implicit reversal (for example, because your combining function is commutative like <hask>(+)</hask>, <hask>(*)</hask>, or <hask>Set.union</hask>), and you seek to improve the performance of your code.
* When you actually do want to reverse the order of the list, in addition to possibly performing some other transformation to the elements. In particular, if you find that you precede or follow your fold with a reverse, it is quite likely that you could improve your code by using the other fold and taking advantage of the implicit reverse.

<hask>foldl</hask> is rarely the right choice. It gives you the implicit reverse of fold, but without the performance gains of <hask>foldl'</hask>. Only in rare, or specially constructed cases like in the previous section, will it yield better results than <hask>foldl'</hask>.

Another reason that <hask>foldr</hask> is often the better choice is that the folding function can ''short-circuit'', that is, terminate early by yielding a result which does not depend on the value of the accumulating parameter. When such possibilities arise with some frequency in your problem, short-circuiting can greatly improve your program's performance. Left folds can never short-circuit.

To illustrate this consider writing a fold that computes the product of the last digits of a list of integers. One might think that <hask>foldl'</hask> is the superior fold in this situation as the result does not depend on the order of the list and is generally not computable on infinite lists anyway.

On my workstation running GHC 7.10.2, <hask>foldl' (\a e -> (mod e 10)*a) 1 [1..10^7]</hask> has a compiled run-time of 422 ms (all of it calculation) and allocates over 400 MBytes on the heap. Conversely, <hask>foldr (\e a -> (mod e 10)*a) 1 [1..10^7]</hask> has a compiled run-time of 2203 ms (the same 422 ms calculation time and 1781 ms of garbage collection) and allocates more than 550 MBytes on the heap. That is what you should expect from reading the previous sections; <hask>foldr</hask> will build up a huge stack of nested expression before evaluating the result and that stack needs to be garbage collected.

But think about the problem a little more. Once the fold hits a number with last digit 0, there is no need to evaluate any further. The ultimate result will always be 0, so you can short-circuit to that answer immediately. Indeed <hask>foldr (\e a -> if mod e 10==0 then 0 else (mod e 10)*a) 1 [1..10^7]</hask> has a measured run-time of 0ms and allocates less than 50 kBytes on the heap. This fold will run just as fast on the range <hask>[1..10^100]</hask> or even <hask>[1..]</hask>.

The left fold cannot short-circuit and is condemned to evaluate the entire input list. Running <hask>foldl' (\a e -> if mod e 10==0 then 0 else (mod e 10)*a) 1 [1..10^7]</hask> takes 781 ms and allocates over 500 MByte of heap space; it is inferior to even the original left fold, not to mention the short-circuiting right fold.

== See also ==

* [[Fold]]
* [[Foldl as foldr]]

[[Category:FAQ]]
[[Category:Idioms]]

Prime numbers

2019-02-19T14:36:53Z

WillNess: /* Map-based */ unbrittle the code by adding explicit separators

In mathematics, ''amongst the natural numbers greater than 1'', a ''prime number'' (or a ''prime'') is such that has no divisors other than itself (and 1).

== Prime Number Resources ==

* At Wikipedia:
**[http://en.wikipedia.org/wiki/Prime_numbers Prime Numbers]
**[http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes Sieve of Eratosthenes]

* HackageDB packages:
** [http://hackage.haskell.org/package/arithmoi arithmoi]: Various basic number theoretic functions; efficient array-based sieves, Montgomery curve factorization ...
** [http://hackage.haskell.org/package/Numbers Numbers]: An assortment of number theoretic functions.
** [http://hackage.haskell.org/package/NumberSieves NumberSieves]: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
** [http://hackage.haskell.org/package/primes primes]: Efficient, purely functional generation of prime numbers.

* Papers:
** O'Neill, Melissa E., [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf "The Genuine Sieve of Eratosthenes"], Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 doi:10.1017/S0956796808007004.

== Definition ==

In mathematics, ''amongst the natural numbers greater than 1'', a ''prime number'' (or a ''prime'') is such that has no divisors other than itself (and 1). The smallest prime is thus 2. Non-prime numbers are known as ''composite'', i.e. those representable as product of two natural numbers greater than 1.

To find out a prime's multiples we can either '''a.''' test each new candidate number for divisibility by that prime, giving rise to a kind of ''trial division'' algorithm; or '''b.''' we can directly generate the multiples of a prime ''p'' by counting up from it in increments of ''p'', resulting in a variant of the ''sieve of Eratosthenes''.

The set of prime numbers is thus

:    '''P'''  = { ''n'' ∈ '''N'''2 ''':''' (∀ ''m'' ∈ '''N'''2) ( (''m'' | ''n'') ⇒ m = n) }

:: = { ''n'' ∈ '''N'''2 ''':''' (∀ ''m'' ∈ '''N'''2) ( ''m'' ⋅ ''m'' ≤ ''n'' ⇒ ¬(''m'' | ''n'')) }

:: = '''N'''2 \ { ''n'' ⋅ ''m'' ''':''' ''n'',''m'' ∈ '''N'''2 }

:: = '''N'''2 \ '''⋃''' { { ''n'' ⋅ ''m'' ''':''' ''m'' ∈ '''N'''n } ''':''' ''n'' ∈ '''N'''2 }

:: = '''N'''2 \ '''⋃''' { { ''n'' ⋅ ''n'', ''n'' ⋅ ''n''+''n'', ''n'' ⋅ ''n''+''n''+''n'', ... } ''':''' ''n'' ∈ '''N'''2 }

:: = '''N'''2 \ '''⋃''' { { ''p'' ⋅ ''p'', ''p'' ⋅ ''p''+''p'', ''p'' ⋅ ''p''+''p''+''p'', ... } ''':''' ''p'' ∈ '''P''' }
::::   where     '''N'''k = { ''n'' ∈ '''N''' ''':''' ''n'' ≥ k }

Thus starting with 2, for each newly found prime we can ''eliminate'' from the rest of the numbers ''all the multiples'' of this prime, giving us the next available number as next prime. This is known as ''sieving'' the natural numbers, so that in the end all the composites are eliminated and what we are left with are just primes. (This is what the last formula is describing, though seemingly [http://en.wikipedia.org/wiki/Impredicativity impredicative], because it is self-referential. But because '''N'''2 is well-ordered with the order preserved under addition, the formula is well-defined.)

Prototypically, it is

<haskell>
primes = map head $ scanl (\\) [2..] [[p, p+p..] | p <- primes]
where
(\\) = Data.List.Ordered.minus
</haskell>

Having (a chain of) direct-access mutable arrays indeed enables easy marking of these multiples as is usually done in imperative languages; but to get an [[#Tree merging with Wheel|efficient ''list''-based code]] we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results incrementally, one by one.

Short exposition is [[Prime numbers miscellaneous#A Tale of Sieves|here]].

== Sieve of Eratosthenes ==
Simplest, bounded, ''very'' inefficient formulation:
<haskell>
import Data.List (\\) -- (\\) is set-difference for unordered lists

primesTo m = sieve [2..m]
where
sieve (x:xs) = x : sieve (xs \\ [x,x+x..m])
sieve [] = []
-- or:
= ps
where
ps = map head $ takeWhile (not.null)
$ scanl (\\) [2..m] [[p, p+p..m] | p <- ps]
</haskell>

The (unbounded) sieve of Eratosthenes calculates primes as ''integers above 1 that are not multiples of primes'', i.e. ''not composite'' — whereas composites are found as enumeration of multiples of each prime, generated by counting up from prime's square in constant increments equal to that prime (or twice that much, for odd primes). This is much more efficient and runs at about n1.2 empirical orders of growth (corresponding to n (log n)2 log log n complexity, more or less, in ''n'' primes produced):

<haskell>
import Data.List.Ordered (minus, union, unionAll)

primes = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- tail primes])

{- Using `under n = takeWhile (<= n)`, with ordered increasing lists,
`minus`, `union` and `unionAll` satisfy, for any `n` and `m`:

under n (minus a b) == nub . sort $ under n a \\ under n b
under n (union a b) == nub . sort $ under n a ++ under n b
under n . unionAll . take m == under n . foldl union [] . take m
under n . unionAll == nub . sort . concat
. takeWhile (not.null) . map (under n) -}
</haskell>

The definition is primed with 2 and 3 as initial primes, to avoid the vicious circle.

The ''"big union"'' <code>unionAll</code> function could be defined as the folding of <code>(\(x:xs) -> (x:) . union xs)</code>; or it could use a <code>Bool</code> array as a sorting and duplicates-removing device. The processing naturally divides up into the segments between successive squares of primes.

Stepwise development follows (the fully developed version is [[#Tree merging with Wheel|here]]).

=== Initial definition ===

First of all, working with ''ordered'' increasing lists, the sieve of Eratosthenes can be genuinely represented by
<haskell>
-- genuine yet wasteful sieve of Eratosthenes
-- primes = eratos [2.. ] -- unbounded
primesTo m = eratos [2..m] -- bounded, up to m
where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p, p+p..])
-- eratos (p:xs) = p : eratos (xs `minus` map (p*) [1..])
-- eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))
-- turner (p:xs) = p : turner [x | x <- xs, rem x p /= 0]

-- fix ( map head . scanl minus [2..] . map (\p-> [p, p+p..]) )
</haskell>

This should be regarded more like a ''specification'', not a code. It runs at [https://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical orders of growth] worse than quadratic in number of primes produced. But it has the core defining features of the classical formulation of S. of E. as '''''a.''''' being bounded, i.e. having a top limit value, and '''''b.''''' finding out the multiples of a prime directly, by counting up from it in constant increments, equal to that prime.

The canonical list-difference <code>minus</code> and duplicates-removing <code>union</code> functions (cf. [http://hackage.haskell.org/packages/archive/data-ordlist/latest/doc/html/Data-List-Ordered.html Data.List.Ordered]) are:
<haskell>
-- ordered lists, difference and union
minus (x:xs) (y:ys) = case (compare x y) of
LT -> x : minus xs (y:ys)
EQ -> minus xs ys
GT -> minus (x:xs) ys
minus xs _ = xs
union (x:xs) (y:ys) = case (compare x y) of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
union xs [] = xs
union [] ys = ys
</haskell>

The name ''merge'' ought to be reserved for duplicates-preserving merging operation of the merge sort.

=== Analysis ===

So for each newly found prime ''p'' the sieve discards its multiples, enumerating them by counting up in steps of ''p''. There are thus <math>O(m/p)</math> multiples generated and eliminated for each prime, and <math>O(m \log \log(m))</math> multiples in total, with duplicates, by virtues of [http://en.wikipedia.org/wiki/Prime_harmonic_series prime harmonic series].

If each multiple is dealt with in <math>O(1)</math> time, this will translate into <math>O(m \log \log(m))</math> RAM machine operations (since we consider addition as an atomic operation). Indeed mutable random-access arrays allow for that. But lists in Haskell are sequential-access, and complexity of <code>minus(a,b)</code> for lists is <math>\textstyle O(|a \cup b|)</math> instead of <math>\textstyle O(|b|)</math> of the direct access destructive array update. The lower the complexity of each ''minus'' step, the better the overall complexity.

So on ''k''-th step the argument list <code>(p:xs)</code> that the <code>eratos</code> function gets, starts with the ''(k+1)''-th prime, and consists of all the numbers ≤ ''m'' coprime with all the primes ≤ ''p''. According to the M. O'Neill's article (p.10) there are <math>\textstyle\Phi(m,p) \in \Theta(m/\log p)</math> such numbers.

It looks like <math>\textstyle\sum_{i=1}^{n}{1/log(p_i)} = O(n/\log n)</math> for our intents and purposes. Since the number of primes below ''m'' is <math>n = \pi(m) = O(m/\log(m))</math> by the prime number theorem (where <math>\pi(m)</math> is a prime counting function), there will be ''n'' multiples-removing steps in the algorithm; it means total complexity of at least <math>O(m n/\log(n)) = O(m^2/(\log(m))^2)</math>, or <math>O(n^2)</math> in ''n'' primes produced - much much worse than the optimal <math>O(n \log(n) \log\log(n))</math>.

=== From Squares ===

But we can start each elimination step at a prime's square, as its smaller multiples will have been already produced and discarded on previous steps, as multiples of smaller primes. This means we can stop early now, when the prime's square reaches the top value ''m'', and thus cut the total number of steps to around <math>\textstyle n = \pi(m^{0.5}) = \Theta(2m^{0.5}/\log m)</math>. This does not in fact change the complexity of random-access code, but for lists it makes it <math>O(m^{1.5}/(\log m)^2)</math>, or <math>O(n^{1.5}/(\log n)^{0.5})</math> in ''n'' primes produced, a dramatic speedup:
<haskell>
primesToQ m = eratos [2..m]
where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
-- eratos (p:xs) = p : eratos (xs `minus` map (p*) [p..div m p])
-- eulers (p:xs) = p : eulers (xs `minus` map (p*) (under (div m p) (p:xs)))
-- turner (p:xs) = p : turner [x | x<-xs, x<p*p || rem x p /= 0]
</haskell>

Its empirical complexity is around <math>O(n^{1.45})</math>. This simple optimization works here because this formulation is bounded (by an upper limit). To start late on a bounded sequence is to stop early (starting past end makes an empty sequence – ''see warning below'' 1), thus preventing the creation of all the superfluous multiples streams which start above the upper bound anyway (note that Turner's sieve is unaffected by this). This is acceptably slow now, striking a good balance between clarity, succinctness and efficiency.

1''Warning'': this is predicated on a subtle point of <code>minus xs [] = xs</code> definition being used, as it indeed should be. If the definition <code>minus (x:xs) [] = x:minus xs []</code> is used, the problem is back and the complexity is bad again.

=== Guarded ===
This ought to be ''explicated'' (improving on clarity, though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from ''p'' or ''p*p'' - because it explicitly ''stops as soon as possible'':
<haskell>
primesToG m = 2 : sieve [3,5..m]
where
sieve (p:xs)
| p*p > m = p : xs
| otherwise = p : sieve (xs `minus` [p*p, p*p+2*p..])
-- p : sieve (xs `minus` map (p*) [p,p+2..])
-- p : eulers (xs `minus` map (p*) (p:xs))
</haskell>
(here we also flatly ignore all evens above 2 a priori.) It is now clear that it ''can't'' be made unbounded just by abolishing the upper bound ''m'', because the guard can not be simply omitted without changing the complexity back for the worst.

=== Accumulating Array ===

So while <code>minus(a,b)</code> takes <math>O(|b|)</math> operations for random-access imperative arrays and about <math>O(|a|)</math> operations here for ordered increasing lists of numbers, using Haskell's immutable array for ''a'' one ''could'' expect the array update time to be nevertheless closer to <math>O(|b|)</math> if destructive update were used implicitly by compiler for an array being passed along as an accumulating parameter:
<haskell>
{-# OPTIONS_GHC -O2 #-}
import Data.Array.Unboxed

primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]]
:: UArray Int Bool)
where
sieve p a
| p*p > m = 2 : [i | (i,True) <- assocs a]
| a!p = sieve (p+2) $ a // [(i,False) | i <- [p*p, p*p+2*p..m]]
| otherwise = sieve (p+2) a
</haskell>

Indeed for unboxed arrays (suggested by Daniel Fischer; with regular, boxed arrays it is ''very'' slow), the above code runs pretty fast, but with empirical complexity of ''O(n1.15..1.45)'' in ''n'' primes produced (for producing from few hundred thousands to few millions primes, memory usage also slowly growing). If the update for each index were an O(1) operation, the empirical complexity would be seen as diminishing, as ''O(n1.15..1.05)'', reflecting the true, linearithmic complexity.

We could use explicitly mutable monadic arrays ([[#Using Mutable Arrays|''see below'']]) to remedy this, but we can also think about it a little bit more on the functional side of things still.

=== Postponed ===
Going back to ''guarded'' Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:
<haskell>
primesPE1 = 2 : sieve [3..] primesPE1
where
sieve xs (p:pt) | q <- p*p , (h,t) <- span (< q) xs =
h ++ sieve (t `minus` [q, q+p..]) pt
-- h ++ turner [x | x<-t, rem x p>0] pt
</haskell>

Inlining and fusing <code>span</code> and <code>(++)</code> we get:

<haskell>
primesPE = 2 : sieve [3..] [[p*p, p*p+p..] | p <- primesPE]
where
sieve (x:xs) t@((q:cs):r)
| x < q = x : sieve xs t
| otherwise = sieve (minus xs cs) r
</haskell>
Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could now finally be made unbounded. Because no multiples-removal process is started ''prematurely'', there are no ''extraneous'' multiples streams, which were the reason for the original formulation's extreme inefficiency.

=== Segmented ===
With work done segment-wise between the successive squares of primes it becomes

<haskell>
primesSE = 2 : ops
where
ops = sieve 3 9 ops [] -- odd primes
sieve x q ~(p:pt) fs =
foldr (flip minus) [x,x+2..q-2] -- chain of subtractions
[[y+s, y+2*s..q] | (s,y) <- fs] -- OR,
-- [x,x+2..q-2] `minus` foldl union [] -- subtraction of merged
-- [[y+s, y+2*s..q] | (s,y) <- fs] -- lists
++ sieve (q+2) (head pt^2) pt
((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
</haskell>

This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times.

The advantage to working in spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each ''finite'' span; and memory usage can be kept in check by using fixed sized segments.

====Segmented Tree-merging====
Rearranging the chain of subtractions into a subtraction of merged streams ''([[#Linear merging|see below]])'' and using [[#Tree merging|tree-like folding]] structure, further [http://ideone.com/pfREP speeds up the code] and ''significantly'' improves its asymptotic time behavior (down to about <math>O(n^{1.28} empirically)</math>, space is leaking though):

<haskell>
primesSTE = 2 : ops
where
ops = sieve 3 9 ops [] -- odd primes
sieve x q ~(p:pt) fs =
([x,x+2..q-2] `minus` joinST [[y+s, y+2*s..q] | (s,y) <- fs])
++ sieve (q+2) (head pt^2) pt
((++ [(2*p,q)]) [(s,q-rem (q-y) s) | (s,y) <- fs])

joinST (xs:t) = (union xs . joinST . pairs) t
where
pairs (xs:ys:t) = union xs ys : pairs t
pairs t = t
joinST [] = []
</haskell>

====Segmented merging via an array====

The removal of composites is easy with arrays. Starting points can be calculated directly:

<haskell>
import Data.List (inits, tails)
import Data.Array.Unboxed

primesSAE = 2 : sieve 2 4 (tail primesSAE) (inits primesSAE)
-- (2:) . (sieve 2 4 . tail <*> inits) $ primesSAE
where
sieve r q ps (fs:ft) = [n | (n,True) <- assocs (
accumArray (\ _ _ -> False) True (r+1,q-1)
[(m,()) | p <- fs, let s = p * div (r+p) p,
m <- [s,s+p..q-1]] :: UArray Int Bool )]
++ sieve q (head ps^2) (tail ps) ft
</haskell>

The pattern of iterated calls to <code>tail</code> is captured by a higher-order function <code>tails</code>, which explicitly generates the stream of tails of a stream, making for a bit more readable (even if possibly a bit less efficient) code:
<haskell>
psSAGE = 2 : [n | (r:q:_, fs) <- (zip . tails . (2:) . map (^2) <*> inits) psSAGE,
(n,True) <- assocs (
accumArray (\_ _ -> False) True (r+1, q-1)
[(m,()) | p <- fs, let s = (r+p)`div`p*p,
m <- [s,s+p..q-1]] :: UArray Int Bool )]
</haskell>

=== Linear merging ===
But segmentation doesn't add anything substantially, and each multiples stream starts at its prime's square anyway. What does the [[#Postponed|Postponed]] code do, operationally? With each prime's square passed by, there emerges a nested linear ''left-deepening'' structure, '''''(...((xs-a)-b)-...)''''', where '''''xs''''' is the original odds-producing ''[3,5..]'' list, so that each odd it produces must go through more and more <code>minus</code> nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, ''right-deepening'' structure, '''''(xs-(a+(b+...)))''''', be better? This idea is due to Richard Bird, seen in his code presented in M. O'Neill's article, equivalent to:
<haskell>
primesB = 2 : minus [3..] (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r)
[] primesB)
</haskell>
or,

<haskell>
primesLME1 = 2 : prs
where
prs = 3 : minus [5,7..] (joinL [[p*p, p*p+2*p..] | p <- prs])

joinL ((x:xs):t) = x : union xs (joinL t)
</haskell>

Here, ''xs'' stays near the top, and ''more frequently'' odds-producing streams of multiples of smaller primes are ''above'' those of the bigger primes, that produce ''less frequently'' their multiples which have to pass through ''more'' <code>union</code> nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the indefinitely-defined structure, defining <code>joinL</code> (or <code>foldr</code>'s combining function) to produce part of its result ''before'' accessing the rest of its input (thus making it ''productive'').

Melissa O'Neill [http://hackage.haskell.org/packages/archive/NumberSieves/0.0/doc/html/src/NumberTheory-Sieve-ONeill.html introduced double primes feed] to prevent unneeded memoization (a memory leak). We can even do multistage. Here's the code, faster still and with radically reduced memory consumption, with empirical orders of growth of around ~ <math>n^{1.40}</math> (initially better, yet worsening for bigger ranges):

<haskell>
primesLME = 2 : _Y ((3:) . minus [5,7..] . joinL . map (\p-> [p*p, p*p+2*p..]))

_Y :: (t -> t) -> t
_Y g = g (_Y g) -- multistage, non-sharing, g (g (g (g ...)))
-- g (let x = g x in x) -- two g stages, sharing
</haskell>

<code>_Y</code> is a non-sharing fixpoint combinator, here arranging for a recursive ''"telescoping"'' multistage primes production (a ''tower'' of producers).

This allows the <code>primesLME</code> stream to be discarded immediately as it is being consumed by its consumer. For <code>prs</code> from <code>primesLME1</code> definition above it is impossible, as each produced element of <code>prs</code> is needed later as input to the same <code>prs</code> corecursive stream definition. So the <code>prs</code> stream feeds in a loop into itself and is thus retained in memory, being consumed by self much slower than it is produced. With multistage production, each stage feeds into its consumer above it at the square of its current element which can be immediately discarded after it's been consumed. <code>(3:)</code> jump-starts the whole thing.

=== Tree merging ===
Moreover, it can be changed into a '''''tree''''' structure. This idea [http://www.haskell.org/pipermail/haskell-cafe/2007-July/029391.html is due to Dave Bayer] and [[Prime_numbers_miscellaneous#Implicit_Heap|Heinrich Apfelmus]]:

<haskell>
primesTME = 2 : _Y ((3:) . gaps 5 . joinT . map (\p-> [p*p, p*p+2*p..]))

-- joinL ((x:xs):t) = x : union xs (joinL t)
joinT ((x:xs):t) = x : union xs (joinT (pairs t)) -- set union, ~=
where pairs (xs:ys:t) = union xs ys : pairs t -- nub.sort.concat

gaps k s@(x:xs) | k < x = k:gaps (k+2) s -- ~= [k,k+2..]\\s,
| True = gaps (k+2) xs -- when null(s\\[k,k+2..])
</haskell>

This code is [http://ideone.com/p0e81 pretty fast], running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about <math>O(n^{1.2})</math> in number of primes ''n'' produced).

As an aside, <code>joinT</code> is equivalent to [[Fold#Tree-like_folds|infinite tree-like folding]] <code>foldi (\(x:xs) ys-> x:union xs ys) []</code>:

[[Image:Tree-like_folding.gif|frameless|center|458px|tree-like folding]]

[https://hackage.haskell.org/package/data-ordlist-0.4.7.0/docs/Data-List-Ordered.html#v:foldt <code>Data.List.Ordered.foldt</code>] of the data-ordlist package builds the same structure, but in a lazier fashion, consuming its input at the slowest pace possible. Here this sophistication is not needed (evidently).

=== Tree merging with Wheel ===
Wheel factorization optimization can be further applied, and another tree structure can be used which is better adjusted for the primes multiples production (effecting about 5%-10% at the top of a total ''2.5x speedup'' w.r.t. the above tree merging on odds only, for first few million primes):

<haskell>
primesTMWE = [2,3,5,7] ++ _Y ((11:) . tail . gapsW 11 wheel
. joinT . hitsW 11 wheel)

gapsW k (d:w) s@(c:cs) | k < c = k : gapsW (k+d) w s -- set difference
| otherwise = gapsW (k+d) w cs -- k==c
hitsW k (d:w) s@(p:ps) | k c+p*d) (p*p) (d:w)
: hitsW (k+d) w ps -- k==p

wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
-- cycle $ zipWith (-) =<< tail $ [i | i <- [11..221], gcd i 210 == 1]
</haskell>

The <code>hitsW</code> function is there to find the starting point for rolling the wheel for each prime, but this can be found directly:

<haskell>
primesW = [2,3,5,7] ++ _Y ( (11:) . tail . gapsW 11 wheel . joinT .
map (\p->
map (p*) . dropWhile (< p) $
scanl (+) (p - rem (p-11) 210) wheel) )
</haskell>

Seems to run about 1.4x faster, too.

====Above Limit - Offset Sieve====
Another task is to produce primes above a given value:
<haskell>
{-# OPTIONS_GHC -O2 -fno-cse #-}
primesFromTMWE primes m = dropWhile (< m) [2,3,5,7,11]
++ gapsW a wh2 (compositesFrom a)
where
(a,wh2) = rollFrom (snapUp (max 3 m) 3 2)
(h,p2:t) = span (< z) $ drop 4 primes -- p < z => p*p<=a
z = ceiling $ sqrt $ fromIntegral a + 1 -- p2>=z => p2*p2>a
compositesFrom a = joinT (joinST [multsOf p a | p <- h ++ [p2]]
: [multsOf p (p*p) | p <- t] )

snapUp v o step = v + (mod (o-v) step) -- full steps from o
multsOf p from = scanl (\c d->c+p*d) (p*x) wh -- map (p*) $
where -- scanl (+) x wh
(x,wh) = rollFrom (snapUp from p (2*p) `div` p) -- , if p < from

wheelNums = scanl (+) 0 wheel
rollFrom n = go wheelNums wheel
where
m = (n-11) `mod` 210
go (x:xs) ws@(w:ws2) | x < m = go xs ws2
| True = (n+x-m, ws) -- (x >= m)
</haskell>

A certain preprocessing delay makes it worthwhile when producing more than just a few primes, otherwise it degenerates into simple [[#Optimal trial division|trial division]], which is then ought to be used directly:

<haskell>
primesFrom m = filter isPrime [m..]
</haskell>

=== Map-based ===
Runs ~1.7x slower than [[#Tree_merging|TME version]], but with the same empirical time complexity, ~<math>n^{1.2}</math> (in ''n'' primes produced) and same very low (near constant) memory consumption:

<haskell>
import Data.List -- based on
import qualified Data.Map as M -- http://stackoverflow.com/a/1140100

primesMPE :: [Integer]
primesMPE = 2 : mkPrimes 3 M.empty prs 9 -- postponed sieve enlargement
where -- by decoupled primes feed loop
prs = 3 : mkPrimes 5 M.empty prs 9
mkPrimes n m ps@ ~(p:pt) q = case (M.null m, M.findMin m) of
{ (False, (n2, skips)) | n == n2 ->
mkPrimes (n+2) (addSkips n (M.deleteMin m) skips) ps q
; _ -> if n < q
then n : mkPrimes (n+2) m ps q
else mkPrimes (n+2) (addSkip n m (2*p)) pt (head pt^2)
}
addSkip n m s = M.alter (Just . maybe [s] (s:)) (n+s) m
addSkips = foldl' . addSkip
</haskell>

== Turner's sieve - Trial division ==

David Turner's ''(SASL Language Manual, 1983)'' formulation replaces non-standard <code>minus</code> in the sieve of Eratosthenes by stock list comprehension with <code>rem</code> filtering, turning it into a trial division algorithm, for clarity and simplicity:

<haskell>
-- unbounded sieve, premature filters
primesT = sieve [2..]
where
sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]

-- map head
-- $ iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..]
</haskell>

This creates many superfluous implicit filters, because they are created prematurely. To be admitted as prime, ''each number'' will be ''tested for divisibility'' here by all its preceding primes, while just those not greater than its square root would suffice. To find e.g. the '''1001'''st prime (<code>7927</code>), '''1000''' filters are used, when in fact just the first '''24''' are needed (up to <code>89</code>'s filter only). Operational overhead here is huge.

=== Guarded Filters ===
But this really ought to be changed into the ''abortive'' variant, [[#From Squares|again achieving]] the ''"miraculous"'' complexity improvement from above quadratic to about <math>O(n^{1.45})</math> empirically (in ''n'' primes produced) by stopping the sieving as soon as possible:

<haskell>
primesToGT m = sieve [2..m]
where
sieve (p:xs)
| p*p > m = p : xs
| True = p : sieve [x | x <- xs, rem x p > 0]

-- (\(a,b:_) -> map head a ++ b) . span ((< m).(^2).head) $
-- iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..m]
</haskell>

=== Postponed Filters ===
Or it can remain unbounded, just filters creation must be ''postponed'' until the right moment:
<haskell>
primesPT1 = 2 : sieve primesPT1 [3..]
where
sieve (p:pt) xs = let (h,t) = span (< p*p) xs
in h ++ sieve pt [x | x <- t, rem x p > 0]

-- fix $ concatMap (fst . fst)
-- . iterate (\((_,xs), p:pt) -> let (h,t) = span (< p*p) xs in
-- ((h, [x | x <- t, rem x p > 0]), pt))
-- . (,) ([2],[3..])
</haskell>
It can be re-written with <code>span</code> and <code>(++)</code> inlined and fused into the <code>sieve</code>:
<haskell>
primesPT = 2 : oddprimes
where
oddprimes = sieve [3,5..] 9 oddprimes
sieve (x:xs) q ps@ ~(p:pt)
| x < q = x : sieve xs q ps
| True = sieve [x | x <- xs, rem x p /= 0] (head pt^2) pt
</haskell>
creating here [[#Linear merging |as well]] the linear filtering nested structure at run-time, <code>(...(([3,5..] >>= filterBy [3]) >>= filterBy [5])...)</code>, but unlike the non-postponed code each filter being created at its proper moment, not sooner than the prime's square is seen.
<haskell>
filterBy ds n = [n | noDivs n ds] -- `ds` assumed to be non-decreasing

noDivs n ds = foldr (\d r -> d*d > n || (rem n d > 0 && r)) True ds
</haskell>

=== Optimal trial division ===

The above is algorithmically equivalent to the traditional formulation of trial division,
<haskell>
ps = 2 : [i | i <- [3..],
and [rem i p > 0 | p <- takeWhile (\p -> p^2 <= i) ps]]
</haskell>
or,
<haskell>
-- primes = filter (`noDivs`[2..]) [2..]
-- = 2 : filter (`noDivs`[3,5..]) [3,5..]
primesTD = 2 : 3 : filter (`noDivs` tail primesTD) [5,7..]

isPrime n = n > 1 && noDivs n primesTD
</haskell>
except that this code is rechecking for each candidate number which primes to use, whereas for every candidate number in each segment between the successive squares of primes these will just be the same prefix of the primes list being built.

Trial division is used as a simple [[Testing primality#Primality Test and Integer Factorization|primality test and prime factorization algorithm]].

=== Segmented Generate and Test ===
Next we turn [[#Postponed Filters |the list of filters]] into one filter by an ''explicit'' list, each one in a progression of prefixes of the primes list. This seems to eliminate most recalculations, explicitly filtering composites out from batches of odds between the consecutive squares of primes.
<haskell>
import Data.List

primesST = 2 : ops
where
ops = sieve 3 9 ops (inits ops) -- odd primes
-- (sieve 3 9 <*> inits) ops -- inits: [],[3],[3,5],...
sieve x q ~(_:pt) (fs:ft) =
filter ((`all` fs) . ((> 0).) . rem) [x,x+2..q-2]
++ sieve (q+2) (head pt^2) pt ft
</haskell>
This can also be coded as, arguably more readable,
<haskell>
primesSGT = 2 : ops
where
ops = 3 : [n | (r:q:_, px) <- (zip . tails . (3:) . map (^2)) ops (inits ops),
n <- [r+2,r+4..q-2], all ((> 0) . rem n) px]

-- n <- foldl (>>=) [r+2,r+4..q-2] -- chain of filters
-- [filterBy [p] | p <- px]] -- OR,

-- n <- [r+2,r+4..q-2] >>= filterBy px] -- a filter by a list
</haskell>

==== Generate and Test Above Limit ====

The following will start the segmented Turner sieve at the right place, using any primes list it's supplied with (e.g. [[#Tree_merging_with_Wheel | TMWE]] etc.) or itself, as shown, demand computing it just up to the square root of any prime it'll produce:

<haskell>
primesFromST m | m <= 2 = 2 : primesFromST 3
primesFromST m | m > 2 =
sieve (m`div`2*2+1) (head ps^2) (tail ps) (inits ps)
where
(h,ps) = span (<= (floor.sqrt $ fromIntegral m+1)) ops
sieve x q ps (fs:ft) =
filter ((`all` (h ++ fs)) . ((> 0) .) . rem) [x,x+2..q-2]
++ sieve (q+2) (head ps^2) (tail ps) ft
ops = 3 : primesFromST 5 -- odd primes

-- ~> take 3 $ primesFromST 100000001234
-- [100000001237,100000001239,100000001249]
</haskell>

This is usually faster than testing candidate numbers for divisibility [[#Optimal trial division|one by one]] which has to re-fetch anew the needed prime factors to test by, for each candidate. Faster is the [[99_questions/Solutions/39#Solution_4.|offset sieve of Eratosthenes on odds]], and yet faster the one [[#Above_Limit_-_Offset_Sieve|w/ wheel optimization]], on this page.

=== Conclusions ===
All these variants being variations of trial division, finding out primes by direct divisibility testing of every candidate number by sequential primes below its square root (instead of just by ''its factors'', which is what ''direct generation of multiples'' is doing, essentially), are thus principally of worse complexity than that of Sieve of Eratosthenes.

The initial code is just a one-liner that ought to have been regarded as ''executable specification'' in the first place. It can easily be improved quite significantly with a simple use of bounded, guarded formulation to limit the number of filters it creates, or by postponement of filter creation.

== Euler's Sieve ==
=== Unbounded Euler's sieve ===
With each found prime Euler's sieve removes all its multiples ''in advance'' so that at each step the list to process is guaranteed to have ''no multiples'' of any of the preceding primes in it (consists only of numbers ''coprime'' with all the preceding primes) and thus starts with the next prime:

<haskell>
primesEU = 2 : eulers [3,5..]
where
eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))
-- eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+2*p..])
</haskell>

This code is extremely inefficient, running above <math>O({n^{2}})</math> empirical complexity (and worsening rapidly), and should be regarded a ''specification'' only. Its memory usage is very high, with empirical space complexity just below <math>O({n^{2}})</math>, in ''n'' primes produced.

In the stream-based sieve of Eratosthenes we are able to ''skip'' along the input stream <code>xs</code> directly to the prime's square, consuming the whole prefix at once, thus achieving the results equivalent to the postponement technique, because the generation of the prime's multiples is independent of the rest of the stream.

But here in the Euler's sieve it ''is'' dependent on all <code>xs</code> and we're unable ''in principle'' to skip along it to the prime's square - because all <code>xs</code> are needed for each prime's multiples generation. Thus efficient unbounded stream-based implementation seems to be impossible in principle, under the simple scheme of producing the multiples by multiplication.

=== Wheeled list representation ===

The situation can be somewhat improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:
<haskell>
{- fromElt (x,i) = x : fromElt (x + i,i)
=== iterate (+ i) x
[n..] === fromElt (n,1)
=== fromSpan ([n],1)
[n,n+2..] === fromElt (n,2)
=== fromSpan ([n,n+2],4) -}

fromSpan (xs,i) = xs ++ fromSpan (map (+ i) xs,i)

{- === concat $ iterate (map (+ i)) xs
fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)
fromSpan (xs,i) `minus` fromSpan (ys,i)
=== fromSpan (xs `minus` ys, i)
map (p*) (fromSpan (xs,i))
=== fromSpan (map (p*) xs, p*i)
fromSpan (xs,i) === forall (p > 0).
fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}

spanSpecs = iterate eulerStep ([2],1)
eulerStep (xs@(p:_), i) =
( (tail . concat . take p . iterate (map (+ i))) xs
`minus` map (p*) xs, p*i )

{- > mapM_ print $ take 4 spanSpecs
([2],1)
([3],2)
([5,7],6)
([7,11,13,17,19,23,29,31],30) -}
</haskell>

Generating a list from a span specification is like rolling a ''[[#Prime_Wheels|wheel]]'' as its pattern gets repeated over and over again. For each span specification <code>w@((p:_),_)</code> produced by <code>eulerStep</code>, the numbers in <code>(fromSpan w)</code> up to <math>{p^2}</math> are all primes too, so that

<haskell>
eulerPrimesTo m = if m > 1 then go ([2],1) else []
where
go w@((p:_), _)
| m < p*p = takeWhile (<= m) (fromSpan w)
| True = p : go (eulerStep w)
</haskell>

This runs at about <math>O(n^{1.5..1.8})</math> complexity, for <code>n</code> primes produced, and also suffers from a severe space leak problem (IOW its memory usage is also very high).

== Using Immutable Arrays ==

=== Generating Segments of Primes ===

The sieve of Eratosthenes' [[#Segmented|removal of multiples on each segment of odds]] can be done by actually marking them in an array, instead of manipulating ordered lists, and can be further sped up more than twice by working with odds only:

<haskell>
import Data.Array.Unboxed

primesSA :: [Int]
primesSA = 2 : oddprimes ()
where
oddprimes = (3 :) . sieve 3 [] . oddprimes
sieve x fs (p:ps) = [i*2 + x | (i,True) <- assocs a]
++ sieve (p*p) ((p,0) :
[(s, rem (y-q) s) | (s,y) <- fs]) ps
where
q = (p*p-x)`div`2
a :: UArray Int Bool
a = accumArray (\ b c -> False) True (1,q-1)
[(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]
</haskell>

Runs significantly faster than [[#Tree merging with Wheel|TMWE]] and with better empirical complexity, of about <math>O(n^{1.10..1.05})</math> in producing first few millions of primes, with constant memory footprint.

=== Calculating Primes Upto a Given Value ===

Equivalent to [[#Accumulating Array|Accumulating Array]] above, running somewhat faster (compiled by GHC with optimizations turned on):

<haskell>
{-# OPTIONS_GHC -O2 #-}
import Data.Array.Unboxed

primesToNA n = 2: [i | i <- [3,5..n], ar ! i]
where
ar = f 5 $ accumArray (\ a b -> False) True (3,n)
[(i,()) | i <- [9,15..n]]
f p a | q > n = a
| True = if null x then a2 else f (head x) a2
where q = p*p
a2 :: UArray Int Bool
a2 = a // [(i,False) | i <- [q, q+2*p..n]]
x = [i | i <- [p+2,p+4..n], a2 ! i]
</haskell>

=== Calculating Primes in a Given Range ===

<haskell>
primesFromToA a b = (if a<3 then [2] else [])
++ [i | i <- [o,o+2..b], ar ! i]
where
o = max (if even a then a+1 else a) 3 -- first odd in the segment
r = floor . sqrt $ fromIntegral b + 1
ar = accumArray (\_ _ -> False) True (o,b) -- initially all True,
[(i,()) | p <- [3,5..r]
, let q = p*p -- flip every multiple of an odd
s = 2*p -- to False
(n,x) = quotRem (o - q) s
q2 = if o <= q then q
else q + (n + signum x)*s
, i <- [q2,q2+s..b] ]
</haskell>

Although sieving by odds instead of by primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.

== Using Mutable Arrays ==

Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.

=== Using ST Array ===

This method implements the Sieve of Eratosthenes, similar to how you might do it
in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently packed) sieve array for the whole sequence to process.

<haskell>
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

sieveUA :: Int -> UArray Int Bool
sieveUA top = runSTUArray $ do
let m = (top-1) `div` 2
r = floor . sqrt $ fromIntegral top + 1
sieve <- newArray (1,m) True -- :: ST s (STUArray s Int Bool)
forM_ [1..r `div` 2] $ \i -> do
isPrime <- readArray sieve i
when isPrime $ do -- ((2*i+1)^2-1)`div`2 == 2*i*(i+1)
forM_ [2*i*(i+1), 2*i*(i+2)+1..m] $ \j -> do
writeArray sieve j False
return sieve

primesToUA :: Int -> [Int]
primesToUA top = 2 : [i*2+1 | (i,True) <- assocs $ sieveUA top]
</haskell>

Its [http://ideone.com/KwZNc empirical time complexity] is improving with ''n'' (number of primes produced) from above <math>O(n^{1.20})</math> towards <math>O(n^{1.16})</math>. The reference [http://ideone.com/FaPOB C++ vector-based implementation] exhibits this improvement in empirical time complexity too, from <math>O(n^{1.5})</math> gradually towards <math>O(n^{1.12})</math>, where tested ''(which might be interpreted as evidence towards the expected [http://en.wikipedia.org/wiki/Computation_time#Linearithmic.2Fquasilinear_time quasilinearithmic] <math>O(n \log(n)\log(\log n))</math> time complexity)''.

=== Bitwise prime sieve with Template Haskell ===

Count the number of prime below a given 'n'. Shows fast bitwise arrays,
and an example of [[Template Haskell]] to defeat your enemies.

<haskell>
{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits

nthPrime :: Int -> Int
nthPrime n = runST (sieve n)

sieve n = do
a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
let cutoff = truncate (sqrt $ fromIntegral n) + 1
go a n cutoff 3 1

go !a !m cutoff !n !c
| n >= m = return c
| otherwise = do
e <- unsafeRead a n
if e then
if n < cutoff then
let loop !j
| j < m = do
x <- unsafeRead a j
when x $ unsafeWrite a j False
loop (j+n)
| otherwise = go a m cutoff (n+2) (c+1)
in loop ( if n < 46340 then n * n else n `shiftL` 1)
else go a m cutoff (n+2) (c+1)
else go a m cutoff (n+2) c
</haskell>

And place in a module:

<haskell>
{-# OPTIONS -fth #-}
import Primes

main = print $( let x = nthPrime 10000000 in [| x |] )
</haskell>

Run as:

<haskell>
$ ghc --make -o primes Main.hs
$ time ./primes
664579
./primes 0.00s user 0.01s system 228% cpu 0.003 total
</haskell>

== Implicit Heap ==

See [[Prime_numbers_miscellaneous#Implicit_Heap | Implicit Heap]].

== Prime Wheels ==

See [[Prime_numbers_miscellaneous#Prime_Wheels | Prime Wheels]].

== Using IntSet for a traditional sieve ==

See [[Prime_numbers_miscellaneous#Using_IntSet_for_a_traditional_sieve | Using IntSet for a traditional sieve]].

== Testing Primality, and Integer Factorization ==

See [[Testing_primality | Testing primality]]:

* [[Testing_primality#Primality_Test_and_Integer_Factorization | Primality Test and Integer Factorization ]]
* [[Testing_primality#Miller-Rabin_Primality_Test | Miller-Rabin Primality Test]]

== One-liners ==
See [[Prime_numbers_miscellaneous#One-liners | primes one-liners]].

== External links ==
* http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip
: A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.
: WARNING: Don't use the priority queue from ''older versions'' of that file for your projects: it's broken and works for primes only by a lucky chance. The ''revised'' version of the file fixes the bug, as pointed out by Eugene Kirpichov on February 24, 2009 on the [http://www.mail-archive.com/haskell-cafe@haskell.org/msg54618.html haskell-cafe] mailing list, and fixed by Bertram Felgenhauer.

* [http://ideone.com/willness/primes test entries] for (some of) the above code variants.

* Speed/memory [http://ideone.com/p0e81 comparison table] for sieve of Eratosthenes variants.

* [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth Empirical orders of growth] on Wikipedia.

[[Category:Code]]
[[Category:Mathematics]]