Difference between revisions of "Prime numbers"

From HaskellWiki
Jump to navigation Jump to search
(→‎Using ST Array: correction: it's n(log n)(log (log n)), in n primes produced)
(→‎Finding Primes: general re-write, starting with simplest and genuine Eratosthenes definition, not with Turner's)
Line 11: Line 11:
 
** [http://hackage.haskell.org/package/NumberSieves NumberSieves]: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
 
** [http://hackage.haskell.org/package/NumberSieves NumberSieves]: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
 
** [http://hackage.haskell.org/package/primes primes]: Efficient, purely functional generation of prime numbers.
 
** [http://hackage.haskell.org/package/primes primes]: Efficient, purely functional generation of prime numbers.
 
== Finding Primes ==
 
   
 
Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other than itself. That means that starting with 2, ''for each'' newly found ''prime'' we can ''eliminate'' from the rest of the numbers ''all such numbers'' that have this prime as their divisor, giving us the ''next available'' number as next prime. This is known as ''sieving'' the natural numbers, so that in the end what we are left with are just primes.
 
Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other than itself. That means that starting with 2, ''for each'' newly found ''prime'' we can ''eliminate'' from the rest of the numbers ''all such numbers'' that have this prime as their divisor, giving us the ''next available'' number as next prime. This is known as ''sieving'' the natural numbers, so that in the end what we are left with are just primes.
Line 20: Line 18:
 
Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient ''list''-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results one by one.
 
Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient ''list''-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results one by one.
   
  +
== Sieve of Eratosthenes ==
=== Filtering To Keep the Prime Numbers In ===
 
  +
Sieve of Eratosthenes is ''genuinely'' represented by
  +
<haskell>
  +
-- genuine yet wasteful Eratosthenes
  +
primesTo m = 2 : sieve [3,5..m] where
  +
sieve [] = []
  +
sieve (p:xs) = p : sieve (xs `minus` [p,p+2*p..m])
  +
</haskell>
  +
  +
This should be regarded more like a ''specification'', not a code. It is extremely slow, running at empirical time complexities worse than quadratic in number of primes produced. But it has the core defining features of S. of E. as '''''a)''''' being bounded, i.e. having a top limit value, and '''''b)''''' finding out the multiples of a prime by counting up from it. Yes, this is exactly how Eratosthenes defined it ([http://www.archive.org/stream/nicomachigerasen00nicouoft#page/29/mode/1up Nicomachus, ''Introduction to Arithmetic'', I, pp. 13, 31]).
  +
  +
The canonical list-difference <code>minus</code> and duplicates-removing list-union <code>union</code> functions dealing with ordered increasing lists - infinite as well as finite - are simple enough to define (using <code>compare</code> has an effect of comparing the values only once, unlike when using (<) etc):
  +
<haskell>
  +
-- ordered lists, difference and union
  +
minus (x:xs) (y:ys) = case (compare x y) of
  +
LT -> x : minus xs (y:ys)
  +
EQ -> minus xs ys
  +
GT -> minus (x:xs) ys
  +
minus xs _ = xs
  +
union (x:xs) (y:ys) = case (compare x y) of
  +
LT -> x : union xs (y:ys)
  +
EQ -> x : union xs ys
  +
GT -> y : union (x:xs) ys
  +
union xs ys = xs ++ ys
  +
</haskell>
  +
  +
(the name ''merge'' ought to be reserved for duplicates-preserving merging as used by ''mergesort'' - that's why we use ''union'' here, following Leon P.Smith's [http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html Data.List.Ordered] package).
  +
  +
=== Analysis ===
  +
  +
So what does it do, this sieve code? For each found prime it removes its ''odd'' multiples from further consideration. It finds them by counting up in steps of ''2p''. There are thus '''<math>O(m/p)</math>''' multiples for each prime, and '''<math>O(m \log(\log m))</math>''' multiples total, with duplicates, by virtue of ''prime harmonic series'', where <math>\sum_{p_i<m}{1/p_i} = O(\log(\log m))</math>.
  +
  +
If each multiple is dealt with in '''<math>O(1)</math>''' time, this will translate into '''<math>O(m \log(\log m))</math>''' RAM-mashine operations (since we consider addition as an atomic operation). Indeed, mutable random-access arrays allow for that. But lists in Haskell don't, and complexity of <code>minus(a,b)</code> is about '''<math>O(|union(a,b)|)</math>''', i.e. here it is '''<math>O(|a|)</math>''' which is '''<math>O(m/\log p)</math>''' according to the notice about the '''Φ'''-function in Melissa O'Neill's article.
   
  +
It looks like <math>\sum_{i=1}^{k}{1/log(p_i)} = O(k/\log k)</math>. Since the number of primes below ''m'' is '''<math>k = \pi(m) = O(m/\log m)</math>''' by the ''prime number theorem'' (where <math>\pi(m)</math> is a prime counting function), there will be ''k'' multiples-removing steps in the algorithm; it means total complexity of '''<math>O(m k/\log k) = O(m^2/(\log m)^2)</math>''', or '''<math>O(n^2)</math>''' in ''n'' primes produced - much much worse than the optimal '''<math>O(n (\log n) \log(\log n))</math>'''.
==== The Classic Turner's Sieve ====
 
   
  +
=== Squared ===
The following is attributed to David Turner ''(SASL Language Manual, 1975)'', generating a list of all prime numbers:
 
   
  +
But we can start each step at the prime's ''square'', as its smaller multiples will be already processed on previous steps. This means we can ''stop early'', when the prime's square reaches the top value ''m'', and thus cut the total number of steps to around '''<math>k = \pi(\sqrt{m}) = O(\sqrt{m}/\log m)</math>'''. This does not in fact change the complexity of random-access code, but for lists it makes it '''<math>O(m^{1.5}/(\log m)^2)</math>''', or '''<math>O(n^{1.5}/\sqrt{\log n})</math>''' in ''n'' primes produced, showing an ''enormous'' speedup in practice:
 
<haskell>
 
<haskell>
primes = sieve [2..]
+
primesToQ m = 2 : sieve [3,5..m] where
  +
sieve [] = []
where
 
sieve (p:xs) = p : sieve [x | x <- xs, rem x p /= 0]
+
sieve (p:xs) = p : sieve (xs `minus` [p*p,p*p+2*p..m])
-- or: filter ((/=0).(`rem`p)) xs
 
 
</haskell>
 
</haskell>
   
  +
Its empirical complexity is about <math>O(n^{1.45})</math>. This simple optimization of starting from a prime's square has dramatic effect here because our formulation is ''bounded'', in accordance with the original algorithm. This has the desired effect of ''stopping early'', turning the unacceptably slow initial ''specification'' into a ''code'', yet-far-from-optimal and slow, but acceptably so.
This should be considered only a ''specification'', not a ''code''. When run as is, it is ''extremely inefficient'' because it starts up the filters ''prematurely'', immediately after each prime, instead of only after the prime's square has been reached. To be admitted as prime, ''each number'' will be ''tested for divisibility'' here by all its preceding primes, while just those not greater than its square root would suffice. This means that e.g. to find the '''1001'''st prime (<code>7927</code>), '''1000''' filters are used, when in fact just the first '''24''' are needed (up to <code>89</code>'s filter only).
 
   
  +
=== Guarded ===
So this in effect creates a cascade of nested filters in front of the infinite numbers supply, and in extremely premature fashion at that. One way of fixing that would be to ''postpone'' the creation of filters until the right moment, by ''decoupling'' the primes supply from the numbers supply.
 
  +
This ought to be explicated ''(improving on clarity though not on time complexity)'' as in the following, for which it is indeed a minor optimization whether to start from ''p'' or ''p*p'' - but ''only after'' we've went to the necessary trouble of explicitly stopping early on:
  +
<haskell>
  +
primesToG m = 2 : sieve [3,5..m] where
  +
sieve (p:xs)
  +
| p*p > m = p : xs
  +
| True = p : sieve (xs `minus` [p*p,p*p+2*p..m])
  +
</haskell>
   
  +
Did Eratosthenes himself achieve the optimal complexity? It rather seems doubtful, as he probably counted boxes in the table by 1 to go from one number to the next, as in '''3''',''5,7'','''9''',''11,13'','''15''', ''...'' for he had no access even to Hindu numerals, using Greek alphabet for writing down numbers instead. Was '''''he''''' performing a ''genuine'' sieve of Eratosthenes then? We'll leave that as an open question.
==== Postponed Filters Sieve ====
 
   
  +
=== Accumulating Array ===
Here each filter's creation is postponed until the very moment it's needed.
 
  +
  +
So while <code>minus(a,b)</code> takes <math>O(|b|)</math> operations for random-access imperative arrays and about <math>O(|a|)</math> operations for lists here, using Haskell's immutable array for ''a'' one ''could'' expect the array update time to be indeed <math>O(|b|)</math> but it looks like it's not so:
 
<haskell>
 
<haskell>
primes = 2: 3: sieve (tail primes) [5,7..]
+
primesToA m = sieve 3 (array (3,m) [(i,True) | i<-[3..m]] //
  +
[(i,False) | i<-[4,6..m]])
where
 
  +
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, rem x p /= 0]
 
  +
sieve p a
-- or: filter ((/=0).(`rem`p)) t
 
where (h,t) = span (< p*p) xs
+
| p*p > m = 2 : [i | (i,True) <- assocs a]
  +
| a!p = sieve (p+2) $ a // [(i,False) | i <- [p*p,p*p+2*p..m]]
  +
| otherwise = sieve (p+2) a
 
</haskell>
 
</haskell>
   
  +
It's much slower than the above, though it ''should'' be running at optimal complexity on implementations which are able to properly use the destructive update here, for an array being passed along as an accumulating parameter, it seems.
This can be seen as the essential framework for all the code to come. It only tests ''odd'' numbers, and only by the primes that are needed, for each ''numbers span'' between successive squares of primes. To find the '''1001'''st prime, the divisibility test is performed by only '''24''' nested filters corresponding to the first '''24''' odd primes.
 
   
  +
How this implementation deficiency is to be overcome? One way is to use explicitly mutable monadic arrays (''see below''), but we can also think about it a little bit more on the functional side of things.
Whereas Turner's sieve exhibits near O(<math>{n^2}</math>) behavior (''n'' referring to the number of primes produced), this one exhibits near O(<math>{n^{1.5}}</math>) behavior, with an orders-of-magnitude speedup.
 
   
  +
=== Postponed ===
===== Odd numbers, by Trial Division =====
 
  +
Going back to ''guarded'' Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each a bit prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:
This is good for generating few 100,000s primes (when compiled):
 
  +
<haskell>
  +
primesPE = 2 : primes'
  +
where
  +
primes' = sieve [3,5..] primes' 9
  +
sieve (p:xs) ps@ ~(_:t) q
  +
| p < q = p : sieve xs ps q
  +
| True = sieve (xs `minus` [q,q+2*p..]) t (head t^2)
  +
</haskell>
  +
Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could '''''now''''' finally be made unbounded. Because no multiples-removal process is started ''prematurely'', there are no ''extraneous'' mutiples streams, which were the reason for the extreme wastefulness and thus inefficiency of the original formulation.
  +
  +
=== Segmented ===
  +
With work done segment-wise between the successive primes squares it becomes
   
 
<haskell>
 
<haskell>
primes = 2: 3: filter isPrime [5,7..]
+
primes = 2: 3: sieve (tail primes) 3 []
 
where
 
where
isPrime n = all (notDivs n)
+
sieve (p:ps) x fs = let q=p*p in
  +
foldr (flip minus) [x+2,x+4..q-2]
$ takeWhile (\p-> p*p <= n) (tail primes)
 
  +
[[y+s,y+2*s..q] | (s,y) <- fs]
notDivs n p = n `mod` p /= 0
 
  +
++ sieve ps q
  +
((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
  +
</haskell>
  +
  +
This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using [http://ideone.com/pfREP tree-like folding] further speeds up the code and improves its asympthotic behaviour.
  +
  +
The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each ''finite'' span.
  +
  +
=== Linear merging ===
  +
But what does the Postponed code do, operationally? For each prime's square passed over, it builds up a nested linear ''left-deepening'' structure, '''''(...((xs-a)-b)-...)''''', where '''''xs''''' is the original odds-producing ''[3,5..]'' list, so that each odd it produces must go through <code>`minus`</code> nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, ''right-deepening'' structure, '''''(xs-(a+(b+...)))''''', be better?
  +
  +
Here, ''xs'' would stay near the top, and more frequently odds-producing streams of multiples of ''smaller'' primes would be above those of the ''bigger'' primes, which produce ''less'' frequently their candidates which have to pass through ''more'' <code>`minus`</code> nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the infinitely-defined structure (specifically, if each <code>(+)</code> operation passes along unconditionally its left child's head value ''before'' polling the right child's head value, then we are safe).
  +
  +
Here's the code, faster yet but still with about same time complexity of <math>O(n^{1.4})</math>, which is essentially equivalent to the Richard Bird's code from Melissa O'Neill's article:
  +
  +
<haskell>
  +
{-# OPTIONS_GHC -O2 -fno-cse #-}
  +
primesLME = 2 : ([3,5..] `minus` fold [[p*p,p*p+2*p..] | p <- primes'])
  +
where
  +
primes' = 3 : ([5,7..] `minus` fold [[p*p,p*p+2*p..] | p <- primes'])
  +
fold ((x:xs):t) = x : union xs (fold t)
 
</haskell>
 
</haskell>
   
  +
The double primes feed is introduced here to prevent unneeded memoization and thus save memory, as per Melissa O'Neill's code, and is dependent on no expression sharing being performed by a compiler - space complexity here is probably <math>O(\pi(\sqrt m)) = O(2\sqrt{n/\log n})</math> but in practice it's negligible compared to run-time system memory itself and thus the total memory is very nearly <math>O(1)</math>.
Instead of relying on nested filters, it tests each odd number by an explicit list of all the needed prime factors. But for each number tested it re-fetches this list ''anew'' which will be ''the same'' for the increasing spans of numbers between the successive squares of primes.
 
   
  +
=== Tree merging ===
===== Generated Spans, by Nested Filters =====
 
  +
Moreover, it can be changed into a '''''tree''''' structure:
The other way to go instead of concentrating on the numbers supply, is to directly work on successive spans between the primes squares.
 
   
  +
<haskell>
This version is a bit faster still, creating ''158,000'' primes (GHC-compiled) in the same time as the postponed filters does ''100,000'' primes:
 
  +
{-# OPTIONS_GHC -O2 -fno-cse #-}
  +
primesTME = 2 : ([3,5..] `minus` foldt [[p*p,p*p+2*p..] | p <- primes'])
  +
where
  +
primes' = 3 : ([5,7..] `minus` foldt [[p*p,p*p+2*p..] | p <- primes'])
  +
foldt ((x:xs):t) = x : union xs (foldt (pairs t))
  +
pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t
  +
</haskell>
  +
  +
It is [http://ideone.com/p0e81 very fast], running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about <math>O(n^{1.2})</math> in number of primes ''n'' produced).
  +
  +
=== Tree merging with Wheel ===
  +
Furthermore, wheel factorization optimization can be applied to it, and another tree structure can be used, better adjusted for the primes multiples production (effecting an additional 5%~10% speedup):
   
 
<haskell>
 
<haskell>
  +
{-# OPTIONS_GHC -O2 -fno-cse #-}
primes = 2: 3: sieve (tail primes) 3 []
 
  +
primesTMWE = 2:3:5:7: gaps 11 wheel (fold3t $ roll 11 wheel primes')
where
 
  +
where
notDivsBy d n = n `mod` d /= 0
 
  +
primes' = 11: gaps 13 (tail wheel) (fold3t $ roll 11 wheel primes')
sieve (p:ps) x ds = foldr (filter . notDivsBy)
 
  +
fold3t ((x:xs): ~(ys:zs:t)) = x : union xs (union ys zs)
[x+2,x+4..p*p-2] ds
 
++ sieve ps (p*p) (p:ds)
+
`union` fold3t (pairs t)
  +
pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t
  +
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
  +
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
  +
gaps k ws@(w:t) cs@(c:u) | k==c = gaps (k+w) t u
  +
| True = k : gaps (k+w) t cs
  +
roll k ws@(w:t) ps@(p:u) | k==p = scanl (\c d->c+p*d) (p*p) ws
  +
: roll (k+w) t u
  +
| True = roll (k+w) t ps
 
</haskell>
 
</haskell>
   
  +
''The original idea of the right-deepening unbounded linear merging of multiples is due to Richard Bird (presented in Melissa O'Neill's article), and the original idea of using a tree-like folding structure is due to Dave Bayer, on haskell-cafe mailing list.''
It explicitly maintains the list of primes needed for testing each odds span between successive primes squares, which it explicitly generates. But it tests with nested <code>filter</code>s, which it repeatedly recreates.
 
   
  +
== Turner's sieve - Trial division ==
===== Generated Spans, by List of Primes =====
 
   
  +
David Turner's original 1975 formulation ''(SASL Language Manual, 1975)'' replaces non-standard ''`minus`'' with stock list comprehension with ''rem'' filtering in the sieve of Eratosthenes, turning it into a kind of trial division algorithm:
So the [[Prime_numbers#Postponed Filters Sieve | Postponed Filters]] code, while running, creates a linear structure of nested filters, testing divisibility by successive primes. This implicit list of primes, each held in its filter's call frame (in Lisp terms) at run-time, can be made ''explicit'', and used explicitly.
 
   
  +
<haskell>
It is actually just the prefix of the primes list ''itself'', of growing length. Now having been ''explicated'', it can be used in one-test ''"flat"'' reformulation of the nested chain of tests. ''Algorithmically'' it is exactly the same - it still tests each candidate number for divisibility by all the primes not greater than its square root upto its lowest factor - but does so in a ''computationally'' different way, '''''turning a list of''''' (<code>filter</code>) '''''functions into a function''''' (<code>and</code>) '''''on a list''''' and inlining the <code>span</code> code, avoiding as much recalculations as possible:
 
  +
-- unbounded sieve, premature filters
  +
primesT = 2 : sieve [3,5..] where
  +
sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]
  +
-- filter ((/=0).(`rem`p)) xs
  +
</haskell>
  +
  +
This creates an immense number of superfluous implicit filters in extremely premature fashion. To be admitted as prime, ''each number'' will be ''tested for divisibility'' here by all its preceding primes, while just those not greater than its square root would suffice. To find e.g. the '''1001'''st prime (<code>7927</code>), '''1000''' filters are used, when in fact just the first '''24''' are needed (up to <code>89</code>'s filter only). Operational overhead is enormous here.
  +
  +
=== Guarded ===
  +
When we force ourselves away from the ''Quest for a Mythical One-Liner'' it really ought to be written ''at least'' as bounded and guarded variety (if not abandoned right away in favor of algorithmically superior sieve of Eratosthenes), yet again achieving the ''miraculous'' complexity improvement from above quadratic to about <math>O(n^{1.45})</math> empirically (in ''n'' primes produced):
   
 
<haskell>
 
<haskell>
  +
primesToGT m = 2 : sieve [3,5..m] where
primes = 2: oddprimes
 
oddprimes = 3: sieve oddprimes 3 0
+
sieve (p:xs)
  +
| p*p > m = p : xs
sieve (p:ps) x k
 
= [n | n <- [x+2,x+4..p*p-2]
+
| True = p : sieve [x | x<-xs, rem x p /= 0]
, and [rem n p/=0 | p <- take k oddprimes]]
+
-- filter ((/=0).(`rem`p)) xs
++ sieve ps (p*p) (k+1)
 
 
</haskell>
 
</haskell>
   
  +
=== Postponed ===
It produces about ''222,000 primes'' in the same amount of time, and is good for creating about a million first primes, compiled.
 
  +
or, ''better'', as a postponed, unbounded variety:
  +
  +
<haskell>
  +
primesPT = 2 : primes'
  +
where
  +
primes' = sieve [3,5..] primes' 9
  +
sieve (p:xs) ps@ ~(_:t) q
  +
| p < q = p : sieve xs ps q
  +
| True = sieve [x | x<-xs, rem x p /= 0] t (head t^2)
  +
-- filter ((/=0).(`rem`p)) xs
  +
</haskell>
  +
  +
=== Segmented ===
  +
A certain further speedup (though not in complexity which stays the same) can be achieved by ''explicating'' the nested filters structure created here as well at run-time, ''turning'' a list of filters into a filter by an explicit list of primes (which is just a ''prefix'' of the primes list itself that is being built):
  +
<haskell>
  +
primesST = 2 : primes'
  +
where
  +
primes' = 3 : sieve primes' 3 0
  +
sieve (p:ps) x k =
  +
let pfx = take k primes'
  +
in [n | n <- [x+2,x+4..p*p-2], and [rem n f /= 0 | f <- pfx]]
  +
-- filter (\n-> all ((/=0).(n`rem`)) pfx) [x+2,x+4..p*p-2]
  +
++ sieve ps (p*p) (k+1)
  +
</haskell>
  +
This seems to be closest to optimal ''trial division'', with most recalculations eliminated, explicitly filtering composites out from batches of odds between the consequent squares of primes. All these variants of course being variations of trial division, finding out primes by direct divisibility testing of ''every odd number'' by ''all'' primes below its square root (instead of just its factors, which is what ''eliminating the multiples'' is doing, essentially), are thus principally of ''worse'' complexities than the sieve of Eratosthenes; but one far worse than another, yet fixed ''easily'' from a wasteful monstrosity to almost acceptible performance (at least for the first few hunderd thousand primes, when compiled) just by following the ''proper definition'' of the sieve as being bounded, simply with the guarded formulation.
   
  +
Another task is to produce primes above a given number (not knowing their ordinal numbers this time), which will demand compute just the primes below its square root:
The reason to have <code>sieve</code> function available separately too is that it can also be used to produce primes above a given number, as in
 
   
 
<haskell>
 
<haskell>
 
primesFrom m = sieve ps (m`div`2*2+1) (length h)
 
primesFrom m = sieve ps (m`div`2*2+1) (length h)
 
where
 
where
(h,(_:ps)) = span (<= (floor.sqrt.fromIntegral) m) primes
+
(h,(_:ps)) = span (<= (floor.sqrt$fromIntegral m+1)) primesST
 
</haskell>
 
</haskell>
   
  +
=== Conclusion ===
It can thus produce a few primes e.g. above <code>239812076741689</code>, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions).
 
  +
So it really pays off to analyse the code properly instead of just labeling it ''"naive"''. BTW were this divisibility testing ''somehow turned'' into an '''O(1)''' operation, e.g. by some kind of massive parallelization, the overall complexity would drop to '''O(n)'''. It's the sequantiality of testing that's the culprit. Though of course the proper multiples-removing S. of E. is a better candidate for parallelization.
   
  +
So the initial Turner code is just a ''one-liner'' that ought to have been regarded as ''specification'' only, in the first place, not a code to be executed as such. The reason it was taught that way is probably so that it could provoke this discussion among the students. ''To regard it as code is what's been naive all along''.
=== Getting the Composite Numbers Out ===
 
   
The divisibility testing too could be considered a specification (as in ''no multiples of p''), and not a code per se, because although testing ''composites'' is cheap (as most of them will have small factors, so the test is soon aborted), testing ''prime'' numbers is costly (as ''all'' of the divisibility tests will get performed, and failed, for each new prime), and is to be avoided if possible.
 
   
==== Euler's Sieve ====
+
== Euler's Sieve ==
With each found prime Euler's sieve [http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data-List-Ordered.html#v:minus removes] all its multiples ''in advance'' so that at each step the list to process is guaranteed to have ''no multiples'' of any of the preceding primes in it (consists only of numbers ''coprime'' with all the preceding primes) and thus starts with the next prime:
+
With each found prime Euler's sieve removes all its multiples ''in advance'' so that at each step the list to process is guaranteed to have ''no multiples'' of any of the preceding primes in it (consists only of numbers ''coprime'' with all the preceding primes) and thus starts with the next prime:
   
 
<haskell>
 
<haskell>
  +
primesEU = euler [2..] where
import Data.List.Ordered (minus)
 
  +
euler (p:xs) = p : euler (xs `minus` map (p*) (p:xs))
 
euler xs@(p:xt) = p : euler (xt `minus` map (p*) xs)
 
primes = euler [2..]
 
 
</haskell>
 
</haskell>
   
This code is extremely inefficient, running at about O(<math>{n^{2.4}}</math>) complexity, and should be regarded a ''specification'' only. It works very hard trying to avoid double hits on multiples, which are relatively few and cheap to deal with in the first place (''[[Prime_numbers#The Classic Turner.27s Sieve |Turner's sieve]] could actually be seen as its rendition'', substituting built-in &nbsp;<code>filter</code>&nbsp; for &nbsp;<code>minus</code> - producing the same results and thus mathematically equivalent yet algorithmically different, turning a sieve into a trial division, essentially).
+
This code is extremely inefficient, running at about O(<math>{n^{2.4}}</math>) complexity, and should be regarded a ''specification'' only. It works very hard trying to avoid double hits on multiples, which are relatively few and cheap to deal with in the first place.
  +
  +
=== Wheeled list representation ===
   
 
The situation can be improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:
 
The situation can be improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:
Line 167: Line 295:
 
This runs at about O(<math>{n^{1.5}}</math>) complexity, for <code>n</code> primes produced.
 
This runs at about O(<math>{n^{1.5}}</math>) complexity, for <code>n</code> primes produced.
   
==== Postponed Multiples Removal ====
 
It is a small modification to [[Prime_numbers#Postponed Filters Sieve | Postponed Filters sieve]] to make it generate and ''remove the multiples'' instead of ''testing each'' candidate:
 
   
  +
== Using Immutable Arrays ==
<haskell>
 
primes = 2: 3: sieve (tail primes) [5,7..]
 
where
 
sieve (p:ps) xs = h ++ sieve ps (t `minus` [q,q+2*p..])
 
where q = p*p
 
(h,t) = span (< q) xs
 
</haskell>
 
   
  +
=== Generating Segments of Primes ===
It is similar to Euler's sieve in that it removes multiples of each prime in advance from a numbers supply at each step before it is fed into the next step corresponding to the next prime; but will generate, and try to remove, some of the multiples more than once, like e.g. <code>3*15</code> and <code>5*9</code>, unlike the true Euler's sieve which wouldn't generate the latter. So it is actually a kind of Sieve of Eratosthenes, removing primes' multiples while generating (counting) them one by one - just not the most computationally efficient one.
 
   
  +
The removal of multiples on each segment of odds can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays:
===== Multiples Removal on Generated Spans, or Sieve of Eratosthenes =====
 
 
[[Prime_numbers#Postponed multiples removal | Postponed multiples removal]], with work done segment-wise between the successive primes squares in similarity to the [[Prime_numbers#Generated Spans, by List of Primes|Generated Spans]], becomes
 
 
<haskell>
 
primes = 2: 3: sieve (tail primes) 3 []
 
where
 
sieve (p:ps) x fs = let q=p*p in
 
foldr (flip minus) [x+2,x+4..q-2]
 
[[y+s,y+2*s..q] | (s,y) <- fs]
 
++ sieve ps q
 
((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
 
</haskell>
 
 
This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using [http://ideone.com/pfREP tree-like folding] further speeds up the code and improves its asympthotic behaviour.
 
 
The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each <i>finite</i> span.
 
 
==== Merged Multiples Removal Sieve ====
 
The left-associating <code>(...((xs-a)-b)-...)</code> [[Prime_numbers#Postponed Multiples Removal | above]] buries the ''initial numbers supply'' <code>xs</code> (here, <code>[5,7..]</code> list) deeper and deeper on the left, with time. But it is the same as <code>(xs-(a+b+...))</code>, and so we can just remove from it the infinite primes multiples lists (all [http://hackage.haskell.org/packages/archive/data-ordlist/latest/doc/html/Data-List-Ordered.html#v%3Aunion united] into one without duplicates), each starting at its seeding prime's square (arriving at what is essentially equivalent to the Richard Bird's code from Melissa O'Neill's article). This way we needn't explicitly jump to a prime's square because it's where its multiples start anyway:
 
<haskell>
 
import Data.List.Ordered (minus,union)
 
 
primes = 2: 3: [5,7..] `minus` foldr union' []
 
[ [p*p,p*p+2*p..] | p<- tail primes ]
 
where
 
union' (q:qs) xs = q : union qs xs
 
</haskell>
 
 
<code>union'</code> is discriminating on its first argument's head element to stop <code>foldr</code> from looping.
 
 
This code is yet faster. Its main deficiency still is that it creates a ''linear'' nested merging structure instead of a ''tree-like'' structure. Each multiple produced by a prime has to percolate to the top eventually, so it's better to make its path shorter. It'll have to go through fewer merge nodes this way.
 
 
==== Treefold Merged Multiples Removal ====
 
The new folding functions <code>foldt</code> and <code>pairs</code> too just have to be made discriminating on their first agrument to prevent looping, as <code>union'</code> above:
 
 
<haskell>
 
primes = 2: 3: 5: [7,9..] `minus` foldt
 
[ [p*p,p*p+2*p..] | p <- tail primes ]
 
where
 
foldt ((x:xs):t) = x : xs `union` foldt (pairs t)
 
pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t
 
</haskell>
 
 
The simple fold structure of <code>1+(2+(4+(8+...)))</code> can be further changed into <code>(2+4)+((4+8)+( ... ))</code> structure better adjusted for primes multiples production, giving it about 5% additional speedup.
 
 
It can be also further improved with the wheel optimization (as seen in [[Prime_numbers#Euler.27s_Sieve | Euler's Sieve]] above).
 
 
==== Treefold Merged Multiples, with Wheel ====
 
The odds-only feed <code>3: 5: [7,9..]</code> above is exactly what <code>fromSpan([3],2)</code> produces. Using a larger wheel means all the multiples of the starting few primes, and not just of <code>2</code>, will be automatically excluded in advance. This finally leads to:
 
 
<haskell>
 
{-# OPTIONS_GHC -O2 -fno-cse #-}
 
primes = 2:3:5:7: gaps 11 wheel (join $ roll 11 wheel primes')
 
where
 
primes' = 11: gaps 13 (tail wheel) (join $ roll 11 wheel primes')
 
 
gaps k ws@(w:t) cs@(c:u) | k==c = gaps (k+w) t u
 
| True = k : gaps (k+w) t cs
 
roll k ws@(w:t) ps@(p:u) | k==p = scanl (\c d->c+p*d) (p*p) ws
 
: roll (k+w) t u
 
| True = roll (k+w) t ps
 
join ((x:xs): ~(ys:zs:t)) = x : union xs (union ys zs)
 
`union` join (pairs t)
 
pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t
 
 
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
 
4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
 
</haskell>
 
 
The double primes feed is introduced here to save memory as per Melissa O'Neill's code, and is dependent on no expression sharing being performed by a compiler. Running at comparable speed on ''[http://ideone.com/willness/primes Ideone.com]'' as [[Prime_numbers#Generating Segments of Primes | immutable arrays]] version, yet unlike it near constant and very low in its memory usage (low MBs for first few million primes), it could be considered a best performing "functional" (list-based) code version here.
 
 
Compared with Melissa O’Neill’s considerably more complex priority queue-based code, it runs (compiled by ghc 6.10.1 with -O2 switch) about 1.6x slower at producing 1 million primes, and about 1.7x slower at 10 to 20 million, with empirical time complexity of O(n^1.22) vs. O(n^1.20), both versions having same low and near-constant memory consumption.
 
 
=== Using Immutable Arrays ===
 
 
==== Generating Segments of Primes ====
 
 
The removal of multiples on each segment of odds in the [[Prime_numbers#Multiples Removal on Generated Spans, or Sieve of Eratosthenes | Sieve of Eratosthenes]] can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays:
 
   
 
<haskell>
 
<haskell>
Line 272: Line 314:
 
</haskell>
 
</haskell>
   
Apparently, arrays are ''fast''. The above is the fastest code of all presented so far. When run on ''[http://ideone.com/willness/primes Ideone.com]'' it is somewhat faster than [[Prime_numbers#Treefold Merged Multiples.2C with Wheel | wheeled treefold]] in producing first few million primes, but is very much unacceptable in its memory consumption which grows faster than O(<math>{n}</math>), quickly getting into tens and hundreds of MBs.
+
Apparently, arrays are ''fast''. The above is the fastest code of all presented so far. When run on ''[http://ideone.com/willness/primes Ideone.com]'' it is somewhat faster than [[Prime_numbers#Tree Merging With Wheel]] in producing first few million primes, but is very much unacceptable in its memory consumption which grows faster than O(<math>{n}</math>), quickly getting into tens and hundreds of MBs.
   
==== Calculating Primes Upto a Given Value ====
+
=== Calculating Primes Upto a Given Value ===
   
 
<haskell>
 
<haskell>
Line 288: Line 330:
 
</haskell>
 
</haskell>
   
==== Calculating Primes in a Given Range ====
+
=== Calculating Primes in a Given Range ===
   
 
<haskell>
 
<haskell>
Line 308: Line 350:
 
Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.
 
Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.
   
=== Using Mutable Arrays ===
+
== Using Mutable Arrays ==
   
 
Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.
 
Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.
   
==== Using ST Array ====
+
=== Using ST Array ===
   
 
This method implements the Sieve of Eratosthenes, similar to how you might do it
 
This method implements the Sieve of Eratosthenes, similar to how you might do it
Line 344: Line 386:
 
Its empirical time complexity is improving with n (number of primes produced) from O(n^1.25) through O(n^1.20) towards O(n^1.16). The reference [http://ideone.com/FaPOB C++ vector-based implementation] exhibits this improvement in empirical time complexity too, from O(n^1.5) gradually towards O(n^1.12), where tested ''(which might be interpreted as evidence towards the expected <math>O(n*\log n*\log(\log n))</math> complexity)''.
 
Its empirical time complexity is improving with n (number of primes produced) from O(n^1.25) through O(n^1.20) towards O(n^1.16). The reference [http://ideone.com/FaPOB C++ vector-based implementation] exhibits this improvement in empirical time complexity too, from O(n^1.5) gradually towards O(n^1.12), where tested ''(which might be interpreted as evidence towards the expected <math>O(n*\log n*\log(\log n))</math> complexity)''.
   
==== Bitwise prime sieve with Template Haskell ====
+
=== Bitwise prime sieve with Template Haskell ===
   
 
Count the number of prime below a given 'n'. Shows fast bitwise arrays,
 
Count the number of prime below a given 'n'. Shows fast bitwise arrays,
Line 403: Line 445:
 
</haskell>
 
</haskell>
   
=== Implicit Heap ===
+
== Implicit Heap ==
   
 
The following is an original implicit heap implementation for the sieve of
 
The following is an original implicit heap implementation for the sieve of
Eratosthenes. The [[Prime_numbers#Treefold_Merged_Multiples.2C_with_Wheel | treefold merge]] section above simplifies it, removing the <code>People a</code> structure altogether, and improves upon it by using a folding tree structure better adjusted for primes processing, and a [[Prime_numbers#Euler.27s_Sieve | wheel]] optimization.
+
Eratosthenes, kept here for historical record. The [[Prime_numbers#Tree Merging with Wheel]] section above simplifies it, removing the <code>People a</code> structure altogether, and improves upon it by using a folding tree structure better adjusted for primes processing, and a [[Prime_numbers#Euler.27s_Sieve | wheel]] optimization.
   
 
See also the message threads [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/25270/focus=25312 Re: "no-coding" functional data structures via lazyness] for more about how merging ordered lists amounts to creating an implicit heap and [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/26426/focus=26493 Re: Code and Perf. Data for Prime Finders] for an explanation of the <code>People a</code> structure that makes it work.
 
See also the message threads [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/25270/focus=25312 Re: "no-coding" functional data structures via lazyness] for more about how merging ordered lists amounts to creating an implicit heap and [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/26426/focus=26493 Re: Code and Perf. Data for Prime Finders] for an explanation of the <code>People a</code> structure that makes it work.
Line 449: Line 491:
 
<code>nonprimes</code> effectively implements a heap, exploiting lazy evaluation.
 
<code>nonprimes</code> effectively implements a heap, exploiting lazy evaluation.
   
=== Prime Wheels ===
+
== Prime Wheels ==
   
 
The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form <math>6k+1</math> or <math>6k+5</math>. Thus, we only need to test these numbers:
 
The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form <math>6k+1</math> or <math>6k+5</math>. Thus, we only need to test these numbers:
Line 508: Line 550:
   
   
=== Using IntSet for a traditional sieve ===
+
== Using IntSet for a traditional sieve ==
 
<haskell>
 
<haskell>
   
Line 531: Line 573:
 
| otherwise = worker (findNext (x+1) is) (mark x is)
 
| otherwise = worker (findNext (x+1) is) (mark x is)
 
</haskell>
 
</haskell>
  +
  +
''(doesn't look like it runs very efficiently)''.
   
 
== Testing Primality ==
 
== Testing Primality ==

Revision as of 13:07, 28 May 2011

In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2.

Prime Number Resources

  • HackageDB packages:
    • Numbers: An assortment of number theoretic functions.
    • NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
    • primes: Efficient, purely functional generation of prime numbers.

Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other than itself. That means that starting with 2, for each newly found prime we can eliminate from the rest of the numbers all such numbers that have this prime as their divisor, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end what we are left with are just primes.

To eliminate a prime's multiples from the result we can either a. plainly test each new candidate number for divisibility by that prime with a direct test, giving rise to a kind of "trial division" algorithm; or b. we can find out multiples of a prime p by counting up from it by p numbers at a time, resulting in a variant of a "genuine sieve" as it was reportedly originally conceived by Eratosthenes in ancient Greece.

Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient list-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results one by one.

Sieve of Eratosthenes

Sieve of Eratosthenes is genuinely represented by

 -- genuine yet wasteful Eratosthenes
 primesTo m = 2 : sieve [3,5..m]  where
    sieve []     = []
    sieve (p:xs) = p : sieve (xs `minus` [p,p+2*p..m])

This should be regarded more like a specification, not a code. It is extremely slow, running at empirical time complexities worse than quadratic in number of primes produced. But it has the core defining features of S. of E. as a) being bounded, i.e. having a top limit value, and b) finding out the multiples of a prime by counting up from it. Yes, this is exactly how Eratosthenes defined it (Nicomachus, Introduction to Arithmetic, I, pp. 13, 31).

The canonical list-difference minus and duplicates-removing list-union union functions dealing with ordered increasing lists - infinite as well as finite - are simple enough to define (using compare has an effect of comparing the values only once, unlike when using (<) etc):

 -- ordered lists, difference and union
 minus (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : minus  xs  (y:ys)
           EQ ->     minus  xs     ys 
           GT ->     minus (x:xs)  ys
 minus  xs     _     = xs
 union (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : union  xs  (y:ys)
           EQ -> x : union  xs     ys 
           GT -> y : union (x:xs)  ys
 union  xs     ys    = xs ++ ys

(the name merge ought to be reserved for duplicates-preserving merging as used by mergesort - that's why we use union here, following Leon P.Smith's Data.List.Ordered package).

Analysis

So what does it do, this sieve code? For each found prime it removes its odd multiples from further consideration. It finds them by counting up in steps of 2p. There are thus multiples for each prime, and multiples total, with duplicates, by virtue of prime harmonic series, where .

If each multiple is dealt with in time, this will translate into RAM-mashine operations (since we consider addition as an atomic operation). Indeed, mutable random-access arrays allow for that. But lists in Haskell don't, and complexity of minus(a,b) is about , i.e. here it is which is according to the notice about the Φ-function in Melissa O'Neill's article.

It looks like . Since the number of primes below m is by the prime number theorem (where is a prime counting function), there will be k multiples-removing steps in the algorithm; it means total complexity of , or in n primes produced - much much worse than the optimal .

Squared

But we can start each step at the prime's square, as its smaller multiples will be already processed on previous steps. This means we can stop early, when the prime's square reaches the top value m, and thus cut the total number of steps to around . This does not in fact change the complexity of random-access code, but for lists it makes it , or in n primes produced, showing an enormous speedup in practice:

 primesToQ m = 2 : sieve [3,5..m]  where
    sieve []     = []
    sieve (p:xs) = p : sieve (xs `minus` [p*p,p*p+2*p..m])

Its empirical complexity is about . This simple optimization of starting from a prime's square has dramatic effect here because our formulation is bounded, in accordance with the original algorithm. This has the desired effect of stopping early, turning the unacceptably slow initial specification into a code, yet-far-from-optimal and slow, but acceptably so.

Guarded

This ought to be explicated (improving on clarity though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from p or p*p - but only after we've went to the necessary trouble of explicitly stopping early on:

 primesToG m = 2 : sieve [3,5..m]  where
    sieve (p:xs) 
       | p*p > m = p : xs
       | True    = p : sieve (xs `minus` [p*p,p*p+2*p..m])

Did Eratosthenes himself achieve the optimal complexity? It rather seems doubtful, as he probably counted boxes in the table by 1 to go from one number to the next, as in 3,5,7,9,11,13,15, ... for he had no access even to Hindu numerals, using Greek alphabet for writing down numbers instead. Was he performing a genuine sieve of Eratosthenes then? We'll leave that as an open question.

Accumulating Array

So while minus(a,b) takes operations for random-access imperative arrays and about operations for lists here, using Haskell's immutable array for a one could expect the array update time to be indeed but it looks like it's not so:

 primesToA m = sieve 3 (array (3,m) [(i,True) | i<-[3..m]] //
                                    [(i,False) | i<-[4,6..m]])
  where
   sieve p a 
    | p*p > m   = 2 : [i | (i,True) <- assocs a]
    | a!p       = sieve (p+2) $ a // [(i,False) | i <- [p*p,p*p+2*p..m]]
    | otherwise = sieve (p+2) a

It's much slower than the above, though it should be running at optimal complexity on implementations which are able to properly use the destructive update here, for an array being passed along as an accumulating parameter, it seems.

How this implementation deficiency is to be overcome? One way is to use explicitly mutable monadic arrays (see below), but we can also think about it a little bit more on the functional side of things.

Postponed

Going back to guarded Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each a bit prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:

 primesPE = 2 : primes'
  where 
    primes' = sieve [3,5..] primes' 9
    sieve (p:xs) ps@ ~(_:t) q
       | p < q  = p : sieve xs ps q
       | True   =     sieve (xs `minus` [q,q+2*p..]) t (head t^2)

Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could now finally be made unbounded. Because no multiples-removal process is started prematurely, there are no extraneous mutiples streams, which were the reason for the extreme wastefulness and thus inefficiency of the original formulation.

Segmented

With work done segment-wise between the successive primes squares it becomes

  primes = 2: 3: sieve (tail primes) 3 []
   where
    sieve (p:ps) x fs = let q=p*p in
        foldr (flip minus) [x+2,x+4..q-2] 
                           [[y+s,y+2*s..q] | (s,y) <- fs]
        ++ sieve ps q 
               ((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])

This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting one by one the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using tree-like folding further speeds up the code and improves its asympthotic behaviour.

The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each finite span.

Linear merging

But what does the Postponed code do, operationally? For each prime's square passed over, it builds up a nested linear left-deepening structure, (...((xs-a)-b)-...), where xs is the original odds-producing [3,5..] list, so that each odd it produces must go through `minus` nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, right-deepening structure, (xs-(a+(b+...))), be better?

Here, xs would stay near the top, and more frequently odds-producing streams of multiples of smaller primes would be above those of the bigger primes, which produce less frequently their candidates which have to pass through more `minus` nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the infinitely-defined structure (specifically, if each (+) operation passes along unconditionally its left child's head value before polling the right child's head value, then we are safe).

Here's the code, faster yet but still with about same time complexity of , which is essentially equivalent to the Richard Bird's code from Melissa O'Neill's article:

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primesLME = 2 : ([3,5..] `minus` fold [[p*p,p*p+2*p..] | p <- primes']) 
  where
   primes' = 3 : ([5,7..] `minus` fold [[p*p,p*p+2*p..] | p <- primes'])   
   fold ((x:xs):t) = x : union xs (fold t)

The double primes feed is introduced here to prevent unneeded memoization and thus save memory, as per Melissa O'Neill's code, and is dependent on no expression sharing being performed by a compiler - space complexity here is probably but in practice it's negligible compared to run-time system memory itself and thus the total memory is very nearly .

Tree merging

Moreover, it can be changed into a tree structure:

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primesTME = 2 : ([3,5..] `minus` foldt [[p*p,p*p+2*p..] | p <- primes']) 
  where
   primes' = 3 : ([5,7..] `minus` foldt [[p*p,p*p+2*p..] | p <- primes'])   
   foldt ((x:xs):t)    = x : union xs (foldt (pairs t))
   pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t

It is very fast, running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about in number of primes n produced).

Tree merging with Wheel

Furthermore, wheel factorization optimization can be applied to it, and another tree structure can be used, better adjusted for the primes multiples production (effecting an additional 5%~10% speedup):

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primesTMWE = 2:3:5:7: gaps 11 wheel (fold3t $ roll 11 wheel primes')
  where
   primes' = 11: gaps 13 (tail wheel) (fold3t $ roll 11 wheel primes')
   fold3t ((x:xs): ~(ys:zs:t)) = x : union xs (union ys zs)    
                                      `union` fold3t (pairs t)  
   pairs ((x:xs):ys:t)         = (x : union xs ys) : pairs t
   wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
           4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
   gaps k ws@(w:t) cs@(c:u) | k==c  = gaps (k+w) t u 
                            | True  = k : gaps (k+w) t cs  
   roll k ws@(w:t) ps@(p:u) | k==p  = scanl (\c d->c+p*d) (p*p) ws 
                                        : roll (k+w) t u 
                            | True  = roll (k+w) t ps

The original idea of the right-deepening unbounded linear merging of multiples is due to Richard Bird (presented in Melissa O'Neill's article), and the original idea of using a tree-like folding structure is due to Dave Bayer, on haskell-cafe mailing list.

Turner's sieve - Trial division

David Turner's original 1975 formulation (SASL Language Manual, 1975) replaces non-standard `minus` with stock list comprehension with rem filtering in the sieve of Eratosthenes, turning it into a kind of trial division algorithm:

 -- unbounded sieve, premature filters
 primesT = 2 : sieve [3,5..]  where
    sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]
                          -- filter ((/=0).(`rem`p)) xs

This creates an immense number of superfluous implicit filters in extremely premature fashion. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes, while just those not greater than its square root would suffice. To find e.g. the 1001st prime (7927), 1000 filters are used, when in fact just the first 24 are needed (up to 89's filter only). Operational overhead is enormous here.

Guarded

When we force ourselves away from the Quest for a Mythical One-Liner it really ought to be written at least as bounded and guarded variety (if not abandoned right away in favor of algorithmically superior sieve of Eratosthenes), yet again achieving the miraculous complexity improvement from above quadratic to about empirically (in n primes produced):

 primesToGT m = 2 : sieve [3,5..m]  where
    sieve (p:xs) 
       | p*p > m = p : xs
       | True    = p : sieve [x | x<-xs, rem x p /= 0]
                          -- filter ((/=0).(`rem`p)) xs

Postponed

or, better, as a postponed, unbounded variety:

 primesPT = 2 : primes'
  where 
    primes' = sieve [3,5..] primes' 9
    sieve (p:xs) ps@ ~(_:t) q
       | p < q   = p : sieve xs ps q
       | True    =     sieve [x | x<-xs, rem x p /= 0] t (head t^2)
                          -- filter ((/=0).(`rem`p)) xs

Segmented

A certain further speedup (though not in complexity which stays the same) can be achieved by explicating the nested filters structure created here as well at run-time, turning a list of filters into a filter by an explicit list of primes (which is just a prefix of the primes list itself that is being built):

 primesST = 2 : primes' 
  where
    primes' = 3 : sieve primes' 3 0
    sieve (p:ps) x k = 
       let pfx = take k primes'
       in  [n | n <- [x+2,x+4..p*p-2], and [rem n f /= 0 | f <- pfx]]
           -- filter (\n-> all ((/=0).(n`rem`)) pfx) [x+2,x+4..p*p-2]
           ++ sieve ps (p*p) (k+1)

This seems to be closest to optimal trial division, with most recalculations eliminated, explicitly filtering composites out from batches of odds between the consequent squares of primes. All these variants of course being variations of trial division, finding out primes by direct divisibility testing of every odd number by all primes below its square root (instead of just its factors, which is what eliminating the multiples is doing, essentially), are thus principally of worse complexities than the sieve of Eratosthenes; but one far worse than another, yet fixed easily from a wasteful monstrosity to almost acceptible performance (at least for the first few hunderd thousand primes, when compiled) just by following the proper definition of the sieve as being bounded, simply with the guarded formulation.

Another task is to produce primes above a given number (not knowing their ordinal numbers this time), which will demand compute just the primes below its square root:

  primesFrom m   = sieve ps (m`div`2*2+1) (length h)
    where 
      (h,(_:ps)) = span (<= (floor.sqrt$fromIntegral m+1)) primesST

Conclusion

So it really pays off to analyse the code properly instead of just labeling it "naive". BTW were this divisibility testing somehow turned into an O(1) operation, e.g. by some kind of massive parallelization, the overall complexity would drop to O(n). It's the sequantiality of testing that's the culprit. Though of course the proper multiples-removing S. of E. is a better candidate for parallelization.

So the initial Turner code is just a one-liner that ought to have been regarded as specification only, in the first place, not a code to be executed as such. The reason it was taught that way is probably so that it could provoke this discussion among the students. To regard it as code is what's been naive all along.


Euler's Sieve

With each found prime Euler's sieve removes all its multiples in advance so that at each step the list to process is guaranteed to have no multiples of any of the preceding primes in it (consists only of numbers coprime with all the preceding primes) and thus starts with the next prime:

  primesEU = euler [2..] where
    euler (p:xs) = p : euler (xs `minus` map (p*) (p:xs))

This code is extremely inefficient, running at about O() complexity, and should be regarded a specification only. It works very hard trying to avoid double hits on multiples, which are relatively few and cheap to deal with in the first place.

Wheeled list representation

The situation can be improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:

  {- fromElt (x,i) = x : fromElt (x + i,i)
                       === iterate (+ i) x
     [n..]             === fromElt (n,1)     -}

  fromSpan (xs,i)  = xs ++ fromSpan (map (+ i) xs,i)

  {-                   === concat $ iterate (map (+ i)) xs
     [n..]             === fromSpan ([n],1)
     fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)  
     fromSpan (xs,i) `minus` fromSpan (ys,i) 
                       === fromSpan (xs `minus` ys, i)  
     map (p*) (fromSpan (xs,i)) 
                       === fromSpan (map (p*) xs, p*i)
     fromSpan (xs,i)   === forall (p > 0).
       fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}

  eulerS () = iterate eulerStep ([2],1)
  eulerStep (xs@(p:_), i) = 
       ( (tail . concat . take p . iterate (map (+ i))) xs
          `minus` map (p*) xs, p*i )

  {- > mapM_ print $ take 4 $ eulerS ()
     ([2],1)
     ([3],2)
     ([5,7],6)
     ([7,11,13,17,19,23,29,31],30)  -}

This is where the notion of wheel comes from naturally, all by itself. For each wheel specification w@((p:_),_) produced by eulerStep, the numbers in (fromSpan w) up to are all primes too, so that

  eulerPrimesTo m = if m > 1 then go ([2],1) else []
    where
      go w@((p:_), _) 
         | m < p*p = takeWhile (<= m) (fromSpan w)
         | True    = p : go (eulerStep w)

This runs at about O() complexity, for n primes produced.


Using Immutable Arrays

Generating Segments of Primes

The removal of multiples on each segment of odds can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays:

  primes = 2: 3: sieve (tail primes) 3 []
   where 
    sieve (p:ps) x fs = [i*2 + x | (i,e) <- assocs a, e] 
                        ++ sieve ps (p*p) fs'
     where
      q     = (p*p-x)`div`2                  
      fs'   = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
      a     = accumArray (\ b c -> False) True (1,q-1)
                         [(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]]

Apparently, arrays are fast. The above is the fastest code of all presented so far. When run on Ideone.com it is somewhat faster than Prime_numbers#Tree Merging With Wheel in producing first few million primes, but is very much unacceptable in its memory consumption which grows faster than O(), quickly getting into tens and hundreds of MBs.

Calculating Primes Upto a Given Value

  primesToN n = 2: [i | i <- [3,5..n], ar ! i]
   where
    ar = f 5 $ accumArray (\ a b -> False) True (3,n) 
                        [(i,()) | i <- [9,15..n]]
    f p a | q > n = a
          | True  = if null x then a' else f (head x) a'
      where q = p*p
            a'= a // [(i,False) | i <- [q,q+2*p..n]]
            x = [i | i <- [p+2,p+4..n], a' ! i]

Calculating Primes in a Given Range

  primesFromTo a b = (if a<3 then [2] else []) 
                     ++ [i | i <- [o,o+2..b], ar ! i]
   where 
    o  = max (if even a then a+1 else a) 3
    r  = floor.sqrt.fromInteger $ b+1
    ar = accumArray (\a b-> False) True (o,b) 
          [(i,()) | p <- [3,5..r]
                    , let q  = p*p 
                          s  = 2*p 
                          (n,x) = quotRem (o - q) s 
                          q' = if  o <= q  then q
                               else  q + (n + signum x)*s
                    , i <- [q',q'+s..b] ]

Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.

Using Mutable Arrays

Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.

Using ST Array

This method implements the Sieve of Eratosthenes, similar to how you might do it in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently bit-packed) array for the whole sequence produced.

import Control.Monad
import Control.Monad.ST
import Data.Array.IArray
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed

primesToNA :: Int -> UArray Int Bool
primesToNA n = runSTUArray sieve  where
  sieve = do
    let m = (n-1)`div`2
    a <- newArray (1,m) True :: ST s (STUArray s Int Bool)
    let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1
    forM_ [1..sr`div`2] $ \i -> do
      let ii = 2*i*(i+1)     -- == ((2*i+1)^2-1)`div`2
      si <- readArray a i
      when si $
        forM_ [ii,ii+i+i+1..m] $ \j -> writeArray a j False
    return a
 
primesToN :: Int -> [Int]
primesToN n = 2:[i*2+1 | (i,e) <- assocs . primesToNA $n, e]

Its empirical time complexity is improving with n (number of primes produced) from O(n^1.25) through O(n^1.20) towards O(n^1.16). The reference C++ vector-based implementation exhibits this improvement in empirical time complexity too, from O(n^1.5) gradually towards O(n^1.12), where tested (which might be interpreted as evidence towards the expected complexity).

Bitwise prime sieve with Template Haskell

Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies.

{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits

nthPrime :: Int -> Int
nthPrime n = runST (sieve n)

sieve n = do
    a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
    let cutoff = truncate (sqrt $ fromIntegral n) + 1
    go a n cutoff 3 1

go !a !m cutoff !n !c
    | n >= m    = return c
    | otherwise = do
        e <- unsafeRead a n
        if e then
          if n < cutoff then
            let loop !j
                 | j < m     = do
                        x <- unsafeRead a j
                        when x $ unsafeWrite a j False
                        loop (j+n)
                 | otherwise = go a m cutoff (n+2) (c+1)
            in loop ( if n < 46340 then n * n else n `shiftL` 1)
           else go a m cutoff (n+2) (c+1)
         else go a m cutoff (n+2) c

And place in a module:

{-# OPTIONS -fth #-}
import Primes

main = print $( let x = nthPrime 10000000 in [| x |] )

Run as:

$ ghc --make -o primes Main.hs
$ time ./primes
664579
./primes  0.00s user 0.01s system 228% cpu 0.003 total

Implicit Heap

The following is an original implicit heap implementation for the sieve of Eratosthenes, kept here for historical record. The Prime_numbers#Tree Merging with Wheel section above simplifies it, removing the People a structure altogether, and improves upon it by using a folding tree structure better adjusted for primes processing, and a wheel optimization.

See also the message threads Re: "no-coding" functional data structures via lazyness for more about how merging ordered lists amounts to creating an implicit heap and Re: Code and Perf. Data for Prime Finders for an explanation of the People a structure that makes it work.

data People a = VIP a (People a) | Crowd [a]

mergeP :: Ord a => People a -> People a -> People a
mergeP (VIP x xt) ys                    = VIP x $ mergeP xt ys
mergeP (Crowd xs) (Crowd ys)            = Crowd $ merge  xs ys
mergeP xs@(Crowd (x:xt)) ys@(VIP y yt)  = case compare x y of
    LT -> VIP x $ mergeP (Crowd xt) ys
    EQ -> VIP x $ mergeP (Crowd xt) yt
    GT -> VIP y $ mergeP xs yt

merge :: Ord a => [a] -> [a] -> [a]
merge xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : merge xt ys
    EQ -> x : merge xt yt
    GT -> y : merge xs yt

diff xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : diff xt ys
    EQ ->     diff xt yt
    GT ->     diff xs yt

foldTree :: (a -> a -> a) -> [a] -> a
foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
    where pairs ~(x: ~(y:ys)) = f x y : pairs ys

primes, nonprimes :: [Integer]
primes    = 2:3:diff [5,7..] nonprimes
nonprimes = serve . foldTree mergeP . map multiples $ tail primes
    where
    multiples p = vip [p*p,p*p+2*p..]

    vip (x:xs)       = VIP x $ Crowd xs
    serve (VIP x xs) = x:serve xs
    serve (Crowd xs) = xs

nonprimes effectively implements a heap, exploiting lazy evaluation.

Prime Wheels

The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form or . Thus, we only need to test these numbers:

primes :: [Integer]
primes = 2:3:primes'
  where
    1:p:candidates = [6*k+r | k <- [0..], r <- [1,5]]
    primes'        = p : filter isPrime candidates
    isPrime n      = all (not . divides n)
                       $ takeWhile (\p -> p*p <= n) primes'
    divides n p    = n `mod` p == 0

Here, primes' is the list of primes greater than 3 and isPrime does not test for divisibility by 2 or 3 because the candidates by construction don't have these numbers as factors. We also need to exclude 1 from the candidates and mark the next one as prime to start the recursion.

Such a scheme to generate candidate numbers first that avoid a given set of primes as divisors is called a prime wheel. Imagine that you had a wheel of circumference 6 to be rolled along the number line. With spikes positioned 1 and 5 units around the circumference, rolling the wheel will prick holes exactly in those positions on the line whose numbers are not divisible by 2 and 3.

A wheel can be represented by its circumference and the spiked positions.

data Wheel = Wheel Integer [Integer]

We prick out numbers by rolling the wheel.

roll (Wheel n rs) = [n*k+r | k <- [0..], r <- rs]

The smallest wheel is the unit wheel with one spike, it will prick out every number.

w0 = Wheel 1 [1]

We can create a larger wheel by rolling a smaller wheel of circumference n along a rim of circumference p*n while excluding spike positions at multiples of p.

nextSize (Wheel n rs) p =
  Wheel (p*n) [r' | k <- [0..(p-1)], r <- rs,
                      let r' = n*k+r, r' `mod` p /= 0]

Combining both, we can make wheels that prick out numbers that avoid a given list ds of divisors.

mkWheel ds = foldl nextSize w0 ds

Now, we can generate prime numbers with a wheel that for instance avoids all multiples of 2, 3, 5 and 7.

primes :: [Integer]
primes = small ++ large
    where
    1:p:candidates = roll $ mkWheel small
    small          = [2,3,5,7]
    large          = p : filter isPrime candidates
    isPrime n      = all (not . divides n) 
                       $ takeWhile (\p -> p*p <= n) large
    divides n p    = n `mod` p == 0

It's a pretty big wheel with a circumference of 210 and allows us to calculate the first 10000 primes in convenient time.

A fixed size wheel is fine, but how about adapting the wheel size while generating prime numbers? See Euler's Sieve above, or the functional pearl titled Lazy wheel sieves and spirals of primes for more.


Using IntSet for a traditional sieve

module Sieve where
import qualified Data.IntSet as I

-- findNext - finds the next member of an IntSet.
findNext c is | I.member c is = c
              | c > I.findMax is = error "Ooops. No next number in set."
              | otherwise = findNext (c+1) is

-- mark - delete all multiples of n from n*n to the end of the set
mark n is = is I.\\ (I.fromAscList (takeWhile (<=end) (map (n*) [n..])))
                where
                    end = I.findMax is

-- primes - gives all primes up to n 
primes n = worker 2 (I.fromAscList [2..n])
                where
                    worker x is 
                     | (x*x) > n = is
                     | otherwise = worker (findNext (x+1) is) (mark x is)

(doesn't look like it runs very efficiently).

Testing Primality

See Testing primality.

External links

A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.
WARNING: Don't use the priority queue from that file for your projects: it's broken and works for primes only by a lucky chance.
test entries for (some of) the above code variants