Difference between revisions of "Prime numbers"

From HaskellWiki
Jump to navigation Jump to search
(→‎Segmented: synch w/ segmented gen&test)
Line 114: Line 114:
 
primesSE () = 2 : primes'
 
primesSE () = 2 : primes'
 
where
 
where
primes' = 3 : sieve primes' 3 9 []
+
primes' = 3 : sieve primes' 5 9 []
 
sieve (p:ps) x q fs =
 
sieve (p:ps) x q fs =
foldr (flip minus) [x+2,x+4..q-2]
+
foldr (flip minus) [x,x+2..q-2]
 
[[y+s,y+2*s..q] | (s,y) <- fs]
 
[[y+s,y+2*s..q] | (s,y) <- fs]
++ sieve ps q (head ps^2)
+
++ sieve ps (q+2) (head ps^2)
 
((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
 
((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
 
</haskell>
 
</haskell>
Line 124: Line 124:
 
This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using [http://ideone.com/pfREP tree-like folding] further speeds up the code and improves its asympthotic behaviour.
 
This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using [http://ideone.com/pfREP tree-like folding] further speeds up the code and improves its asympthotic behaviour.
   
The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each ''finite'' span.
+
The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each ''finite'' span; and memory usage can be kept in check by using fixed sized segments.
   
 
=== Linear merging ===
 
=== Linear merging ===

Revision as of 18:55, 10 October 2011

In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2.

Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other than itself. That means that starting with 2, for each newly found prime we can eliminate from the rest of the numbers all such numbers that have this prime as their divisor, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end what we are left with are just primes.

To eliminate a prime's multiples from the result we can either a. plainly test each new candidate number for divisibility by that prime with a direct test, giving rise to a kind of "trial division" algorithm; or b. we can find out multiples of a prime p by counting up from it by p numbers at a time, resulting in a variant of a "genuine sieve" as it was reportedly originally conceived by Eratosthenes in ancient Greece.

Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient list-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results one by one.

Prime Number Resources

  • HackageDB packages:
    • Numbers: An assortment of number theoretic functions.
    • NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
    • primes: Efficient, purely functional generation of prime numbers.
  • Papers:
    • O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes", Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 doi:10.1017/S0956796808007004.

Sieve of Eratosthenes

Sieve of Eratosthenes is genuinely represented by

 -- genuine yet wasteful sieve of Eratosthenes
 primesTo m = 2 : eratos [3,5..m]  where
    eratos []     = []
    eratos (p:xs) = p : eratos (xs `minus` [p,p+2*p..m])
 -- eulers (p:xs) = p : eulers (xs `minus` map (p*)(p:xs))
 -- turner (p:xs) = p : turner [x | x<-xs, rem x p /= 0]

This should be regarded more like a specification, not a code. It is extremely slow, running at empirical time complexities worse than quadratic in number of primes produced. But it has the core defining features of S. of E. as a. being bounded, i.e. having a top limit value, and b. finding out the multiples of a prime by counting up from it. Yes, this is exactly how Eratosthenes defined it (Nicomachus, Introduction to Arithmetic, I, pp. 13, 31).

The canonical list-difference minus and duplicates-removing list-union union functions dealing with ordered increasing lists - infinite as well as finite - are simple enough to define (using compare has an effect of comparing the values only once, unlike when using (<) etc):

 -- ordered lists, difference and union
 minus (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : minus  xs  (y:ys)
           EQ ->     minus  xs     ys 
           GT ->     minus (x:xs)  ys
 minus  xs     _     = xs
 union (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : union  xs  (y:ys)
           EQ -> x : union  xs     ys 
           GT -> y : union (x:xs)  ys
 union  xs     ys    = xs ++ ys

(the name merge ought to be reserved for duplicates-preserving merging as used by mergesort - that's why we use union here, following Leon P.Smith's Data.List.Ordered package).

Analysis

So what does it do, this sieve code? For each found prime it removes its odd multiples from further consideration. It finds them by counting up in steps of 2p. There are thus multiples for each prime, and multiples total, with duplicates, by virtues of prime harmonic series, where .

If each multiple is dealt with in time, this will translate into RAM-mashine operations (since we consider addition as an atomic operation). Indeed, mutable random-access arrays allow for that. Melissa O'Neill's article's stated goal was to show that so does efficient Priority Queue implementation in Haskell as well. But lists in Haskell are sequential, not random-access, and complexity of minus(a,b) is about , i.e. here it is which is according to the remark about the Φ-function in Melissa O'Neill's article.

It looks like . Since the number of primes below m is by the prime number theorem (where is a prime counting function), there will be k = n multiples-removing steps in the algorithm; it means total complexity of , or in n primes produced - much much worse than the optimal .

From Squares

But we can start each step at the prime's square, as its smaller multiples will be already processed on previous steps. This means we can stop early, when the prime's square reaches the top value m, and thus cut the total number of steps to around . This does not in fact change the complexity of random-access code, but for lists it makes it , or in n primes produced, showing an enormous speedup in practice:

 primesToQ m = 2 : sieve [3,5..m]  where
    sieve []     = []
    sieve (p:xs) = p : sieve (xs `minus` [p*p,p*p+2*p..m])

Its empirical complexity is about . This simple optimization of starting from a prime's square has dramatic effect here because our formulation is bounded, in accordance with the original algorithm. This has the desired effect of stopping early and thus preventing the creation of all the extraneous multiples streams which start beyond the upper bound anyway, turning the unacceptably slow initial specification into a code yet-far-from-optimal and slow, but acceptably so, striking a good balance between clarity, succinctness and efficiency.

Guarded

This ought to be explicated (improving on clarity though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from p or p*p - but only after we've went to the necessary trouble of explicitly stopping as soon as possible:

 primesToG m = 2 : sieve [3,5..m]  where
    sieve (p:xs) 
       | p*p > m = p : xs
       | True    = p : sieve (xs `minus` [p*p,p*p+2*p..m])

It is now clear that it can't be made unbounded just by abolishing the upper bound m, because the guard can not be simply omitted without changing the complexity back for the worst.

Accumulating Array

So while minus(a,b) takes operations for random-access imperative arrays and about operations for lists here, using Haskell's immutable array for a one could expect the array update time to be indeed but it looks like it's not so:

 primesToA m = sieve 3 $ array (3,m) [(i,odd i) | i<-[3..m]] 
  where
   sieve p a 
    | p*p>m = 2 : [i | (i,True) <- assocs a]
    | a!p   = sieve (p+2) $ a//[(i,False) | i <- [p*p,p*p+2*p..m]]
    | True  = sieve (p+2) a

It's much slower than the above, though it should be running at optimal complexity on implementations which are able to properly use the destructive update here, for an array being passed along as an accumulating parameter, it seems.

How this implementation deficiency is to be overcome? One way is to use explicitly mutable monadic arrays (see below), but we can also think about it a little bit more on the functional side of things.

Postponed

Going back to guarded Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each a bit prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:

 primesPE () = 2 : primes'
  where 
    primes' = sieve [3,5..] primes' 9
    sieve (p:xs) ps@ ~(_:t) q
       | p < q  = p : sieve xs ps q
       | True   =     sieve (xs `minus` [q,q+2*p..]) t (head t^2)

Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could now finally be made unbounded. Because no multiples-removal process is started prematurely, there are no extraneous mutiples streams, which were the reason for the extreme wastefulness and thus inefficiency of the original formulation.

Segmented

With work done segment-wise between the successive squares of primes it becomes

  primesSE () = 2 : primes'
   where
    primes' = 3 : sieve primes' 5 9 []
    sieve (p:ps) x q fs = 
        foldr (flip minus) [x,x+2..q-2] 
                           [[y+s,y+2*s..q] | (s,y) <- fs]
        ++ sieve ps (q+2) (head ps^2)
               ((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])

This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting one by one the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using tree-like folding further speeds up the code and improves its asympthotic behaviour.

The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each finite span; and memory usage can be kept in check by using fixed sized segments.

Linear merging

But segmentation doesn't add anything substantially, and each multiples stream starts at its prime's square anyway. What does the Postponed code do, operationally? For each prime's square passed over, it builds up a nested linear left-deepening structure, (...((xs-a)-b)-...), where xs is the original odds-producing [3,5..] list, so that each odd it produces must go through `minus` nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, right-deepening structure, (xs-(a+(b+...))), be better?

Here, xs would stay near the top, and more frequently odds-producing streams of multiples of smaller primes would be above those of the bigger primes, that produce less frequently their candidates which have to pass through more `union` nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the infinitely-defined structure (specifically, if each (+) operation passes along unconditionally its left child's head value before polling the right child's head value, then we are safe).

Here's the code, faster yet but still with about same time complexity of , which is essentially equivalent to the Richard Bird's code from Melissa O'Neill's article:

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primesLME () = 2 : ([3,5..] `minus` 
                             join [[p*p,p*p+2*p..] | p <- primes']) 
  where
    primes' = 3 : ([5,7..] `minus` 
                             join [[p*p,p*p+2*p..] | p <- primes'])   
    join ((x:xs):t) = x : union xs (join t)

The double primes feed is introduced here to prevent unneeded memoization and thus save memory, as per Melissa O'Neill's code, and is dependent on no expression sharing being performed by a compiler.

Tree merging

Moreover, it can be changed into a tree structure:

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primesTME () = 2 : ([3,5..] `minus` 
                           join [[p*p,p*p+2*p..] | p <- primes']) 
  where
    primes' = 3 : ([5,7..] `minus` 
                           join [[p*p,p*p+2*p..] | p <- primes'])   
    join  ((x:xs):t)    = x : union xs (join (pairs t))
    pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t

It is very fast, running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about in number of primes n produced).

For esthetic purposes the above can be rewritten as follows, using explicated infinite tree-like folding:

primes = 2 : g (fix g)
  where
    g xs = 3 : gaps 5 (unionAll [[x*x, x*x+2*x..] | x <- xs])

fix g = xs where xs = g xs 
gaps k s@(x:xs) | k<x  = k:gaps (k+2) s      -- equiv to
                | True =   gaps (k+2) xs     -- [k,k+2..]`minus`s, k<=x
unionAll = foldi (\(x:xs) ys -> x:union xs ys) []

Tree merging with Wheel

Furthermore, wheel factorization optimization can be applied to it, and another tree structure can be used which is better adjusted for the primes multiples production (effecting about 5%~10% at the top of a total 2.5x speedup w.r.t. the above tree merging on odds only - though complexity stays the same):

{-# OPTIONS_GHC -O2 -fno-cse #-}
primesTMWE () = 2:3:5:7: gaps 11 wheel (join $ roll 11 wheel primes')
  where
    primes' = 11: gaps 13 (tail wheel) (join $ roll 11 wheel primes')
    join ((x:xs): ~(ys:zs:t))  = x : union xs (union ys zs)    
                                       `union` join (pairs t)  
    pairs ((x:xs):ys:t)        = (x : union xs ys) : pairs t
    gaps k ws@(w:t) cs@(c:u) | k==c  = gaps (k+w) t u 
                             | True  = k : gaps (k+w) t cs  
    roll k ws@(w:t) ps@(p:u) | k==p  = scanl (\c d->c+p*d) (p*p) ws 
                                         : roll (k+w) t u 
                             | True  = roll (k+w) t ps    
    wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
            4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel

The original idea of the right-deepening unbounded linear merging of multiples is due to Richard Bird (presented in Melissa O'Neill's article), and the original idea of using a tree-like folding structure is due to Dave Bayer on haskell-cafe mailing list (though in more complex formulation).

Above Limit

Another task is to produce primes above a given number (not having to find out their ordinal numbers).

{-# OPTIONS_GHC -O2 -fno-cse #-}
primesFromTMWE a0 = (if a0 <= 2 then [2] else []) 
                     ++ (gaps a wh' $ compositesFrom a)
  where
    (a,wh') = rollFrom (snap (max 3 a0) 3 2)
    (h,p':t) = span (< z) primes'                -- p < z => p*p<=a
    z = ceiling $ sqrt $ fromIntegral a + 1      -- p'>=z => p'*p'>a

    compositesFrom a = 
      foldi union' [] (foldi union [] [multsOf p a | p <- h++[p']]
                       : [multsOf p (p*p) | p <- t])
    primes'  = gaps 11 wheel (foldi union' []
                                   [multsOf p (p*p) | p <- primes'']) 
    primes'' = 11: gaps 13 (tail wheel) (foldi union' []
                                   [multsOf p (p*p) | p <- primes''])
    union' (x:xs) ys = x : union xs ys
    multsOf p from = scanl (\c d->c+p*d) (p*x) wh -- (p*)<$>
      where                                       --   scanl (+) x wh
        (x,wh) = rollFrom (snap from p (2*p) `div` p)

gaps k ws@(w:t) cs@(c:u) | k==c  = gaps (k+w) t u 
                         | True  = k : gaps (k+w) t cs  
snap v origin step = if r==0 then v else v+(step-r)
                       where r = mod (v-origin) step 
wheelNums = scanl (+) 0 wheel
wheel     = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
            4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
rollFrom n = go wheelNums wheel 
  where m  = (n-11) `mod` 210  
        go (x:xs) ws@(w:ws') | x < m = go xs ws'
                             | True  = (n+x-m, ws)     -- (x >= m)

This uses the infinite-tree folding foldi with plain union where heads are unordered, and back with union' above that. A certain preprocessing delay makes it worthwhile when producing more than just a few primes.

Turner's sieve - Trial division

David Turner's original 1975 formulation (SASL Language Manual, 1975) replaces non-standard `minus` in the sieve of Eratosthenes by stock list comprehension with rem filtering, turning it into a kind of trial division algorithm:

 -- unbounded sieve, premature filters
 primesT () = 2 : sieve [3,5..]  where
     sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]
                         -- filter ((/=0).(`rem`p)) xs

This creates an immense number of superfluous implicit filters in extremely premature fashion. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes potentially, while just those not greater than its square root would suffice. To find e.g. the 1001st prime (7927), 1000 filters are used, when in fact just the first 24 are needed (up to 89's filter only). Operational overhead is enormous here.

Guarded Filters

When we force ourselves away from the Quest for a Mythical One-Liner it really ought to be written at least as bounded and guarded variety (if not abandoned right away in favor of algorithmically superior sieve of Eratosthenes), yet again achieving the miraculous complexity improvement from above quadratic to about empirically (in n primes produced):

 primesToGT m = 2 : sieve [3,5..m]  where
    sieve (p:xs) 
       | p*p > m = p : xs
       | True    = p : sieve [x | x<-xs, rem x p /= 0]
                          -- filter ((/=0).(`rem`p)) xs

Postponed Filters

or better yet as unbounded, postponed variety:

 primesPT () = 2 : primes'
  where 
    primes' = sieve [3,5..] primes' 9
    sieve (p:xs) ps@ ~(_:t) q
       | p < q   = p : sieve xs ps q
       | True    =     sieve [x | x<-xs, rem x p /= 0] t (head t^2)
                          -- filter ((/=0).(`rem`p)) xs

creating here as well the linear nested filters structure at run-time, (...(([3,5..] |> filterBy 3) |> filterBy 5)...) (with |> defined as x |> f = f x), each filter started at its proper moment.

Optimal trial divison

The above is equivalent to the traditional formulation of trial division,

isPrime primes n = foldr (\p r -> p*p > n || (rem n p /= 0 && r))
                         True primes
primes = 2 : filter (isPrime primes) [3..]

except that this one is rechecking for each candidate which primes to use, which will be the same prefix of the primes list being built, for all the candidate numbers in the ever increasing spans between the successive primes squares.

Segmented Generate and Test

This primes prefix's length can be explicitly maintained, achieving a certain further speedup (though not in complexity which stays the same) by turning a list of filters into one filter by an explicit list of primes:

 primesST () = 2 : primes' 
  where
    primes' = 3 : sieve 0 5 9 (tail primes')
    sieve k x q ps = let fs = take k primes' in
      [n | n <- [x,x+2..q-2], and [rem n f/=0 | f <- fs]]
         -- filter (\n-> all ((/=0).(rem n)) fs) [x,x+2..q-2]
      ++ sieve (k+1) (q+2) (head ps^2) (tail ps)

This seems to eliminate most recalculations, explicitly filtering composites out from batches of odds between the consecutive squares of primes.

All these variants of course being variations of trial division – finding out primes by direct divisibility testing of every odd number by all primes below its square root potentially (instead of just by its factors, which is what direct generation of multiples is doing, essentially) – are thus of principally worse complexities than that of Sieve of Eratosthenes; but one far worse than another yet easily fixable from a wasteful monstrosity to almost acceptable performance (at least for the first few hunderd thousand primes, when compiled) just by following the proper definition of the sieve as being bounded, simply with guarded formulation, instead of "heading for the hills" of using brittle implementations of complex data structures with unclear performance guarantees.

Generate and Test Above Limit

The following will start the segmented Turner sieve at the right place, using any primes list it's supplied with (e.g. TMWE etc.), demand computing it up to the square root of any prime it'll produce:

 primesSTFrom primes m 
   | m>2 = sieve (length h-1) (m`div`2*2-1) (head ps^2) (tail ps)
  where 
    (h,ps) = span (<= (floor.sqrt$fromIntegral m+1)) primes
    sieve k x q ps = let fs = take k $ tail primes in
      [n | n <- [x+2,x+4..q-2], and [rem n f /= 0 | f <- fs]]
      ++ sieve (k+1) q (head ps^2) (tail ps)

This is usually faster than testing candidate numbers for divisibility one by one which has to refetch anew the needed prime factors to test by, for each candidate. Faster is the offset sieve of Eratosthenes on odds, and yet faster the above one, w/ wheel optimization.

Conclusions

So it really pays off to analyse the code properly instead of just labeling it "naive". BTW were this divisibility testing somehow turned into an O(1) operation, e.g. by some kind of massive parallelization, the overall complexity would drop to O(n). It's the sequantiality of testing that's the culprit. Though of course the proper multiples-removing S. of E. is a better candidate for parallelization.

Did Eratosthenes himself achieve the optimal complexity? It rather seems doubtful, as he probably counted boxes in the table by 1 to go from one number to the next, as in 3,5,7,9,11,13,15, ... for he had no access even to Hindu numerals, using Greek alphabet for writing down numbers instead. Was he performing a genuine sieve of Eratosthenes then? Should faithfulness of an algorithm's implementation be judged by its performance? We'll leave that as an open question.

So the initial Turner code is just a one-liner that ought to have been regarded as specification only, in the first place, not a code to be executed as such. The reason it was taught that way is probably so that it could provoke this discussion among the students. To regard it as plain executable code is what's been naive all along.

Euler's Sieve

Unbounded Euler's sieve

With each found prime Euler's sieve removes all its multiples in advance so that at each step the list to process is guaranteed to have no multiples of any of the preceding primes in it (consists only of numbers coprime with all the preceding primes) and thus starts with the next prime:

 primesEU () = 2 : euler [3,5..] where
    euler (p:xs) = p : euler (xs `minus` map (p*) (p:xs))

This code is extremely inefficient, running above complexity (and worsening rapidly), and should be regarded a specification only. Its memory usage is very high, with space complexity just below , in n primes produced.

Wheeled list representation

The situation can be somewhat improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:

  {- fromElt (x,i) = x : fromElt (x + i,i)
                       === iterate  (+ i) x
     [n..]             === fromElt  (n,1) 
                       === fromSpan ([n],1) 
     [n,n+2..]         === fromElt  (n,2)    
                       === fromSpan ([n,n+2],4)     -}

  fromSpan (xs,i)  = xs ++ fromSpan (map (+ i) xs,i)

  {-                   === concat $ iterate (map (+ i)) xs
     fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)  
     fromSpan (xs,i) `minus` fromSpan (ys,i) 
                       === fromSpan (xs `minus` ys, i)  
     map (p*) (fromSpan (xs,i)) 
                       === fromSpan (map (p*) xs, p*i)
     fromSpan (xs,i)   === forall (p > 0).
       fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}

  spanSpecs = iterate eulerStep ([2],1)
  eulerStep (xs@(p:_), i) = 
       ( (tail . concat . take p . iterate (map (+ i))) xs
          `minus` map (p*) xs, p*i )

  {- > mapM_ print $ take 4 spanSpecs 
     ([2],1)
     ([3],2)
     ([5,7],6)
     ([7,11,13,17,19,23,29,31],30)  -}

Generating a list from a span specification is like rolling a wheel as its pattern gets repeated over and over again. For each span specification w@((p:_),_) produced by eulerStep, the numbers in (fromSpan w) up to are all primes too, so that

  eulerPrimesTo m = if m > 1 then go ([2],1) else []
    where
      go w@((p:_), _) 
         | m < p*p = takeWhile (<= m) (fromSpan w)
         | True    = p : go (eulerStep w)

This runs at about complexity, for n primes produced, and also suffers from a severe space leak problem (IOW its memory usage is also very high).

Using Immutable Arrays

Generating Segments of Primes

The removal of multiples on each segment of odds can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays:

  primesSA () = 2: 3: sieve (tail primes) 3 []
   where 
    sieve (p:ps) x fs = [i*2 + x | (i,e) <- assocs a, e] 
                        ++ sieve ps (p*p) fs'
     where
      q     = (p*p-x)`div`2                  
      fs'   = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
      a     = accumArray (\ b c -> False) True (1,q-1)
                         [(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]]

Apparently, arrays are fast. The above is the fastest code of all presented so far. When run on Ideone.com it is somewhat faster than Tree Merging With Wheel in producing first few million primes, but is very much unacceptable in its memory consumption which grows faster than O(), quickly getting into tens and hundreds of MBs.

Calculating Primes Upto a Given Value

  primesToNA n = 2: [i | i <- [3,5..n], ar ! i]
   where
    ar = f 5 $ accumArray (\ a b -> False) True (3,n) 
                        [(i,()) | i <- [9,15..n]]
    f p a | q > n = a
          | True  = if null x then a' else f (head x) a'
      where q = p*p
            a'= a // [(i,False) | i <- [q,q+2*p..n]]
            x = [i | i <- [p+2,p+4..n], a' ! i]

Calculating Primes in a Given Range

  primesFromToA a b = (if a<3 then [2] else []) 
                      ++ [i | i <- [o,o+2..b], ar ! i]
   where 
    o  = max (if even a then a+1 else a) 3
    r  = floor.sqrt.fromInteger $ b+1
    ar = accumArray (\a b-> False) True (o,b) 
          [(i,()) | p <- [3,5..r]
                    , let q  = p*p 
                          s  = 2*p 
                          (n,x) = quotRem (o - q) s 
                          q' = if  o <= q  then q
                               else  q + (n + signum x)*s
                    , i <- [q',q'+s..b] ]

Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.

Using Mutable Arrays

Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.

Using ST Array

This method implements the Sieve of Eratosthenes, similar to how you might do it in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently bit-packed) array for the whole sequence produced.

import Control.Monad
import Control.Monad.ST
import Data.Array.IArray
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed

primesToNA :: Int -> UArray Int Bool
primesToNA n = runSTUArray sieve  where
  sieve = do
    let m = (n-1)`div`2
    a <- newArray (1,m) True :: ST s (STUArray s Int Bool)
    let sr = floor . (sqrt::Double->Double) $ fromIntegral n+1
    forM_ [1..sr`div`2] $ \i -> do
      let ii = 2*i*(i+1)     -- == ((2*i+1)^2-1)`div`2
      si <- readArray a i
      when si $
        forM_ [ii,ii+i+i+1..m] $ \j -> writeArray a j False
    return a
 
primesToN :: Int -> [Int]
primesToN n = 2:[i*2+1 | (i,True) <- assocs . primesToNA $n]

Its empirical time complexity is improving with n (number of primes produced) from through towards . The reference C++ vector-based implementation exhibits this improvement in empirical time complexity too, from gradually towards , where tested (which might be interpreted as evidence towards the expected complexity).

Bitwise prime sieve with Template Haskell

Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies.

{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits

nthPrime :: Int -> Int
nthPrime n = runST (sieve n)

sieve n = do
    a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
    let cutoff = truncate (sqrt $ fromIntegral n) + 1
    go a n cutoff 3 1

go !a !m cutoff !n !c
    | n >= m    = return c
    | otherwise = do
        e <- unsafeRead a n
        if e then
          if n < cutoff then
            let loop !j
                 | j < m     = do
                        x <- unsafeRead a j
                        when x $ unsafeWrite a j False
                        loop (j+n)
                 | otherwise = go a m cutoff (n+2) (c+1)
            in loop ( if n < 46340 then n * n else n `shiftL` 1)
           else go a m cutoff (n+2) (c+1)
         else go a m cutoff (n+2) c

And place in a module:

{-# OPTIONS -fth #-}
import Primes

main = print $( let x = nthPrime 10000000 in [| x |] )

Run as:

$ ghc --make -o primes Main.hs
$ time ./primes
664579
./primes  0.00s user 0.01s system 228% cpu 0.003 total

Implicit Heap

See Implicit Heap.

Prime Wheels

See Prime Wheels.

Using IntSet for a traditional sieve

See Using IntSet for a traditional sieve.

Testing Primality

See Testing primality:

External links

A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.
WARNING: Don't use the priority queue from that file for your projects: it's broken and works for primes only by a lucky chance.