Difference between revisions of "Prime numbers"

Revision as of 07:11, 5 June 2016

In mathematics, amongst the natural numbers greater than 1, a prime number (or a prime) is such that has no divisors other than itself (and 1).

Prime Number Resources

At Wikipedia:
- Prime Numbers
- Sieve of Eratosthenes

HackageDB packages:
- arithmoi: Various basic number theoretic functions; efficient array-based sieves, Montgomery curve factorization ...
- Numbers: An assortment of number theoretic functions.
- NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
- primes: Efficient, purely functional generation of prime numbers.

Papers:
- O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes", Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 doi:10.1017/S0956796808007004.

Definition

In mathematics, amongst the natural numbers greater than 1, a prime number (or a prime) is such that has no divisors other than itself (and 1). The smallest prime is thus 2. Non-prime numbers are known as composite, i.e. those representable as product of two natural numbers greater than 1.

To find out a prime's multiples we can either a. test each new candidate number for divisibility by that prime, giving rise to a kind of trial division algorithm; or b. we can directly generate the multiples of a prime p by counting up from it in increments of p, resulting in a variant of the sieve of Eratosthenes.

The set of prime numbers is thus

P = {n ∈ N₂ : (∀ m ∈ N₂) ((m | n) ⇒ m = n)}

= {n ∈ N₂ : (∀ m ∈ N₂) (m×m ≤ n ⇒ ¬(m | n))}

= N₂ \ {n×m : n,m ∈ N₂}

= N₂ \ ⋃ { {n×m : m ∈ N_n} : n ∈ N₂ }

= N₂ \ ⋃ { {n×n, n×n+n, n×n+n+n, ...} : n ∈ N₂ }

= N₂ \ ⋃ { {p×p, p×p+p, p×p+p+p, ...} : p ∈ P }

where N_k = {n ∈ N : n ≥ k}

Thus starting with 2, for each newly found prime we can eliminate from the rest of the numbers all the multiples of this prime, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end all the composites are eliminated and what we are left with are just primes. (This is what the last formula is describing, though seemingly impredicative, because it is self-referential. But because N₂ is well-ordered (with the order being preserved under addition), the formula is well-defined.)

In pseudocode, this can be written as

        primes = [2..] \ [[p*p, p*p+p..] | p <- primes]

Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient list-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results incrementally, one by one.

Sieve of Eratosthenes

Simplest, bounded, very inefficient formulation:

primesTo m = sieve [2..m]       {- (\\) is set-difference for unordered lists -}
             where 
             sieve (x:xs) = x : sieve (xs Data.List.\\ [x,x+x..m])
             sieve [] = []

The (unbounded) sieve of Eratosthenes calculates primes as integers above 1 that are not multiples of primes, i.e. not composite — whereas composites are found as enumeration of multiples of each prime, generated by counting up from prime's square in constant increments equal to that prime (or twice that much, for odd primes):

primes = 2 : 3 : ([5,7..] `minus` _U [[p*p, p*p+2*p..] | p <- tail primes])

{- Using `under n = takeWhile (<= n)`, with ordered increasing lists,
   `minus`, `union` and `_U` satisfy, for any `n`:
       under n (minus a b) == under n a \\ under n b
       under n (union a b) == nub . sort $ under n a ++ under n b
       under n . _U == nub . sort . concat 
                                  . takeWhile (not.null) . map (under n) -}

The definition is primed with 2 and 3 as initial primes, to avoid the vicious circle.

The "big union" _U function can be defined as the folding of (\(x:xs) -> (x:) . union xs); or it could use a Bool array as a sorting and duplicates-removing device. The processing naturally divides up into the segments between successive squares of primes.

Stepwise development follows (the fully developed version is here).

Initial definition

First of all, working with ordered increasing lists, the sieve of Eratosthenes can be genuinely represented by

 -- genuine yet wasteful sieve of Eratosthenes
 -- primes = eratos [2.. ]               -- unbounded
primesTo m = eratos [2..m]               -- bounded, up to m
  where
    eratos []     = []
    eratos (p:xs) = p : eratos (xs `minus` [p, p+p..])
 -- eratos (p:xs) = p : eratos (xs `minus` map (p*) [1..])
 -- eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))    
 -- turner (p:xs) = p : turner [x | x <- xs, rem x p /= 0]

This should be regarded more like a specification, not a code. It runs at empirical orders of growth worse than quadratic in number of primes produced. But it has the core defining features of the classical formulation of S. of E. as a. being bounded, i.e. having a top limit value, and b. finding out the multiples of a prime directly, by counting up from it in constant increments, equal to that prime.

The canonical list-difference minus and duplicates-removing union functions (cf. Data.List.Ordered package) are:

 -- ordered lists, difference and union
minus (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : minus  xs  (y:ys)
           EQ ->     minus  xs     ys 
           GT ->     minus (x:xs)  ys
minus  xs     _     = xs
union (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : union  xs  (y:ys)
           EQ -> x : union  xs     ys 
           GT -> y : union (x:xs)  ys
union  xs     []    = xs
union  []     ys    = ys

The name merge ought to be reserved for duplicates-preserving merging operation of the merge sort.

Analysis

So for each newly found prime p the sieve discards its multiples, enumerating them by counting up in steps of p. There are thus $O(m/p)$ multiples generated and eliminated for each prime, and $O(m\log \log(m))$ multiples in total, with duplicates, by virtues of prime harmonic series.

If each multiple is dealt with in $O(1)$ time, this will translate into $O(m\log \log(m))$ RAM machine operations (since we consider addition as an atomic operation). Indeed mutable random-access arrays allow for that. But lists in Haskell are sequential-access, and complexity of minus(a,b) for lists is $\textstyle O(|a\cup b|)$ instead of $\textstyle O(|b|)$ of the direct access destructive array update. The lower the complexity of each minus step, the better the overall complexity.

So on k-th step the argument list (p:xs) that the eratos function gets, starts with the (k+1)-th prime, and consists of all the numbers ≤ m coprime with all the primes ≤ p. According to the M. O'Neill's article (p.10) there are $\textstyle \Phi (m,p)\in \Theta (m/\log p)$ such numbers.

It looks like $\textstyle \sum _{i=1}^{n}{1/log(p_{i})}=O(n/\log n)$ for our intents and purposes. Since the number of primes below m is $n=\pi (m)=O(m/\log(m))$ by the prime number theorem (where $\pi (m)$ is a prime counting function), there will be n multiples-removing steps in the algorithm; it means total complexity of at least $O(mn/\log(n))=O(m^{2}/(\log(m))^{2})$ , or $O(n^{2})$ in n primes produced - much much worse than the optimal $O(n\log(n)\log \log(n))$ .

From Squares

But we can start each elimination step at a prime's square, as its smaller multiples will have been already produced and discarded on previous steps, as multiples of smaller primes. This means we can stop early now, when the prime's square reaches the top value m, and thus cut the total number of steps to around $\textstyle n=\pi (m^{0.5})=\Theta (2m^{0.5}/\log m)$ . This does not in fact change the complexity of random-access code, but for lists it makes it $O(m^{1.5}/(\log m)^{2})$ , or $O(n^{1.5}/(\log n)^{0.5})$ in n primes produced, a dramatic speedup:

primesToQ m = eratos [2..m] 
  where
    eratos []     = []
    eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
 -- eratos (p:xs) = p : eratos (xs `minus` map (p*) [p..div m p])
 -- eulers (p:xs) = p : eulers (xs `minus` map (p*) (under (div m p) (p:xs)))
 -- turner (p:xs) = p : turner [x | x<-xs, x<p*p || rem x p /= 0]

Its empirical complexity is around $O(n^{1.45})$ . This simple optimization works here because this formulation is bounded (by an upper limit). To start late on a bounded sequence is to stop early (starting past end makes an empty sequence – see warning below^₁), thus preventing the creation of all the superfluous multiples streams which start above the upper bound anyway (note that Turner's sieve is unaffected by this). This is acceptably slow now, striking a good balance between clarity, succinctness and efficiency.

^₁Warning: this is predicated on a subtle point of minus xs [] = xs definition being used, as it indeed should be. If the definition minus (x:xs) [] = x:minus xs [] is used, the problem is back and the complexity is bad again.

Guarded

This ought to be explicated (improving on clarity, though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from p or p*p - because it explicitly stops as soon as possible:

primesToG m = 2 : sieve [3,5..m]
  where
    sieve (p:xs) 
       | p*p > m   = p : xs
       | otherwise = p : sieve (xs `minus` [p*p, p*p+2*p..])
                  -- p : sieve (xs `minus` map (p*) [p,p+2..])
                  -- p : eulers (xs `minus` map (p*) (p:xs))

(here we also flatly ignore all evens above 2 a priori.) It is now clear that it can't be made unbounded just by abolishing the upper bound m, because the guard can not be simply omitted without changing the complexity back for the worst.

Accumulating Array

So while minus(a,b) takes $O(|b|)$ operations for random-access imperative arrays and about $O(|a|)$ operations here for ordered increasing lists of numbers, using Haskell's immutable array for a one could expect the array update time to be nevertheless closer to $O(|b|)$ if destructive update were used implicitly by compiler for an array being passed along as an accumulating parameter:

{-# OPTIONS_GHC -O2 #-}
import Data.Array.Unboxed

primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]]
                        :: UArray Int Bool)
  where
    sieve p a 
      | p*p > m   = 2 : [i | (i,True) <- assocs a]
      | a!p       = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
      | otherwise = sieve (p+2) a

Indeed for unboxed arrays, with the type signature added explicitly (suggested by Daniel Fischer), the above code runs pretty fast, with empirical complexity of about $O(n^{1.15..1.45})$ in n primes produced (for producing from few hundred thousands to few millions primes, memory usage also slowly growing). But it relies on specific compiler optimizations, and indeed if we remove the explicit type signature, the code above turns very slow.

How can we write code that we'd be certain about? One way is to use explicitly mutable monadic arrays (see below), but we can also think about it a little bit more on the functional side of things still.

Postponed

Going back to guarded Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:

primesPE1 = 2 : sieve [3..] primesPE1 
  where 
    sieve xs (p:ps) | q <- p*p , (h,t) <- span (< q) xs =
                   h ++ sieve (t `minus` [q, q+p..]) ps
                -- h ++ turner [x|x<-t, rem x p /= 0] ps

Inlining and fusing span and (++) we get:

primesPE = 2 : oddprimes
  where 
    oddprimes = sieve [3,5..] 9 oddprimes
    sieve (x:xs) q ps@ ~(p:t)
      | x < q     = x : sieve xs q ps
      | otherwise =     sieve (xs `minus` [q, q+2*p..]) (head t^2) t

Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could now finally be made unbounded. Because no multiples-removal process is started prematurely, there are no extraneous multiples streams, which were the reason for the original formulation's extreme inefficiency.

Segmented

With work done segment-wise between the successive squares of primes it becomes

primesSE = 2 : oddprimes
  where
    oddprimes = sieve 3 9 oddprimes []
    sieve x q ~(p:t) fs = 
        foldr (flip minus) [x,x+2..q-2]                   -- chain of subtractions
                           [[y+s, y+2*s..q] | (s,y) <- fs]     -- OR,
        -- [x,x+2..q-2] `minus` foldl union []            -- subtraction of merged
        --                    [[y+s, y+2*s..q] | (s,y) <- fs]            --  lists
        ++ sieve (q+2) (head t^2) t
                ((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])

This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting one by one the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times.

The advantage to working in spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each finite span; and memory usage can be kept in check by using fixed sized segments.

Segmented Tree-merging

Rearranging the chain of subtractions into a subtraction of merged streams (see below) and using tree-like folding structure, further speeds up the code and significantly improves its asymptotic time behavior (down to about $O(n^{1.28}empirically)$ , space is leaking though):

primesSTE = 2 : oddprimes
  where                  
    oddprimes = sieve 3 9 oddprimes []
    sieve x q ~(p:t) fs = 
        ([x,x+2..q-2] `minus` joinST [[y+s, y+2*s..q] | (s,y) <- fs])
        ++ sieve (q+2) (head t^2) t
                   ((++ [(2*p,q)]) [(s,q-rem (q-y) s) | (s,y) <- fs])
    
joinST (xs:t) = (union xs . joinST . pairs) t
  where
    pairs (xs:ys:t) = union xs ys : pairs t
    pairs t         = t
joinST []     = []

Segmented merging via an array

The removal of composites is easy with arrays. Starting points can be calculated directly:

import Data.List (inits)
import Data.Array.Unboxed
 
primesSAE = 2 : sieve 3 4 (tail primesSAE) (inits primesSAE)
  where
  sieve x q ps (fs:ft) = [i | (i,True) <- assocs (
         accumArray (\ _ _ -> False)
                    True  (x,q-1)
                    [(i,()) | p <- fs, let c = p * div (x+p-1) p,
                              i <- [c, c+p..q-1]] :: UArray Int Bool )]
      ++ sieve q (head ps^2) (tail ps) ft

Linear merging

But segmentation doesn't add anything substantially, and each multiples stream starts at its prime's square anyway. What does the Postponed code do, operationally? With each prime's square passed by, there emerges a nested linear left-deepening structure, (...((xs-a)-b)-...), where xs is the original odds-producing [3,5..] list, so that each odd it produces must go through more and more minus nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, right-deepening structure, (xs-(a+(b+...))), be better? This idea is due to Richard Bird, seen in his code presented in M. O'Neill's article, equivalent to:

primesB = 2 : minus [3..] (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) 
                                 [] primesB)

or,

primesLME1 = 2 : prs
  where
    prs = 3 : minus [5,7..] (joinL [[p*p, p*p+2*p..] | p <- prs])
    
joinL ((x:xs):t) = x : union xs (joinL t)

Here, xs stays near the top, and more frequently odds-producing streams of multiples of smaller primes are above those of the bigger primes, that produce less frequently their multiples which have to pass through more union nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the indefinitely-defined structure, defining joinL (or foldr's combining function) to produce part of its result before accessing the rest of its input (making it productive).

Melissa O'Neill introduced double primes feed to prevent unneeded memoization (a memory leak). We can even do multistage. Here's the code, faster still and with radically reduced memory consumption, with empirical orders of growth of around ~ $n^{1.40}$ (initially better, yet worsening for bigger ranges):

primesLME = 2 : _Y ((3:) . minus [5,7..] . joinL . map (\p-> [p*p, p*p+2*p..]))

_Y   :: (t -> t) -> t
_Y g = g (_Y g)                -- multistage, non-sharing, recursive
       -- g x where x = g x    -- two stages, sharing, corecursive

_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).

This allows the primesLME stream to be discarded immediately as it is being consumed by its consumer. With primes' from primesLME1 definition above it is impossible, as each produced element of primes' is needed later as input to the same primes' corecursive stream definition. So the primes' stream feeds in a loop into itself, and thus is retained in memory. With multistage production, each stage feeds into its consumer above it at the square of its current element which can be immediately discarded after it's been consumed. (3:) jump-starts the whole thing.

Tree merging

Moreover, it can be changed into a tree structure. This idea is due to Dave Bayer and Heinrich Apfelmus:

primesTME = 2 : _Y ((3:) . gaps 5 . joinT . map (\p-> [p*p, p*p+2*p..]))

-- joinL ((x:xs):t) = x : union xs (joinL t) 
joinT ((x:xs):t) = x : union xs (joinT (pairs t))    -- set union, ~=
  where  pairs (xs:ys:t) = union xs ys : pairs t     --    nub.sort.concat

gaps k s@(x:xs) | k < x = k:gaps (k+2) s       -- ~= [k,k+2..]\\s, 
                | True  =   gaps (k+2) xs      --   when null(s\\[k,k+2..])

This code is pretty fast, running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about $O(n^{1.2})$ in number of primes n produced).

As an aside, joinT is equivalent to infinite tree-like folding foldi (\(x:xs) ys-> x:union xs ys) []:

Tree merging with Wheel

Wheel factorization optimization can be further applied, and another tree structure can be used which is better adjusted for the primes multiples production (effecting about 5%-10% at the top of a total 2.5x speedup w.r.t. the above tree merging on odds only, for first few million primes):

primesTMWE = [2,3,5,7] ++ _Y ((11:) . tail  . gapsW 11 wheel 
                                    . joinT . hitsW 11 wheel)
    
gapsW k (d:w) s@(c:cs) | k < c     = k : gapsW (k+d) w s    -- set difference
                       | otherwise =     gapsW (k+d) w cs   --   k==c
hitsW k (d:w) s@(p:ps) | k < p     =     hitsW (k+d) w s    -- intersection
                       | otherwise = scanl (\c d->c+p*d) (p*p) (d:w) 
                                       : hitsW (k+d) w ps   --   k==p 
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
        4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel

The hitsW is there to find the start point for rolling the wheel for each prime, but it can be found directly:

primesW = [2,3,5,7] ++ _Y ( (11:) . tail  . gapsW 11 wheel . joinT .
                            map (\p->  
                              map (p*) . dropWhile (< p) $
                                scanl (+) (p - rem (p-11) 210) wheel) )

Seems to run about 1.4x faster, too.

Above Limit - Offset Sieve

Another task is to produce primes above a given value:

{-# OPTIONS_GHC -O2 -fno-cse #-}
primesFromTMWE primes m = dropWhile (< m) [2,3,5,7,11] 
                          ++ gapsW a wh2 (compositesFrom a)
  where
    (a,wh2) = rollFrom (snapUp (max 3 m) 3 2)
    (h,p2:t) = span (< z) $ drop 4 primes           -- p < z => p*p<=a
    z = ceiling $ sqrt $ fromIntegral a + 1         -- p2>=z => p2*p2>a
    compositesFrom a = joinT (joinST [multsOf p  a    | p <- h ++ [p2]]
                                   : [multsOf p (p*p) | p <- t] )

snapUp v o step = v + (mod (o-v) step)              -- full steps from o
multsOf p from = scanl (\c d->c+p*d) (p*x) wh       -- map (p*) $
  where                                             --   scanl (+) x wh
    (x,wh) = rollFrom (snapUp from p (2*p) `div` p) --   , if p < from 
wheelNums = scanl (+) 0 wheel
rollFrom n = go wheelNums wheel 
  where m = (n-11) `mod` 210  
        go (x:xs) ws@(w:ws2) | x < m = go xs ws2
                             | True  = (n+x-m, ws)  -- (x >= m)

A certain preprocessing delay makes it worthwhile when producing more than just a few primes, otherwise it degenerates into simple trial division, which is then ought to be used directly:

primesFrom m = filter isPrime [m..]

Map-based

Runs ~1.7x slower than TME version, but with the same empirical time complexity, ~ $n^{1.2}$ (in n primes produced) and same very low (near constant) memory consumption:

import Data.List               -- based on http://stackoverflow.com/a/1140100
import qualified Data.Map as M

primesMPE :: [Integer]
primesMPE = 2:mkPrimes 3 M.empty prs 9   -- postponed addition of primes into map;
  where                                  -- decoupled primes loop feed 
    prs = 3:mkPrimes 5 M.empty prs 9
    mkPrimes n m ps@ ~(p:t) q = case (M.null m, M.findMin m) of
        (False, (n2, skips)) | n == n2 ->
             mkPrimes (n+2) (addSkips n (M.deleteMin m) skips) ps q
        _ -> if n<q
             then    n : mkPrimes (n+2)  m                  ps q
             else        mkPrimes (n+2) (addSkip n m (2*p)) t (head t^2)

    addSkip n m s = M.alter (Just . maybe [s] (s:)) (n+s) m
    addSkips = foldl' . addSkip

Turner's sieve - Trial division

David Turner's (SASL Language Manual, 1983) formulation replaces non-standard minus in the sieve of Eratosthenes by stock list comprehension with rem filtering, turning it into a trial division algorithm:

  -- unbounded sieve, premature filters
primesT = sieve [2..]
  where
    sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]
  -- map head
  --   $ iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..]

This creates many superfluous implicit filters, because they are created prematurely. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes, while just those not greater than its square root would suffice. To find e.g. the 1001st prime (7927), 1000 filters are used, when in fact just the first 24 are needed (up to 89's filter only). Operational overhead here is huge.

Guarded Filters

But this really ought to be changed into bounded and guarded variant, again achieving the "miraculous" complexity improvement from above quadratic to about $O(n^{1.45})$ empirically (in n primes produced):

primesToGT m = sieve [2..m]
  where
    sieve (p:xs) 
      | p*p > m = p : xs
      | True    = p : sieve [x | x <- xs, rem x p > 0]
  -- (\(a,b:_) -> map head a ++ b) . span ((< m).(^2).head) 
  --   $ iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..m]

Postponed Filters

Or it can remain unbounded, just filters creation must be postponed until the right moment:

primesPT1 = 2 : sieve primesPT1 [3..] 
  where 
    sieve (p:ps) xs = let (h,t) = span (< p*p) xs 
                      in h ++ sieve ps [x | x <- t, rem x p > 0]
  -- fix $ concatMap (fst . fst)
  --     . iterate (\((_,xs), p:ps) -> let (h,t) = span (< p*p) xs in
  --                              ((h, [x | x <- t, rem x p > 0]), ps)) 
  --     . (,) ([2],[3..])

This is better re-written with span and (++) inlined and fused into the sieve:

primesPT = 2 : oddprimes
  where 
    oddprimes = sieve [3,5..] 9 oddprimes
    sieve (x:xs) q ps@ ~(p:t)
      | x < q = x : sieve xs q ps
      | True  =     sieve [x | x <- xs, rem x p /= 0] (head t^2) t

creating here as well the linear nested structure at run-time, (...(([3,5..] >>= filterBy [3]) >>= filterBy [5])...), where filterBy ds n = [n | noDivs n ds] (see noDivs definition below) – but unlike the original code, each filter being created at its proper moment, not sooner than the prime's square is seen.

Optimal trial division

The above is equivalent to the traditional formulation of trial division,

ps = 2 : [i | i <- [3..],  
              and [rem i p > 0 | p <- takeWhile (\p -> p^2 <= i) ps]]

or,

noDivs n fs = foldr (\f r -> f*f > n || (rem n f > 0 && r)) True fs
-- primes = filter (`noDivs`[2..]) [2..]
--        = 2 : filter (`noDivs`[3,5..]) [3,5..]
primesTD = 2 : 3 : filter (`noDivs` tail primesTD) [5,7..]
isPrime n = n > 1 && noDivs n primesTD

except that this code is rechecking for each candidate number which primes to use, whereas for every candidate number in each segment between the successive squares of primes these will just be the same prefix of the primes list being built.

Trial division is used as a simple primality test and prime factorization algorithm.

Segmented Generate and Test

Next we turn the list of filters into one filter by an explicit list, each one in a progression of prefixes of the primes list. This seems to eliminate most recalculations, explicitly filtering composites out from batches of odds between the consecutive squares of primes.

import Data.List

primesST = 2 : oddprimes
  where
    oddprimes = sieve 3 9 oddprimes (inits oddprimes)  -- [],[3],[3,5],...
    sieve x q ~(_:t) (fs:ft) =
      filter ((`all` fs) . ((/=0).) . rem) [x,x+2..q-2]
      ++ sieve (q+2) (head t^2) t ft

Generate and Test Above Limit

The following will start the segmented Turner sieve at the right place, using any primes list it's supplied with (e.g. TMWE etc.) or itself, as shown, demand computing it just up to the square root of any prime it'll produce:

primesFromST m | m > 2 =
    sieve (m`div`2*2+1) (head ps^2) (tail ps) (inits ps)
  where 
    (h,ps) = span (<= (floor.sqrt $ fromIntegral m+1)) oddprimes
    sieve x q ps (fs:ft) =
      filter ((`all` (h ++ fs)) . ((/=0).) . rem) [x,x+2..q-2]
      ++ sieve (q+2) (head ps^2) (tail ps) ft
    oddprimes = 3 : primesFromST 5     -- odd primes

This is usually faster than testing candidate numbers for divisibility one by one which has to re-fetch anew the needed prime factors to test by, for each candidate. Faster is the offset sieve of Eratosthenes on odds, and yet faster the one w/ wheel optimization, on this page.

Conclusions

All these variants being variations of trial division, finding out primes by direct divisibility testing of every candidate number by sequential primes below its square root (instead of just by its factors, which is what direct generation of multiples is doing, essentially), are thus principally of worse complexity than that of Sieve of Eratosthenes.

The initial code is just a one-liner that ought to have been regarded as executable specification in the first place. It can easily be improved quite significantly with a simple use of bounded, guarded formulation to limit the number of filters it creates, or by postponement of filter creation.

Euler's Sieve

Unbounded Euler's sieve

With each found prime Euler's sieve removes all its multiples in advance so that at each step the list to process is guaranteed to have no multiples of any of the preceding primes in it (consists only of numbers coprime with all the preceding primes) and thus starts with the next prime:

primesEU = 2 : eulers [3,5..]
  where
    eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))
 -- eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+2*p..])

This code is extremely inefficient, running above $O({n^{2}})$ empirical complexity (and worsening rapidly), and should be regarded a specification only. Its memory usage is very high, with empirical space complexity just below $O({n^{2}})$ , in n primes produced.

In the stream-based sieve of Eratosthenes we are able to skip along the input stream xs directly to the prime's square, consuming the whole prefix at once, thus achieving the results equivalent to the postponement technique, because the generation of the prime's multiples is independent of the rest of the stream.

But here in the Euler's sieve it is dependent on all xs and we're unable in principle to skip along it to the prime's square - because all xs are needed for each prime's multiples generation. Thus efficient unbounded stream-based implementation seems to be impossible in principle, under the simple scheme of producing the multiples by multiplication.

Wheeled list representation

The situation can be somewhat improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:

{- fromElt (x,i) = x : fromElt (x + i,i)
                       === iterate  (+ i) x
     [n..]             === fromElt  (n,1) 
                       === fromSpan ([n],1) 
     [n,n+2..]         === fromElt  (n,2)    
                       === fromSpan ([n,n+2],4)     -}

fromSpan (xs,i)  = xs ++ fromSpan (map (+ i) xs,i)

{-                   === concat $ iterate (map (+ i)) xs
   fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)  
   fromSpan (xs,i) `minus` fromSpan (ys,i) 
                     === fromSpan (xs `minus` ys, i)  
   map (p*) (fromSpan (xs,i)) 
                     === fromSpan (map (p*) xs, p*i)
   fromSpan (xs,i)   === forall (p > 0).
     fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}

spanSpecs = iterate eulerStep ([2],1)
eulerStep (xs@(p:_), i) = 
       ( (tail . concat . take p . iterate (map (+ i))) xs
          `minus` map (p*) xs, p*i )

{- > mapM_ print $ take 4 spanSpecs 
     ([2],1)
     ([3],2)
     ([5,7],6)
     ([7,11,13,17,19,23,29,31],30)  -}

Generating a list from a span specification is like rolling a wheel as its pattern gets repeated over and over again. For each span specification w@((p:_),_) produced by eulerStep, the numbers in (fromSpan w) up to ${p^{2}}$ are all primes too, so that

eulerPrimesTo m = if m > 1 then go ([2],1) else []
    where
      go w@((p:_), _) 
         | m < p*p = takeWhile (<= m) (fromSpan w)
         | True    = p : go (eulerStep w)

This runs at about $O(n^{1.5..1.8})$ complexity, for n primes produced, and also suffers from a severe space leak problem (IOW its memory usage is also very high).

Using Immutable Arrays

Generating Segments of Primes

The sieve of Eratosthenes' removal of multiples on each segment of odds can be done by actually marking them in an array, instead of manipulating ordered lists, and can be further sped up more than twice by working with odds only:

import Data.Array.Unboxed
 
primesSA :: [Int]
primesSA = 2 : oddprimes ()
  where 
    oddprimes () = 3 : sieve (oddprimes ()) 3 []
    sieve (p:ps) x fs = [i*2 + x | (i,True) <- assocs a] 
                        ++ sieve ps (p*p) ((p,0) : 
                             [(s, rem (y-q) s) | (s,y) <- fs])
     where
      q = (p*p-x)`div`2
      a :: UArray Int Bool
      a = accumArray (\ b c -> False) True (1,q-1)
                     [(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]

Runs significantly faster than TMWE and with better empirical complexity, of about $O(n^{1.10..1.05})$ in producing first few millions of primes, with constant memory footprint.

Calculating Primes Upto a Given Value

Equivalent to Accumulating Array above, running somewhat faster (compiled by GHC with optimizations turned on):

{-# OPTIONS_GHC -O2 #-}
import Data.Array.Unboxed

primesToNA n = 2: [i | i <- [3,5..n], ar ! i]
  where
    ar = f 5 $ accumArray (\ a b -> False) True (3,n) 
                        [(i,()) | i <- [9,15..n]]
    f p a | q > n = a
          | True  = if null x then a2 else f (head x) a2
      where q = p*p
            a2  :: UArray Int Bool
            a2 = a // [(i,False) | i <- [q, q+2*p..n]]
            x  = [i | i <- [p+2,p+4..n], a2 ! i]

Calculating Primes in a Given Range

primesFromToA a b = (if a<3 then [2] else []) 
                      ++ [i | i <- [o,o+2..b], ar ! i]
  where 
    o  = max (if even a then a+1 else a) 3   -- first odd in the segment
    r  = floor . sqrt $ fromIntegral b + 1
    ar = accumArray (\_ _ -> False) True (o,b)  -- initially all True,
          [(i,()) | p <- [3,5..r]
                    , let q  = p*p      -- flip every multiple of an odd 
                          s  = 2*p                         -- to False
                          (n,x) = quotRem (o - q) s 
                          q2 = if  o <= q  then q
                               else  q + (n + signum x)*s
                    , i <- [q2,q2+s..b] ]

Although sieving by odds instead of by primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.

Using Mutable Arrays

Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.

Using ST Array

This method implements the Sieve of Eratosthenes, similar to how you might do it in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently packed) sieve array for the whole sequence to process.

import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

sieveUA :: Int -> UArray Int Bool
sieveUA top = runSTUArray $ do
    let m = (top-1) `div` 2
        r = floor . sqrt $ fromIntegral top + 1
    sieve <- newArray (1,m) True      -- :: ST s (STUArray s Int Bool)
    forM_ [1..r `div` 2] $ \i -> do
      isPrime <- readArray sieve i
      when isPrime $ do               -- ((2*i+1)^2-1)`div`2 == 2*i*(i+1)
        forM_ [2*i*(i+1), 2*i*(i+2)+1..m] $ \j -> do
          writeArray sieve j False
    return sieve
 
primesToUA :: Int -> [Int]
primesToUA top = 2 : [i*2+1 | (i,True) <- assocs $ sieveUA top]

Its empirical time complexity is improving with n (number of primes produced) from above $O(n^{1.20})$ towards $O(n^{1.16})$ . The reference C++ vector-based implementation exhibits this improvement in empirical time complexity too, from $O(n^{1.5})$ gradually towards $O(n^{1.12})$ , where tested (which might be interpreted as evidence towards the expected quasilinearithmic $O(n\log(n)\log(\log n))$ time complexity).

Bitwise prime sieve with Template Haskell

Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies.

{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits

nthPrime :: Int -> Int
nthPrime n = runST (sieve n)

sieve n = do
    a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
    let cutoff = truncate (sqrt $ fromIntegral n) + 1
    go a n cutoff 3 1

go !a !m cutoff !n !c
    | n >= m    = return c
    | otherwise = do
        e <- unsafeRead a n
        if e then
          if n < cutoff then
            let loop !j
                 | j < m     = do
                        x <- unsafeRead a j
                        when x $ unsafeWrite a j False
                        loop (j+n)
                 | otherwise = go a m cutoff (n+2) (c+1)
            in loop ( if n < 46340 then n * n else n `shiftL` 1)
           else go a m cutoff (n+2) (c+1)
         else go a m cutoff (n+2) c

And place in a module:

{-# OPTIONS -fth #-}
import Primes

main = print $( let x = nthPrime 10000000 in [| x |] )

Run as:

$ ghc --make -o primes Main.hs
$ time ./primes
664579
./primes  0.00s user 0.01s system 228% cpu 0.003 total

Implicit Heap

See Implicit Heap.

Prime Wheels

See Prime Wheels.

Using IntSet for a traditional sieve

See Using IntSet for a traditional sieve.

Testing Primality, and Integer Factorization

See Testing primality:

One-liners

See primes one-liners.

External links

http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip

A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.

WARNING: Don't use the priority queue from older versions of that file for your projects: it's broken and works for primes only by a lucky chance. The revised version of the file fixes the bug, as pointed out by Eugene Kirpichov on February 24, 2009 on the haskell-cafe mailing list, and fixed by Bertram Felgenhauer.

test entries for (some of) the above code variants.

Speed/memory comparison table for sieve of Eratosthenes variants.

Empirical orders of growth on Wikipedia.

@@ Line 1: / Line 1: @@
+In mathematics, ''amongst the natural numbers greater than 1'', a ''prime number'' (or a ''prime'') is such that has no divisors other than itself (and 1).
 == Prime Number Resources ==
-In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2.
+* At Wikipedia:
-[http://www.haskell.org/haskellwiki/Prime_numbers Prime Numbers] at Wikipedia.
+**[http://en.wikipedia.org/wiki/Prime_numbers Prime Numbers]
+**[http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes Sieve of Eratosthenes]
+* HackageDB packages:
-[http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes Sieve of Eratosthenes] at Wikipedia.
+** [http://hackage.haskell.org/package/arithmoi arithmoi]: Various basic number theoretic functions; efficient array-based sieves, Montgomery curve factorization ...
+** [http://hackage.haskell.org/package/Numbers Numbers]: An assortment of number theoretic functions.
+** [http://hackage.haskell.org/package/NumberSieves NumberSieves]: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
+** [http://hackage.haskell.org/package/primes primes]: Efficient, purely functional generation of prime numbers.
+* Papers:
-HackageDB packages:
+** O'Neill, Melissa E., [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf "The Genuine Sieve of Eratosthenes"], Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 doi:10.1017/S0956796808007004.
+== Definition ==
-[http://hackage.haskell.org/package/Numbers Numbers]: An assortment of number theoretic functions.
+In mathematics, ''amongst the natural numbers greater than 1'', a ''prime number'' (or a ''prime'') is such that has no divisors other than itself (and 1). The smallest prime is thus 2. Non-prime numbers are known as ''composite'', i.e. those representable as product of two natural numbers greater than 1.
-[http://hackage.haskell.org/package/NumberSieves NumberSieves]: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
+To find out a prime's multiples we can either '''a.''' test each new candidate number for divisibility by that prime, giving rise to a kind of ''trial division'' algorithm; or '''b.''' we can directly generate the multiples of a prime ''p'' by counting up from it in increments of ''p'', resulting in a variant of the ''sieve of Eratosthenes''.
-[http://hackage.haskell.org/package/primes primes]: Efficient, purely functional generation of prime numbers.
+The set of prime numbers is thus
-== Finding Primes ==
+: &nbsp;&nbsp; '''P''' &nbsp;= {''n'' &isin; '''N'''<sub>2</sub> ''':''' (&forall; ''m'' &isin; '''N'''<sub>2</sub>) ((''m'' | ''n'') &rArr; m = n)}
-Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other then itself. That means that starting with 2, for each newly found prime we can eliminate from the rest of the numbers all such numbers that have this prime as their divisor. This is known as "sieving" the natural numbers, so that in the end what we are left with are just primes.
+::  = {''n'' &isin; '''N'''<sub>2</sub> ''':''' (&forall; ''m'' &isin; '''N'''<sub>2</sub>) (''m''&times;''m'' &le; ''n'' &rArr; &not;(''m'' | ''n''))}
-=== Simple Primes Sieve ===
-The following is a direct translation of this idea, generating a list of all the prime numbers in the universe:
+::  = '''N'''<sub>2</sub> \ {''n''&times;''m'' ''':''' ''n'',''m'' &isin; '''N'''<sub>2</sub>}
+:: = '''N'''<sub>2</sub> \ '''&#8899;''' { {''n''&times;''m'' ''':''' ''m'' &isin; '''N'''<sub>n</sub>} ''':''' ''n'' &isin; '''N'''<sub>2</sub> }
+:: = '''N'''<sub>2</sub> \ '''&#8899;''' { {''n''&times;''n'', ''n''&times;''n''+''n'', ''n''&times;''n''+''n''+''n'', ...} ''':''' ''n'' &isin; '''N'''<sub>2</sub> }
+:: = '''N'''<sub>2</sub> \ '''&#8899;''' { {''p''&times;''p'', ''p''&times;''p''+''p'', ''p''&times;''p''+''p''+''p'', ...} ''':''' ''p'' &isin; '''P''' }
+:::: &nbsp; where &nbsp; &nbsp; '''N'''<sub>k</sub> = {''n'' &isin; '''N''' ''':''' ''n'' &ge; k}
+Thus starting with 2, for each newly found prime we can ''eliminate'' from the rest of the numbers ''all the multiples'' of this prime, giving us the next available number as next prime. This is known as ''sieving'' the natural numbers, so that in the end all the composites are eliminated and what we are left with are just primes. <small>(This is what the last formula is describing, though seemingly [http://en.wikipedia.org/wiki/Impredicativity impredicative], because it is self-referential. But because '''N'''<sub>2</sub> is well-ordered (with the order being preserved under addition), the formula is well-defined.)</small>
+In ''pseudocode'', this can be written as
 <haskell>
-  primes :: [Integer]
+        primes = [2..] \ [[p*p, p*p+p..] | p <- primes]
-  primes = sieve [2..]
-   where
-    sieve (p:xs)
-         = p : sieve [x | x<-xs, x `mod` p /= 0]
-                     -- or:  filter ((/=0).(`mod`p)) xs
 </haskell>
+Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an [[#Tree merging with Wheel|efficient ''list''-based code]] we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results incrementally, one by one.
-While appearing simple, this method is <i>extremely inefficient</i> and not recommended for more than a few thousand prime numbers, even when compiled. Here, every number is tested for divisibility by each prime up to its smallest factor, so any prime is checked against all its preceding primes, when in fact just those not greater than its square root would suffice. That means e.g. that to find the 1001st prime (<code>7927</code>) 1000 filters are used, when in fact just the first 24 are needed (upto <code>89</code>'s filter only).
+== Sieve of Eratosthenes ==
-So it in effect creates a cascade of filters in front of the infinite numbers supply, in an <i>extremely premature</i> fashion. One way of fixing that would be to <i>postpone</i> the creation of filters until the right moment, as in
+Simplest, bounded, ''very'' inefficient formulation:
+<haskell>
+primesTo m = sieve [2..m]       {- (\\) is set-difference for unordered lists -}
+             where
+             sieve (x:xs) = x : sieve (xs Data.List.\\ [x,x+x..m])
+             sieve [] = []
+</haskell>
+The (unbounded) sieve of Eratosthenes calculates primes as ''integers above 1 that are not multiples of primes'', i.e. ''not composite'' &mdash; whereas composites are found as enumeration of multiples of each prime, generated by counting up from prime's square in constant increments equal to that prime (or twice that much, for odd primes):
-==== Postponed Filters Prime Sieve ====
 <haskell>
+primes = 2 : 3 : ([5,7..] `minus` _U [[p*p, p*p+2*p..] | p <- tail primes])
-  primes :: [Integer]
-  primes = 2: 3: sieve (tail primes) [5,7..]
+{- Using `under n = takeWhile (<= n)`, with ordered increasing lists,
-   where
+   `minus`, `union` and `_U` satisfy, for any `n`:
-    sieve (p:ps) xs
-         = h ++ sieve ps [x | x<-t, x `rem` p /= 0]
+       under n (minus a b) == under n a \\ under n b
-                         -- or:  filter ((/=0).(`rem`p)) t
+       under n (union a b) == nub . sort $ under n a ++ under n b
-             where (h,~(_:t)) = span (< p*p) xs
+       under n . _U == nub . sort . concat
+                                  . takeWhile (not.null) . map (under n) -}
 </haskell>
+The definition is primed with 2 and 3 as initial primes, to avoid the vicious circle.
-This only tests odd numbers, and by the least amount of primes for each numbers span between successive squares of primes.
+The ''"big union"'' <code>_U</code> function can be defined as the folding of <code>(\(x:xs) -> (x:) . union xs)</code>; or it could use a <code>Bool</code> array as a sorting and duplicates-removing device. The processing naturally divides up into the segments between successive squares of primes.
-Whereas the first version exhibits near O(<math>{n^2}</math>) behavior, this one exhibits near O(<math>{n^{1.5}}</math>) behavior, and is good for creating a few 100,000s of primes (compiled). It takes as much time to generate <i>100,000 primes</i> with it as it takes for the first one to generate <i>5500</i>, GHC-compiled.
+Stepwise development follows (the fully developed version is [[#Tree merging with Wheel|here]]).
-==== Primality Test and Integer Factorization ====
+=== Initial definition ===
-Given an infinite list of prime numbers, we can implement primality tests and integer factorization:
+First of all, working with ''ordered'' increasing lists, the sieve of Eratosthenes can be genuinely represented by
 <haskell>
+ -- genuine yet wasteful sieve of Eratosthenes
-  isPrime n = n > 1 && n == head (primeFactors n)
+ -- primes = eratos [2.. ]               -- unbounded
+primesTo m = eratos [2..m]               -- bounded, up to m
+  where
+    eratos []     = []
+    eratos (p:xs) = p : eratos (xs `minus` [p, p+p..])
+ -- eratos (p:xs) = p : eratos (xs `minus` map (p*) [1..])
+ -- eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))
+ -- turner (p:xs) = p : turner [x | x <- xs, rem x p /= 0]
+</haskell>
+This should be regarded more like a ''specification'', not a code. It runs at [https://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth empirical orders of growth] worse than quadratic in number of primes produced. But it has the core defining features of the classical formulation of S. of E. as '''''a.''''' being bounded, i.e. having a top limit value, and '''''b.''''' finding out the multiples of a prime directly, by counting up from it in constant increments, equal to that prime.
-  primeFactors 1 = []
-  primeFactors n = go n primes
+The canonical list-difference <code>minus</code> and duplicates-removing <code>union</code> functions (cf. [http://hackage.haskell.org/packages/archive/data-ordlist/latest/doc/html/Data-List-Ordered.html Data.List.Ordered package]) are:
-     where
+<haskell>
-     go n ps@(p:pt)
+ -- ordered lists, difference and union
-        | p*p > n        = [n]
+minus (x:xs) (y:ys) = case (compare x y) of
-        | n `rem` p == 0 = p : go (n `quot` p) ps
-        | otherwise      = go n pt
+           LT -> x : minus  xs  (y:ys)
+           EQ ->     minus  xs     ys
+           GT ->     minus (x:xs)  ys
+minus  xs     _     = xs
+union (x:xs) (y:ys) = case (compare x y) of
+           LT -> x : union  xs  (y:ys)
+           EQ -> x : union  xs     ys
+           GT -> y : union (x:xs)  ys
+union  xs     []    = xs
+union  []     ys    = ys
 </haskell>
+The name ''merge'' ought to be reserved for duplicates-preserving merging operation of the merge sort.
-=== Simple Prime Sieve II ===
-The following method is slightly faster, creating <i>114,000 primes</i> in the same period of time. It works well for generating a few 100,000s of primes as well:
+=== Analysis ===
+So for each newly found prime ''p'' the sieve discards its multiples, enumerating them by counting up in steps of ''p''. There are thus <math>O(m/p)</math> multiples generated and eliminated for each prime, and <math>O(m \log \log(m))</math> multiples in total, with duplicates, by virtues of [http://en.wikipedia.org/wiki/Prime_harmonic_series prime harmonic series].
+If each multiple is dealt with in <math>O(1)</math> time, this will translate into <math>O(m \log \log(m))</math> RAM machine operations (since we consider addition as an atomic operation). Indeed mutable random-access arrays allow for that. But lists in Haskell are sequential-access, and complexity of <code>minus(a,b)</code> for lists is <math>\textstyle O(|a \cup b|)</math> instead of <math>\textstyle O(|b|)</math> of the direct access destructive array update. The lower the complexity of each ''minus'' step, the better the overall complexity.
+So on ''k''-th step the argument list <code>(p:xs)</code> that the <code>eratos</code> function gets, starts with the ''(k+1)''-th prime, and consists of all the numbers &le; ''m'' coprime with all the primes &le; ''p''. According to the M. O'Neill's article (p.10) there are <math>\textstyle\Phi(m,p) \in \Theta(m/\log p)</math> such numbers.
+It looks like <math>\textstyle\sum_{i=1}^{n}{1/log(p_i)} = O(n/\log n)</math> for our intents and purposes. Since the number of primes below ''m'' is <math>n = \pi(m) = O(m/\log(m))</math> by the prime number theorem (where <math>\pi(m)</math> is a prime counting function), there will be ''n'' multiples-removing steps in the algorithm; it means total complexity of at least <math>O(m n/\log(n)) = O(m^2/(\log(m))^2)</math>, or <math>O(n^2)</math> in ''n'' primes produced - much much worse than the optimal <math>O(n \log(n) \log\log(n))</math>.
+=== From Squares ===
+But we can start each elimination step at a prime's square, as its smaller multiples will have been already produced and discarded on previous steps, as multiples of smaller primes. This means we can stop early now, when the prime's square reaches the top value ''m'', and thus cut the total number of steps to around <math>\textstyle n = \pi(m^{0.5}) = \Theta(2m^{0.5}/\log m)</math>. This does not in fact change the complexity of random-access code, but for lists it makes it <math>O(m^{1.5}/(\log m)^2)</math>, or <math>O(n^{1.5}/(\log n)^{0.5})</math> in ''n'' primes produced, a dramatic speedup:
 <haskell>
+primesToQ m = eratos [2..m]
-  primes :: [Integer]
+  where
-  primes = 2:filter isPrime [3,5..]
-     where
+    eratos []     = []
-     isPrime n   = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes
+    eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
+ -- eratos (p:xs) = p : eratos (xs `minus` map (p*) [p..div m p])
-     divides n p = n `mod` p == 0
+ -- eulers (p:xs) = p : eulers (xs `minus` map (p*) (under (div m p) (p:xs)))
+ -- turner (p:xs) = p : turner [x | x<-xs, x<p*p || rem x p /= 0]
 </haskell>
+Its empirical complexity is around <math>O(n^{1.45})</math>. This simple optimization works here because this formulation is bounded (by an upper limit). To start late on a bounded sequence is to stop early (starting past end makes an empty sequence &ndash; ''see warning below''<sup><sub> 1</sub></sup>), thus preventing the creation of all the superfluous multiples streams which start above the upper bound anyway <small>(note that Turner's sieve is unaffected by this)</small>. This is acceptably slow now, striking a good balance between clarity, succinctness and efficiency.
-Instead of relying on nested filters, it tests each odd number by an explicit list of all the needed prime factors, but it recomputes this list, which will be the same for the increasing spans of numbers between the successive squares of primes.
+<sup><sub>1</sub></sup><small>''Warning'': this is predicated on a subtle point of  <code>minus xs [] = xs</code> definition being used, as it indeed should be. If the definition <code>minus (x:xs) [] = x:minus xs []</code> is used, the problem is back and the complexity is bad again.</small>
-=== Simple Prime Sieve III ===
+=== Guarded ===
-This is faster still, creating <i>158,000 primes</i> (again, GHC-compiled) in the same time span:
+This ought to be ''explicated'' (improving on clarity, though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from ''p'' or ''p*p'' - because it explicitly ''stops as soon as possible'':
+<haskell>
+primesToG m = 2 : sieve [3,5..m]
+  where
+    sieve (p:xs)
+       | p*p > m   = p : xs
+       | otherwise = p : sieve (xs `minus` [p*p, p*p+2*p..])
+                  -- p : sieve (xs `minus` map (p*) [p,p+2..])
+                  -- p : eulers (xs `minus` map (p*) (p:xs))
+</haskell>
+(here we also flatly ignore all evens above 2 a priori.) It is now clear that it ''can't'' be made unbounded just by abolishing the upper bound ''m'', because the guard can not be simply omitted without changing the complexity back for the worst.
+=== Accumulating Array ===
+So while <code>minus(a,b)</code> takes <math>O(|b|)</math> operations for random-access imperative arrays and about <math>O(|a|)</math> operations here for ordered increasing lists of numbers, using Haskell's immutable array for ''a'' one ''could'' expect the array update time to be nevertheless closer to <math>O(|b|)</math> if destructive update were used implicitly by compiler for an array being passed along as an accumulating parameter:
 <haskell>
+{-# OPTIONS_GHC -O2 #-}
-  primes :: [Integer]
+import Data.Array.Unboxed
-  primes = 2:3:go 5 [] (tail primes)
-    where
+primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]]
-      divisibleBy d n = n `mod` d == 0
-      go start ds (p:ps) = let q = p*p in
+                        :: UArray Int Bool)
+  where
-          foldr (\d -> filter (not . divisibleBy d)) [start, start+2..q-2] ds
+    sieve p a
-          ++ go (q+2) (p:ds) ps
+      | p*p > m   = 2 : [i | (i,True) <- assocs a]
+      | a!p       = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
+      | otherwise = sieve (p+2) a
 </haskell>
+Indeed for unboxed arrays, with the type signature added explicitly <small>(suggested by Daniel Fischer)</small>, the above code runs pretty fast, with empirical complexity of about ''<math>O(n^{1.15..1.45})</math>'' in ''n'' primes produced (for producing from few hundred thousands to few millions primes, memory usage also slowly growing). But it relies on specific compiler optimizations, and indeed if we remove the explicit type signature, the code above turns ''very'' slow.
-This one explicitly maintains the list of primes needed for testing each odds span between successive primes squares, which it also explicitly generates. But it tests with nested <code>filter</code>s, which it repeatedly recreates.
+How can we write code that we'd be certain about? One way is to use explicitly mutable monadic arrays ([[#Using Mutable Arrays|''see below'']]), but we can also think about it a little bit more on the functional side of things still.
-=== Simple Prime Sieve IV ===
+=== Postponed ===
-The list of primes needed to test each range of odds is actually just the prefix of the primes list itself, of known length, and need not be specifically generated at all. Combined with one-call testing by the explicit list of primes, and direct generation of odds between the successive primes squares, this leads to:
+Going back to ''guarded'' Eratosthenes, first we notice that though it works with minimal number of prime multiples streams, it still starts working with each prematurely. Fixing this with explicit synchronization won't change complexity but will speed it up some more:
+<haskell>
+primesPE1 = 2 : sieve [3..] primesPE1
+  where
+    sieve xs (p:ps) | q <- p*p , (h,t) <- span (< q) xs =
+                   h ++ sieve (t `minus` [q, q+p..]) ps
+                -- h ++ turner [x|x<-t, rem x p /= 0] ps
+</haskell>
+<!-- primesPE1 = 2 : sieve [3,5..] 9 (tail primesPE1)
+  where
+    sieve xs q ~(p:t) | (h,ys) <- span (< q) xs =
+                   h ++ sieve (ys `minus` [q, q+2*p..]) (head t^2) t
+                -- h ++ turner [x | x <- ys, rem x p /= 0] ...
+-->
+Inlining and fusing <code>span</code> and <code>(++)</code> we get:
+<haskell>
+primesPE = 2 : oddprimes
+  where
+    oddprimes = sieve [3,5..] 9 oddprimes
+    sieve (x:xs) q ps@ ~(p:t)
+      | x < q     = x : sieve xs q ps
+      | otherwise =     sieve (xs `minus` [q, q+2*p..]) (head t^2) t
+</haskell>
+Since the removal of a prime's multiples here starts at the right moment, and not just from the right place, the code could now finally be made unbounded. Because no multiples-removal process is started ''prematurely'', there are no ''extraneous'' multiples streams, which were the reason for the original formulation's extreme inefficiency.
+=== Segmented ===
+With work done segment-wise between the successive squares of primes it becomes
 <haskell>
+primesSE = 2 : oddprimes
-  primes :: [Integer]
+  where
-  primes   = 2: 3: sieve 0 (tail primes) 5
-  sieve k (p:ps) x
+    oddprimes = sieve 3 9 oddprimes []
-           = let fs = take k (tail primes) in
+    sieve x q ~(p:t) fs =
-             [x | x<-[x,x+2..p*p-2], and [x`rem`p/=0 | p<-fs]]
+        foldr (flip minus) [x,x+2..q-2]                   -- chain of subtractions
-                             -- or:  all ((>0).(x`rem`)) fs
+                           [[y+s, y+2*s..q] | (s,y) <- fs]     -- OR,
+        -- [x,x+2..q-2] `minus` foldl union []            -- subtraction of merged
-             ++ sieve (k+1) ps (p*p+2)
+        --                    [[y+s, y+2*s..q] | (s,y) <- fs]            --  lists
+        ++ sieve (q+2) (head t^2) t
+                ((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])
+</haskell>
+This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting ''one by one'' the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times.
+The advantage to working in spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each ''finite'' span; and memory usage can be kept in check by using fixed sized segments.
+====Segmented Tree-merging====
+Rearranging the chain of subtractions into a subtraction of merged streams ''([[#Linear merging|see below]])'' and using [[#Tree merging|tree-like folding]] structure, further [http://ideone.com/pfREP speeds up the code] and ''significantly'' improves its asymptotic time behavior (down to about <math>O(n^{1.28} empirically)</math>, space is leaking though):
+<haskell>
+primesSTE = 2 : oddprimes
+  where
+    oddprimes = sieve 3 9 oddprimes []
+    sieve x q ~(p:t) fs =
+        ([x,x+2..q-2] `minus` joinST [[y+s, y+2*s..q] | (s,y) <- fs])
+        ++ sieve (q+2) (head t^2) t
+                   ((++ [(2*p,q)]) [(s,q-rem (q-y) s) | (s,y) <- fs])
+joinST (xs:t) = (union xs . joinST . pairs) t
+  where
+    pairs (xs:ys:t) = union xs ys : pairs t
+    pairs t         = t
+joinST []     = []
 </haskell>
+====Segmented merging via an array====
-It produces about <i>222,000 primes</i> in the same time span, and is good for creating about a million first primes, compiled.
+The removal of composites is easy with arrays. Starting points can be calculated directly:
-The reason to have <code>sieve</code> function available separately too is that it can also be used to produce primes above a given number, as in
 <haskell>
+import Data.List (inits)
-  primesFrom m   = sieve (length h) ps $ m`div`2*2+1
+import Data.Array.Unboxed
-    where
-      (h,(_:ps)) = span (<= (floor.sqrt.fromIntegral) m) primes
+primesSAE = 2 : sieve 3 4 (tail primesSAE) (inits primesSAE)
+  where
+  sieve x q ps (fs:ft) = [i | (i,True) <- assocs (
+         accumArray (\ _ _ -> False)
+                    True  (x,q-1)
+                    [(i,()) | p <- fs, let c = p * div (x+p-1) p,
+                              i <- [c, c+p..q-1]] :: UArray Int Bool )]
+      ++ sieve q (head ps^2) (tail ps) ft
 </haskell>
+=== Linear merging ===
-It can thus produce a few primes e.g. above <code>239812076741689</code>, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions).
+But segmentation doesn't add anything substantially, and each multiples stream starts at its prime's square anyway. What does the [[#Postponed|Postponed]] code do, operationally? With each prime's square passed by, there emerges a  nested linear ''left-deepening'' structure, '''''(...((xs-a)-b)-...)''''', where '''''xs''''' is the original odds-producing ''[3,5..]'' list, so that each odd it produces must go through more and more <code>minus</code> nodes on its way up - and those odd numbers that eventually emerge on top are prime. Thinking a bit about it, wouldn't another, ''right-deepening'' structure, '''''(xs-(a+(b+...)))''''', be better? This idea is due to Richard Bird, seen in his code presented in M. O'Neill's article, equivalent to:
+<haskell>
+primesB = 2 : minus [3..] (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r)
+                                 [] primesB)
+</haskell>
+or,
+<haskell>
-=== Sieve of Eratosthenes ===
+primesLME1 = 2 : prs
+  where
+    prs = 3 : minus [5,7..] (joinL [[p*p, p*p+2*p..] | p <- prs])
+joinL ((x:xs):t) = x : union xs (joinL t)
+</haskell>
+Here, ''xs'' stays near the top, and ''more frequently'' odds-producing streams of multiples of smaller primes are ''above'' those of the bigger primes, that produce ''less frequently'' their multiples which have to pass through ''more'' <code>union</code> nodes on their way up. Plus, no explicit synchronization is necessary anymore because the produced multiples of a prime start at its square anyway - just some care has to be taken to avoid a runaway access to the indefinitely-defined structure, defining <code>joinL</code> (or <code>foldr</code>'s combining function) to produce part of its result ''before'' accessing the rest of its input (making it ''productive'').
-All the versions thus far try to <i>keep the primes</i> among all numbers by testing <i>each number</i> in isolation. Testing composites is cheap (as most of them will have small factors, so the test is soon aborted), but testing prime numbers is costly. The sieve of Eratosthenes tries to <i>get rid of composites</i> by working on <i>spans</i> of numbers <i>at once</i> and thus gets its primes <i>"for free"</i>, as gaps between the generated/marked/crossed-off composites:
+Melissa O'Neill [http://hackage.haskell.org/packages/archive/NumberSieves/0.0/doc/html/src/NumberTheory-Sieve-ONeill.html introduced double primes feed] to prevent unneeded memoization (a memory leak). We can even do multistage. Here's the code, faster still and with radically reduced memory consumption, with empirical orders of growth of around ~ <math>n^{1.40}</math> (initially better, yet worsening for bigger ranges):
 <haskell>
+primesLME = 2 : _Y ((3:) . minus [5,7..] . joinL . map (\p-> [p*p, p*p+2*p..]))
-  primes :: [Integer]
-  primes = 2: 3: 5: sieve [] (tail primes) 7
-   where
-    sieve fs (p:ps) x = gaps x comps
-                        ++ sieve ((2*p,q+2*p):fs') ps (q+2)
-     where
-      gaps x (y:ys) | x==y = gaps (x+2) ys
-                    | True = x: gaps (x+2) (y:ys)
-      gaps x []            = if x==q then [] else [x]
-      q                    = p*p
-      comps                = foldl mrg [] mults
-      (mults,fs')          = unzip (map mark fs)
-      mark (s,x)           = (h,(s,x'))
-                             where (h,~(x':_)) = span(<q)[x,x+s..]
-      mrg u@(a:t) w@(b:r)
-                    | a<b  = a: t `mrg` w
-                    | b<a  = b: u `mrg` r
-                    | True = a: t `mrg` r
-      mrg u       w        = if null u then w else u
-</haskell>
+_Y   :: (t -> t) -> t
-This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve would do, marking one by one the multiples of the relevant primes - and then combs through them looking for gaps.
+_Y g = g (_Y g)                -- multistage, non-sharing, recursive
+       -- g x where x = g x    -- two stages, sharing, corecursive
+</haskell>
+<code>_Y</code> is a non-sharing fixpoint combinator, here arranging for a recursive ''"telescoping"'' multistage primes production (a ''tower'' of producers).
-Compared with the previous version, if compiled with GHC, it is steadily 1.2x slower and takes 3.5x more memory. But interpreted under both GHCi and WinHugs, it runs <i>faster</i>, takes <i>less</i> memory, and has better asymptotic behavior.
+This allows the <code>primesLME</code> stream to be discarded immediately as it is being consumed by its consumer. With <code>primes'</code> from <code>primesLME1</code> definition above it is impossible, as each produced element of <code>primes'</code> is needed later as input to the same <code>primes'</code> corecursive stream definition. So the <code>primes'</code> stream feeds in a loop into itself, and thus is retained in memory. With multistage production, each stage feeds into its consumer above it at the square of its current element which can be immediately discarded after it's been consumed. <code>(3:)</code> jump-starts the whole thing.
-==== Euler's Sieve ====
+=== Tree merging ===
-This is the sieve of Eratosthenes variation which is directly generating, starting with 2, for each newly found prime all numbers that <i>would</i> have this prime as their divisor - i.e. its multiples - to be eliminated from the rest of the numbers (starting at its square of course), resulting in a straightforward modification of the postponed-filters sieve:
+Moreover, it can be changed into a '''''tree''''' structure. This idea [http://www.haskell.org/pipermail/haskell-cafe/2007-July/029391.html is due to Dave Bayer] and  [[Prime_numbers_miscellaneous#Implicit_Heap|Heinrich Apfelmus]]:
 <haskell>
+primesTME = 2 : _Y ((3:) . gaps 5 . joinT . map (\p-> [p*p, p*p+2*p..]))
-  primes :: [Integer]
-  primes = 2: 3: sieve (tail primes) [5,7..]
-   where
-    sieve (p:ps) xs
-             = h ++ sieve ps (t `minus` [q+2*p,q+4*p..])
-                 where (h,~(_:t)) = span (< q) xs
-                       q = p*p
-    xs@(x:xt) `minus` ys@(y:yt)
-      | x<y  = x:xt `minus` ys
-      | x>y  =   xs `minus` yt
-      | True =   xt `minus` yt
-</haskell>
+-- joinL ((x:xs):t) = x : union xs (joinL t)
-This is easily adaptable to wheel optimization.
+joinT ((x:xs):t) = x : union xs (joinT (pairs t))    -- set union, ~=
+  where  pairs (xs:ys:t) = union xs ys : pairs t     --    nub.sort.concat
+gaps k s@(x:xs) | k < x = k:gaps (k+2) s       -- ~= [k,k+2..]\\s,
-==== Merged Multiples Sieve ====
+                | True  =   gaps (k+2) xs      --   when null(s\\[k,k+2..])
+</haskell>
+This code is [http://ideone.com/p0e81 pretty fast], running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about <math>O(n^{1.2})</math> in number of primes ''n'' produced).
+As an aside, <code>joinT</code> is equivalent to [[Fold#Tree-like_folds|infinite tree-like folding]] <code>foldi (\(x:xs) ys-> x:union xs ys) []</code>:
+[[Image:Tree-like_folding.gif|frameless|center|458px|tree-like folding]]
+=== Tree merging with Wheel ===
-The explicit partitioning into spans between the successive primes squares is actually superfluous and is implicitly achieved when looking for gaps in the directly merged inifinite primes multiples streams (as if turning <code>(...((s-a)-b)-...)</code> into <code>(s-(a+b+..))</code> in the Euler's Sieve definition):
+Wheel factorization optimization can be further applied, and another tree structure can be used which is better adjusted for the primes multiples production (effecting about 5%-10% at the top of a total ''2.5x speedup'' w.r.t. the above tree merging on odds only, for first few million primes):
 <haskell>
+primesTMWE = [2,3,5,7] ++ _Y ((11:) . tail  . gapsW 11 wheel
-  primes :: [Integer]
+                                    . joinT . hitsW 11 wheel)
-  primes = (2:) . (3:) . gaps 5 $
-           foldr (\p-> let q=p*p
-                       in (q:).merge [q+2*p,q+4*p..])
+gapsW k (d:w) s@(c:cs) | k < c     = k : gapsW (k+d) w s    -- set difference
-                 [] (tail primes)
+                       | otherwise =     gapsW (k+d) w cs   --   k==c
+hitsW k (d:w) s@(p:ps) | k < p     =     hitsW (k+d) w s    -- intersection
-   where
+                       | otherwise = scanl (\c d->c+p*d) (p*p) (d:w)
-    gaps x s@(y:ys)
-            | x==y    = gaps (x+2) ys
+                                       : hitsW (k+d) w ps   --   k==p
+wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
-            | True    = x: gaps (x+2) s
+:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
-    merge a@(x:xs) b@(y:ys)
-            | x > y   = y:merge a  ys
-            | x < y   = x:merge xs b
-            | True    = x:merge xs ys
 </haskell>
+The <code>hitsW</code> is there to find the start point for rolling the wheel for each prime, but it can be found directly:
-This code is the fastest of all presented so far. Here the specific clause order used in <code>merge</code> achieves additional 30% - 50% speedup.
+<haskell>
+primesW = [2,3,5,7] ++ _Y ( (11:) . tail  . gapsW 11 wheel . joinT .
+                            map (\p->
+                              map (p*) . dropWhile (< p) $
+                                scanl (+) (p - rem (p-11) 210) wheel) )
+</haskell>
+Seems to run about 1.4x faster, too.
-=== Prime Wheels ===
+====Above Limit - Offset Sieve====
-The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form <math>6k+1</math> or <math>6k+5</math>. Thus, we only need to test these numbers:
+Another task is to produce primes above a given value:
+<haskell>
+{-# OPTIONS_GHC -O2 -fno-cse #-}
+primesFromTMWE primes m = dropWhile (< m) [2,3,5,7,11]
+                          ++ gapsW a wh2 (compositesFrom a)
+  where
+    (a,wh2) = rollFrom (snapUp (max 3 m) 3 2)
+    (h,p2:t) = span (< z) $ drop 4 primes           -- p < z => p*p<=a
+    z = ceiling $ sqrt $ fromIntegral a + 1         -- p2>=z => p2*p2>a
+    compositesFrom a = joinT (joinST [multsOf p  a    | p <- h ++ [p2]]
+                                   : [multsOf p (p*p) | p <- t] )
+snapUp v o step = v + (mod (o-v) step)              -- full steps from o
+multsOf p from = scanl (\c d->c+p*d) (p*x) wh       -- map (p*) $
+  where                                             --   scanl (+) x wh
+    (x,wh) = rollFrom (snapUp from p (2*p) `div` p) --   , if p < from
+wheelNums = scanl (+) 0 wheel
+rollFrom n = go wheelNums wheel
+  where m = (n-11) `mod` 210
+        go (x:xs) ws@(w:ws2) | x < m = go xs ws2
+                             | True  = (n+x-m, ws)  -- (x >= m)
+</haskell>
+A certain preprocessing delay makes it worthwhile when producing more than just a few primes, otherwise it degenerates into simple [[#Optimal trial division|trial division]], which is then ought to be used directly:
+<haskell>
+primesFrom m = filter isPrime [m..]
+</haskell>
+=== Map-based ===
+Runs ~1.7x slower than [[#Tree_merging|TME version]], but with the same empirical time complexity, ~<math>n^{1.2}</math> (in ''n'' primes produced) and  same very low (near constant) memory consumption:
+<haskell>
+import Data.List               -- based on http://stackoverflow.com/a/1140100
+import qualified Data.Map as M
+primesMPE :: [Integer]
+primesMPE = 2:mkPrimes 3 M.empty prs 9   -- postponed addition of primes into map;
+  where                                  -- decoupled primes loop feed
+    prs = 3:mkPrimes 5 M.empty prs 9
+    mkPrimes n m ps@ ~(p:t) q = case (M.null m, M.findMin m) of
+        (False, (n2, skips)) | n == n2 ->
+             mkPrimes (n+2) (addSkips n (M.deleteMin m) skips) ps q
+        _ -> if n<q
+             then    n : mkPrimes (n+2)  m                  ps q
+             else        mkPrimes (n+2) (addSkip n m (2*p)) t (head t^2)
+    addSkip n m s = M.alter (Just . maybe [s] (s:)) (n+s) m
+    addSkips = foldl' . addSkip
+</haskell>
+== Turner's sieve - Trial division ==
+David Turner's ''(SASL Language Manual, 1983)'' formulation replaces non-standard <code>minus</code> in the sieve of Eratosthenes by stock list comprehension with <code>rem</code> filtering, turning it into a trial division algorithm:
 <haskell>
+  -- unbounded sieve, premature filters
-primes :: [Integer]
+primesT = sieve [2..]
-primes = 2:3:primes'
   where
-:p:candidates = [6*k+r | k <- [0..], r <- [1,5]]
+    sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]
+  -- map head
-    primes'        = p : filter isPrime candidates
-    isPrime n      = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes'
+  --   $ iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..]
-    divides n p    = n `mod` p == 0
 </haskell>
+This creates many superfluous implicit filters, because they are created prematurely. To be admitted as prime, ''each number'' will be ''tested for divisibility'' here by all its preceding primes, while just those not greater than its square root would suffice. To find e.g. the '''1001'''st prime (<code>7927</code>), '''1000''' filters are used, when in fact just the first '''24''' are needed (up to <code>89</code>'s filter only). Operational overhead here is huge.
-Here, <hask>primes'</hask> is the list of primes greater than 3 and <hask>isPrime</hask> does not test for divisibility by 2 or 3 because the <hask>candidates</hask> by construction don't have these numbers as factors. We also need to exclude 1 from the candidates and mark the next one as prime to start the recursion.
+=== Guarded Filters ===
-Such a scheme to generate candidate numbers first that avoid a given set of primes as divisors is called a '''prime wheel'''. Imagine that you had a wheel of circumference 6 to be rolled along the number line. With spikes positioned 1 and 5 units around the circumference, rolling the wheel will prick holes exactly in those positions on the line whose numbers are not divisible by 2 and 3.
+But this really ought to be changed into bounded and guarded variant, [[#From Squares|again achieving]] the ''"miraculous"'' complexity improvement from above quadratic to about <math>O(n^{1.45})</math> empirically (in ''n'' primes produced):
-A wheel can be represented by its circumference and the spiked positions.
 <haskell>
+primesToGT m = sieve [2..m]
-data Wheel = Wheel Integer [Integer]
+  where
+    sieve (p:xs)
+      | p*p > m = p : xs
+      | True    = p : sieve [x | x <- xs, rem x p > 0]
+  -- (\(a,b:_) -> map head a ++ b) . span ((< m).(^2).head)
+  --   $ iterate (\(p:xs) -> [x | x <- xs, rem x p > 0]) [2..m]
 </haskell>
-We prick out numbers by rolling the wheel.
+=== Postponed Filters ===
+Or it can remain unbounded, just filters creation must be ''postponed'' until the right moment:
 <haskell>
+primesPT1 = 2 : sieve primesPT1 [3..]
-roll (Wheel n rs) = [n*k+r | k <- [0..], r <- rs]
+  where
+    sieve (p:ps) xs = let (h,t) = span (< p*p) xs
+                      in h ++ sieve ps [x | x <- t, rem x p > 0]
+  -- fix $ concatMap (fst . fst)
+  --     . iterate (\((_,xs), p:ps) -> let (h,t) = span (< p*p) xs in
+  --                              ((h, [x | x <- t, rem x p > 0]), ps))
+  --     . (,) ([2],[3..])
 </haskell>
+This is better re-written with <code>span</code> and <code>(++)</code> inlined and fused into the <code>sieve</code>:
-The smallest wheel is the unit wheel with one spike, it will prick out every number.
 <haskell>
+primesPT = 2 : oddprimes
-w0 = Wheel 1 [1]
+  where
+    oddprimes = sieve [3,5..] 9 oddprimes
+    sieve (x:xs) q ps@ ~(p:t)
+      | x < q = x : sieve xs q ps
+      | True  =     sieve [x | x <- xs, rem x p /= 0] (head t^2) t
 </haskell>
+creating here [[#Linear merging |as well]] the linear nested structure at run-time, <code>(...(([3,5..] >>= filterBy [3]) >>= filterBy [5])...)</code>, where <code>filterBy ds n = [n | noDivs n ds]</code> (see <code>noDivs</code> definition below) &thinsp;&ndash;&thinsp; but unlike the original code, each filter being created at its proper moment, not sooner than the prime's square is seen.
-We can create a larger wheel by rolling a smaller wheel of circumference <hask>n</hask> along a rim of circumference <hask>p*n</hask> while excluding spike positions at multiples of <hask>p</hask>.
+=== Optimal trial division ===
+The above is equivalent to the traditional formulation of trial division,
 <haskell>
+ps = 2 : [i | i <- [3..],
-nextSize (Wheel n rs) p =
-  Wheel (p*n) [r' | k <- [0..(p-1)], r <- rs, let r' = n*k+r, r' `mod` p /= 0]
+              and [rem i p > 0 | p <- takeWhile (\p -> p^2 <= i) ps]]
 </haskell>
+or,
-Combining both, we can make wheels that prick out numbers that avoid a given list <hask>ds</hask> of divisors.
 <haskell>
+noDivs n fs = foldr (\f r -> f*f > n || (rem n f > 0 && r)) True fs
-mkWheel ds = foldl nextSize w0 ds
+-- primes = filter (`noDivs`[2..]) [2..]
+--        = 2 : filter (`noDivs`[3,5..]) [3,5..]
+primesTD = 2 : 3 : filter (`noDivs` tail primesTD) [5,7..]
+isPrime n = n > 1 && noDivs n primesTD
 </haskell>
+except that this code is rechecking for each candidate number which primes to use, whereas for every candidate number in each segment between the successive squares of primes these will just be the same prefix of the primes list being built.
+Trial division is used as a simple [[Testing primality#Primality Test and Integer Factorization|primality test and prime factorization algorithm]].
-Now, we can generate prime numbers with a wheel that for instance avoids all multiples of 2, 3, 5 and 7.
+=== Segmented Generate and Test ===
+Next we turn [[#Postponed Filters |the list of filters]] into one filter by an ''explicit'' list, each one in a progression of prefixes of the primes list. This seems to eliminate most recalculations, explicitly filtering composites out from batches of odds between the consecutive squares of primes.
 <haskell>
+import Data.List
-primes :: [Integer]
-primes = small ++ large
+primesST = 2 : oddprimes
-    where
+  where
-:p:candidates = roll $ mkWheel small
-    small          = [2,3,5,7]
+    oddprimes = sieve 3 9 oddprimes (inits oddprimes)  -- [],[3],[3,5],...
+    sieve x q ~(_:t) (fs:ft) =
-    large          = p : filter isPrime candidates
+      filter ((`all` fs) . ((/=0).) . rem) [x,x+2..q-2]
-    isPrime n      = all (not . divides n) $ takeWhile (\p -> p*p <= n) large
-    divides n p    = n `mod` p == 0
+      ++ sieve (q+2) (head t^2) t ft
 </haskell>
-It's a pretty big wheel with a circumference of 210 and allows us to calculate the first 10000 primes in convenient time.
+==== Generate and Test Above Limit ====
-A fixed size wheel is fine, but how about adapting the wheel size while generating prime numbers? See the [[Research papers/Functional pearls|functional pearl]] titled [http://citeseer.ist.psu.edu/runciman97lazy.html Lazy wheel sieves and spirals of primes] for more.
+The following will start the segmented Turner sieve at the right place, using any primes list it's supplied with (e.g. [[#Tree_merging_with_Wheel | TMWE]] etc.) or itself, as shown, demand computing it just up to the square root of any prime it'll produce:
-=== Implicit Heap ===
+<haskell>
-The following is a more efficient prime generator, implementing the sieve of
+primesFromST m | m > 2 =
-Eratosthenes.
+    sieve (m`div`2*2+1) (head ps^2) (tail ps) (inits ps)
+  where
+    (h,ps) = span (<= (floor.sqrt $ fromIntegral m+1)) oddprimes
+    sieve x q ps (fs:ft) =
+      filter ((`all` (h ++ fs)) . ((/=0).) . rem) [x,x+2..q-2]
+      ++ sieve (q+2) (head ps^2) (tail ps) ft
+    oddprimes = 3 : primesFromST 5     -- odd primes
+</haskell>
+This is usually faster than testing candidate numbers for divisibility [[#Optimal trial division|one by one]] which has to re-fetch anew the needed prime factors to test by, for each candidate. Faster is the [[99_questions/Solutions/39#Solution_4.|offset sieve of Eratosthenes on odds]], and yet faster the one [[#Above_Limit_-_Offset_Sieve|w/ wheel optimization]], on this page.
+=== Conclusions ===
+All these variants being variations of trial division, finding out primes by direct divisibility testing of every candidate number by sequential primes below its square root (instead of just by ''its factors'', which is what ''direct generation of multiples'' is doing, essentially), are thus principally of worse complexity than that of Sieve of Eratosthenes.
+The initial code is just a one-liner that ought to have been regarded as ''executable specification'' in the first place. It can easily be improved quite significantly with a simple use of bounded, guarded formulation to limit the number of filters it creates, or by postponement of filter creation.
+== Euler's Sieve ==
-See also the message threads [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/25270/focus=25312 Re: "no-coding" functional data structures via lazyness] for more about how merging ordered lists amounts to creating an implicit heap and [http://thread.gmane.org/gmane.comp.lang.haskell.cafe/26426/focus=26493 Re: Code and Perf. Data for Prime Finders] for an explanation of the <hask>People a</hask> structure that makes it work when tying a knot.
+=== Unbounded Euler's sieve ===
+With each found prime Euler's sieve removes all its multiples ''in advance'' so that at each step the list to process is guaranteed to have ''no multiples'' of any of the preceding primes in it (consists only of numbers ''coprime'' with all the preceding primes) and thus starts with the next prime:
 <haskell>
+primesEU = 2 : eulers [3,5..]
-data People a = VIP a (People a) | Crowd [a]
+  where
+    eulers (p:xs) = p : eulers (xs `minus` map (p*) (p:xs))
+ -- eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+2*p..])
+</haskell>
+This code is extremely inefficient, running above <math>O({n^{2}})</math> empirical complexity (and worsening rapidly), and should be regarded a ''specification'' only. Its memory usage is very high, with empirical space complexity just below <math>O({n^{2}})</math>, in ''n'' primes produced.
-mergeP :: Ord a => People a -> People a -> People a
-mergeP (VIP x xt) ys                    = VIP x $ mergeP xt ys
-mergeP (Crowd xs) (Crowd ys)            = Crowd $ merge  xs ys
-mergeP xs@(Crowd ~(x:xt)) ys@(VIP y yt) = case compare x y of
-    LT -> VIP x $ mergeP (Crowd xt) ys
-    EQ -> VIP x $ mergeP (Crowd xt) yt
-    GT -> VIP y $ mergeP xs yt
+In the stream-based sieve of Eratosthenes we are able to ''skip'' along the input stream <code>xs</code> directly to the prime's square, consuming the whole prefix at once, thus achieving the results equivalent to the postponement technique, because the generation of the prime's multiples is independent of the rest of the stream.
+But here in the Euler's sieve it ''is'' dependent on all <code>xs</code> and we're unable ''in principle'' to skip along it to the prime's square - because all <code>xs</code> are needed for each prime's multiples generation. Thus efficient unbounded stream-based implementation seems to be impossible in principle, under the simple scheme of producing the multiples by multiplication.
-merge :: Ord a => [a] -> [a] -> [a]
-merge xs@(x:xt) ys@(y:yt) = case compare x y of
-    LT -> x : merge xt ys
-    EQ -> x : merge xt yt
-    GT -> y : merge xs yt
+=== Wheeled list representation ===
-diff xs@(x:xt) ys@(y:yt) = case compare x y of
-    LT -> x : diff xt ys
-    EQ ->     diff xt yt
-    GT ->     diff xs yt
+The situation can be somewhat improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:
-foldTree :: (a -> a -> a) -> [a] -> a
+<haskell>
-foldTree f ~(x:xs) = f x . foldTree f . pairs $ xs
+{- fromElt (x,i) = x : fromElt (x + i,i)
-    where pairs ~(x: ~(y:ys)) = f x y : pairs ys
+                       === iterate  (+ i) x
+     [n..]             === fromElt  (n,1)
+                       === fromSpan ([n],1)
+     [n,n+2..]         === fromElt  (n,2)
+                       === fromSpan ([n,n+2],4)     -}
+fromSpan (xs,i)  = xs ++ fromSpan (map (+ i) xs,i)
-primes, nonprimes :: [Integer]
-primes    = 2:3:diff [5,7..] nonprimes
+{-                   === concat $ iterate (map (+ i)) xs
-nonprimes = serve . foldTree mergeP . map multiples $ tail primes
+   fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)
+   fromSpan (xs,i) `minus` fromSpan (ys,i)
+                     === fromSpan (xs `minus` ys, i)
+   map (p*) (fromSpan (xs,i))
+                     === fromSpan (map (p*) xs, p*i)
+   fromSpan (xs,i)   === forall (p > 0).
+     fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}
+spanSpecs = iterate eulerStep ([2],1)
+eulerStep (xs@(p:_), i) =
+       ( (tail . concat . take p . iterate (map (+ i))) xs
+          `minus` map (p*) xs, p*i )
+{- > mapM_ print $ take 4 spanSpecs
+     ([2],1)
+     ([3],2)
+     ([5,7],6)
+     ([7,11,13,17,19,23,29,31],30)  -}
+</haskell>
+Generating a list from a span specification is like rolling a ''[[#Prime_Wheels|wheel]]'' as its pattern gets repeated over and over again. For each span specification <code>w@((p:_),_)</code> produced by <code>eulerStep</code>, the numbers in <code>(fromSpan w)</code> up to <math>{p^2}</math> are all primes too, so that
+<haskell>
+eulerPrimesTo m = if m > 1 then go ([2],1) else []
     where
-    multiples p = vip [p*k | k <- [p,p+2..]]
+      go w@((p:_), _)
+         | m < p*p = takeWhile (<= m) (fromSpan w)
+         | True    = p : go (eulerStep w)
+</haskell>
+This runs at about <math>O(n^{1.5..1.8})</math> complexity, for <code>n</code> primes produced, and also suffers from a severe space leak problem (IOW its memory usage is also very high).
-    vip (x:xs)       = VIP x $ Crowd xs
-    serve (VIP x xs) = x:serve xs
+== Using Immutable Arrays ==
-    serve (Crowd xs) = xs
+=== Generating Segments of Primes ===
+The sieve of Eratosthenes' [[#Segmented|removal of multiples on each segment of odds]] can be done by actually marking them in an array, instead of manipulating ordered lists, and can be further sped up more than twice by working with odds only:
+<haskell>
+import Data.Array.Unboxed
+primesSA :: [Int]
+primesSA = 2 : oddprimes ()
+  where
+    oddprimes () = 3 : sieve (oddprimes ()) 3 []
+    sieve (p:ps) x fs = [i*2 + x | (i,True) <- assocs a]
+                        ++ sieve ps (p*p) ((p,0) :
+                             [(s, rem (y-q) s) | (s,y) <- fs])
+     where
+      q = (p*p-x)`div`2
+      a :: UArray Int Bool
+      a = accumArray (\ b c -> False) True (1,q-1)
+                     [(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]
 </haskell>
+Runs significantly faster than [[#Tree merging with Wheel|TMWE]] and with better empirical complexity, of about <math>O(n^{1.10..1.05})</math> in producing first few millions of primes, with constant memory footprint.
-<hask>nonprimes</hask> effectively implements a heap, exploiting lazy evaluation.
-=== Bitwise prime sieve ===
+=== Calculating Primes Upto a Given Value ===
+Equivalent to [[#Accumulating Array|Accumulating Array]] above, running somewhat faster (compiled by GHC with optimizations turned on):
+<haskell>
+{-# OPTIONS_GHC -O2 #-}
+import Data.Array.Unboxed
+primesToNA n = 2: [i | i <- [3,5..n], ar ! i]
+  where
+    ar = f 5 $ accumArray (\ a b -> False) True (3,n)
+                        [(i,()) | i <- [9,15..n]]
+    f p a | q > n = a
+          | True  = if null x then a2 else f (head x) a2
+      where q = p*p
+            a2  :: UArray Int Bool
+            a2 = a // [(i,False) | i <- [q, q+2*p..n]]
+            x  = [i | i <- [p+2,p+4..n], a2 ! i]
+</haskell>
+=== Calculating Primes in a Given Range ===
+<haskell>
+primesFromToA a b = (if a<3 then [2] else [])
+                      ++ [i | i <- [o,o+2..b], ar ! i]
+  where
+    o  = max (if even a then a+1 else a) 3   -- first odd in the segment
+    r  = floor . sqrt $ fromIntegral b + 1
+    ar = accumArray (\_ _ -> False) True (o,b)  -- initially all True,
+          [(i,()) | p <- [3,5..r]
+                    , let q  = p*p      -- flip every multiple of an odd
+                          s  = 2*p                         -- to False
+                          (n,x) = quotRem (o - q) s
+                          q2 = if  o <= q  then q
+                               else  q + (n + signum x)*s
+                    , i <- [q2,q2+s..b] ]
+</haskell>
+Although sieving by odds instead of by primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.
+== Using Mutable Arrays ==
+Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.
+=== Using ST Array ===
+This method implements the Sieve of Eratosthenes, similar to how you might do it
+in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently packed) sieve array for the whole sequence to process.
+<haskell>
+import Control.Monad
+import Control.Monad.ST
+import Data.Array.ST
+import Data.Array.Unboxed
+sieveUA :: Int -> UArray Int Bool
+sieveUA top = runSTUArray $ do
+    let m = (top-1) `div` 2
+        r = floor . sqrt $ fromIntegral top + 1
+    sieve <- newArray (1,m) True      -- :: ST s (STUArray s Int Bool)
+    forM_ [1..r `div` 2] $ \i -> do
+      isPrime <- readArray sieve i
+      when isPrime $ do               -- ((2*i+1)^2-1)`div`2 == 2*i*(i+1)
+        forM_ [2*i*(i+1), 2*i*(i+2)+1..m] $ \j -> do
+          writeArray sieve j False
+    return sieve
+primesToUA :: Int -> [Int]
+primesToUA top = 2 : [i*2+1 | (i,True) <- assocs $ sieveUA top]
+</haskell>
+Its [http://ideone.com/KwZNc empirical time complexity] is improving with ''n'' (number of primes produced) from above <math>O(n^{1.20})</math> towards <math>O(n^{1.16})</math>. The reference [http://ideone.com/FaPOB C++ vector-based implementation] exhibits this improvement in empirical time complexity too, from <math>O(n^{1.5})</math> gradually towards <math>O(n^{1.12})</math>, where tested ''(which might be interpreted as evidence towards the expected [http://en.wikipedia.org/wiki/Computation_time#Linearithmic.2Fquasilinear_time quasilinearithmic] <math>O(n \log(n)\log(\log n))</math> time complexity)''.
+=== Bitwise prime sieve with Template Haskell ===
 Count the number of prime below a given 'n'. Shows fast bitwise arrays,
@@ Line 325: / Line 679: @@
         e <- unsafeRead a n
         if e then
-            if n < cutoff then
+          if n < cutoff then
-                let loop !j
+            let loop !j
-                        | j < m     = do
+                 | j < m     = do
-                            x <- unsafeRead a j
+                        x <- unsafeRead a j
-                            when x $ unsafeWrite a j False
+                        when x $ unsafeWrite a j False
-                            loop (j+n)
+                        loop (j+n)
-                        | otherwise = go a m cutoff (n+2) (c+1)
+                 | otherwise = go a m cutoff (n+2) (c+1)
-                in loop ( if n < 46340 then n * n else n `shiftL` 1)
+            in loop ( if n < 46340 then n * n else n `shiftL` 1)
-             else go a m cutoff (n+2) (c+1)
+           else go a m cutoff (n+2) (c+1)
          else go a m cutoff (n+2) c
 </haskell>
-And places in a module:
+And place in a module:
 <haskell>
@@ Line 355: / Line 709: @@
 </haskell>
+== Implicit Heap ==
-=== Using IntSet for a traditional sieve ===
-<haskell>
+See [[Prime_numbers_miscellaneous#Implicit_Heap | Implicit Heap]].
-module Sieve where
-import qualified Data.IntSet as I
+== Prime Wheels ==
--- findNext - finds the next member of an IntSet.
-findNext c is | I.member c is = c
-              | c > I.findMax is = error "Ooops. No next number in set."
-              | otherwise = findNext (c+1) is
+See [[Prime_numbers_miscellaneous#Prime_Wheels | Prime Wheels]].
--- mark - delete all multiples of n from n*n to the end of the set
-mark n is = is I.\\ (I.fromAscList (takeWhile (<=end) (map (n*) [n..])))
-                where
-                    end = I.findMax is
+== Using IntSet for a traditional sieve ==
--- primes - gives all primes up to n
-primes n = worker 2 (I.fromAscList [2..n])
-                where
-                    worker x is
-                     | (x*x) > n = is
-                     | otherwise = worker (findNext (x+1) is) (mark x is)
-</haskell>
+See [[Prime_numbers_miscellaneous#Using_IntSet_for_a_traditional_sieve | Using IntSet for a traditional sieve]].
-== Testing Primality ==
-=== Miller-Rabin Primality Test ===
+== Testing Primality, and Integer Factorization ==
-<haskell>
-find2km :: Integral a => a -> (a,a)
+See [[Testing_primality | Testing primality]]:
-find2km n = f 0 n
-    where
+* [[Testing_primality#Primality_Test_and_Integer_Factorization | Primality Test and Integer Factorization ]]
-        f k m
+* [[Testing_primality#Miller-Rabin_Primality_Test | Miller-Rabin Primality Test]]
-            | r == 1 = (k,m)
-            | otherwise = f (k+1) q
+== One-liners ==
-            where (q,r) = quotRem m 2
+See [[Prime_numbers_miscellaneous#One-liners | primes one-liners]].
-millerRabinPrimality :: Integer -> Integer -> Bool
-millerRabinPrimality n a
-    | a <= 1 || a >= n-1 =
-        error $ "millerRabinPrimality: a out of range ("
-              ++ show a ++ " for "++ show n ++ ")"
-    | n < 2 = False
-    | even n = False
-    | b0 == 1 || b0 == n' = True
-    | otherwise = iter (tail b)
-    where
-        n' = n-1
-        (k,m) = find2km n'
-        b0 = powMod n a m
-        b = take (fromIntegral k) $ iterate (squareMod n) b0
-        iter [] = False
-        iter (x:xs)
-            | x == 1 = False
-            | x == n' = True
-            | otherwise = iter xs
-pow' :: (Num a, Integral b) => (a -> a -> a) -> (a -> a) -> a -> b -> a
-pow' _ _ _ 0 = 1
-pow' mul sq x' n' = f x' n' 1
-    where
-        f x n y
-            | n == 1 = x `mul` y
-            | r == 0 = f x2 q y
-            | otherwise = f x2 q (x `mul` y)
-            where
-                (q,r) = quotRem n 2
-                x2 = sq x
-mulMod :: Integral a => a -> a -> a -> a
-mulMod a b c = (b * c) `mod` a
-squareMod :: Integral a => a -> a -> a
-squareMod a b = (b * b) `rem` a
-powMod :: Integral a => a -> a -> a -> a
-powMod m = pow' (mulMod m) (squareMod m)
-</haskell>
 == External links ==
 * http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip
-: A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators. WARNING: Don't use the priority queue from that file for your projects: it's broken and works for primes only by a lucky chance.
+: A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.
+:  WARNING: Don't use the priority queue from ''older versions'' of that file for your projects: it's broken and works for primes only by a lucky chance. The ''revised'' version of the file fixes the bug, as pointed out by Eugene Kirpichov on February 24, 2009 on the [http://www.mail-archive.com/haskell-cafe@haskell.org/msg54618.html haskell-cafe] mailing list, and fixed by Bertram Felgenhauer.
+* [http://ideone.com/willness/primes test entries] for (some of) the above code variants.
+* Speed/memory [http://ideone.com/p0e81 comparison table] for sieve of Eratosthenes variants.
+* [http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth Empirical orders of growth] on Wikipedia.
 [[Category:Code]]