# Prime numbers

In mathematics, amongst the natural numbers greater than 1, a prime number (or a prime) is such that has no divisors other than itself (and 1). The smallest prime is thus 2. Non-prime numbers are known as composite, i.e. those representable as product of two natural numbers greater than 1. The set of prime numbers is thus P = N2 \ {n×m : n,mN2} = N2 \ ∪ { {n×n,n×n+n,n×n+2×n, ...}: nN2 }, where Nk = {nN: n ≥ k}.

Thus starting with 2, for each newly found prime we can eliminate from the rest of the numbers all the multiples of this prime, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end all the composites are eliminated and what we are left with are just primes.

To eliminate a prime's multiples we can either a. test each new candidate number for divisibility by that prime, giving rise to a kind of trial division algorithm; or b. we can directly generate the multiples of a prime p by counting up from it in increments of p, resulting in a variant of the sieve of Eratosthenes.

Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient list-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results incrementally, one by one.

## 1 Prime Number Resources

• At Wikipedia:
• HackageDB packages:
• arithmoi: Various basic number theoretic functions; efficient array-based sieves, Montgomery curve factorization ...
• Numbers: An assortment of number theoretic functions.
• NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
• primes: Efficient, purely functional generation of prime numbers.
• Papers:
• O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes", Journal of Functional Programming, Published online by Cambridge University Press 9 October 2008 doi:10.1017/S0956796808007004.

## 2 Sieve of Eratosthenes

To start with, the sieve of Eratosthenes can be genuinely represented by

-- genuine yet wasteful sieve of Eratosthenes
primesTo m = 2 : eratos [3,5..m]  where
eratos []     = []
eratos (p:xs) = p : eratos (xs minus [p, p+2*p..m])
-- eulers (p:xs) = p : eulers (xs minus map (p*)(p:xs))
-- turner (p:xs) = p : turner [x | x <- xs, rem x p /= 0]

This should be regarded more like a specification, not a code (the fully developed version is here). It is extremely slow, running at empirical time complexities worse than quadratic in number of primes produced. But it has the core defining features of the classical formulation of S. of E. as a. being bounded, i.e. having a top limit value, and b. finding out the multiples of a prime directly, by counting up from it, in increments of the prime's value (or twice that, to count on odds only).

The canonical list-difference minus and duplicates-removing list-union union functions dealing with ordered increasing lists - infinite as well as finite - are simple enough to define (cf. Leon P.Smith's Data.List.Ordered package):

-- ordered lists, difference and union
minus (x:xs) (y:ys) = case (compare x y) of
LT -> x : minus  xs  (y:ys)
EQ ->     minus  xs     ys
GT ->     minus (x:xs)  ys
minus  xs     _     = xs
union (x:xs) (y:ys) = case (compare x y) of
LT -> x : union  xs  (y:ys)
EQ -> x : union  xs     ys
GT -> y : union (x:xs)  ys
union  xs     []    = xs
union  []     ys    = ys

The name merge ought to be reserved for duplicates-preserving merging operation of mergesort.

### 2.1 Analysis

So for each newly found prime the sieve removes its odd multiples from further consideration. It finds them by counting up in steps of 2p. There are thus O(m / p) multiples generated and eliminated for each prime, and O(mloglog(m)) multiples in total, with duplicates, by virtues of prime harmonic series.

If each multiple is dealt with in O(1) time, this will translate into O(mloglog(m)) RAM machine operations (since we consider addition as an atomic operation). Indeed mutable random-access arrays allow for that. But lists in Haskell are sequential-access, and complexity of minus(a,b) for lists is $\textstyle O(|a \cup b|)$ instead of $\textstyle O(|b|)$ of the direct access destructive array update. The lower the complexity of each minus step, the better the overall complexity.

So on k-th step the argument list (p:xs) that the eratos function gets starts with the (k+1)-th prime, and consists of all the numbers ≤ m coprime with all the primes ≤ p. According to the M. O'Neill's article (p.10) there are $\textstyle\Phi(m,p) \in \Theta(m/\log p)$ such numbers.

It looks like $\textstyle\sum_{i=1}^{n}{1/log(p_i)} = O(n/\log n)$ for our intents and purposes. Since the number of primes below m is n = π(m) = O(m / log(m)) by the prime number theorem (where π(m) is a prime counting function), there will be n multiples-removing steps in the algorithm; it means total complexity of at least O(mn / log(n)) = O(m2 / (log(m))2), or O(n2) in n primes produced - much much worse than the optimal O(nlog(n)loglog(n)).

### 2.2 From Squares

But we can start each step at a prime's square, as its smaller multiples will have been already produced on previous steps. This means we can stop early, when the prime's square reaches the top value m, and thus cut the total number of steps to around $\textstyle n = \pi(m^{0.5}) = \Theta(2m^{0.5}/\log m)$. This does not in fact change the complexity of random-access code, but for lists it makes it O(m1.5 / (logm)2), or O(n1.5 / (logn)0.5) in n primes produced, a dramatic speedup:

primesToQ m = 2 : sieve [3,5..m]  where
sieve []     = []
sieve (p:xs) = p : sieve (xs minus [p*p, p*p+2*p..m])

Its empirical complexity is about O(n1.45). This simple optimization works here because our formulation is bounded, as is the original algorithm. To start late on a bounded sequence is to stop early (if we start past its end we don't need to start at all – see warning below), thus preventing the creation of all the superfluous multiples streams which started above the upper bound anyway. This is acceptably slow now, striking a good balance between clarity, succinctness and efficiency.

Warning: this is predicated on a subtle point of minus xs [] = xs definition being used, as it indeed should be. If the definition minus (x:xs) [] = x:minus xs [] is used, the problem is back and the complexity is bad again.

### 2.3 Guarded

This ought to be explicated (improving on clarity, though not on time complexity) as in the following, for which it is indeed a minor optimization whether to start from p or p*p - because it explicitly stops as soon as possible:

primesToG m = 2 : sieve [3,5..m]  where
sieve (p:xs)
| p*p > m   = p : xs
| otherwise = p : sieve (xs minus [p*p, p*p+2*p..])

It is now clear that it can't be made unbounded just by abolishing the upper bound m, because the guard can not be simply omitted without changing the complexity back for the worst.

### 2.4 Accumulating Array

So while minus(a,b) takes O( | b | ) operations for random-access imperative arrays and about O( | a | ) operations for lists here, using Haskell's immutable array for a one could expect the array update time to be nevertheless closer to O( | b | ) if destructive update were used implicitly by compiler for an array being passed along as an accumulating parameter:

import Data.Array.Unboxed

primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]]
:: UArray Int Bool)
where
sieve p a
| p*p > m   = 2 : [i | (i,True) <- assocs a]
where
primes' = 3 : (gaps 5 $joinT [[p*p, p*p+2*p..] | p <- primes']) joinT ((x:xs):t) = x : union xs (joinT (pairs t)) pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t gaps k s@(x:xs) | k<x = k:gaps (k+2) s -- equivalent to | True = gaps (k+2) xs -- [k,k+2..]minuss, k<=head s It is very fast, running at speeds and empirical complexities comparable with the code from Melissa O'Neill's article (about O(n1.2) in number of primes n produced). As an aside, joinT is equivalent to calling infinite tree-like folding foldi (\(x:xs) ys->x:union xs ys) []. The above can be rewritten as follows, using a standard (corecursive) definition for fix combinator from Data.Function to arrange for double primes feed automatically: primesTMEf = 2 : g (fix g) where g xs = 3 : gaps 5 (joinT [[x*x, x*x+2*x..] | x <- xs]) fix g = xs where xs = g xs -- global defn to avoid space leak ### 2.9 Tree merging with Wheel Wheel factorization optimization can be further applied, and another tree structure can be used which is better adjusted for the primes multiples production (effecting about 5%-10% at the top of a total 2.5x speedup w.r.t. the above tree merging on odds only - though complexity stays roughly the same): {-# OPTIONS_GHC -O2 -fno-cse #-} primesTMWE = 2:3:5:7: gapsW 11 wheel (joinT3$ rollW 11 wheel primes')
where
primes' = 11: gapsW 13 (tail wheel) (joinT3 $rollW 11 wheel primes') gapsW k ws@(w:t) cs@(c:u) | k==c = gapsW (k+w) t u | True = k : gapsW (k+w) t cs rollW k ws@(w:t) ps@(p:u) | k==p = scanl (\c d->c+p*d) (p*p) ws : rollW (k+w) t u | True = rollW (k+w) t ps joinT3 ((x:xs): ~(ys:zs:t)) = x : union xs (union ys zs) union joinT3 (pairs t) wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: 4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel The compiler switch -fno-cse is used here to prevent space leak. #### 2.9.1 Above Limit - Offset Sieve Another task is to produce primes above a given value (i.e. not from their ordinal numbers). {-# OPTIONS_GHC -O2 -fno-cse #-} primesFromTMWE primes m = (if m <= 2 then [2] else []) ++ (gapsW a wh'$ compositesFrom a)
where
(a,wh') = rollFrom (snap (max 3 m) 3 2)
(h,p':t) = span (< z) $drop 4 primes -- p < z => p*p<=a z = ceiling$ sqrt $fromIntegral a + 1 -- p'>=z => p'*p'>a snap v origin step = if r==0 then v else v+(step-r) where r = mod (v-origin) step compositesFrom a = joinT (joinST [multsOf p a | p <- h++[p']] : [multsOf p (p*p) | p <- t]) multsOf p from = scanl (\c d->c+p*d) (p*x) wh -- map (p*)$
where                                         --   scanl (+) x wh
(x,wh) = rollFrom (snap from p (2*p) div p)

wheelNums = scanl (+) 0 wheel
rollFrom n = go wheelNums wheel
where m  = (n-11) mod 210
go (x:xs) ws@(w:ws') | x < m = go xs ws'
| True  = (n+x-m, ws)     -- (x >= m)

A certain preprocessing delay makes it worthwhile when producing more than just a few primes, otherwise it degenerates into simple trial division, which is then ought to be used directly:

primesFrom m = filter isPrime [m..]

## 3 Turner's sieve - Trial division

David Turner's original 1975 formulation (SASL Language Manual, 1975) replaces non-standard minus in the sieve of Eratosthenes by stock list comprehension with rem filtering, turning it into a kind of trial division algorithm:

-- unbounded sieve, premature filters
primesT = 2 : sieve [3,5..]  where
sieve (p:xs) = p : sieve [x | x <- xs, rem x p /= 0]
-- filter ((/=0).(remp)) xs

This creates an immense number of superfluous implicit filters in extremely premature fashion. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes potentially, while just those not greater than its square root would suffice. To find e.g. the 1001st prime (7927), 1000 filters are used, when in fact just the first 24 are needed (up to 89's filter only). Operational overhead is enormous here.

### 3.1 Guarded Filters

But this really ought to be changed into bounded and guarded variant, again achieving the "miraculous" complexity improvement from above quadratic to about O(n1.45) empirically (in n primes produced):

primesToGT m = 2 : sieve [3,5..m]  where
sieve (p:xs)
| p*p > m = p : xs
| True    = p : sieve [x | x <- xs, rem x p /= 0]

### 3.2 Postponed Filters

or as unbounded, postponed filters-creation variant:

primesPT = 2 : primes'
where
primes' = sieve [3,5..] 9 primes'
sieve (x:xs) q ps@ ~(p:t)
| x < q   = x : sieve xs q ps
| True    =     sieve [x | x <- xs, rem x p /= 0] (head t^2) t

creating here as well the linear nested structure at run-time, (...(([3,5..] >>= filterBy [3]) >>= filterBy [5])...), where filterBy ds n = [n | noDivs ds n] (see noDivs definition below)  –  but unlike the original code, each filter being created at its proper moment, not sooner than the prime's square is seen.

### 3.3 Optimal trial division

The above is equivalent to the traditional formulation of trial division,

noDivs factors n = foldr (\f r -> f*f > n || (rem n f /= 0 && r))
True factors
-- primes = filter (noDivs [2..]) [2..]
sieve x q ps k = let fs = take k $tail primes in filter ((all fs) . ((/=0).) . rem) [x,x+2..q-2] ++ sieve (q+2) (head ps^2) (tail ps) (k+1) This is usually faster than testing candidate numbers for divisibility one by one which has to re-fetch anew the needed prime factors to test by, for each candidate. Faster is the offset sieve of Eratosthenes on odds, and yet faster the above one, w/ wheel optimization. ### 3.5 Conclusions All these variants of course being variations of trial division – finding out primes by direct divisibility testing of every odd number by sequential primes below its square root (instead of just by its factors, which is what direct generation of multiples is doing, essentially) – are thus of principally worse complexities than that of Sieve of Eratosthenes. The prototype code is very inefficient, yet improving significantly with just a simple use of bounded, guarded formulation. So, there's no harm in applying simple improvements to a code instead of just labeling it "naive". BTW were divisibility testing somehow turned into an O(1) operation, e.g. by some kind of massive parallelization, the overall complexity of trial division sieve would drop to O(n). It's the sequentiality of testing that's the culprit. Did Eratosthenes himself achieve the optimal complexity? It rather seems doubtful, as he probably counted boxes in the table by 1 to go from one number to the next, as in 3,5,7,9,11,13,15, ... for he had no access even to Hindu numerals, using Greek alphabet for writing down numbers instead. Was he performing a genuine sieve of Eratosthenes then? Should faithfulness of an algorithm's implementation be judged by its performance? So the initial Turner code is just a one-liner that ought to have been regarded as executable specification in the first place. The reason it was taught that way is probably so that it could provoke discussion among students, at the very least discovering the optimization technique of the postponement of filter-creation. ## 4 Euler's Sieve ### 4.1 Unbounded Euler's sieve With each found prime Euler's sieve removes all its multiples in advance so that at each step the list to process is guaranteed to have no multiples of any of the preceding primes in it (consists only of numbers coprime with all the preceding primes) and thus starts with the next prime: primesEU = 2 : euler [3,5..] where euler (p:xs) = p : euler (xs minus map (p*) (p:xs)) This code is extremely inefficient, running above O(n2) complexity (and worsening rapidly), and should be regarded a specification only. Its memory usage is very high, with space complexity just below O(n2), in n primes produced. ### 4.2 Wheeled list representation The situation can be somewhat improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences: {- fromElt (x,i) = x : fromElt (x + i,i) === iterate (+ i) x [n..] === fromElt (n,1) === fromSpan ([n],1) [n,n+2..] === fromElt (n,2) === fromSpan ([n,n+2],4) -} fromSpan (xs,i) = xs ++ fromSpan (map (+ i) xs,i) {- === concat$ iterate (map (+ i)) xs
fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)
fromSpan (xs,i) minus fromSpan (ys,i)
=== fromSpan (xs minus ys, i)
map (p*) (fromSpan (xs,i))
=== fromSpan (map (p*) xs, p*i)
fromSpan (xs,i)   === forall (p > 0).
fromSpan (concat $take p$ iterate (map (+ i)) xs, p*i) -}

spanSpecs = iterate eulerStep ([2],1)
eulerStep (xs@(p:_), i) =
( (tail . concat . take p . iterate (map (+ i))) xs
minus map (p*) xs, p*i )

{- > mapM_ print $take 4 spanSpecs ([2],1) ([3],2) ([5,7],6) ([7,11,13,17,19,23,29,31],30) -} Generating a list from a span specification is like rolling a wheel as its pattern gets repeated over and over again. For each span specification w@((p:_),_) produced by eulerStep, the numbers in (fromSpan w) up to p2 are all primes too, so that eulerPrimesTo m = if m > 1 then go ([2],1) else [] where go w@((p:_), _) | m < p*p = takeWhile (<= m) (fromSpan w) | True = p : go (eulerStep w) This runs at about O(n1.5..1.8) complexity, for n primes produced, and also suffers from a severe space leak problem (IOW its memory usage is also very high). ## 5 Using Immutable Arrays ### 5.1 Generating Segments of Primes The removal of multiples on each segment of odds can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays: import Data.Array -- import Data.Array.Unboxed primesSA = 2 : prs where prs = 3 : sieve prs 3 [] sieve (p:ps) x fs = [i*2 + x | (i,e) <- assocs a, e] ++ sieve ps (p*p) fs' where q = (p*p-x)div2 fs' = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs] -- a :: UArray Int Bool a = accumArray (\ b c -> False) True (1,q-1) [(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]] Run on Ideone.com it is somewhat faster than Tree Merging With Wheel in producing first million primes or two, but has worse time complexity and large memory footprint which quickly gets into hundreds of MBs. When the (commented out above) explicit type signature is added, making the same code work on unboxed arrays, the resulting GHC-compiled code runs more than twice faster and with better empirical complexity, of about O(n1.10..1.05) in producing first few millions of primes, with smaller though still growing memory footprint. Fixed sized segments are usually used in segmented sieves to limit the memory usage. ### 5.2 Calculating Primes Upto a Given Value primesToNA n = 2: [i | i <- [3,5..n], ar ! i] where ar = f 5$ accumArray (\ a b -> False) True (3,n)
[(i,()) | i <- [9,15..n]]
f p a | q > n = a
| True  = if null x then a' else f (head x) a'
where q = p*p
a'= a // [(i,False) | i <- [q, q+2*p..n]]
x = [i | i <- [p+2,p+4..n], a' ! i]

### 5.3 Calculating Primes in a Given Range

primesFromToA a b = (if a<3 then [2] else [])
++ [i | i <- [o,o+2..b], ar ! i]
where
o  = max (if even a then a+1 else a) 3
r  = floor . sqrt $fromInteger b + 1 ar = accumArray (\a b-> False) True (o,b) [(i,()) | p <- [3,5..r] , let q = p*p s = 2*p (n,x) = quotRem (o - q) s q' = if o <= q then q else q + (n + signum x)*s , i <- [q',q'+s..b] ] Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes. ## 6 Using Mutable Arrays Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell. ### 6.1 Using ST Array This method implements the Sieve of Eratosthenes, similar to how you might do it in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently packed) sieve array for the whole sequence to process. import Control.Monad import Control.Monad.ST import Data.Array.ST import Data.Array.Unboxed sieveUA :: Int -> UArray Int Bool sieveUA top = runSTUArray$ do
let m = (top-1) div 2
r = floor . sqrt $fromIntegral top + 1 sieve <- newArray (1,m) True -- :: ST s (STUArray s Int Bool) forM_ [1..r div 2]$ \i -> do
when isPrime $do -- ((2*i+1)^2-1)div2 == 2*i*(i+1) forM_ [2*i*(i+1), 2*i*(i+2)+1..m]$ \j -> do
writeArray sieve j False
return sieve

primesToUA :: Int -> [Int]
primesToUA top = 2 : [i*2+1 | (i,True) <- assocs $sieveUA top] Its empirical time complexity is improving with n (number of primes produced) from above O(n1.20) towards O(n1.16). The reference C++ vector-based implementation exhibits this improvement in empirical time complexity too, from O(n1.5) gradually towards O(n1.12), where tested (which might be interpreted as evidence towards the expected quasilinearithmic O(nlog(n)log(logn)) time complexity). ### 6.2 Bitwise prime sieve with Template Haskell Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies. {-# OPTIONS -O2 -optc-O -XBangPatterns #-} module Primes (nthPrime) where import Control.Monad.ST import Data.Array.ST import Data.Array.Base import System import Control.Monad import Data.Bits nthPrime :: Int -> Int nthPrime n = runST (sieve n) sieve n = do a <- newArray (3,n) True :: ST s (STUArray s Int Bool) let cutoff = truncate (sqrt$ fromIntegral n) + 1
go a n cutoff 3 1

go !a !m cutoff !n !c
| n >= m    = return c
| otherwise = do
if e then
if n < cutoff then
let loop !j
| j < m     = do
when x $unsafeWrite a j False loop (j+n) | otherwise = go a m cutoff (n+2) (c+1) in loop ( if n < 46340 then n * n else n shiftL 1) else go a m cutoff (n+2) (c+1) else go a m cutoff (n+2) c And place in a module: {-# OPTIONS -fth #-} import Primes main = print$( let x = nthPrime 10000000 in [| x |] )

Run as:

$ghc --make -o primes Main.hs$ time ./primes
664579
./primes  0.00s user 0.01s system 228% cpu 0.003 total

## 7 Implicit Heap

See Implicit Heap.

## 8 Prime Wheels

See Prime Wheels.