Prime numbers

From HaskellWiki
Revision as of 18:20, 6 February 2011 by WillNess (talk | contribs) (Treefold Merged Multiples, with Wheel: speed compare to ONeillPrimesTest.hs with -O2 switch on ghc 6.10.1)
Jump to: navigation, search

In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2.

Prime Number Resources

  • HackageDB packages:
    • Numbers: An assortment of number theoretic functions.
    • NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
    • primes: Efficient, purely functional generation of prime numbers.

Finding Primes

Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other than itself. That means that starting with 2, for each newly found prime we can eliminate from the rest of the numbers all such numbers that have this prime as their divisor, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end what we are left with are just primes.

To eliminate a prime's multiples from the result we can either a. plainly test each new candidate number for divisibility by that prime with a direct test, giving rise to a kind of "trial division" algorithm; or b. we can find out multiples of a prime p by counting up from it by p numbers at a time, resulting in a variant of a "genuine sieve" as it was reportedly originally conceived by Eratosthenes in ancient Greece.

Having a direct-access mutable arrays indeed enables easy marking off of these multiples on pre-allocated array as it is usually done in imperative languages; but to get an efficient list-based code we have to be smart about combining those streams of multiples of each prime - which gives us also the memory efficiency in generating the results one by one.

Filtering To Keep the Prime Numbers In

The Classic Turner's Sieve

The following is attributed to David Turner (SASL Language Manual, 1975), generating a list of all prime numbers:

  primes = sieve [2..]
    sieve (p:xs) = p : sieve [x | x <- xs, rem x p /= 0]  
                             -- or:  filter ((/=0).(`rem`p)) xs

This should be considered only a specification, not a code. When run as is, it is extremely inefficient because it starts up the filters prematurely, immediately after each prime, instead of only after the prime's square has been reached. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes, while just those not greater than its square root would suffice. This means that e.g. to find the 1001st prime (7927), 1000 filters are used, when in fact just the first 24 are needed (up to 89's filter only).

So this in effect creates a cascade of nested filters in front of the infinite numbers supply, and in extremely premature fashion at that. One way of fixing that would be to postpone the creation of filters until the right moment, by decoupling the primes supply from the numbers supply.

Postponed Filters Sieve

Here each filter's creation is postponed until the very moment it's needed.

  primes = 2: 3: sieve (tail primes) [5,7..]
    sieve (p:ps) xs = h ++ sieve ps [x | x <- t, rem x p /= 0]  
                                    -- or:  filter ((/=0).(`rem`p)) t
                      where (h,t) = span (< p*p) xs

This can be seen as the essential framework for all the code to come. It only tests odd numbers, and only by the primes that are needed, for each numbers span between successive squares of primes. To find the 1001st prime, the divisibility test is performed by only 24 nested filters corresponding to the first 24 odd primes.

Whereas Turner's sieve exhibits near O({n^2}) behavior (n referring to the number of primes produced), this one exhibits near O({n^{1.5}}) behavior, with an orders-of-magnitude speedup.

Odd numbers, by Trial Division

This is good for generating few 100,000s primes (when compiled):

  primes = 2: 3: filter isPrime [5,7..]
    isPrime n   = all (notDivs n) 
                    $ takeWhile (\p-> p*p <= n) (tail primes)
    notDivs n p = n `mod` p /= 0

Instead of relying on nested filters, it tests each odd number by an explicit list of all the needed prime factors. But for each number tested it re-fetches this list anew which will be the same for the increasing spans of numbers between the successive squares of primes.

Generated Spans, by Nested Filters

The other way to go instead of concentrating on the numbers supply, is to directly work on successive spans between the primes squares.

This version is a bit faster still, creating 158,000 primes (GHC-compiled) in the same time as the postponed filters does 100,000 primes:

primes = 2: 3: sieve (tail primes) 3 []
    notDivsBy d n     = n `mod` d /= 0
    sieve (p:ps) x ds = foldr (filter . notDivsBy) 
                                [x+2,x+4..p*p-2] ds
                        ++ sieve ps (p*p) (p:ds)

It explicitly maintains the list of primes needed for testing each odds span between successive primes squares, which it explicitly generates. But it tests with nested filters, which it repeatedly recreates.

Generated Spans, by List of Primes

So the Postponed Filters code, while running, creates a linear structure of nested filters, testing divisibility by successive primes. This implicit list of primes, each held in its filter's call frame (in Lisp terms) at run-time, can be made explicit, and used explicitly.

It is actually just the prefix of the primes list itself, of growing length. Now having been explicated, it can be used in one-test "flat" reformulation of the nested chain of tests. Algorithmically it is exactly the same - it still tests each candidate number for divisibility by all the primes not greater than its square root upto its lowest factor - but does so in a computationally different way, turning a list of (filter) functions into a function (and) on a list and inlining the span code, avoiding as much recalculations as possible:

primes    = 2: oddprimes
oddprimes = 3: sieve oddprimes 3 0
sieve (p:ps) x k 
          = [n | n <- [x+2,x+4..p*p-2]
                 , and [rem n p/=0 | p <- take k oddprimes]]
            ++ sieve ps (p*p) (k+1)

It produces about 222,000 primes in the same amount of time, and is good for creating about a million first primes, compiled.

The reason to have sieve function available separately too is that it can also be used to produce primes above a given number, as in

  primesFrom m   = sieve (length h) ps $ m`div`2*2+1 
      (h,(_:ps)) = span (<= (floor.sqrt.fromIntegral) m) primes

It can thus produce a few primes e.g. above 239812076741689, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions).

Getting the Composite Numbers Out

The divisibility testing too could be considered a specification (as in no multiples of p), and not a code per se, because although testing composites is cheap (as most of them will have small factors, so the test is soon aborted), testing prime numbers is costly (as all of the divisibility tests will get performed, and failed, for each new prime), and is to be avoided if possible.

Euler's Sieve

With each found prime Euler's sieve removes all its multiples in advance so that at each step the list to process is guaranteed to have no multiples of any of the preceding primes in it (consists only of numbers coprime with all the preceding primes) and thus starts with the next prime:

  import Data.List.Ordered (minus)

  euler xs@(p:xt) = p : euler (xt `minus` map (p*) xs)
  primes = euler [2..]

This code is extremely inefficient, running at about O({n^{2.4}}) complexity, and should be regarded a specification only. It works very hard trying to avoid double hits on multiples, which are relatively few and cheap to deal with in the first place (Turner's sieve could actually be seen as its rendition, substituting built-in  filter  for  minus - producing the same results and thus mathematically equivalent yet algorithmically different, turning a sieve into a trial division, essentially).

The situation can be improved using a different list representation, for generating lists not from a last element and an increment, but rather a last span and an increment, which entails a set of helpful equivalences:

  {- fromElt (x,i) = x : fromElt (x + i,i)
                       === iterate (+ i) x
     [n..]             === fromElt (n,1)     -}

  fromSpan (xs,i)  = xs ++ fromSpan (map (+ i) xs,i)

  {-                   === concat $ iterate (map (+ i)) xs
     [n..]             === fromSpan ([n],1)
     fromSpan (p:xt,i) === p : fromSpan (xt ++ [p + i], i)  
     fromSpan (xs,i) `minus` fromSpan (ys,i) 
                       === fromSpan (xs `minus` ys, i)  
     map (p*) (fromSpan (xs,i)) 
                       === fromSpan (map (p*) xs, p*i)
     fromSpan (xs,i)   === forall (p > 0).
       fromSpan (concat $ take p $ iterate (map (+ i)) xs, p*i) -}

  eulerS () = iterate eulerStep ([2],1)
  eulerStep (xs@(p:_), i) = 
       ( (tail . concat . take p . iterate (map (+ i))) xs
          `minus` map (p*) xs, p*i )

  {- > mapM_ print $ take 4 $ eulerS ()
     ([7,11,13,17,19,23,29,31],30)  -}

This is where the notion of wheel comes from naturally, all by itself. For each wheel specification w@((p:_),_) produced by eulerStep, the numbers in (fromSpan w) up to {p^2} are all primes too, so that

  eulerPrimesTo m = if m > 1 then go ([2],1) else []
      go w@((p:_), _) 
         | m < p*p = takeWhile (<= m) (fromSpan w)
         | True    = p : go (eulerStep w)

This runs at about O({n^{1.5}}) complexity, for n primes produced.

Postponed Multiples Removal

It is a small modification to Postponed Filters sieve to make it generate and remove the multiples instead of testing each candidate:

  primes = 2: 3: sieve (tail primes) [5,7..]  
    sieve (p:ps) xs = h ++ sieve ps (t `minus` [q,q+2*p..])
                      where q     = p*p
                            (h,t) = span (< q) xs

It is similar to Euler's sieve in that it removes multiples of each prime in advance from a numbers supply at each step before it is fed into the next step corresponding to the next prime; but will generate, and try to remove, some of the multiples more than once, like e.g. 3*15 and 5*9, unlike the true Euler's sieve which wouldn't generate the latter. So it is actually a kind of Sieve of Eratosthenes, removing primes' multiples while generating (counting) them one by one - just not the most computationally efficient one.

Multiples Removal on Generated Spans, or Sieve of Eratosthenes

Postponed multiples removal, with work done segment-wise between the successive primes squares in similarity to the Generated Spans, becomes

  primes = 2: 3: sieve (tail primes) 3 []
    sieve (p:ps) x fs = let q=p*p in
        foldr (flip minus) [x+2,x+4..q-2] 
                           [[y+s,y+2*s..q] | (s,y) <- fs]
        ++ sieve ps q 
               ((2*p,q):[(s,q-rem (q-y) s) | (s,y) <- fs])

This "marks" the odd composites in a given range by generating them - just as a person performing the original sieve of Eratosthenes would do, counting one by one the multiples of the relevant primes. These composites are independently generated so some will be generated multiple times. Rearranging the chain of subtractions into a subtraction of merged streams and using tree-like folding further speeds up the code and improves its asympthotic behaviour.

The advantage in working with spans explicitly is that this code is easily amendable to using arrays for the composites marking and removal on each finite span.

Merged Multiples Removal Sieve

The left-associating (...((xs-a)-b)-...) above buries the initial numbers supply xs (here, [5,7..] list) deeper and deeper on the left, with time. But it is the same as (xs-(a+b+...)), and so we can just remove from it the infinite primes multiples lists (all united into one without duplicates), each starting at its seeding prime's square (arriving at what is essentially equivalent to the Richard Bird's code from Melissa O'Neill's article). This way we needn't explicitly jump to a prime's square because it's where its multiples start anyway:

  import Data.List.Ordered (minus,union)

  primes = 2: 3: [5,7..] `minus` foldr union' []
                   [ [p*p,p*p+2*p..] | p<- tail primes ]
    union' (q:qs) xs = q : union qs xs

union' is discriminating on its first argument's head element to stop foldr from looping.

This code is yet faster. Its main deficiency still is that it creates a linear nested merging structure instead of a tree-like structure. Each multiple produced by a prime has to percolate to the top eventually, so it's better to make its path shorter. It'll have to go through fewer merge nodes this way.

Treefold Merged Multiples Removal

The new folding functions foldt and pairs too just have to be made discriminating on their first agrument to prevent looping, as union' above:

primes = 2: 3: 5: [7,9..] `minus` foldt
                    [ [p*p,p*p+2*p..] | p <- tail primes ]
    foldt ((x:xs):t)    = x : xs `union` foldt (pairs t)
    pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t

The simple fold structure of 1+(2+(4+(8+...))) can be further changed into (2+4)+((4+8)+( ... )) structure better adjusted for primes multiples production, giving it about 5% additional speedup.

It can be also further improved with the wheel optimization (as seen in Euler's Sieve above).

Treefold Merged Multiples, with Wheel

The odds-only feed 3: 5: [7,9..] above is exactly what fromSpan([3],2) produces. Using a larger wheel means all the multiples of the starting few primes, and not just of 2, will be automatically excluded in advance. This finally leads to:

 {-# OPTIONS_GHC -O2 -fno-cse #-}
 primes = 2:3:5:7: gaps 11 wheel (join $ roll 11 wheel primes')
   primes' = 11: gaps 13 (tail wheel) 
                                 (join $ roll 11 wheel primes')
   gaps k ws@(w:t) cs@(c:u) | k==c  = gaps (k+w) t u 
                            | True  = k : gaps (k+w) t cs  
   roll k ws@(w:t) ps@(p:u) | k==p  = scanl (\c d->c+p*d) (p*p) ws 
                                        : roll (k+w) t u 
                            | True  = roll (k+w) t ps    
   join ((x:xs): ~(ys:zs:t))  = x : union xs (union ys zs)    
                                      `union` join (pairs t)  
   pairs ((x:xs):ys:t)        = (x : union xs ys) : pairs t
   wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:

The double primes feed is introduced here to save memory as per Melissa O'Neill's code, and is dependent on no expression sharing being performed by a compiler. Running at comparable speed on as immutable arrays version, yet unlike it near constant and very low in its memory usage (low MBs for first few million primes), it could be considered a best performing "functional" (list-based) code version here.

Compared with Melissa O’Neill’s considerably more complex priority queue-based code it runs (compiled by ghc 6.10.1 with -O2 switch) about 1.6x slower at producing 1 million primes, and about 1.7x slower at 10 to 20 million, both versions having same low and near-constant memory consumption.

Using Immutable Arrays

Generating Segments of Primes

The removal of multiples on each segment of odds in the Sieve of Eratosthenes can be done by actually marking them in an array instead of manipulating the ordered lists, and can be further sped up more than twice by working with odds only, represented as their offsets in segment arrays:

  primes = 2: 3: sieve (tail primes) 3 []
    sieve (p:ps) x fs = [i*2 + x | (i,e) <- assocs a, e] 
                        ++ sieve ps (p*p) fs'
      q     = (p*p-x)`div`2                  
      fs'   = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
      a     = accumArray (\ b c -> False) True (1,q-1)
                         [(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]]

Apparently, arrays are fast. The above is the fastest code of all presented so far. When run on it is somewhat faster than wheeled treefold in producing first few million primes, but is very much unacceptable in its memory consumption which grows faster than O({n}), quickly getting into tens and hundreds of MBs.

Calculating Primes Upto a Given Value

  primesToN n = 2: [i | i <- [3,5..n], ar ! i]
    ar = f 5 $ accumArray (\ a b -> False) True (3,n) 
                        [(i,()) | i <- [9,15..n]]
    f p a | q > n = a
          | True  = if null x then a' else f (head x) a'
      where q = p*p
            a'= a // [(i,False) | i <- [q,q+2*p..n]]
            x = [i | i <- [p+2,p+4..n], a' ! i]

Calculating Primes in a Given Range

  primesFromTo a b = (if a<3 then [2] else []) 
                     ++ [i | i <- [o,o+2..b], ar ! i]
    o  = max (if even a then a+1 else a) 3
    r  = floor.sqrt.fromInteger $ b+1
    ar = accumArray (\a b-> False) True (o,b) 
          [(i,()) | p <- [3,5..r]
                    , let q  = p*p 
                          s  = 2*p 
                          (n,x) = quotRem (o - q) s 
                          q' = if  o <= q  then q
                               else  q + (n + signum x)*s
                    , i <- [q',q'+s..b] ]

Although using odds instead of primes, the array generation is so fast that it is very much feasible and even preferable for quick generation of some short spans of relatively big primes.

Using Mutable Arrays

Using mutable arrays is the fastest but not the most memory efficient way to calculate prime numbers in Haskell.

Using ST Array

This method implements the Sieve of Eratosthenes, similar to how you might do it in C, modified to work on odds only. It is fast, but about linear in memory consumption, allocating one (though apparently bit-packed) array for the whole sequence produced.

import Control.Monad
import Control.Monad.ST
import Data.Array.IArray
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed

primesToNA :: Int -> UArray Int Bool
primesToNA n = runSTUArray sieve  where
  sieve = do
    let m = (n-1)`div`2
    a <- newArray (1,m) True :: ST s (STUArray s Int Bool)
    let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1
    forM_ [`div`2] $ \i -> do
      let ii = 2*i*(i+1)     -- == ((2*i+1)^2-1)`div`2
      si <- readArray a i
      when si $
        forM_ [ii,ii+i+i+1..m] $ \j -> writeArray a j False
    return a
primesToN :: Int -> [Int]
primesToN n = 2:[i*2+1 | (i,e) <- assocs . primesToNA $n, e]

Bitwise prime sieve with Template Haskell

Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies.

{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits

nthPrime :: Int -> Int
nthPrime n = runST (sieve n)

sieve n = do
    a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
    let cutoff = truncate (sqrt $ fromIntegral n) + 1
    go a n cutoff 3 1

go !a !m cutoff !n !c
    | n >= m    = return c
    | otherwise = do
        e <- unsafeRead a n
        if e then
          if n < cutoff then
            let loop !j
                 | j < m     = do
                        x <- unsafeRead a j
                        when x $ unsafeWrite a j False
                        loop (j+n)
                 | otherwise = go a m cutoff (n+2) (c+1)
            in loop ( if n < 46340 then n * n else n `shiftL` 1)
           else go a m cutoff (n+2) (c+1)
         else go a m cutoff (n+2) c

And place in a module:

{-# OPTIONS -fth #-}
import Primes

main = print $( let x = nthPrime 10000000 in [| x |] )

Run as:

$ ghc --make -o primes Main.hs
$ time ./primes
./primes  0.00s user 0.01s system 228% cpu 0.003 total

Implicit Heap

The following is a more efficient prime generator, implementing the sieve of Eratosthenes.

See also the message threads Re: "no-coding" functional data structures via lazyness for more about how merging ordered lists amounts to creating an implicit heap and Re: Code and Perf. Data for Prime Finders for an explanation of the People a structure that makes it work when tying a knot.

data People a = VIP a (People a) | Crowd [a]

mergeP :: Ord a => People a -> People a -> People a
mergeP (VIP x xt) ys                    = VIP x $ mergeP xt ys
mergeP (Crowd xs) (Crowd ys)            = Crowd $ merge  xs ys
mergeP xs@(Crowd (x:xt)) ys@(VIP y yt)  = case compare x y of
    LT -> VIP x $ mergeP (Crowd xt) ys
    EQ -> VIP x $ mergeP (Crowd xt) yt
    GT -> VIP y $ mergeP xs yt

merge :: Ord a => [a] -> [a] -> [a]
merge xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : merge xt ys
    EQ -> x : merge xt yt
    GT -> y : merge xs yt

diff xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : diff xt ys
    EQ ->     diff xt yt
    GT ->     diff xs yt

foldTree :: (a -> a -> a) -> [a] -> a
foldTree f ~(x:xs) = x `f` foldTree f (pairs xs)
    where pairs ~(x: ~(y:ys)) = f x y : pairs ys

primes, nonprimes :: [Integer]
primes    = 2:3:diff [5,7..] nonprimes
nonprimes = serve . foldTree mergeP . map multiples $ tail primes
    multiples p = vip [p*p,p*p+2*p..]

    vip (x:xs)       = VIP x $ Crowd xs
    serve (VIP x xs) = x:serve xs
    serve (Crowd xs) = xs

nonprimes effectively implements a heap, exploiting lazy evaluation.

Prime Wheels

The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form 6k+1 or 6k+5. Thus, we only need to test these numbers:

primes :: [Integer]
primes = 2:3:primes'
    1:p:candidates = [6*k+r | k <- [0..], r <- [1,5]]
    primes'        = p : filter isPrime candidates
    isPrime n      = all (not . divides n)
                       $ takeWhile (\p -> p*p <= n) primes'
    divides n p    = n `mod` p == 0

Here, primes' is the list of primes greater than 3 and isPrime does not test for divisibility by 2 or 3 because the candidates by construction don't have these numbers as factors. We also need to exclude 1 from the candidates and mark the next one as prime to start the recursion.

Such a scheme to generate candidate numbers first that avoid a given set of primes as divisors is called a prime wheel. Imagine that you had a wheel of circumference 6 to be rolled along the number line. With spikes positioned 1 and 5 units around the circumference, rolling the wheel will prick holes exactly in those positions on the line whose numbers are not divisible by 2 and 3.

A wheel can be represented by its circumference and the spiked positions.

data Wheel = Wheel Integer [Integer]

We prick out numbers by rolling the wheel.

roll (Wheel n rs) = [n*k+r | k <- [0..], r <- rs]

The smallest wheel is the unit wheel with one spike, it will prick out every number.

w0 = Wheel 1 [1]

We can create a larger wheel by rolling a smaller wheel of circumference n along a rim of circumference p*n while excluding spike positions at multiples of p.

nextSize (Wheel n rs) p =
  Wheel (p*n) [r' | k <- [0..(p-1)], r <- rs,
                      let r' = n*k+r, r' `mod` p /= 0]

Combining both, we can make wheels that prick out numbers that avoid a given list ds of divisors.

mkWheel ds = foldl nextSize w0 ds

Now, we can generate prime numbers with a wheel that for instance avoids all multiples of 2, 3, 5 and 7.

primes :: [Integer]
primes = small ++ large
    1:p:candidates = roll $ mkWheel small
    small          = [2,3,5,7]
    large          = p : filter isPrime candidates
    isPrime n      = all (not . divides n) 
                       $ takeWhile (\p -> p*p <= n) large
    divides n p    = n `mod` p == 0

It's a pretty big wheel with a circumference of 210 and allows us to calculate the first 10000 primes in convenient time.

A fixed size wheel is fine, but how about adapting the wheel size while generating prime numbers? See Euler's Sieve above, or the functional pearl titled Lazy wheel sieves and spirals of primes for more.

Using IntSet for a traditional sieve

module Sieve where
import qualified Data.IntSet as I

-- findNext - finds the next member of an IntSet.
findNext c is | I.member c is = c
              | c > I.findMax is = error "Ooops. No next number in set."
              | otherwise = findNext (c+1) is

-- mark - delete all multiples of n from n*n to the end of the set
mark n is = is I.\\ (I.fromAscList (takeWhile (<=end) (map (n*) [n..])))
                    end = I.findMax is

-- primes - gives all primes up to n 
primes n = worker 2 (I.fromAscList [2..n])
                    worker x is 
                     | (x*x) > n = is
                     | otherwise = worker (findNext (x+1) is) (mark x is)

Testing Primality

See Testing primality.

External links

A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators; code by Melissa O'Neill.
WARNING: Don't use the priority queue from that file for your projects: it's broken and works for primes only by a lucky chance.
test entries for (some of) the above code variants