Difference between revisions of "Prime numbers"
m |
(restructuring the article a bit, adding new treefold merge section; arrays) |
||
Line 1: | Line 1: | ||
== Prime Number Resources == |
== Prime Number Resources == |
||
− | In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2. |
+ | In mathematics, a <i>prime number</i> (or a <i>prime</i>) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2. |
[http://www.haskell.org/haskellwiki/Prime_numbers Prime Numbers] at Wikipedia. |
[http://www.haskell.org/haskellwiki/Prime_numbers Prime Numbers] at Wikipedia. |
||
Line 16: | Line 16: | ||
== Finding Primes == |
== Finding Primes == |
||
− | Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other then itself. That means that starting with 2, <i>for each</i> newly found <i>prime</i> we can <i>eliminate</i> from the rest of the numbers <i>all such numbers</i> that have this prime as their divisor, giving us the <i>next available</i> number as next prime. This is known as <i>sieving</i> the natural numbers, so that in the end what we are left with are just primes. |
+ | Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other then itself. That means that starting with 2, <i>for each</i> newly found <i>prime</i> we can <i>eliminate</i> from the rest of the numbers <i>all such numbers</i> that have this prime as their divisor, giving us the <i>next available</i> number as next prime. This is known as <i>sieving</i> the natural numbers, so that in the end what we are left with are just primes. |
=== The Classic Simple Primes Sieve === |
=== The Classic Simple Primes Sieve === |
||
− | Attributed to David Turner <i>(SASL Language Manual, 1975)</i>, the following is a direct translation of that idea, generating a list of all |
+ | Attributed to David Turner <i>(SASL Language Manual, 1975)</i>, the following is a direct translation of that idea, generating a list of all prime numbers: |
<haskell> |
<haskell> |
||
Line 30: | Line 30: | ||
</haskell> |
</haskell> |
||
− | This should only be considered a <i>specification</i>, not a code. When run as is, it is <i>extremely inefficient</i> because it starts up the filters prematurely, immediately after each prime, instead of only after the prime's square has been reached. To be admitted as prime, <i>each number</i> will be <i>tested for divisibility</i> here by all its preceding primes, while just those not greater than its square root would suffice. This means that e.g. to find the <b>1001</b>st prime (<code>7927</code>), <b>1000</b> filters are used, when in fact just the first <b>24</b> are needed (upto <code>89</code>'s filter only). |
+ | This should only be considered a <i>specification</i>, not a <i>code</i>. When run as is, it is <i>extremely inefficient</i> because it starts up the filters prematurely, immediately after each prime, instead of only after the prime's square has been reached. To be admitted as prime, <i>each number</i> will be <i>tested for divisibility</i> here by all its preceding primes, while just those not greater than its square root would suffice. This means that e.g. to find the <b>1001</b>st prime (<code>7927</code>), <b>1000</b> filters are used, when in fact just the first <b>24</b> are needed (upto <code>89</code>'s filter only). |
− | So this |
+ | So this in effect creates a cascade of nested filters in front of the infinite numbers supply, and in <i>extremely premature</i> fashion at that. One way of fixing that would be to <i>postpone</i> the creation of filters until the right moment, as in |
− | + | === Postponed Filters Sieve === |
|
<haskell> |
<haskell> |
||
primes :: [Integer] |
primes :: [Integer] |
||
Line 44: | Line 44: | ||
</haskell> |
</haskell> |
||
− | This only tests odd numbers, and |
+ | This can be seen as an essential framework for all the code to come. It only tests odd numbers, and only by the primes that are needed, for each <i>numbers span</i> between successive squares of primes. To find the <b>1001</b>st prime, the divisibility test is performed by only <b>24</b> nested filters corresponding to the first <b>24</b> odd primes. |
− | Whereas the first version exhibits near O(<math>{n^2}</math>) behavior, this one exhibits near O(<math>{n^{1.5}}</math>) behavior, |
+ | Whereas the first version exhibits near O(<math>{n^2}</math>) behavior, this one exhibits near O(<math>{n^{1.5}}</math>) behavior, with an orders-of-magnitude speedup. |
+ | |||
+ | There is another way for the composites to be found - by generating all the multiples of successive primes, in advance. Any number thus generated will of course be divisible by the corresponding prime. |
||
==== Postponed Multiples Removal i.e. Euler's Sieve ==== |
==== Postponed Multiples Removal i.e. Euler's Sieve ==== |
||
− | Instead of testing each number for divisibility by a prime we can just <i>remove</i> the prime's <i>multiples</i> in advance. We gain in |
+ | Instead of testing <i>each number</i> for divisibility by a prime we can just <i>remove</i> the prime's <i>multiples</i> in advance. We gain in speed because we now get the primes <i>for free</i>, after all the multiples are removed on a particular span, <i>without</i> performing any divisibility tests <i>at all</i>: |
<haskell> |
<haskell> |
||
Line 56: | Line 58: | ||
where |
where |
||
sieve (p:ps) xs = h ++ sieve ps (t `minus` [q+2*p,q+4*p..]) |
sieve (p:ps) xs = h ++ sieve ps (t `minus` [q+2*p,q+4*p..]) |
||
⚫ | |||
where (h,~(_:t)) = span (< q) xs |
where (h,~(_:t)) = span (< q) xs |
||
q = p*p |
q = p*p |
||
Line 69: | Line 70: | ||
This is, in fact, Euler's sieve. |
This is, in fact, Euler's sieve. |
||
+ | |||
+ | ==== Merged Multiples Removal Sieve ==== |
||
+ | <code>(...((s-a)-b)-...)</code> is the same as <code>(s-(a+b+...))</code>, and so we can just remove the merged infinite primes multiples, each starting at its prime's square, from the intial numbers supply. This way we won't even need to explicitly jump to the prime's square because it's where the multiples start anyway: |
||
+ | <haskell> |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
+ | </haskell> |
||
+ | |||
+ | This code is yet faster. Its main deficiency still is that it creates a linear nested merging structure, instead of a tree-like structure. Each multiple produced by a primes has to percolate to the top eventually, so it's better to make its path shorter. It'll have to go through fewer merge nodes this way. |
||
+ | |||
+ | The linearity is imposed by type asymmetry of our <code>(merge' :: a -> b -> b)</code> function, forcing us into the pattern 1+(1+(1+(1+...))), <code>+</code> standing for <code>merge'</code>. |
||
+ | |||
+ | We need to turn it into an associative operation of uniform type <code>(:: a -> a -> a)</code> to be able to freely rearrange the combinations into arbitrary tree-like patterns, as in ((1+1)+(2+2))+(...) etc. The type uniformity is what makes compositionality possible. |
||
+ | |||
⚫ | |||
+ | |||
+ | ==== Treefold Merged Multiples Removal ==== |
||
+ | <haskell> |
||
+ | primes :: [Integer] |
||
+ | primes = 2:[3,5] ++ drop 2 [3,5..] `minus` comps |
||
+ | where |
||
+ | mults = map (\p-> let q=p*p in ([q],tail [q,q+2*p..])) $ primes |
||
+ | comps = fst $ tfold mergeSP (pairwise mergeSP mults) |
||
+ | |||
+ | pairwise f (x:y:ys) = f x y:pairwise f ys |
||
+ | |||
+ | tfold f (a: ~(b: ~(c:xs))) |
||
+ | = (a `f` (b `f` c)) `f` tfold f (pairwise f xs) |
||
+ | |||
+ | mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c |
||
⚫ | |||
+ | where |
||
+ | spMerge :: (Ord a) => [a] -> [a] -> ([a],[a]) |
||
+ | spMerge x@(a:as) y@(b:bs) = case compare a b of |
||
+ | LT -> (a:c,d) where (c,d) = spMerge as y |
||
+ | EQ -> (a:c,d) where (c,d) = spMerge as bs |
||
+ | GT -> (b:c,d) where (c,d) = spMerge x bs |
||
+ | spMerge a [] = ([] ,a) |
||
+ | spMerge [] b = ([] ,b) |
||
+ | </haskell> |
||
+ | The fold used here creates a <code>(2+(2+2))+( (4+(4+4)) + ( (8+(8+8)) + ... ))</code> structure, better adjusted for primes multiples production than <code>1+(2+(4+(8+...)))</code>, used by the"Implicit Heap" code, giving it additional 10% speedup. |
||
+ | |||
+ | This code can be further improved by using the wheel (as described below). |
||
=== More Filtering Sieves === |
=== More Filtering Sieves === |
||
Line 88: | Line 135: | ||
==== Generated Spans, by Nested Filters ==== |
==== Generated Spans, by Nested Filters ==== |
||
+ | The other way to go instead of concentrating on the numbers supply, is to directly work on the successive spans between the primes squares. |
||
+ | |||
This version is a bit faster still, creating <i>158,000 primes</i> (again, GHC-compiled) in the same time as the postponed filters does 100,000 primes: |
This version is a bit faster still, creating <i>158,000 primes</i> (again, GHC-compiled) in the same time as the postponed filters does 100,000 primes: |
||
Line 125: | Line 174: | ||
It can thus produce a few primes e.g. above <code>239812076741689</code>, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions). |
It can thus produce a few primes e.g. above <code>239812076741689</code>, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions). |
||
− | === Sieve of Eratosthenes === |
+ | ==== Multiples Removal on Generated Spans, or Sieve of Eratosthenes ==== |
The divisibility testing should also be considered a specification (as in "no multiples of p"), and not a code per se, because although testing composites is cheap (as most of them will have small factors, so the test is soon aborted), testing prime numbers is costly, and is to be avoided. |
The divisibility testing should also be considered a specification (as in "no multiples of p"), and not a code per se, because although testing composites is cheap (as most of them will have small factors, so the test is soon aborted), testing prime numbers is costly, and is to be avoided. |
||
Line 156: | Line 205: | ||
Compared with the previous version, if compiled with GHC, it is steadily 1.2x slower and takes 3.5x more memory. But interpreted under both GHCi and WinHugs, it runs <i>faster</i>, takes <i>less</i> memory, and has better asymptotic behavior. |
Compared with the previous version, if compiled with GHC, it is steadily 1.2x slower and takes 3.5x more memory. But interpreted under both GHCi and WinHugs, it runs <i>faster</i>, takes <i>less</i> memory, and has better asymptotic behavior. |
||
− | + | === Using Arrays === |
|
+ | ==== Multiples Removal on Generated Spans, With Arrays ==== |
||
The removal of multiples on each segment of odds can be done with arrays, instead of "minus" and "merge": |
The removal of multiples on each segment of odds can be done with arrays, instead of "minus" and "merge": |
||
<haskell> |
<haskell> |
||
Line 175: | Line 225: | ||
Apparently, arrays are <i>very</i> fast. |
Apparently, arrays are <i>very</i> fast. |
||
− | ==== |
+ | ==== Calculating Primes Upto a Given Value ==== |
− | The explicit partitioning into spans between the successive primes squares in both the Sieve of Eratosthenes and Euler's Sieve versions above is actually superfluous and is implicitly achieved when looking for gaps in the directly merged infinite primes multiples streams (as if turning <code>(...((s-a)-b)-...)</code> into <code>(s-(a+b+...))</code> in the Euler's Sieve definition): |
||
<haskell> |
<haskell> |
||
+ | primesToN n = ps |
||
⚫ | |||
+ | where |
||
⚫ | |||
+ | ar = f 5 $ accumArray (\a b->False) True (3,n) [(i,())|i<-[9,15..n]] |
||
⚫ | |||
+ | f p a | q>=n = a |
||
⚫ | |||
+ | | True = if null x then a' else f (head x) a' |
||
− | comps = foldr merge' [] mults |
||
+ | where q=p*p |
||
⚫ | |||
+ | a'=a//[(i,False)|i<-[q,q+p..n]] |
||
+ | x=[i|i<-[p+2,p+4..n], a' !i] |
||
+ | ps = 2:[i|i<-[3,5..n], ar!i] |
||
</haskell> |
</haskell> |
||
+ | ==== Calculating Primes in a Given Range ==== |
||
− | This code is the fastest of all the simple sieves presented so far. Its main deficiency still is that it creates a linear nested merging structure, instead of a tree-like structure. |
||
+ | <haskell> |
||
− | This is because we create the multiples of each prime with one first element separated out, so that we're able to prevent the run-ahead when the multiples lists are being merged. We do that by treating them in an unsymmetrical manner, pulling the first element on the left, first. But that is precisely what prevents us from being able to rearrange them in arbitrary tree-like patterns. It's like we're stuck in the 1+(1+(1+(1+...))) pattern, unable to use e.g. ((1+1)+(2+2))+(...) etc. The type of the (+) in the first one is (a->b->b), and the type in the second one is (a->a->a). The type uniformity is what makes compositionality possible, and we need that to arrage them in a tree-like structure. |
||
+ | primesFromTo a b = ps |
||
− | |||
+ | where |
||
⚫ | |||
+ | a' = max (if even a then a+1 else a) 3 |
||
+ | n = (1+).floor.sqrt.fromInteger $ b |
||
+ | ar = accumArray (\a b->False) True (a',b) |
||
+ | [(i,()) | p<-[3,5..n], i<- dropWhile (<a') [p*p,p*p+2*p..b] ] |
||
+ | ps = (if a<3 then [2] else []) |
||
+ | ++ [i|i<-[a',a'+2..b], ar!i] |
||
+ | </haskell> |
||
=== Implicit Heap === |
=== Implicit Heap === |
Revision as of 14:31, 25 December 2009
Prime Number Resources
In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The smallest prime is thus 2.
Prime Numbers at Wikipedia.
Sieve of Eratosthenes at Wikipedia.
HackageDB packages:
Numbers: An assortment of number theoretic functions.
NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler's Totient.
primes: Efficient, purely functional generation of prime numbers.
Finding Primes
Any natural number is representable as a product of powers of its prime factors, and so a prime has no prime divisors other then itself. That means that starting with 2, for each newly found prime we can eliminate from the rest of the numbers all such numbers that have this prime as their divisor, giving us the next available number as next prime. This is known as sieving the natural numbers, so that in the end what we are left with are just primes.
The Classic Simple Primes Sieve
Attributed to David Turner (SASL Language Manual, 1975), the following is a direct translation of that idea, generating a list of all prime numbers:
primes :: [Integer]
primes = sieve [2..]
where
sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
-- or: filter ((/=0).(`mod`p)) xs
This should only be considered a specification, not a code. When run as is, it is extremely inefficient because it starts up the filters prematurely, immediately after each prime, instead of only after the prime's square has been reached. To be admitted as prime, each number will be tested for divisibility here by all its preceding primes, while just those not greater than its square root would suffice. This means that e.g. to find the 1001st prime (7927
), 1000 filters are used, when in fact just the first 24 are needed (upto 89
's filter only).
So this in effect creates a cascade of nested filters in front of the infinite numbers supply, and in extremely premature fashion at that. One way of fixing that would be to postpone the creation of filters until the right moment, as in
Postponed Filters Sieve
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x<-t, x `rem` p /= 0]
-- or: filter ((/=0).(`rem`p)) t
where (h,~(_:t)) = span (< p*p) xs
This can be seen as an essential framework for all the code to come. It only tests odd numbers, and only by the primes that are needed, for each numbers span between successive squares of primes. To find the 1001st prime, the divisibility test is performed by only 24 nested filters corresponding to the first 24 odd primes.
Whereas the first version exhibits near O() behavior, this one exhibits near O() behavior, with an orders-of-magnitude speedup.
There is another way for the composites to be found - by generating all the multiples of successive primes, in advance. Any number thus generated will of course be divisible by the corresponding prime.
Postponed Multiples Removal i.e. Euler's Sieve
Instead of testing each number for divisibility by a prime we can just remove the prime's multiples in advance. We gain in speed because we now get the primes for free, after all the multiples are removed on a particular span, without performing any divisibility tests at all:
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps (t `minus` [q+2*p,q+4*p..])
where (h,~(_:t)) = span (< q) xs
q = p*p
minus :: (Ord a) => [a] -> [a] -> [a]
minus a@(x:xs) b@(y:ys) = case compare x y of
LT -> x: xs `minus` b
EQ -> xs `minus` ys
GT -> a `minus` ys
This is, in fact, Euler's sieve.
Merged Multiples Removal Sieve
(...((s-a)-b)-...)
is the same as (s-(a+b+...))
, and so we can just remove the merged infinite primes multiples, each starting at its prime's square, from the intial numbers supply. This way we won't even need to explicitly jump to the prime's square because it's where the multiples start anyway:
primes :: [Integer]
primes = [2,3] ++ [5,7..] `minus` foldr merge' [] mults
where
mults = map (\p->let q=p*p in (q,tail[q,q+2*p..])) $ tail primes
merge' (q,qs) xs = q : merge qs xs
This code is yet faster. Its main deficiency still is that it creates a linear nested merging structure, instead of a tree-like structure. Each multiple produced by a primes has to percolate to the top eventually, so it's better to make its path shorter. It'll have to go through fewer merge nodes this way.
The linearity is imposed by type asymmetry of our (merge' :: a -> b -> b)
function, forcing us into the pattern 1+(1+(1+(1+...))), +
standing for merge'
.
We need to turn it into an associative operation of uniform type (:: a -> a -> a)
to be able to freely rearrange the combinations into arbitrary tree-like patterns, as in ((1+1)+(2+2))+(...) etc. The type uniformity is what makes compositionality possible.
The code in the "Implicit Heap" section below improves on that, and is essentially equivalent to using a treefold instead of a standard linear foldr:
Treefold Merged Multiples Removal
primes :: [Integer]
primes = 2:[3,5] ++ drop 2 [3,5..] `minus` comps
where
mults = map (\p-> let q=p*p in ([q],tail [q,q+2*p..])) $ primes
comps = fst $ tfold mergeSP (pairwise mergeSP mults)
pairwise f (x:y:ys) = f x y:pairwise f ys
tfold f (a: ~(b: ~(c:xs)))
= (a `f` (b `f` c)) `f` tfold f (pairwise f xs)
mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c
in (a ++ bc, merge b' d)
where
spMerge :: (Ord a) => [a] -> [a] -> ([a],[a])
spMerge x@(a:as) y@(b:bs) = case compare a b of
LT -> (a:c,d) where (c,d) = spMerge as y
EQ -> (a:c,d) where (c,d) = spMerge as bs
GT -> (b:c,d) where (c,d) = spMerge x bs
spMerge a [] = ([] ,a)
spMerge [] b = ([] ,b)
The fold used here creates a (2+(2+2))+( (4+(4+4)) + ( (8+(8+8)) + ... ))
structure, better adjusted for primes multiples production than 1+(2+(4+(8+...)))
, used by the"Implicit Heap" code, giving it additional 10% speedup.
This code can be further improved by using the wheel (as described below).
More Filtering Sieves
The primes list creation with divisibility testing can be reformulated in a few more ways, using the list of primes as it is being built (a la "circular programming").
Odd numbers, by Trial Division
This is also good for generating a few 100,000s primes (when GHC-compiled as well):
primes :: [Integer]
primes = 2: 3: filter isPrime [5,7..]
where
isPrime n = all (notDivs n) $ takeWhile (\p-> p*p <= n) (tail primes)
notDivs n p = n `mod` p /= 0
Instead of relying on nested filters, it tests each odd number by an explicit list of all the needed prime factors. But for each number tested it re-fetches this list anew which will be the same for the increasing spans of numbers between the successive squares of primes.
Generated Spans, by Nested Filters
The other way to go instead of concentrating on the numbers supply, is to directly work on the successive spans between the primes squares.
This version is a bit faster still, creating 158,000 primes (again, GHC-compiled) in the same time as the postponed filters does 100,000 primes:
primes :: [Integer]
primes = 2: 3: sieve [] (tail primes) 5
where
notDivsBy d n = n `mod` d /= 0
sieve ds (p:ps) x = foldr (filter . notDivsBy) [x, x+2..p*p-2] ds
++ sieve (p:ds) ps (p*p+2)
This one explicitly maintains the list of primes needed for testing each odds span between successive primes squares, which it also explicitly generates. But it tests with nested filter
s, which it repeatedly recreates.
Generated Spans, by List of Primes
The list of primes needed to test each range of odds is actually just the prefix of the primes list itself, of known length, and need not be specifically generated at all. Combined with one-call testing by the explicit list of primes, and direct generation of odds between the successive primes squares, this leads to:
primes :: [Integer]
primes = 2: 3: sieve 0 (tail primes) 5
sieve k (p:ps) x = [x | x<-[x,x+2..p*p-2], and [x`rem`p/=0 | p<-fs]]
-- or: all ((>0).(x`rem`)) fs
++ sieve (k+1) ps (p*p+2)
where fs = take k (tail primes)
It produces about 222,000 primes in the same amount of time, and is good for creating about a million first primes, compiled.
The reason to have sieve
function available separately too is that it can also be used to produce primes above a given number, as in
primesFrom m = sieve (length h) ps $ m`div`2*2+1
where
(h,(_:ps)) = span (<= (floor.sqrt.fromIntegral) m) primes
It can thus produce a few primes e.g. above 239812076741689
, which is a square of the millionth odd prime, without having to compute all the preceding primes (which would number in trillions).
Multiples Removal on Generated Spans, or Sieve of Eratosthenes
The divisibility testing should also be considered a specification (as in "no multiples of p"), and not a code per se, because although testing composites is cheap (as most of them will have small factors, so the test is soon aborted), testing prime numbers is costly, and is to be avoided.
All the filtering versions thus far try to keep the primes among all numbers by testing each number in isolation. Instead, all the relevant primes' multiples can be removed from the corresponding segments of odds, and what's left after that will be just primes:
primes :: [Integer]
primes = 2: 3: sieve [] (tail primes) 5
where
sieve fs (p:ps) x = [x,x+2..q-2] `minus` foldl merge [] mults
++ sieve ((2*p,q+2*p):fs') ps (q+2)
where
q = p*p
(mults,fs') = un_zip $ map mark fs
mark (s,y) = (h,(s,y'))
where (h,~(y':_)) = span(<q)[y,y+s..]
merge :: (Ord a) => [a] -> [a] -> [a]
merge a@(x:xs) b@(y:ys) = case compare x y of
LT -> x: merge xs b
EQ -> x: merge xs ys
GT -> y: merge a ys
merge a b = if null a then b else a
This modifies the preceding sieve to "mark" the odd composites in a given range (instead of testing their divisibility) by generating them - just as a person performing the original sieve of Eratosthenes would do, marking one by one the multiples of the relevant primes.
Compared with the previous version, if compiled with GHC, it is steadily 1.2x slower and takes 3.5x more memory. But interpreted under both GHCi and WinHugs, it runs faster, takes less memory, and has better asymptotic behavior.
Using Arrays
Multiples Removal on Generated Spans, With Arrays
The removal of multiples on each segment of odds can be done with arrays, instead of "minus" and "merge":
primes :: [Integer]
primes = 2: 3: sieve [] (tail primes) 5
where
sieve fs (p:ps) x = [i | i<-[x,x+2..q-2], a!i]
++ sieve ((2*p,q+2*p):fs') ps (q+2)
where
q = p*p
(mults,fs') = un_zip (map mark fs)
mark (s,x) = (h,(s,x'))
where (h,~(x':_)) = span (<q) [x,x+s..]
a = accumArray (\a b->False) True (x,q-2)
[(i,()) | m<-mults, i<-m]
Apparently, arrays are very fast.
Calculating Primes Upto a Given Value
primesToN n = ps
where
ar = f 5 $ accumArray (\a b->False) True (3,n) [(i,())|i<-[9,15..n]]
f p a | q>=n = a
| True = if null x then a' else f (head x) a'
where q=p*p
a'=a//[(i,False)|i<-[q,q+p..n]]
x=[i|i<-[p+2,p+4..n], a' !i]
ps = 2:[i|i<-[3,5..n], ar!i]
Calculating Primes in a Given Range
primesFromTo a b = ps
where
a' = max (if even a then a+1 else a) 3
n = (1+).floor.sqrt.fromInteger $ b
ar = accumArray (\a b->False) True (a',b)
[(i,()) | p<-[3,5..n], i<- dropWhile (<a') [p*p,p*p+2*p..b] ]
ps = (if a<3 then [2] else [])
++ [i|i<-[a',a'+2..b], ar!i]
Implicit Heap
The following is a more efficient prime generator, implementing the sieve of Eratosthenes.
See also the message threads Re: "no-coding" functional data structures via lazyness for more about how merging ordered lists amounts to creating an implicit heap and Re: Code and Perf. Data for Prime Finders for an explanation of the People a
structure that makes it work when tying a knot.
data People a = VIP a (People a) | Crowd [a]
mergeP :: Ord a => People a -> People a -> People a
mergeP (VIP x xt) ys = VIP x $ mergeP xt ys
mergeP (Crowd xs) (Crowd ys) = Crowd $ merge xs ys
mergeP xs@(Crowd (x:xt)) ys@(VIP y yt) = case compare x y of
LT -> VIP x $ mergeP (Crowd xt) ys
EQ -> VIP x $ mergeP (Crowd xt) yt
GT -> VIP y $ mergeP xs yt
merge :: Ord a => [a] -> [a] -> [a]
merge xs@(x:xt) ys@(y:yt) = case compare x y of
LT -> x : merge xt ys
EQ -> x : merge xt yt
GT -> y : merge xs yt
diff xs@(x:xt) ys@(y:yt) = case compare x y of
LT -> x : diff xt ys
EQ -> diff xt yt
GT -> diff xs yt
foldTree :: (a -> a -> a) -> [a] -> a
foldTree f ~(x:xs) = f x $ foldTree f $ pairs xs
where pairs ~(x: ~(y:ys)) = f x y : pairs ys
primes, nonprimes :: [Integer]
primes = 2:3:diff [5,7..] nonprimes
nonprimes = serve . foldTree mergeP . map multiples $ tail primes
where
multiples p = vip [p*p,p*p+2*p..]
vip (x:xs) = VIP x $ Crowd xs
serve (VIP x xs) = x:serve xs
serve (Crowd xs) = xs
nonprimes
effectively implements a heap, exploiting lazy evaluation.
Prime Wheels
The idea of only testing odd numbers can be extended further. For instance, it is a useful fact that every prime number other than 2 and 3 must be of the form or . Thus, we only need to test these numbers:
primes :: [Integer]
primes = 2:3:primes'
where
1:p:candidates = [6*k+r | k <- [0..], r <- [1,5]]
primes' = p : filter isPrime candidates
isPrime n = all (not . divides n) $ takeWhile (\p -> p*p <= n) primes'
divides n p = n `mod` p == 0
Here, primes'
is the list of primes greater than 3 and isPrime
does not test for divisibility by 2 or 3 because the candidates
by construction don't have these numbers as factors. We also need to exclude 1 from the candidates and mark the next one as prime to start the recursion.
Such a scheme to generate candidate numbers first that avoid a given set of primes as divisors is called a prime wheel. Imagine that you had a wheel of circumference 6 to be rolled along the number line. With spikes positioned 1 and 5 units around the circumference, rolling the wheel will prick holes exactly in those positions on the line whose numbers are not divisible by 2 and 3.
A wheel can be represented by its circumference and the spiked positions.
data Wheel = Wheel Integer [Integer]
We prick out numbers by rolling the wheel.
roll (Wheel n rs) = [n*k+r | k <- [0..], r <- rs]
The smallest wheel is the unit wheel with one spike, it will prick out every number.
w0 = Wheel 1 [1]
We can create a larger wheel by rolling a smaller wheel of circumference n
along a rim of circumference p*n
while excluding spike positions at multiples of p
.
nextSize (Wheel n rs) p =
Wheel (p*n) [r' | k <- [0..(p-1)], r <- rs, let r' = n*k+r, r' `mod` p /= 0]
Combining both, we can make wheels that prick out numbers that avoid a given list ds
of divisors.
mkWheel ds = foldl nextSize w0 ds
Now, we can generate prime numbers with a wheel that for instance avoids all multiples of 2, 3, 5 and 7.
primes :: [Integer]
primes = small ++ large
where
1:p:candidates = roll $ mkWheel small
small = [2,3,5,7]
large = p : filter isPrime candidates
isPrime n = all (not . divides n) $ takeWhile (\p -> p*p <= n) large
divides n p = n `mod` p == 0
It's a pretty big wheel with a circumference of 210 and allows us to calculate the first 10000 primes in convenient time.
A fixed size wheel is fine, but how about adapting the wheel size while generating prime numbers? See the functional pearl titled Lazy wheel sieves and spirals of primes for more.
Bitwise prime sieve
Count the number of prime below a given 'n'. Shows fast bitwise arrays, and an example of Template Haskell to defeat your enemies.
{-# OPTIONS -O2 -optc-O -XBangPatterns #-}
module Primes (nthPrime) where
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits
nthPrime :: Int -> Int
nthPrime n = runST (sieve n)
sieve n = do
a <- newArray (3,n) True :: ST s (STUArray s Int Bool)
let cutoff = truncate (sqrt $ fromIntegral n) + 1
go a n cutoff 3 1
go !a !m cutoff !n !c
| n >= m = return c
| otherwise = do
e <- unsafeRead a n
if e then
if n < cutoff then
let loop !j
| j < m = do
x <- unsafeRead a j
when x $ unsafeWrite a j False
loop (j+n)
| otherwise = go a m cutoff (n+2) (c+1)
in loop ( if n < 46340 then n * n else n `shiftL` 1)
else go a m cutoff (n+2) (c+1)
else go a m cutoff (n+2) c
And places in a module:
{-# OPTIONS -fth #-}
import Primes
main = print $( let x = nthPrime 10000000 in [| x |] )
Run as:
$ ghc --make -o primes Main.hs
$ time ./primes
664579
./primes 0.00s user 0.01s system 228% cpu 0.003 total
Using IntSet for a traditional sieve
module Sieve where
import qualified Data.IntSet as I
-- findNext - finds the next member of an IntSet.
findNext c is | I.member c is = c
| c > I.findMax is = error "Ooops. No next number in set."
| otherwise = findNext (c+1) is
-- mark - delete all multiples of n from n*n to the end of the set
mark n is = is I.\\ (I.fromAscList (takeWhile (<=end) (map (n*) [n..])))
where
end = I.findMax is
-- primes - gives all primes up to n
primes n = worker 2 (I.fromAscList [2..n])
where
worker x is
| (x*x) > n = is
| otherwise = worker (findNext (x+1) is) (mark x is)
Testing Primality
Primality Test and Integer Factorization
Given an infinite list of prime numbers, we can implement primality tests and integer factorization:
isPrime n = n > 1 && n == head (primeFactors n)
primeFactors 1 = []
primeFactors n = go n primes
where
go n ps@(p:pt)
| p*p > n = [n]
| n `rem` p == 0 = p : go (n `quot` p) ps
| otherwise = go n pt
Miller-Rabin Primality Test
find2km :: Integral a => a -> (a,a)
find2km n = f 0 n
where
f k m
| r == 1 = (k,m)
| otherwise = f (k+1) q
where (q,r) = quotRem m 2
millerRabinPrimality :: Integer -> Integer -> Bool
millerRabinPrimality n a
| a <= 1 || a >= n-1 =
error $ "millerRabinPrimality: a out of range ("
++ show a ++ " for "++ show n ++ ")"
| n < 2 = False
| even n = False
| b0 == 1 || b0 == n' = True
| otherwise = iter (tail b)
where
n' = n-1
(k,m) = find2km n'
b0 = powMod n a m
b = take (fromIntegral k) $ iterate (squareMod n) b0
iter [] = False
iter (x:xs)
| x == 1 = False
| x == n' = True
| otherwise = iter xs
pow' :: (Num a, Integral b) => (a -> a -> a) -> (a -> a) -> a -> b -> a
pow' _ _ _ 0 = 1
pow' mul sq x' n' = f x' n' 1
where
f x n y
| n == 1 = x `mul` y
| r == 0 = f x2 q y
| otherwise = f x2 q (x `mul` y)
where
(q,r) = quotRem n 2
x2 = sq x
mulMod :: Integral a => a -> a -> a -> a
mulMod a b c = (b * c) `mod` a
squareMod :: Integral a => a -> a -> a
squareMod a b = (b * b) `rem` a
powMod :: Integral a => a -> a -> a -> a
powMod m = pow' (mulMod m) (squareMod m)
External links
- A collection of prime generators; the file "ONeillPrimes.hs" contains one of the fastest pure-Haskell prime generators. WARNING: Don't use the priority queue from that file for your projects: it's broken and works for primes only by a lucky chance.