# Difference between revisions of "Talk:Prime numbers"

(An interesting thought) |
(move comment to the correct year (d'oh).) |
||

(56 intermediate revisions by 6 users not shown) | |||

Line 1: | Line 1: | ||

+ | == 2007 == |
||

+ | |||

Here's an interesting question: will the program go faster if we replace all those <hask>(n >)</hask> expressions with <hask>(\x -> floor (sqrt n) > x)</hask>? |
Here's an interesting question: will the program go faster if we replace all those <hask>(n >)</hask> expressions with <hask>(\x -> floor (sqrt n) > x)</hask>? |
||

Line 8: | Line 10: | ||

[[User:MathematicalOrchid|MathematicalOrchid]] 16:41, 5 February 2007 (UTC) |
[[User:MathematicalOrchid|MathematicalOrchid]] 16:41, 5 February 2007 (UTC) |
||

+ | |||

+ | a composite can indeed have factors greater than its square root, and indeed most do. what you mean is that a composite will definitely have at least one factor smaller-equal than its square root. |
||

+ | |||

+ | why not use <hask>(\x -> n > x*x)</hask> --[[User:JohannesAhlmann|Johannes Ahlmann]] 21:18, 5 February 2007 (UTC) |
||

+ | |||

+ | LOL! That is ''indeed'' what I meant. |
||

+ | |||

+ | It turns out my comment above is correct - the <hask>takeWhile</hask> filtering in <hask>factors</hask> is in fact unecessary. The function works just fine without it. (Notice I have made some edits to correct the multiple bugs in the <hask>primes</hask> function. Oops!) |
||

+ | |||

+ | Now the only use of <hask>takeWhile</hask> is in the <hask>is_prime</hask> function, which could be changed to 'give up' the search a lot faster and hence confirm large primes with much less CPU time and RAM usage. Maybe I'll wrap my brain around that later. |
||

+ | |||

+ | [[User:MathematicalOrchid|MathematicalOrchid]] 10:17, 6 February 2007 (UTC) |
||

+ | |||

+ | == 2009 == |
||

+ | |||

+ | The section [http://www.haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=24537#Simple_Prime_Sieve_II Simple Prime Sieve II] is not a sieve in the same sense that the first one is. It really implements a primality test as a filter. |
||

+ | |||

+ | A more "sieve-like" version of the simple sieve which exploits the fact that we need not check for primes larger than the square root would be |
||

+ | |||

+ | <hask> |
||

+ | primes :: [Integer] |
||

+ | primes = sieve [2..] |
||

+ | where sieve (p:xs) = p : sieve [x | x<-xs, (x< p*p) || (x `mod` p /= 0)] |
||

+ | </hask> |
||

+ | |||

+ | However, this runs even slower than the original! |
||

+ | |||

+ | [[User:Kapil|Kapil Hari Paranjape]] 06:51, 4 February 2009 (UTC) |
||

+ | |||

+ | ---- |
||

+ | I want to thank Leon P. Smith for showing me the idea of producing the spans of odds directly, for version [http://www.haskell.org/haskellwiki/?title=Prime_numbers&oldid=31589#Simple_Prime_Sieve_IV IV]. I had a combination of span and infinite odds list, as in span (< p*p) [3,5..] etc. That sped it up some 20% more, when GHC-compiled. |
||

+ | |||

+ | The mark-and-comb version that I put under [http://www.haskell.org/haskellwiki/?title=Prime_numbers&oldid=31589#Simple_Sieve_of_Eratosthenes Simple Sieve of Eratosthenes] seems to me very "faithful" to the original (IYKWIM). Strangely it shows exactly same asymptotic behavior when GHC-compiled (tested inside GHCi) as IV. Does this prove that priority queue-based code is <i>better</i> than the original? <i>:)</i> |
||

+ | |||

+ | BTW "unzip" is somehow screwed up inside "haskell" block, I don't know how to fix that. |
||

+ | |||

+ | : not anymore [[User:WillNess|WillNess]] 13:39, 10 February 2011 (UTC) |
||

+ | |||

+ | I've also added the postponed-filters version to the first sieve code to show that the <i>squares</i> optimization <i>does</i> matter and gives <i>huge</i> efficiency advantage just by itself. The <i>odds only</i> trick gives it a dozen or two percent improvement, but it's nothing compared to this 20x massive speedup! |
||

+ | |||

+ | Written in list-comprehension style, it's |
||

+ | |||

+ | <hask> |
||

+ | primes :: [Integer] |
||

+ | primes = 2: 3: sieve (tail primes) [5,7..] where |
||

+ | sieve (p:ps) xs |
||

+ | = h ++ sieve ps [x|x<-t, x `rem` p /= 0] |
||

+ | where (h,(_:t))=span (< p*p) xs |
||

+ | </hask> |
||

+ | |||

+ | Which BTW is <i>faster</i> than the IV version itself, when interpreted in GHCi. So <i>what</i> are we comparing here, code versions or Haskell implementations?? |
||

+ | |||

+ | [[User:WillNess|WillNess]] 10:46, 15 November 2009 (UTC) |
||

+ | |||

+ | I've added the code for Euler's sieve which is just the postponed filters with minimal modification, substituting <code>(t `minus` multiples p)</code> for <code>(filter (nodivs p) t)</code>. |
||

+ | |||

+ | :as it later turned out it was not a Euler sieve, but rather an approximation. [[User:WillNess|WillNess]] 13:27, 10 February 2011 (UTC) |
||

+ | |||

+ | Now it is obvious that <code>(...(((s - a) - b) - c) - ...)</code> is the same as <code>(s - (a + b + c + ...))</code> and this is the next code, the "merged multiples" variation of Euler's sieve. |
||

+ | |||

+ | It is very much like the streamlined and further optimized famous ''Richard Bird's code'' (appearing in [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf Melissa O'Neill's] [http://journals.cambridge.org/download.php?file=%2FJFP%2FJFP19_01%2FS0956796808007004a.pdf&code=1c69f7d6fafad8f6454f2789ffa4ab4d JFP article]), which copyright status is unknown to me, so I couldn't reference it in the main article body. The code as written in the article has the wrong clause order in <code>merge</code>. |
||

+ | |||

+ | :when using <code>compare</code>, clause order doesn't matter. [[User:WillNess|WillNess]] 15:32, 26 January 2010 (UTC) |
||

+ | |||

+ | I've also changed the span pattern-binding to the more correct, lazy pattern, <code>(h,~(_:t))</code>. |
||

+ | |||

+ | [[User:WillNess|WillNess]] 17:10, 5 December 2009 (UTC) |
||

+ | |||

+ | |||

+ | New [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=32749#Treefold_Merged_Multiples_Removal treefolding merge] is inspired by apfelmus's VIP code from [http://www.haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=21251#Implicit_Heap Implicit Heap]; but it uses a different structure, better at primes multiples generation: instead of his 1+(2+(4+(8+...))) it's (2+4)+( (4+8) + ( (8+16) + ...)). The reason I put my version here is to show the natural progression of development from the postponed filters to Euler's sieve to merged multiples to treefold-merged multiples. I.e. it's not some ad-hoc invention; it's ''logical''. It is also step-by-step. |
||

+ | |||

+ | I estimate the total cost of producing primes multiples as Sum (1/p)*d, where d is the leaf's depth, i.e. the amount of merge nodes its produced prime must pass on its way up to the top. The values for cost function correspond very well with the actual time performance of the respective algorithms: it's better by 10%-12% and the performance boost is 10%-12% too. |
||

+ | |||

+ | I will also add this code further improved with the Wheel optimization here. That one beats the PQ-based code from Melissa ONeill's ZIP file by a constant margin of around 20%, its asymptotic behaviour *exactly* the same. |
||

+ | |||

+ | :that was with the [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&diff=prev&oldid=32778#Treefold_Merged_Multiples.2C_with_Wheel incomplete code] which only rolled the wheel on numbers supply, and not on multiples. It had e.g. [11*11,11*13,11*15,11*17...] but of course 11*15 could've been eliminated in advance too (and 11*25, 11*35, 11*45, 11*49, etc...). [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&diff=next&oldid=32778#Treefold_Merged_Multiples.2C_with_Wheel Fixing that] made it run ''twice'' faster than before. [[User:WillNess|WillNess]] 08:33, 29 December 2009 (UTC) |
||

+ | |||

+ | ::these tests were most probably wrong, either on GHCi or without using the -O2 switch [[User:WillNess|WillNess]] 13:27, 10 February 2011 (UTC) |
||

+ | |||

+ | I measure local asymptotics by taking a logBase of run time ratio in base of a problem size ratio. I've settled on testing code performance as interpreted, inside GHCi. Running a compiled code feels more like testing a compiler itself. Too many times I saw two operationally equivalent expressions running in wildly different times. It can't be anything else other than the compiler's quirks, and we're not interested in those, here. :) |
||

+ | |||

+ | Apparently, <i>arrays</i> are <i><b>very</b></i> fast. :) (using accumArray as seen in [http://www.haskell.org/pipermail/haskell-cafe/2007-March/023095.html Thorkil Naur's Haskell cafe post], but still without the Wheel). |
||

+ | |||

+ | [[User:WillNess|WillNess]] 14:47, 25 December 2009 (UTC) |
||

+ | |||

+ | == 2010 == |
||

+ | NEW! A *major* improvement to [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=32868#Treefold_Merged_Multiples_Removal treefold merge] by [http://www.haskell.org/pipermail/haskell-cafe/2010-January/071807.html Daniel Fischer]'s reimplementing spMerge into a properly tail-recursive thingy, fixing a space leak there. :) [[User:WillNess|WillNess]] 06:56, 9 January 2010 (UTC) |
||

+ | |||

+ | :AND his other idea: making `tfold' ''strict'' - which ''really'' brings down the memory consumption. The only caveat: use at least 6 primes to bootstrap the tree-folding. At each tree expansion it needs additional 3*2^n, n=1,... primes, but is producing PI( (PRIMES !! SUM(i=1..n)(3*2^i)) ^ 2) which is way more than that. [[User:WillNess|WillNess]] 10:02, 25 January 2010 (UTC) |
||

+ | |||

+ | :I've also inlined spMerge completely into mergeSP itself (now called unionSP) along the lines of Leon P. Smith's Data.OrdList.mergeAll implementation. Looks yet simpler that way. Haven't tested it though. [[User:WillNess|WillNess]] 23:21, 28 February 2010 (UTC) |
||

+ | |||

+ | :changed Data.OrdList to Data.List.Ordered as per the new version of [http://hackage.haskell.org/package/data-ordlist data-ordlist] package. [[User:WillNess|WillNess]] 07:45, 16 March 2010 (UTC) |
||

+ | |||

+ | [http://www.haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=33126#Euler.27s_Sieve Euler's Sieve] initial one-liner code is due to [http://www.haskell.org/pipermail/haskell-cafe/2010-January/071615.html Daniel Fischer]. |
||

+ | [[User:WillNess|WillNess]] 16:35, 19 January 2010 (UTC) |
||

+ | |||

+ | Here's new streamlined code for immutable arrays: |
||

+ | <haskell> |
||

+ | primes = 2: 3: sieve [] (tail primes) 3 |
||

+ | where |
||

+ | sieve fs (p:ps) x = [i*2 + x | (i,e) <- assocs a, e] |
||

+ | ++ sieve fs' ps (p*p) |
||

+ | where |
||

+ | q = (p*p-x)`div`2 |
||

+ | fs' = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs] |
||

+ | a = accumArray (\ b c -> False) True (1,q-1) |
||

+ | [(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]] |
||

+ | </haskell> |
||

+ | It is twice faster, but more obscure; so I thought I'd keep the previous version on the main page for didactic reasons. [[User:WillNess|WillNess]] 08:43, 18 July 2010 (UTC) |
||

+ | |||

+ | : I've added it now to the main page and restyled the treefold code a bit. '''The test entries on Ideone.com are [http://ideone.com/willness/primes here]'''. [[User:WillNess|WillNess]] 11:00, 6 August 2010 (UTC) |
||

+ | :ST Array code also becomes much faster (3x for 1 mln-th prime in fact) when translated into working on odds only, like the immutable array version - but its memory footprint is also large. [[User:WillNess|WillNess]] 10:34, 13 August 2010 (UTC) |
||

+ | |||

+ | == 2011 == |
||

+ | |||

+ | Augmenting the latest [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=38513#Treefold_Merged_Multiples.2C_with_Wheel Treefold Merged Multiples, with Wheel] version to work with [http://haskell.org/haskellwiki/index.php?title=Prime_numbers&oldid=38513#Implicit_Heap VIPs] does nothing except slow it down [http://ideone.com/r3tbr/ a little bit]. Lazy pattern in join/tfold also starts causing space leak then, so primes' becomes necessary to be more defined upfront to prevent a loop when the tilde is removed. [[User:WillNess|WillNess]] 18:31, 6 February 2011 (UTC) |
||

+ | |||

+ | Compared with Melissa O’Neill’s considerably more complex priority queue-based code it runs (compiled by ghc 6.10.1 with -O2 switch) about 1.6x slower at producing 1 million primes, and about 1.7x slower at 10 to 20 million, with empirical complexity of n^1.23 vs ONeill's n^1.20; both versions having same low and near-constant memory consumption. Ideone.com uses (ghc-6.8.2) and limits run time to 15s; [https://ideone.com/H8Y5V there] the ratio is 1.16x .. 1.25x for generating 1 .. 6 million primes, and the empirical complexities are n^1.24 vs ONeill's n^1.20. [[User:WillNess|WillNess]] 07:55, 13 February 2011 (UTC) |
||

+ | |||

+ | ---- |
||

+ | The <code>primesFrom</code> function in Section 2.1.2.3 does not compile. Specifically, the <code>sieve</code> function is defined to take in a list as its first argument, but is passed a number instead : <code>(length h)</code> . |
||

+ | --[[User:Gphilip|Gphilip]] 15:43, 16 February 2011 (UTC) |
||

+ | |||

+ | : thanks for catching this -- I've tweaked the main function too many times and forgot to change the call appropriately. The arguments' order is now in sync. [[User:WillNess|WillNess]] 00:53, 7 March 2011 (UTC) |
||

+ | |||

+ | == 2012 == |
||

+ | |||

+ | |||

+ | == 2013 == |
||

+ | |||

+ | Ideone.com now apparently works with ghc-7.4.1. Test entry http://ideone.com/n3XzZY shows TMWE works OK, within nearly constant memory. |

## Latest revision as of 10:21, 2 February 2013

## 2007

Here's an interesting question: will the program go faster if we replace all those `(n >)`

expressions with `(\x -> floor (sqrt n) > x)`

?

On one hand, a composite integer cannot possess a factor greater than its square root.

On the other hand, since the list we're looking through contains all possible prime numbers, we are guaranteed to find a factor or an exact match eventually, so do we need the `takeWhile`

at all?

Throwing this over to somebody with a bigger brain than me...

MathematicalOrchid 16:41, 5 February 2007 (UTC)

a composite can indeed have factors greater than its square root, and indeed most do. what you mean is that a composite will definitely have at least one factor smaller-equal than its square root.

why not use `(\x -> n > x*x)`

--Johannes Ahlmann 21:18, 5 February 2007 (UTC)

LOL! That is *indeed* what I meant.

It turns out my comment above is correct - the `takeWhile`

filtering in `factors`

is in fact unecessary. The function works just fine without it. (Notice I have made some edits to correct the multiple bugs in the `primes`

function. Oops!)

Now the only use of `takeWhile`

is in the `is_prime`

function, which could be changed to 'give up' the search a lot faster and hence confirm large primes with much less CPU time and RAM usage. Maybe I'll wrap my brain around that later.

MathematicalOrchid 10:17, 6 February 2007 (UTC)

## 2009

The section Simple Prime Sieve II is not a sieve in the same sense that the first one is. It really implements a primality test as a filter.

A more "sieve-like" version of the simple sieve which exploits the fact that we need not check for primes larger than the square root would be

`primes :: [Integer] primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x<-xs, (x< p*p) || (x `mod` p /= 0)]`

However, this runs even slower than the original!

Kapil Hari Paranjape 06:51, 4 February 2009 (UTC)

I want to thank Leon P. Smith for showing me the idea of producing the spans of odds directly, for version IV. I had a combination of span and infinite odds list, as in span (< p*p) [3,5..] etc. That sped it up some 20% more, when GHC-compiled.

The mark-and-comb version that I put under Simple Sieve of Eratosthenes seems to me very "faithful" to the original (IYKWIM). Strangely it shows exactly same asymptotic behavior when GHC-compiled (tested inside GHCi) as IV. Does this prove that priority queue-based code is *better* than the original? *:)*

BTW "unzip" is somehow screwed up inside "haskell" block, I don't know how to fix that.

- not anymore WillNess 13:39, 10 February 2011 (UTC)

I've also added the postponed-filters version to the first sieve code to show that the *squares* optimization *does* matter and gives *huge* efficiency advantage just by itself. The *odds only* trick gives it a dozen or two percent improvement, but it's nothing compared to this 20x massive speedup!

Written in list-comprehension style, it's

`primes :: [Integer] primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x|x<-t, x `rem` p /= 0] where (h,(_:t))=span (< p*p) xs`

Which BTW is *faster* than the IV version itself, when interpreted in GHCi. So *what* are we comparing here, code versions or Haskell implementations??

WillNess 10:46, 15 November 2009 (UTC)

I've added the code for Euler's sieve which is just the postponed filters with minimal modification, substituting `(t `minus` multiples p)`

for `(filter (nodivs p) t)`

.

- as it later turned out it was not a Euler sieve, but rather an approximation. WillNess 13:27, 10 February 2011 (UTC)

Now it is obvious that `(...(((s - a) - b) - c) - ...)`

is the same as `(s - (a + b + c + ...))`

and this is the next code, the "merged multiples" variation of Euler's sieve.

It is very much like the streamlined and further optimized famous *Richard Bird's code* (appearing in Melissa O'Neill's JFP article), which copyright status is unknown to me, so I couldn't reference it in the main article body. The code as written in the article has the wrong clause order in `merge`

.

- when using
`compare`

, clause order doesn't matter. WillNess 15:32, 26 January 2010 (UTC)

I've also changed the span pattern-binding to the more correct, lazy pattern, `(h,~(_:t))`

.

WillNess 17:10, 5 December 2009 (UTC)

New treefolding merge is inspired by apfelmus's VIP code from Implicit Heap; but it uses a different structure, better at primes multiples generation: instead of his 1+(2+(4+(8+...))) it's (2+4)+( (4+8) + ( (8+16) + ...)). The reason I put my version here is to show the natural progression of development from the postponed filters to Euler's sieve to merged multiples to treefold-merged multiples. I.e. it's not some ad-hoc invention; it's *logical*. It is also step-by-step.

I estimate the total cost of producing primes multiples as Sum (1/p)*d, where d is the leaf's depth, i.e. the amount of merge nodes its produced prime must pass on its way up to the top. The values for cost function correspond very well with the actual time performance of the respective algorithms: it's better by 10%-12% and the performance boost is 10%-12% too.

I will also add this code further improved with the Wheel optimization here. That one beats the PQ-based code from Melissa ONeill's ZIP file by a constant margin of around 20%, its asymptotic behaviour *exactly* the same.

- that was with the incomplete code which only rolled the wheel on numbers supply, and not on multiples. It had e.g. [11*11,11*13,11*15,11*17...] but of course 11*15 could've been eliminated in advance too (and 11*25, 11*35, 11*45, 11*49, etc...). Fixing that made it run
*twice*faster than before. WillNess 08:33, 29 December 2009 (UTC)

- these tests were most probably wrong, either on GHCi or without using the -O2 switch WillNess 13:27, 10 February 2011 (UTC)

I measure local asymptotics by taking a logBase of run time ratio in base of a problem size ratio. I've settled on testing code performance as interpreted, inside GHCi. Running a compiled code feels more like testing a compiler itself. Too many times I saw two operationally equivalent expressions running in wildly different times. It can't be anything else other than the compiler's quirks, and we're not interested in those, here. :)

Apparently, *arrays* are * very* fast. :) (using accumArray as seen in Thorkil Naur's Haskell cafe post, but still without the Wheel).

WillNess 14:47, 25 December 2009 (UTC)

## 2010

NEW! A *major* improvement to treefold merge by Daniel Fischer's reimplementing spMerge into a properly tail-recursive thingy, fixing a space leak there. :) WillNess 06:56, 9 January 2010 (UTC)

- AND his other idea: making `tfold'
*strict*- which*really*brings down the memory consumption. The only caveat: use at least 6 primes to bootstrap the tree-folding. At each tree expansion it needs additional 3*2^n, n=1,... primes, but is producing PI( (PRIMES !! SUM(i=1..n)(3*2^i)) ^ 2) which is way more than that. WillNess 10:02, 25 January 2010 (UTC)

- I've also inlined spMerge completely into mergeSP itself (now called unionSP) along the lines of Leon P. Smith's Data.OrdList.mergeAll implementation. Looks yet simpler that way. Haven't tested it though. WillNess 23:21, 28 February 2010 (UTC)

- changed Data.OrdList to Data.List.Ordered as per the new version of data-ordlist package. WillNess 07:45, 16 March 2010 (UTC)

Euler's Sieve initial one-liner code is due to Daniel Fischer. WillNess 16:35, 19 January 2010 (UTC)

Here's new streamlined code for immutable arrays:

```
primes = 2: 3: sieve [] (tail primes) 3
where
sieve fs (p:ps) x = [i*2 + x | (i,e) <- assocs a, e]
++ sieve fs' ps (p*p)
where
q = (p*p-x)`div`2
fs' = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
a = accumArray (\ b c -> False) True (1,q-1)
[(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]]
```

It is twice faster, but more obscure; so I thought I'd keep the previous version on the main page for didactic reasons. WillNess 08:43, 18 July 2010 (UTC)

- I've added it now to the main page and restyled the treefold code a bit.
**The test entries on Ideone.com are here**. WillNess 11:00, 6 August 2010 (UTC) - ST Array code also becomes much faster (3x for 1 mln-th prime in fact) when translated into working on odds only, like the immutable array version - but its memory footprint is also large. WillNess 10:34, 13 August 2010 (UTC)

## 2011

Augmenting the latest Treefold Merged Multiples, with Wheel version to work with VIPs does nothing except slow it down a little bit. Lazy pattern in join/tfold also starts causing space leak then, so primes' becomes necessary to be more defined upfront to prevent a loop when the tilde is removed. WillNess 18:31, 6 February 2011 (UTC)

Compared with Melissa O’Neill’s considerably more complex priority queue-based code it runs (compiled by ghc 6.10.1 with -O2 switch) about 1.6x slower at producing 1 million primes, and about 1.7x slower at 10 to 20 million, with empirical complexity of n^1.23 vs ONeill's n^1.20; both versions having same low and near-constant memory consumption. Ideone.com uses (ghc-6.8.2) and limits run time to 15s; there the ratio is 1.16x .. 1.25x for generating 1 .. 6 million primes, and the empirical complexities are n^1.24 vs ONeill's n^1.20. WillNess 07:55, 13 February 2011 (UTC)

The `primesFrom`

function in Section 2.1.2.3 does not compile. Specifically, the `sieve`

function is defined to take in a list as its first argument, but is passed a number instead : `(length h)`

.
--Gphilip 15:43, 16 February 2011 (UTC)

- thanks for catching this -- I've tweaked the main function too many times and forgot to change the call appropriately. The arguments' order is now in sync. WillNess 00:53, 7 March 2011 (UTC)

## 2012

## 2013

Ideone.com now apparently works with ghc-7.4.1. Test entry http://ideone.com/n3XzZY shows TMWE works OK, within nearly constant memory.