Here's an interesting question: will the program go faster if we replace all those
(n >) expressions with
(\x -> floor (sqrt n) > x)?
On one hand, a composite integer cannot possess a factor greater than its square root.
On the other hand, since the list we're looking through contains all possible prime numbers, we are guaranteed to find a factor or an exact match eventually, so do we need the
takeWhile at all?
Throwing this over to somebody with a bigger brain than me...
MathematicalOrchid 16:41, 5 February 2007 (UTC)
a composite can indeed have factors greater than its square root, and indeed most do. what you mean is that a composite will definitely have at least one factor smaller-equal than its square root.
why not use
(\x -> n > x*x) --Johannes Ahlmann 21:18, 5 February 2007 (UTC)
LOL! That is indeed what I meant.
It turns out my comment above is correct - the
takeWhile filtering in
factors is in fact unecessary. The function works just fine without it. (Notice I have made some edits to correct the multiple bugs in the
primes function. Oops!)
Now the only use of
takeWhile is in the
is_prime function, which could be changed to 'give up' the search a lot faster and hence confirm large primes with much less CPU time and RAM usage. Maybe I'll wrap my brain around that later.
MathematicalOrchid 10:17, 6 February 2007 (UTC)
The section Simple Prime Sieve II is not a sieve in the same sense that the first one is. It really implements a primality test as a filter.
A more "sieve-like" version of the simple sieve which exploits the fact that we need not check for primes larger than the square root would be
primes :: [Integer] primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x<-xs, (x< p*p) || (x `mod` p /= 0)]
However, this runs even slower than the original!
Kapil Hari Paranjape 06:51, 4 February 2009 (UTC)
I want to thank Leon P. Smith for showing me the idea of producing the spans of odds directly, for version IV. I had a combination of span and infinite odds list, as in span (< p*p) [3,5..] etc. That sped it up some 20% more, when GHC-compiled.
The mark-and-comb version that I put under Simple Sieve of Eratosthenes seems to me very "faithful" to the original (IYKWIM). Strangely it shows exactly same asymptotic behavior when GHC-compiled (tested inside GHCi) as IV. Does this prove that priority queue-based code is better than the original? :)
BTW "unzip" is somehow screwed up inside "haskell" block, I don't know how to fix that.
- not anymore WillNess 13:39, 10 February 2011 (UTC)
I've also added the postponed-filters version to the first sieve code to show that the squares optimization does matter and gives huge efficiency advantage just by itself. The odds only trick gives it a dozen or two percent improvement, but it's nothing compared to this 20x massive speedup!
Written in list-comprehension style, it's
primes :: [Integer] primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x|x<-t, x `rem` p /= 0] where (h,(_:t))=span (< p*p) xs
Which BTW is faster than the IV version itself, when interpreted in GHCi. So what are we comparing here, code versions or Haskell implementations??
WillNess 10:46, 15 November 2009 (UTC)
I've added the code for Euler's sieve which is just the postponed filters with minimal modification, substituting
(t `minus` multiples p) for
(filter (nodivs p) t).
- as it later turned out it was not a Euler sieve, but rather an approximation. WillNess 13:27, 10 February 2011 (UTC)
Now it is obvious that
(...(((s - a) - b) - c) - ...) is the same as
(s - (a + b + c + ...)) and this is the next code, the "merged multiples" variation of Euler's sieve.
It is very much like the streamlined and further optimized famous Richard Bird's code (appearing in Melissa O'Neill's JFP article), which copyright status is unknown to me, so I couldn't reference it in the main article body. The code as written in the article has the wrong clause order in
- when using
compare, clause order doesn't matter. WillNess 15:32, 26 January 2010 (UTC)
I've also changed the span pattern-binding to the more correct, lazy pattern,
WillNess 17:10, 5 December 2009 (UTC)
New treefolding merge is inspired by apfelmus's VIP code from Implicit Heap; but it uses a different structure, better at primes multiples generation: instead of his 1+(2+(4+(8+...))) it's (2+4)+( (4+8) + ( (8+16) + ...)). The reason I put my version here is to show the natural progression of development from the postponed filters to Euler's sieve to merged multiples to treefold-merged multiples. I.e. it's not some ad-hoc invention; it's logical. It is also step-by-step.
I estimate the total cost of producing primes multiples as Sum (1/p)*d, where d is the leaf's depth, i.e. the amount of merge nodes its produced prime must pass on its way up to the top. The values for cost function correspond very well with the actual time performance of the respective algorithms: it's better by 10%-12% and the performance boost is 10%-12% too.
I will also add this code further improved with the Wheel optimization here. That one beats the PQ-based code from Melissa ONeill's ZIP file by a constant margin of around 20%, its asymptotic behaviour *exactly* the same.
- that was with the incomplete code which only rolled the wheel on numbers supply, and not on multiples. It had e.g. [11*11,11*13,11*15,11*17...] but of course 11*15 could've been eliminated in advance too (and 11*25, 11*35, 11*45, 11*49, etc...). Fixing that made it run twice faster than before. WillNess 08:33, 29 December 2009 (UTC)
- these tests were most probably wrong, either on GHCi or without using the -O2 switch WillNess 13:27, 10 February 2011 (UTC)
I measure local asymptotics by taking a logBase of run time ratio in base of a problem size ratio. I've settled on testing code performance as interpreted, inside GHCi. Running a compiled code feels more like testing a compiler itself. Too many times I saw two operationally equivalent expressions running in wildly different times. It can't be anything else other than the compiler's quirks, and we're not interested in those, here. :)
Apparently, arrays are very fast. :) (using accumArray as seen in Thorkil Naur's Haskell cafe post, but still without the Wheel).
WillNess 14:47, 25 December 2009 (UTC)
- AND his other idea: making `tfold' strict - which really brings down the memory consumption. The only caveat: use at least 6 primes to bootstrap the tree-folding. At each tree expansion it needs additional 3*2^n, n=1,... primes, but is producing PI( (PRIMES !! SUM(i=1..n)(3*2^i)) ^ 2) which is way more than that. WillNess 10:02, 25 January 2010 (UTC)
- I've also inlined spMerge completely into mergeSP itself (now called unionSP) along the lines of Leon P. Smith's Data.OrdList.mergeAll implementation. Looks yet simpler that way. Haven't tested it though. WillNess 23:21, 28 February 2010 (UTC)
- changed Data.OrdList to Data.List.Ordered as per the new version of data-ordlist package. WillNess 07:45, 16 March 2010 (UTC)
Here's new streamlined code for immutable arrays:
primes = 2: 3: sieve  (tail primes) 3 where sieve fs (p:ps) x = [i*2 + x | (i,e) <- assocs a, e] ++ sieve fs' ps (p*p) where q = (p*p-x)`div`2 fs' = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs] a = accumArray (\ b c -> False) True (1,q-1) [(i,()) | (s,y) <- fs, i <- [y+s,y+s+s..q]]
It is twice faster, but more obscure; so I thought I'd keep the previous version on the main page for didactic reasons. WillNess 08:43, 18 July 2010 (UTC)
- I've added it now to the main page and restyled the treefold code a bit. The test entries on Ideone.com are here. WillNess 11:00, 6 August 2010 (UTC)
- ST Array code also becomes much faster (3x for 1 mln-th prime in fact) when translated into working on odds only, like the immutable array version - but its memory footprint is also large. WillNess 10:34, 13 August 2010 (UTC)
Augmenting the latest Treefold Merged Multiples, with Wheel version to work with VIPs does nothing except slow it down a little bit. Lazy pattern in join/tfold also starts causing space leak then, so primes' becomes necessary to be more defined upfront to prevent a loop when the tilde is removed. WillNess 18:31, 6 February 2011 (UTC)
Compared with Melissa O’Neill’s considerably more complex priority queue-based code it runs (compiled by ghc 6.10.1 with -O2 switch) about 1.6x slower at producing 1 million primes, and about 1.7x slower at 10 to 20 million, with empirical complexity of n^1.23 vs ONeill's n^1.20; both versions having same low and near-constant memory consumption. Ideone.com uses (ghc-6.8.2) and limits run time to 15s; there the ratio is 1.16x .. 1.25x for generating 1 .. 6 million primes, and the empirical complexities are n^1.24 vs ONeill's n^1.20. WillNess 07:55, 13 February 2011 (UTC)
primesFrom function in Section 188.8.131.52 does not compile. Specifically, the
sieve function is defined to take in a list as its first argument, but is passed a number instead :
(length h) .
--Gphilip 15:43, 16 February 2011 (UTC)
- thanks for catching this -- I've tweaked the main function too many times and forgot to change the call appropriately. The arguments' order is now in sync. WillNess 00:53, 7 March 2011 (UTC)
Ideone.com now apparently works with ghc-7.4.1. Test entry http://ideone.com/n3XzZY shows TMWE works OK, within nearly constant memory.