On one hand, a composite integer cannot possess a factor greater than its square root.On the other hand, since the list we're looking through contains all possible prime numbers, we are guaranteed to find a factor or an exact match eventually, so do we need the
Throwing this over to somebody with a bigger brain than me...
MathematicalOrchid 16:41, 5 February 2007 (UTC)
a composite can indeed have factors greater than its square root, and indeed most do. what you mean is that a composite will definitely have at least one factor smaller-equal than its square root.why not use
LOL! That is indeed what I meant.It turns out my comment above is correct - the
MathematicalOrchid 10:17, 6 February 2007 (UTC)
The section Simple Prime Sieve II is not a sieve in the same sense that the first one is. It really implements a primality test as a filter.
A more "sieve-like" version of the simple sieve which exploits the fact that we need not check for primes larger than the square root would be
primes = sieve [2..]
where sieve (p:xs) = p : sieve [x | x<-xs, (x< p*p) || (x `mod` p /= 0)]
However, this runs even slower than the original!
Kapil Hari Paranjape 06:51, 4 February 2009 (UTC)
I want to thank Leon P. Smith for showing me the idea of producing the spans of odds directly, for version IV. I had a combination of span and infinite odds list, as in span (< p*p) [3,5..] etc. That sped it up some 20% more, when GHC-compiled.
The mark-and-comb version that I put under Simple Sieve of Eratosthenes seems to me very "faithful" to the original (IYKWIM). Strangely it shows exactly same asymptotic behavior when GHC-compiled (tested inside GHCi) as IV. Does this prove that priority queue-based code is better than the original? :)
BTW "unzip" is somehow screwed up inside "haskell" block, I don't know how to fix that.
I've also added the postponed-filters version to the first sieve code to show that the squares optimization does matter and gives huge efficiency advantage just by itself. The odds only trick gives it a dozen or two percent improvement, but it's nothing compared to this 20x massive speedup!
Written in list-comprehension style, it's
primes = 2: 3: sieve (tail primes) [5,7..] where
sieve (p:ps) xs
= h ++ sieve ps [x|x<-t, x `rem` p /= 0]
where (h,(_:t))=span (< p*p) xs
Which BTW is faster than the IV version itself, when interpreted in GHCi. So what are we comparing here, code versions or Haskell implementations??
WillNess 10:46, 15 November 2009 (UTC)
I've added the code for Euler's sieve which is just the postponed filters with minimal modification, substituting
(t `minus` multiples p) for
(filter (nodivs p) t).
Now it is obvious that
(...(((s - a) - b) - c) - ...) is the same as
(s - (a + b + c + ...)) and this is the next code, the "merged multiples" variation of Euler's sieve.
It is very much like the streamlined and further optimized famous Richard Bird's code (appearing in Melissa O'Neill's JFP article), which copyright status is unknown to me, so I couldn't reference it in the main article body. The code as written in the article has the wrong clause order in
merge, and uses
minus instead of more efficient
I've also changed the span pattern-binding to the more correct lazy pattern,
WillNess 17:10, 5 December 2009 (UTC)
New treefolding merge is inspired by apfelmus's VIP code from Implicit Heap; but it uses a different structure, better at primes multiples generation: instead of his 1+(2+(4+(8+...))) it's (2+4)+( (4+8) + ( (8+16) + ...)). The reason I put my version here is to show the natural progression of developement from the postponed filters to Euler's sieve to merged multiples to treefold-merged multiples. I.e. it's not some ad-hoc invention; it's logical. It is also step-by-step.
I estimate the total cost of producing primes multiples as Sum (1/p)*d, where d is the leaf's depth, i.e. the amount of merge nodes its produced prime must pass on its way up to the top. The values for cost function correspond very well with the actual time performance of the respective algorithms: it's better by 10%-12% and the performance boost is 10%-12% too.
I will also add this code further improved with the Wheel optimization here. That one beats the PQ-based code from Melissa ONeill's ZIP file by a constant margin of around 20%, its asympotic behaviour *exactly* the same.
- that was with the incomplete code which only rolled the wheel on numbers supply, and not on multiples. It had e.g. [11*11,11*13,11*15,11*17...] but of course 11*15 could've been eliminated in advance too (and 11*25, 11*35, 11*45, 11*49, etc...). Fixing that made it run twice faster than before. WillNess 08:33, 29 December 2009 (UTC)
I measure local asymptotics by taking a logBase of run time ratio in base of a problem size ratio. I've settled on testing code performance as interpreted, inside GHCi. Running a compiled code feels more like testing a compiler itself. Too many times I saw two operationally equivalent expressions running in wildly different times. It can't be anything else other than the compiler's quirks, and we're not interested in those, here. :)
Apparently, arrays are very fast. :) (using accumArray as seen in Thorkil Naur's Haskell cafe post, but still without the Wheel).
WillNess 14:47, 25 December 2009 (UTC)
- AND his other idea: making `tfold' strict - which really brings down the memory consumption. The only caveat: use at least 6 primes to bootstrap the tree-folding. At each tree expansion it needs additional 3*2^n, n=1,... primes, but is producing PI( (PRIMES !! SUM(i=1..n)(3*2^i)) ^ 2) which is way more than is needed. WillNess 10:02, 25 January 2010 (UTC)