-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve prime sieve performance #12025
Conversation
|
Nice addition of the mask with a low and high limits. I'm curious about the time/memory performance, to compare it with the Atkin versions or the Eratosthenes one that posted @ironman353 in #11594. |
This is great! @pabloferz no apologies needed, I'm still learning and this is what I wanted to see in order to learn more, thanks! |
Here are some timings. On my machine (the best out of ten):
For large numbers this won't be faster than a well implemented segment sieve, for instance @ironman353's |
@pabloferz, for N <= 10^6 you should use Eratosthenes_1 in my code. If N > 10^6 then full_sieve(). function Atkin_3(MAX::Int)
pi::Int64 = 0;
SQRT_MAX::Int = floor(sqrt(MAX)) + 1
isPrime::Array{Bool} = falses(MAX)
index::Int = 0
k1::Int64 = 0
k::Int64 = 0
xUpper::Float64 = sqrt(MAX / 4) + 1;
x::Int64 = 1
y::Int64 = 0
while x < xUpper
index = 0
k1 = 4 * x * x
y = 1
if x % 3 == 0
while true
k = k1 + y * y
if k >= MAX
break
end
isPrime[k] = !isPrime[k]
index += 1
if index % 2 == 1
y += 4
else
y += 2
end
end
else
while true
k = k1 + y * y
if k >= MAX
break
end
isPrime[k] = !isPrime[k]
y += 2
end
end
x += 1
end
xUpper = sqrt(MAX / 3) + 1
x = 1
y = 0
while x < xUpper
index = 1
k1 = 3 * x * x
y = 2
while true
k = k1 + y * y
if k >= MAX
break
end
isPrime[k] = !isPrime[k]
index += 1
if index % 2 == 1
y += 4
else
y += 2
end
end
x += 2
end
xUpper = sqrt(MAX) + 1
x = 1
y = 0
while x < xUpper
k1 = 3 * x * x
if x % 2 == 0
y = 1
index = 0
else
y = 2
index = 1
end
while y < x
k = k1 - y * y
if k < MAX
isPrime[k] = !isPrime[k]
end
index += 1
if index % 2 == 1
y += 4
else
y += 2
end
end
x += 1
end
isPrime[2] = true
isPrime[3] = true
n2::Int = 0
for n::Int = 5 : SQRT_MAX
if isPrime[n]
n2 = n * n
for j::Int = n2 : n2 : MAX - 1
isPrime[j] = false
end
end
end
for i::Int = 2 : MAX - 1
if isPrime[i]
pi = pi + 1
end
end
return pi
end
@time Atkin_3(1000000000);
# include("E:/Julia/Atkin_3.jl") |
@ironman353 I'm trying to compare non-segmented sieves. Of course a properly segmented version is going to be faster, but I'm not trying to provide that in this PR (at least not yet). |
There is a O(n) sieve of Eratosthenes, but it requires extra memory, so reaching 10^9 is difficult. My java implementation takes just 1.712 seconds to generate all primes below 10^8 (single threading) which is way better than the O(n log log n) sieve of Eratosthenes or different versions of sieve of Atkin. import java.io.*;
import java.util.*;
import java.math.*;
public class linear_sieve {
public static int MAX = 100000000;
public static int SQRT_MAX = (int)Math.floor(Math.sqrt((double)MAX));
public static int[] x = new int[MAX];
public static ArrayList<Integer> primes = new ArrayList<Integer>();
public static void main(String[] args) {
double start = System.currentTimeMillis();
long pi = 0;
for (int i = 2; i < MAX; i++) {
if (x[i] == 0) {
x[i] = i;
primes.add(i);
}
for (int j = 0; j < primes.size() && primes.get(j) <= x[i] && i * primes.get(j) < MAX; j++) {
x[i * primes.get(j)] = primes.get(j);
}
}
System.out.println("C(" + MAX + ") = " + primes.size());
double end = System.currentTimeMillis();
System.out.println("Time elapsed : " + (end - start) / 1000d + " seconds");
}
} |
@ironman353 The Julia |
@ironman353 That optimized sieve of Atkin is really nice and is faster than what I had. Did you came up with it or did you get the algorithm from somewhere else? If it's yours, would you mind if I rewrite the PR based on your code? If it's not, do you know if we can released it under the MIT license? |
@pabloferz, I found the algorithm at http://compoasso.free.fr/primelistweb/page/prime/atkin_en.php |
How can I find what conflicts this PR entails? Would a rebase fix that? |
Rebase on master should fix yes. |
@@ -407,6 +407,7 @@ export | |||
prevpow2, | |||
prevprod, | |||
primes, | |||
primesmask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to add this function, but if we do it needs documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the comments in this discussion #11594 (comment)
(Also, I'm aware this would need documentation, I just haven't added it yet)
I have a segmented version of the function If the proposed changes seem good, I can add the corresponding documentation so this can be merged. |
It would be quite nice to provide an API for finding out things like cache sizes. |
👍 to such an API On the subject of using a
|
I was looking at this for a good 30 seconds trying to figure out what the %#!@ was going on before I noticed the different units. I really hate that we don't print times in a standard unit anymore. |
Amen to that. I keep getting annoyed having to mentally parse the units. Before you just compared two numbers. |
There's still some known performance snags for BitArrays involving GC frames, pointer lookup hoists, and user-defined In this case, since BitArrays are within 15%, I'd try rewriting your |
With
This seems like a good compromise between speed and space. If there are plans on improve |
Best to squash out any intermediate commits that would have failed tests. |
4e76ff6
to
c82aff1
Compare
I believe this PR is ready, so I leave it for you to review it. Just as a remark, the use of a |
Also, this would close #11594, but a lot of the discussion there would have to be moved into at least another issue. |
2a10792
to
844e32f
Compare
If I got #11943 right, this must be now using the new doc system and, hopefully, ready. |
I think so. You can run |
This PR, in my opinion, is ready to be merged, unless we need to wait for #12435 to land first. |
Is |
@tkelman No, it doesn't show up in the RST docs. Where do I put the signature? |
Where in the manual do you want this function to be documented? |
Ok, I tested it and it now generates proper RST docs. |
@StefanKarpinski I'm going to assume you're cool with merging this? |
Improve prime sieve performance
Honestly, I've been hesitating because I'm not really qualified to vet this code. But it's faster and seems to still produce correct results, so it's all good. |
@pabloferz great, this is very educational! ✨ |
Neither am I, but it was a well-done PR and I'm inclined to trust @pabloferz here. I'm not sure this needs to be in base forever, but that can be revisited down the road as part of #5155 and similar. |
I can try to explain the algorithm and prove its correctness somewhere if you feel that would help. |
That would be great! It doesn't need to be a formal proof, just a convincing explanation (which is really what a proof is, but you know what I mean). |
This PR changes the algorithm used in
primesmask
from a Sieve of Atkin to a Sieve of Eratosthenes, based on theidearedesign proposed by @VicDrastik in #11594.The main ideas used are
Most integers are composite, so it is better to set all flags to false initially, then invert for possible primesExcept for 2, 3 all primes are of the from 6k ± 1[7, 11, 13, 17, 19, 23, 29, 31]
(all primes greater than 5 are of this form)._primesmask
(not exported) to constructprimesmask
and to find primes.Additionally, the function
primesmask
is now exported and both this function andprimes
are extended in order to generate all the primes between two integers.As a follow-up, it would be worth trying to write a general segmented version (perhaps a mix between this and the ideas of @ironman353 in #11594 or this http://primesieve.org/segmented_sieve.html).
I apologize to @Ismael-VC who was also working on this in #11927 (his approach was to fix the current implementation of the Sieve of Atkin, but this should be even faster as suggested in #11594).
Finally, this PR addresses the initial issue on #11594 but that turned out to be a discussion on how to improve the whole support for primes. Maybe that one should be separated into smaller issues and grouped into a master issue.
CC @danaj @aiorla @StefanKarpinski