NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Generating All 32-Bit Primes (Part I) (hnlyman.github.io)
susam 1 days ago [-]
I have a little tool called Prime Grid Explorer at https://susam.net/primegrid.html that I wrote for my own amusement. It can display all primes below 3317044064679887385961981 (an 82-bit integer).

The largest three primes it can show are

  3317044064679887385961783
  3317044064679887385961801
  3317044064679887385961813
Visit https://susam.net/primegrid.html#3317044064679887385961781-2... to see them plotted. Click the buttons labelled '·' and 't' to enable the grid and tooltips, then hover over each circle to see its value.

So essentially it can test all 81-bit integers and some 82-bit integers for primality. It does so using the Miller-Rabin primality test with prime bases derived from https://oeis.org/A014233 (OEIS A014233). The algorithm is implemented in about 80 lines of plain JavaScript. If you view the source, look for the function isPrimeByMR.

The Miller-Rabin test is inherently probabilistic. It tests whether a number is a probable prime by checking whether certain number theoretic congruence relations hold for a given base a. The test can yield false positives, that is, a composite number may pass the test. But it cannot have false negatives, so a number that fails the test is definitely composite. The more bases for which the test holds, the more likely it is that the tested number is prime. It has been computationally verified that there are no false positives below 3317044064679887385961981 when tested with prime bases 2, 3, 5, ..., 41. So although the algorithm is probabilistic, it functions as a deterministic test for all numbers below this bound when tested with these 13 bases.

flancian 1 days ago [-]
Very cool, thank you! Both the visualization tool and the description of Miller-Rabin.

I didn't know an algorithm with these properties existed!

Furthermore, your tool gave me a more intuitive feel of the rate at which primes "thin out" than every treatment of the topic I read previously.

susam 18 hours ago [-]
Thank you for writing this. I'm glad my comment was useful.
senfiaj 1 days ago [-]
There is also the segmented Sieve of Eratosthenes. It has a simlar performance but uses much less memory: the number of prime numbers from 2 to sqrt(n). For example, for n = 1000000, the RAM has to store only 168 additional numbers.

I use this algorithm here https://surenenfiajyan.github.io/prime-explorer/

dahart 1 days ago [-]
The Pseudosquares Sieve will drop the memory requirements much further from sqrt(n) to log^2(n). https://link.springer.com/chapter/10.1007/11792086_15
adgjlsfhk1 20 hours ago [-]
IMO that algorithm is barely a sieve. It is technically pareto optimal, but it seems like in practice it would just end up worse than either an optimized segmented sieve (it is ~log(n)^2 slower) or an O(1) memory method (probable prime test followed by a deterministic prime test) depending on how large and how wide the numbers you're testing are.
hnlyman 1 days ago [-]
Yep, this is the natural way to go, especially considering the possibility of parallel computing and the importance of cache locality, etc.
esterna 21 hours ago [-]
Very nice! One small note: In an Eratosthenes sieve, you can start crossing out numbers from p^2, since i*p for i<p will have already been crossed out in step i at the latest. You can simply replace

  for (size_t i=2; i <= 0xFFFFFFFF / p; i++) {
by

  for (size_t i=p; i <= 0xFFFFFFFF / p; i++) {
adgjlsfhk1 20 hours ago [-]
and you can get another 2x improvement by incrementing i by 2

for (size_t i=p; i <= 0xFFFFFFFF / p; i+=2) {

since every other one will be a multiple of 2

csense 1 days ago [-]
I'm pretty sure you can get rid of the 0xFFFFFFFF / p and get some more speedup by manually implementing the bitarray ops. You can get another boost by using BSF instruction [1] to quickly scan for the next set bit. And you really only need to store odd numbers; storing the even numbers is just wasteful.

You can get even more speedup by taking into account cache effects. When you cross out all the multiples of 3 you use 512MB of bandwidth. Then when you cross out all multiples of 5 you use 512MB more. Then 512MB again when you cross out all multiples of 7. The fundamental problem is that you have many partially generated cache-sized chunks and you cycle through them in order with each prime. I'm pretty sure it's faster if you instead fully generate each chunk and then never access it again. So e.g. if your cache is 128k you create a 128k chunk and cross out multiples of 3, 5, 7, etc. for that 128k chunk. Then you do the next 128k chunk again crossing out multiples of 3, 5, 7, etc. That way you only use ~512MB of memory bandwidth in total instead of 512MB per prime number. (Actually it's only really that high for small primes, it starts becoming less once your primes get bigger than the number of bits in a cache line.)

[1] https://en.wikipedia.org/wiki/Find_first_set

mark-r 1 days ago [-]
You can combine the Sieve and Wheel techniques to reduce the memory requirements dramatically. There's no need to use a bit for numbers that you already know can't be prime. You can find a Python implementation at https://stackoverflow.com/a/62919243/5987
tromp 1 days ago [-]
Or a C implementation at https://tromp.github.io/pearls.html#sieve which runs in well under 10s.
hnlyman 1 days ago [-]
I'd be interested in seeing an explanation of the code, since it looks pretty incomprehensible to me. Per the arbitrary rules I set for myself, I'm not allowed to precompute/hardcode the wheel (looks like this implementation uses a hardcoded wheel of size 2x3x5=30). I wonder if/by how much the performance would suffer by computing and storing the coprime remainders in memory instead of handing them directly to the compiler.
tromp 24 hours ago [-]
I wrote this in a semi obfuscated style to make it fit on one screen. It's indeed a hardcoded 2x3x5 wheel; but I suspect computing all those constants would have made the program significantly longer.
forinti 1 days ago [-]
If you take all 53 8 bit primes, you can use modular arithmetic with a residue base to work with numbers up to

64266330917908644872330635228106713310880186591609208114244758680898150367880703152525200743234420230

This would require 334 bits.

ojciecczas 1 days ago [-]
Do you know the https://en.wikipedia.org/wiki/Sieve_of_Atkin? It's mind-blowing.
1 days ago [-]
dahart 1 days ago [-]
This got me through many of the first 100 problems on Project Euler:

    n = 1000000 # must be even
    sieve = [True] * (n/2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i/2]: sieve[i*i/2::i] = [False] * ((n-i*i-1)/(2*i)+1)
    …
    # x is prime if x%2 and sieve[x/2]
Edit: I guess I irked someone. :/ Yes this is a memory hog, but to me beautiful because it’s so tiny and simple. I never tried very hard, but I wonder if it could be made a real one-liner.
Cold_Miserable 18 hours ago [-]
Heh. 1.Create fast modulus quad M for dword D for the first 2000? 200000? (xM)D 2.Eliminate 0b,101b 3.Divide using vrcp14ss/vdivss with correction. Use fast square root too using rsqrt14.
1 days ago [-]
davispeck 1 days ago [-]
I always like seeing implementations that start from trial division and gradually introduce optimizations like wheel factorization.

It makes the trade-offs much clearer than jumping straight to a complex sieve.

reader9274 1 days ago [-]
Very well written
flysand7 20 hours ago [-]
I enjoyed it too! Enough technical details to be useful, but not too boring.
hnlyman 15 hours ago [-]
Thank you both! I'm glad my writing style appeals to people. This is the first time a substantial number of people have read something I wrote, so it's pretty big for me.
marxisttemp 1 days ago [-]
Why include writing the primes to a file instead of, say, standard output? That increases the optimization space drastically and the IO will eclipse all the careful bitwise math

Does having the primes in a file even allow faster is-prime lookup of a number?

hnlyman 1 days ago [-]
No real reason. It's just an arbitrary task I made for myself. I might have to adjust the goal if writing to the file becomes the lion's share of the runtime, but I'll be pretty happy with myself if that's the project's biggest problem.
logicallee 1 days ago [-]
there are also very fast primality tests that work statistically. It's called Miller-Rabin, I tested in the browser here[1] and it can do them all in about three minutes on my phone.

[1] https://claude.ai/public/artifacts/baa198ed-5a17-4d04-8cef-7...

amelius 1 days ago [-]
What are the false positive/negative rates?
logicallee 1 days ago [-]
for the way this one was done, this witness set has been proven to produce no false positives or negatives for n < 2³⁷.
_alternator_ 1 days ago [-]
Nice. Notably with Miller-Rabin, you can also iterate the test cheaply and get exponentially low false positive/negative rates. I believe that this is how prime factors for RSA keys are usually chosen; choose an error rate below 2^-1000 and sleep extremely soundly knowing that the universe is more likely to evaporate in the next second than that you’ve got a false positive prime.
ZyanWu 1 days ago [-]
> There is a long way to go from here. Kim Walisch's primesieve can generate all 32-bit primes in 0.061s (though this is without writing them to a file)

Oh, come on, just use a bash indirection and be done with it. It takes 1 minute and you had another result for comparison

stainlu 17 hours ago [-]
[flagged]
shablulman 1 days ago [-]
[dead]
useftmly 1 days ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:15:24 GMT+0000 (Coordinated Universal Time) with Vercel.