-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xoroshiro128+ seeding problems, was: The first call to theft_random_choice(t, 4)
returns 2 most of the time.
#39
Comments
Thanks! This is a very helpful bug report. I expect to fix this for the v0.5.0 release (which has been taking quite a while, due to the multicore stuff in #16). I have a pretty good idea what the root cause is. |
Further investigation suggests that it is specific to the new PRNG, when the limit is a power of 2. The PRNG theft used until a recent change on develop (mt64) doesn't exhibit this problem. What I don't understand yet is why my existing test for bias in the distribution didn't catch the issue. Thanks for reporting this. |
Further testing shows that the problem only manifests for the first few values. If I discard the first 30 calls to I suspect that the PRNG bit pool is not reset properly between runs. From what I've seen of your tests, this wouldn't show up because you only test for consistency within a single run, and the first values would be statistically insignificant against the whole run. |
I suspect the reason your test caught this issue and my existing one didn't is that yours was using a freshly seeded PRNG each time, whereas mine was seeding once and continuing to draw -- I think the xoroshiro128+ PRNG added in #29 has less variety in the lower bits, particularly immediately after seeding. This is less of a problem when continuing to draw, because theft will buffer the fully generated (Sorry: I came to the same conclusion, but forgot to send the above comment because of an appointment.) The way theft's shrinking is implemented, it needs to re-seed the PRNG for each new trial, so having too much bias in the starting context is a problem. Discarding the first 32 bits when re-seeding xoroshiro128+ appears to work, but that's kind of a hack, and I'm going to think about whether I want to depend on it. Either way -- thanks for bringing this to my attention, I'll get it resolved before 0.5.0. |
Based on this article by Daniel Lemire and some local testing, I think his stateless splitmix64 may be a better choice for use in theft. It appears to be roughly on par with xoroshiro128+ (both take roughly half as long to finish theft's prng test suite as MT64), but it does significantly better with its tests for statistical bias in early draws, and it's also in the public domain / CC0 1.0. |
Note that when seeding a PRNG with N bits of state, it is usually considered good practice to discard at least the first N bits of output in order to avoid bias introduced by the seeding procedure. I would probably avoid stateless generators, I suspect that they have much poorer statistical qualities (in particular in terms of generating independent sequences by using different seeds). "Stateless" is a poor choice of term to describe them btw: they're just generators with a small state that is externally managed. |
Right, it's only "stateless" in that it is using a passed in state rather than mutating a global or static state. I'm not decided whether I should revert to MT64 (its only downside for my purposes is that it's a bit slower) or switch to splitmix64. I'll continue testing both. I don't expect the next release to be done for a bit still because of a massive restructuring for the multi-core support. |
I've run some tests using the dieharder test suite on 4 random number generators: Mersenne Twister, Xoroshiro128+, Splitmix64 and Combined Tausworthe. I've run the following tests:
As a conclusion, I'd say avoid Splitmix64. If the slower speed is acceptable, Mersenne Twister is a possible choice. Otherwise, Combined Tausworthe is good. Xoroshiro128 is also acceptable provided that you draw and discard the first few values after reseeding. |
Thanks again for this analysis. I'm still working on the changes for multicore support, but this will factor into the next release. |
I've read quite a bit more about PRNGs, and feel like I'm almost approaching an informed decision now. Blackman and Vigna have deprecated The entire PRNG setup will be somewhat different in |
theft_random_choice(t, 4)
returns 2 most of the time.theft_random_choice(t, 4)
returns 2 most of the time.
How to reproduce
Run the following code:
Expected results
1/4 of the tests should fail and 3/4 should succeed. Some might be reported as duplicates (I don't know how theft determines duplicates exactly).
Actual results
Most of the time, theft reports 1 failure and 999 duplicates. Sometimes (rarely) there is 1 success. Example run:
Notes
I get the expected result for
theft_random_choice (t, 5)
:skip several failure reports
A quick check with values from 2 to 20 shows funny results for all powers of two (see attached source):
theft-test.zip
Version information
Running on Linux with gcc version 4.8.2:
Current theft
develop
head: 0619c2aThe text was updated successfully, but these errors were encountered: