-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gen_range long-running loop #951
Comments
self.0 = self.0.wrapping_add(u32::max_value() / 3 * 2); You call this an RNG? From your output:
Our sampling algorithms assume that RNGs are reasonably random, and this particular one uses rejection sampling. In case you want to know why this happens, take a look at the implementation: let v: $u_large = rng.gen();
let (hi, lo) = v.wmul(range);
if lo <= zone {
return low.wrapping_add(hi as $ty);
} Yep, we're now multiplying by 3 modulo |
I created this specific RNG for testing with deterministic values. However, it appears I triggered a case that converges very slowly. But even with a true RNG, it seems the sampling algorithm could take an indefinite amount of time to return. Where does this algorithm come from? |
It could take a large amount of time, but the probability for that is extremely small.
|
But your algorithm was created to test what? If distributions can still provide unbiased results given an extremely biased RNG? Not going to work. That distributions can still return results? What does this prove? |
@dhardy Sorry, I think there's some crossed wires here. I didn't create this RNG to test the I created it for a test of my own code. I wanted it to be deterministic to allow me to check the results in my code tests. Initially, the test RNG just returned 1, but that translated to always selecting However, that made the test hang and then timeout. If I waited long enough without a timeout, I expect it would eventually reach a value in the zone. Thanks to your comments, I now have a test RNG that finishes quickly. My residual concern is that it would be valid for a truly random RNG to return values out of the zone for many iterations, so this could be a source of intermittent and irreproducible delays in It does raise the question of whether this optimization is actually faster on average or just in the best case. My initial off-the-cuff calculation is that for my code example there should only be 4 values returned by Maybe |
This requirement is perfectly fine — not only do we allow seeding most PRNGs with a fixed key, we also document portability and changes across versions. But maybe you mean something other than merely deterministic here? It was assumed in the design of these algorithms that the PRNG is apparently random. And, as I'm sure you've found by now, it doesn't take very much complexity to avoid issues of this kind. But we're actually have much higher standards for PRNGs we provide, for example the TestU01 and PractRand test batteries. If you can show that this code has real performance issues with good PRNGs, then we should address that. But if your argument is merely there can be blocking issues with severely biased PRNGs, I don't really see this as an issue, because (a) one must expect that these algorithms may not deliver useful results with such PRNGs, and (b) it is easy to construct a PRNG which exhibits severe blocking behaviour, even if the rejection-zone were optimally small. |
The problem, even when using a good PRNG, is the variability between best-case and worst-case execution time. If generating a random value takes time rand/src/distributions/uniform.rs Lines 445 to 447 in e2112c4
2T . For a range of 3, it takes longer than 2T only 1 time in 4294967296.
The alternative heuristic at rand/src/distributions/uniform.rs Line 451 in e2112c4
$u_large being u32 gives 3 << 30 = 3221225472 . Since gen() has 4294967296 possible values, 25% of values will be out of the zone and will cause the value generation to loop. Overall the heuristic takes time 1.33T on average, but with a large variance. 1.5% of gen_range calls will take 4T or longer.
If the range is 4, then Whether it is a problem in practice is debatable since, for multiple samples, the docs recommend a different approach. But it perhaps bears more investigation, which I see you've considered previously: #661 (comment) Have there been any benchmarks created for this code since then? I couldn't see any, but maybe I am missing a call chain. Would you be happy to look at a PR to benchmark:
|
There are multiple algorithms for generating random integers in a range from an RNG, some of which are exact ("unbiased"), and some of which have better performance profiles than others. Here is a survey of them:
See also Fast Loaded Dice Roller Experiments, which compares several sampling algorithms (but these algorithms are weighted, rather than unweighted which is the focus here). |
@jongiddy yes, please go ahead and make a PR with a comparator. Apparently I underestimated how interested you are in this topic! Regarding benchmarks, it's long enough ago that I can't remember what precisely was used. @pitdicker wrote a lot of the uniform sampling code but has since moved on from this project. I suspect you will need to write some benchmarks yourself. @peteroupc thanks for the sources. I mostly haven't looked further into these on the basis that "we are doing ok already" (#570). |
I believe this was fixed with #985. Please feel free to reopen if it was not. |
Thanks @peteroupc for the mention! I missed it the first time around. |
#985 added benchmarks to better represent the variety of run times for different range sizes. I then looked at some fixes to improve the times. There was no absolute improvement. My changes tended to rebalance the times, somewhat more evenly and consistently, but with some times getting worse. This is reasonable, since, for example, the full range case is ultra-optimized in the existing code, and improving other cases at a cost to it might be a sensible trade-off. Since the current code can't be improved without trade-offs, I agree that this issue can be closed. I will look back over my changes and see if there is a combination that I can submit as a PR for consideration. |
The following code to create a predictable RNG times-out, after repeatedly re-fetching values.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d7da5d2567c957af40788c1af4902b08
Not sure if this has to do with rejection of out-of-range values but for a range of 3 values, I would expect that there is at most 2 values that need to be rejected, so any RNG that returns different values should work reasonably quickly.
The text was updated successfully, but these errors were encountered: