gen_range long-running loop #951

jongiddy · 2020-03-16T14:21:26Z

The following code to create a predictable RNG times-out, after repeatedly re-fetching values.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d7da5d2567c957af40788c1af4902b08

Not sure if this has to do with rejection of out-of-range values but for a range of 3 values, I would expect that there is at most 2 values that need to be rejected, so any RNG that returns different values should work reasonably quickly.

dhardy · 2020-03-16T14:49:21Z

self.0 = self.0.wrapping_add(u32::max_value() / 3 * 2);

You call this an RNG? From your output:

[src/main.rs:9] self.0 = 2863311530
[src/main.rs:9] self.0 = 1431655764
[src/main.rs:9] self.0 = 4294967294
[src/main.rs:9] self.0 = 2863311528
[src/main.rs:9] self.0 = 1431655762
[src/main.rs:9] self.0 = 4294967292
[src/main.rs:9] self.0 = 2863311526
[src/main.rs:9] self.0 = 1431655760

Our sampling algorithms assume that RNGs are reasonably random, and this particular one uses rejection sampling.

In case you want to know why this happens, take a look at the implementation:

                    let v: $u_large = rng.gen();
                    let (hi, lo) = v.wmul(range);
                    if lo <= zone {
                        return low.wrapping_add(hi as $ty);
                    }

Yep, we're now multiplying by 3 modulo u32::MAX. Your algorithm adds approximately 2/3 modulo u32::MAX. Thus the lo part here barely changes.

jongiddy · 2020-03-16T21:51:08Z

I created this specific RNG for testing with deterministic values. However, it appears I triggered a case that converges very slowly. But even with a true RNG, it seems the sampling algorithm could take an indefinite amount of time to return. Where does this algorithm come from?

vks · 2020-03-17T01:29:17Z

But even with a true RNG, it seems the sampling algorithm could take an indefinite amount of time to return.

It could take a large amount of time, but the probability for that is extremely small.

Where does this algorithm come from?

https://en.wikipedia.org/wiki/Rejection_sampling

dhardy · 2020-03-17T09:06:27Z

But your algorithm was created to test what? If distributions can still provide unbiased results given an extremely biased RNG? Not going to work. That distributions can still return results? What does this prove?

jongiddy · 2020-03-17T11:23:54Z

@dhardy Sorry, I think there's some crossed wires here. I didn't create this RNG to test the rand library or how it handles bias.

I created it for a test of my own code. I wanted it to be deterministic to allow me to check the results in my code tests. Initially, the test RNG just returned 1, but that translated to always selecting low, so multiplying by 1/3 of the word size was an attempt to get variety while still being deterministic.

However, that made the test hang and then timeout. If I waited long enough without a timeout, I expect it would eventually reach a value in the zone. Thanks to your comments, I now have a test RNG that finishes quickly.

My residual concern is that it would be valid for a truly random RNG to return values out of the zone for many iterations, so this could be a source of intermittent and irreproducible delays in rand-using code. It's apparent that a trade-off has been made here, favoring speed for the best (and common) case over the rare case that has a significant delay.

It does raise the question of whether this optimization is actually faster on average or just in the best case. My initial off-the-cuff calculation is that for my code example there should only be 4 values returned by gen() in the reject list. Instead, the approximation appears to make that list much larger.

Maybe gen_range should have a comment in the docs along the lines of "The rejection sampling may lead to delays. There are possible but rare sequences that may cause significant delays."

dhardy · 2020-03-17T16:35:32Z

I wanted it to be deterministic to allow me to check the results in my code tests.

This requirement is perfectly fine — not only do we allow seeding most PRNGs with a fixed key, we also document portability and changes across versions. But maybe you mean something other than merely deterministic here?

It was assumed in the design of these algorithms that the PRNG is apparently random. And, as I'm sure you've found by now, it doesn't take very much complexity to avoid issues of this kind. But we're actually have much higher standards for PRNGs we provide, for example the TestU01 and PractRand test batteries.

If you can show that this code has real performance issues with good PRNGs, then we should address that. But if your argument is merely there can be blocking issues with severely biased PRNGs, I don't really see this as an issue, because (a) one must expect that these algorithms may not deliver useful results with such PRNGs, and (b) it is easy to construct a PRNG which exhibits severe blocking behaviour, even if the rejection-zone were optimally small.

jongiddy · 2020-03-20T06:51:18Z

The problem, even when using a good PRNG, is the variability between best-case and worst-case execution time.

If generating a random value takes time T and, for simplicity, if the calculation u32::MAX - ((u32::MAX - range + 1) % range) also takes time T, then using

rand/src/distributions/uniform.rs

Lines 445 to 447 in e2112c4

    
           let unsigned_max: $u_large = ::core::$u_large::MAX; 
        
           let ints_to_reject = (unsigned_max - range + 1) % range; 
        
           unsigned_max - ints_to_reject

almost always takes time 2T. For a range of 3, it takes longer than 2T only 1 time in 4294967296.

The alternative heuristic at

rand/src/distributions/uniform.rs

Line 451 in e2112c4

(range << range.leading_zeros()).wrapping_sub(1)

with $u_large being u32 gives 3 << 30 = 3221225472. Since gen() has 4294967296 possible values, 25% of values will be out of the zone and will cause the value generation to loop. Overall the heuristic takes time 1.33T on average, but with a large variance. 1.5% of gen_range calls will take 4T or longer.

If the range is 4, then 4 << 29 = 2147483648 and 50% of the generated values are rejected, which seems particularly bad when you realise that this case could have zero rejections. This takes time 2T on average, but 12.5% of calls will take 4T or longer.

Whether it is a problem in practice is debatable since, for multiple samples, the docs recommend a different approach. But it perhaps bears more investigation, which I see you've considered previously: #661 (comment)

Have there been any benchmarks created for this code since then? I couldn't see any, but maybe I am missing a call chain.

Would you be happy to look at a PR to benchmark:

10..01 - almost 50% rejection rate, no opportunity to optimize
(any power of two) - currently 50% rejection rate, could be 0%
0..011 - currently 25% rejection rate, could be almost 0%

peteroupc · 2020-03-20T07:18:39Z

There are multiple algorithms for generating random integers in a range from an RNG, some of which are exact ("unbiased"), and some of which have better performance profiles than others. Here is a survey of them:

O'Neill, M.E., "Efficiently Generating a Number in a Range", Jul. 22, 2018.
Lumbroso, J. (@jlumbroso), "Optimal Discrete Uniform Generation from Coin Flips, and Applications", arXiv:1304.1916 [cs.DS], 2013.
Saad et al., "The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions", 2020 (reference code).

See also Fast Loaded Dice Roller Experiments, which compares several sampling algorithms (but these algorithms are weighted, rather than unweighted which is the focus here).

dhardy · 2020-03-20T08:56:49Z

@jongiddy yes, please go ahead and make a PR with a comparator. Apparently I underestimated how interested you are in this topic!

Regarding benchmarks, it's long enough ago that I can't remember what precisely was used. @pitdicker wrote a lot of the uniform sampling code but has since moved on from this project. I suspect you will need to write some benchmarks yourself.

@peteroupc thanks for the sources. I mostly haven't looked further into these on the basis that "we are doing ok already" (#570).

vks · 2021-05-07T21:05:31Z

I believe this was fixed with #985. Please feel free to reopen if it was not.

jlumbroso · 2021-05-07T22:39:14Z

Thanks @peteroupc for the mention! I missed it the first time around.

jongiddy · 2021-05-08T06:19:57Z

#985 added benchmarks to better represent the variety of run times for different range sizes. I then looked at some fixes to improve the times. There was no absolute improvement. My changes tended to rebalance the times, somewhat more evenly and consistently, but with some times getting worse. This is reasonable, since, for example, the full range case is ultra-optimized in the existing code, and improving other cases at a cost to it might be a sensible trade-off.

Since the current code can't be improved without trade-offs, I agree that this issue can be closed. I will look back over my changes and see if there is a combination that I can submit as a PR for consideration.

dhardy closed this as completed Mar 16, 2020

dhardy reopened this Mar 20, 2020

jongiddy mentioned this issue Jun 9, 2020

Add a benchmark for 32-bit UniformInt sample_single #985

Merged

vks closed this as completed May 7, 2021

dhardy mentioned this issue Jul 16, 2021

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gen_range long-running loop #951

gen_range long-running loop #951

jongiddy commented Mar 16, 2020

dhardy commented Mar 16, 2020

jongiddy commented Mar 16, 2020

vks commented Mar 17, 2020

dhardy commented Mar 17, 2020

jongiddy commented Mar 17, 2020

dhardy commented Mar 17, 2020

jongiddy commented Mar 20, 2020

peteroupc commented Mar 20, 2020

dhardy commented Mar 20, 2020

vks commented May 7, 2021

jlumbroso commented May 7, 2021

jongiddy commented May 8, 2021

gen_range long-running loop #951

gen_range long-running loop #951

Comments

jongiddy commented Mar 16, 2020

dhardy commented Mar 16, 2020

jongiddy commented Mar 16, 2020

vks commented Mar 17, 2020

dhardy commented Mar 17, 2020

jongiddy commented Mar 17, 2020

dhardy commented Mar 17, 2020

jongiddy commented Mar 20, 2020

peteroupc commented Mar 20, 2020

dhardy commented Mar 20, 2020

vks commented May 7, 2021

jlumbroso commented May 7, 2021

jongiddy commented May 8, 2021