Improve Random (performance, APIs, ...) #47085

stephentoub · 2021-01-17T03:15:14Z

Changes the algorithm used by new Random() to be one that's smaller and faster
Refactors the implementation to make it internally pluggable, and let's us more easily iterate on the algorithm used by new Random() in the future
Moves the existing implementation to be used when it's necessary for back compat
Adds NextInt64 and NextSingle methods

Fixes #26741
Fixes #23198
Fixes #41457
cc: @tannergooding, @danmosemsft, @colgreen

Based on looking at lots of Random usage, new Random() is by far the most common way Random is used in real apps. A seed is generally only specified when either a) it's test code and a repeatable sequence is desired or b) sometimes trying to workaround the poor default seed quality we had in .NET Framework, where the seed was based on Environment.TickCount and thus lots of Randoms created in the same quantum would end up with the same seed. There's also a smattering a derived types, often to add thread-safety with each override locking around a call to the base or some such thing. As such, I focused on optimizing new Random() (where no guarantees are made about the sequence or what methods may or may not delegate to other methods) and moved the existing implementation to be used only if a seed is provided or if a derived type is being instantiated.

Method	Toolchain	Mean	Error	StdDev	Ratio	Allocated
Ctor	\master	1,475.427 ns	13.3713 ns	25.7619 ns	1.00	280 B
Ctor	\pr	141.151 ns	1.5279 ns	1.4292 ns	0.09	72 B

Next	\master	7.604 ns	0.0361 ns	0.0302 ns	1.00	-
Next	\pr	3.252 ns	0.0405 ns	0.0339 ns	0.43	-

NextInt	\master	9.643 ns	0.0452 ns	0.0377 ns	1.00	-
NextInt	\pr	10.746 ns	0.0511 ns	0.0478 ns	1.11	-

NextIntInt	\master	9.910 ns	0.1180 ns	0.1046 ns	1.00	-
NextIntInt	\pr	10.298 ns	0.2229 ns	0.2085 ns	1.04	-

NextBytes	\master	7,212.814 ns	17.1527 ns	15.2054 ns	1.00	-
NextBytes	\pr	292.819 ns	0.9567 ns	0.7989 ns	0.04	-

NextBytesSpan	\master	7,784.761 ns	28.5982 ns	25.3515 ns	1.00	-
NextBytesSpan	\pr	296.508 ns	2.8035 ns	2.4852 ns	0.04	-

NextDouble	\master	8.260 ns	0.0251 ns	0.0235 ns	1.00	-
NextDouble	\pr	3.487 ns	0.0244 ns	0.0216 ns	0.42	-

A few open issues:

I want to add more tests; I added some, but our coverage of Random today isn't great.
I need to run perf tests on Linux (the above is Windows) and on 32-bit (the above is 64-bit).
There's probably a better way to implement NextInt64 in the legacy generator, but I don't know what: suggestions welcome.
@colgreen, after working on this I looked at your repro and see you've invested a lot of time and thought into a lot of this stuff. I'd welcome either a) your thoughts on improvements to make in this PR or b) you submitting a follow-up PR to add improvements on top of this. Also, I ended up using a similar strategy to the one you used with Log2, but I expect that's contributing to the slight regression in those overloads; it'd be good to think of possible alternatives, though given the wins elsewhere, I'm not terribly concerned.

Related to #46890
cc: @tarekgh, @noahfalk

dotnet-issue-labeler · 2021-01-17T03:15:21Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

src/libraries/System.Private.CoreLib/src/System/Random.cs

danmoseley · 2021-01-17T05:20:11Z

Would it be clearer to break out the two impls into their own files?

src/libraries/System.Private.CoreLib/src/System/Random.cs

danmoseley · 2021-01-17T06:21:26Z

With respect to algorithm, it looks like we didn't really do a "survey" of the PRNG options (as we did eg for hashing algorithms a while ago). Since we aren't implementing a seeded API here, which we can always change later, perhaps that doesn't matter very much so long as we're reasonably confident that it's not "worse" since folks seem to be mostly wanting more performance and asking for more randomness.

Looking around at other stacks and utility libraries like Boost, there doesn't seem to be convergence on anything, and where there is, it's not clear whether it's particularly fast. (Or their limitations are accepted, eg., java.Util.Random.nextLong documentation says "this algorithm will not return all possible long values"). I was a bit perturbed by the link in the original issue to a criticism of xoshiro256** by another PRNG purveyor, but on her site the discussion of it seems to conclude its flaws are unlikely to be easily discoverable. So I guess, that algorithm seems good enough for me.

src/libraries/System.Runtime.Extensions/tests/System/Random.cs

src/libraries/System.Private.CoreLib/src/System/Random.cs

colgreen · 2021-01-17T16:44:09Z

With respect to algorithm, it looks like we didn't really do a "survey" of the PRNG options (as we did eg for hashing algorithms a while ago).

There are some standard PRNG test frameworks that run their own 'battery of tests' against lots of PRNGs, and which publish results, e.g.

https://github.com/lemire/testingRNG
https://rurban.github.io/dieharder/QUALITY.html

and possibly (related to hash functions, but a PRNG is essentially serial application of a hash function):

https://github.com/rurban/smhasher

I've been suggesting folks use xoshiro256** for a while now, but based on those results I do wonder if wyrand would be a better choice. I don't think there's a problem with using xoshiro256** to be honest (the tests are quite exhaustive!), but since wyrand is available and appears to be faster, and passes more of those tests, maybe it's worth a little bit of effort looking into that a bit more.

danmoseley · 2021-01-18T00:22:55Z

xoshiro256** implementation is super simple so unless we discover a significant flaw I think we should go ahead and use it and lock in the sweet perf gain. Then we can always change later.

colgreen · 2021-01-18T00:29:00Z

xoshiro256** implementation is super simple so unless we discover a significant flaw I think we should go ahead and use it and lock in the sweet perf gain. Then we can always change later.

I am cool with this.

src/libraries/System.Private.CoreLib/src/System/Random.cs

colgreen · 2021-01-18T22:46:55Z

I believe the rejection sampling loops aren't quite right. Or rather, they can be tightened up to run less loops.

The issue relates to the calculation of bits:

int bits = BitOperations.Log2((uint)maxValue) + 1;

As an example let's take maxValue = 8. We want to sample from the set {0, 1, 2, 3, 4, 5, 6, 7}, noting that maxValue itself is an exclusive bound, and we have maxValue possible discrete samples.

As 8 is a power of two (2^3 == 8), we ought to be generating 3 bits and just returning without having to loop a second time. However, the bits expression above gives a result of 4. I.e. Log2(8) = 3, and then we add one.

I believe the correct/better way is to use a method such as:

/// <summary>
/// Evaluate the binary logarithm of a non-zero Int32, with rounding up of fractional results.
/// I.e. returns the exponent of the smallest power of two that is greater than or equal to the specified number.
/// </summary>
/// <param name="x">The input value.</param>
/// <returns>The exponent of the smallest integral power of two that is greater than or equal to x.</returns>
public static int Log2Ceiling(uint x)
{
    // Log2(x) gives the required power of two, however this is integer Log2() therefore any fractional
    // part in the result is truncated, i.e., the result may be 1 too low. To compensate we add 1 if x
    // is not an exact power of two.
    int exp = BitOperations.Log2(x);

    // Return (exp + 1) if x is non-zero, and not an exact power of two.
    return BitOperations.PopCount(x) > 1 ? exp + 1 : exp;
}

(from https://github.com/colgreen/Redzen/blob/master/Redzen/MathUtils.cs ... and recently fixed, so just be aware if you have an out of date local copy)

Now you can do:

int bits = Log2Ceiling(maxValue)

Which more than halves the number of rejection sampling loops, on average, when maxValue is a power of two.

stephentoub · 2021-01-19T01:28:06Z

I believe the rejection sampling loops aren't quite right. Or rather, they can be tightened up to run less loops.

Specifically for powers of two, yes, the bound could be tighter.

GrabYourPitchforks · 2021-01-19T19:12:45Z

@stephentoub @tannergooding How does this PR interact with #23198 and #41457? Does it resolve those issues?

stephentoub · 2021-01-19T19:18:45Z

From my perspective it resolves both. (With the caveat that those issues still exist for a type derived from Random or constructing a Random with a specific seed.)

stephentoub · 2021-01-20T04:23:35Z

Tweaked things a bit further and added a 32-bit implementation. It’s now looking pretty reasonable on Windows. Still need to measure on Linux.

64-bit:

Method	Toolchain	Mean	Ratio	Allocated
Ctor	master\corerun.exe	1,454.062 ns	1.00	280 B
Ctor	pr\corerun.exe	112.203 ns	0.08	72 B

Next	master\corerun.exe	7.554 ns	1.00	-
Next	pr\corerun.exe	2.336 ns	0.31	-

NextInt	master\corerun.exe	9.503 ns	1.00	-
NextInt	pr\corerun.exe	8.895 ns	0.94	-

NextIntInt	master\corerun.exe	9.924 ns	1.00	-
NextIntInt	pr\corerun.exe	9.197 ns	0.93	-

NextBytes	master\corerun.exe	7,181.847 ns	1.00	-
NextBytes	pr\corerun.exe	259.274 ns	0.04	-

NextBytesSpan	master\corerun.exe	7,792.237 ns	1.00	-
NextBytesSpan	pr\corerun.exe	257.490 ns	0.03	-

NextDouble	master\corerun.exe	8.372 ns	1.00	-
NextDouble	pr\corerun.exe	2.664 ns	0.32	-

32-bit:

Method	Toolchain	Mean	Ratio	Allocated
Ctor	master_x86\corerun.exe	1,680.583 ns	1.00	256 B
Ctor	pr_x86\corerun.exe	259.524 ns	0.15	36 B

Next	master_x86\corerun.exe	8.609 ns	1.00	-
Next	pr_x86\corerun.exe	4.692 ns	0.55	-

NextInt	master_x86\corerun.exe	11.472 ns	1.00	-
NextInt	pr_x86\corerun.exe	10.833 ns	0.94	-

NextIntInt	master_x86\corerun.exe	14.647 ns	1.00	-
NextIntInt	pr_x86\corerun.exe	13.646 ns	0.93	-

NextBytes	master_x86\corerun.exe	7,952.158 ns	1.00	-
NextBytes	pr_x86\corerun.exe	739.291 ns	0.09	-

NextBytesSpan	master_x86\corerun.exe	8,522.058 ns	1.00	-
NextBytesSpan	pr_x86\corerun.exe	731.178 ns	0.09	-

NextDouble	master_x86\corerun.exe	9.083 ns	1.00	-
NextDouble	pr_x86\corerun.exe	7.972 ns	0.88	-

src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems

src/libraries/System.Private.CoreLib/src/System/Random.cs

src/libraries/System.Runtime.Extensions/tests/System/Random.cs

- Changes the algorithm used by `new Random()` to be one that's smaller and faster and produces better results - Refactors the implementation to make it internally pluggable - Moves the existing implementation to be used when it's necessary for back compat - Adds NextInt64 and NextSingle methods

stephentoub · 2021-01-20T15:09:54Z

Linux 64-bit:

Method	Toolchain	Mean	Ratio	Allocated
Ctor	/master/corerun	1,510.009 ns	1.00	280 B
Ctor	/pr/corerun	783.730 ns	0.52	72 B

Next	/master/corerun	7.142 ns	1.00	-
Next	/pr/corerun	2.912 ns	0.41	-

NextInt	/master/corerun	9.890 ns	1.00	-
NextInt	/pr/corerun	9.859 ns	1.00	-

NextIntInt	/master/corerun	10.476 ns	1.00	-
NextIntInt	/pr/corerun	9.471 ns	0.90	-

NextBytes	/master/corerun	7,045.031 ns	1.00	-
NextBytes	/pr/corerun	260.487 ns	0.04	-

NextBytesSpan	/master/corerun	7,288.855 ns	1.00	-
NextBytesSpan	/pr/corerun	263.415 ns	0.04	-

NextDouble	/master/corerun	7.806 ns	1.00	-
NextDouble	/pr/corerun	2.791 ns	0.36	-

src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro128StarStarImpl.cs

src/libraries/System.Private.CoreLib/src/System/Random.LegacyImpl.cs

danmoseley · 2021-01-20T17:24:42Z

Just curious, why are the array-filler overloads so much faster on 32 bit vs old impl, when the Next() like methods are slower? I don't see it.

Next | master_x86\corerun.exe | 8.975 ns | 1.00 | -
Next | pr_x86\corerun.exe | 13.833 ns | 1.54 | -
NextBytes | master_x86\corerun.exe | 7,942.055 ns | 1.00 | -
NextBytes | pr_x86\corerun.exe | 1,599.183 ns | 0.20 | -
NextBytesSpan | master_x86\corerun.exe | 8,523.159 ns | 1.00 | -
NextBytesSpan | pr_x86\corerun.exe | 1,604.615 ns | 0.19

stephentoub · 2021-01-20T17:27:47Z

Just curious, why are the array-filler overloads so much faster on 32 bit vs old impl, when the Next() like methods are slower? I don't see it.

Because they previously were calling Next per byte and now aren't.

danmoseley · 2021-01-20T17:29:46Z

Because they previously were calling Next per byte and now aren't.

https://github.com/dotnet/runtime/pull/47085/files#diff-92b19116b63bd5daf93516a0908297653d54d08263b754c8a1bbb2930e14db1bR167
I was looking at this - I must be spacing out.

stephentoub · 2021-01-20T17:31:38Z

https://github.com/dotnet/runtime/pull/47085/files#diff-92b19116b63bd5daf93516a0908297653d54d08263b754c8a1bbb2930e14db1bR167
I was looking at this - I must be spacing out.

That's the LegacyImpl. Xoshiro128StarStarImpl's is:
https://github.com/dotnet/runtime/pull/47085/files#diff-6c7e83396bfaa4e2fca1ef773ed044545fbec310c5b299e3e930034cea6ad608R199-R217

danmoseley · 2021-01-20T20:25:47Z

I have no more feedback, LGTM but I haven't gone through everything so I can't sign off this moment.

src/libraries/System.Runtime.Extensions/tests/System/Random.cs

src/libraries/System.Private.CoreLib/src/System/Random.ImplBase.cs

GSPP · 2021-01-26T10:58:26Z

A XorShift-style algorithm for Random. Awesome!

For some reason, it is not widely known that really good algorithms for random numbers exist. This family of algorithms is of high quality and super fast. Great, that this is finding its way into .NET.

As far as I understand, things like the Mersenne twister and other popular algorithms are strongly inferior.

For reference, I'm linking my thoughts on this: #6203 (comment)

colgreen · 2021-01-26T11:27:41Z

As far as I understand, things like the Mersenne twister and other popular algorithms are strongly inferior.

I think so, yes. Also, the original xor-shift PRNG as devised by George Marsaglia was a good fast PRNG for its time (early 2000s?), but these new 'xoshiro' derivatives from it pass more statistical tests while maintaining good performance.

There might be an argument in future for switching again to a wyrand PRNG (or a derivative if one emerges), or something else, but I think using xoshiro is the right decision today; it gives a nice performance boost and a much better quality of randomness, i.e. fixes all of the raised issues with the old System.Random to the best of my knowledge.

Ultimately if you want really good random noise you flip to the slower crypto random generators of course, so there's a kind of spectrum of PRNGs, and trade offs, and this work as at a sweet spot I think between performance and RNG quality.

stephentoub added the area-System.Runtime label Jan 17, 2021

stephentoub added this to the 6.0.0 milestone Jan 17, 2021

dotnet-issue-labeler bot added the new-api-needs-documentation label Jan 17, 2021

tannergooding reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

danmoseley reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

danmoseley reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

danmoseley reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Runtime.Extensions/tests/System/Random.cs Outdated Show resolved Hide resolved

gfoidl reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

jkotas reviewed Jan 17, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

stephentoub force-pushed the betterrandom branch 2 times, most recently from 76d70ae to 5b69e72 Compare January 18, 2021 03:51

huoyaoyuan reviewed Jan 18, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

adamsitnik added the tenet-performance Performance related issue label Jan 18, 2021

ChayoteJarocho reviewed Jan 18, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

carlossanlop mentioned this pull request Jan 19, 2021

Introduce FileStreamStrategy as a first step of FileStream rewrite #47128

Merged

stephentoub force-pushed the betterrandom branch from 5b69e72 to 5d4dc64 Compare January 19, 2021 04:15

tarekgh mentioned this pull request Jan 19, 2021

Optimize Activity Ids #46890

Merged

danmoseley reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems Show resolved Hide resolved

danmoseley reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.cs Outdated Show resolved Hide resolved

danmoseley reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Runtime.Extensions/tests/System/Random.cs Outdated Show resolved Hide resolved

stephentoub added 2 commits January 20, 2021 09:25

Minor cleanup and nits

a8ce8a5

stephentoub force-pushed the betterrandom branch from e3288ae to a8ce8a5 Compare January 20, 2021 14:54

stephentoub commented Jan 20, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.Xoshiro128StarStarImpl.cs Outdated Show resolved Hide resolved

danmoseley reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.LegacyImpl.cs Show resolved Hide resolved

Switch Xoshiro128StarStar.NextDouble back to using NextUInt64

e57633c

tannergooding reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Runtime.Extensions/tests/System/Random.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Runtime.Extensions/tests/System/Random.cs Outdated Show resolved Hide resolved

Simplify floating-point values in test

972b5e3

stephentoub force-pushed the betterrandom branch from 844b3fa to 972b5e3 Compare January 20, 2021 22:20

tannergooding reviewed Jan 20, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Random.ImplBase.cs Outdated Show resolved Hide resolved

tannergooding approved these changes Jan 20, 2021

View reviewed changes

Move internal Log2Ceiling to BitOperations

fbbd461

stephentoub merged commit 5ee25aa into dotnet:master Jan 21, 2021

stephentoub deleted the betterrandom branch January 21, 2021 01:27

stephentoub mentioned this pull request Jan 22, 2021

Delete custom FastRandom from ThreadPool #47338

Merged

pentp mentioned this pull request Feb 3, 2021

Proposal: add a static, thread-static Random.Current instance #43887

Closed

ghost locked as resolved and limited conversation to collaborators Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Random (performance, APIs, ...) #47085

Improve Random (performance, APIs, ...) #47085

stephentoub commented Jan 17, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Jan 17, 2021

danmoseley commented Jan 17, 2021

danmoseley commented Jan 17, 2021

colgreen commented Jan 17, 2021

danmoseley commented Jan 18, 2021

colgreen commented Jan 18, 2021

colgreen commented Jan 18, 2021 •

edited

Loading

stephentoub commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021

stephentoub commented Jan 19, 2021

stephentoub commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

GSPP commented Jan 26, 2021 •

edited

Loading

colgreen commented Jan 26, 2021 •

edited

Loading

Improve Random (performance, APIs, ...) #47085

Improve Random (performance, APIs, ...) #47085

Conversation

stephentoub commented Jan 17, 2021 • edited Loading

dotnet-issue-labeler bot commented Jan 17, 2021

danmoseley commented Jan 17, 2021

danmoseley commented Jan 17, 2021

colgreen commented Jan 17, 2021

danmoseley commented Jan 18, 2021

colgreen commented Jan 18, 2021

colgreen commented Jan 18, 2021 • edited Loading

stephentoub commented Jan 19, 2021

GrabYourPitchforks commented Jan 19, 2021

stephentoub commented Jan 19, 2021

stephentoub commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

stephentoub commented Jan 20, 2021

danmoseley commented Jan 20, 2021

GSPP commented Jan 26, 2021 • edited Loading

colgreen commented Jan 26, 2021 • edited Loading

stephentoub commented Jan 17, 2021 •

edited

Loading

colgreen commented Jan 18, 2021 •

edited

Loading

GSPP commented Jan 26, 2021 •

edited

Loading

colgreen commented Jan 26, 2021 •

edited

Loading