[stdlib] Lemire’s Nearly Divisionless Random Integer Generation #25286

palimondo · 2019-06-06T16:58:38Z

Implementation of Daniel Lemire’s “Fast Random Integer Generation in Interval”
See https://arxiv.org/pdf/1805.10941.pdf

Implementation of Daniel Lemire’s “Fast Random Integer Generation in Interval” See https://arxiv.org/pdf/1805.10941.pdf

palimondo · 2019-06-06T17:15:47Z

I have attempted a Swift port of @lemire's algorithm:

uint64_t nearlydivisionless ( uint64_t s, uint64_t (* random64 ) ( void ) ) {
  uint64_t x = random64 () ;
  __uint128_t m = ( __uint128_t ) x * ( __uint128_t ) s;
  uint64_t l = ( uint64_t ) m;
  if (l < s) {
    uint64_t t = -s % s;
    while (l < t) {
      x = random64 () ;
      m = ( __uint128_t ) x * ( __uint128_t ) s;
      l = ( uint64_t ) m;
    }
  }
  return m >> 64;
}

I'm not sure if I have correctly interpreted the negation of unsigned integer, I had to rely on https://stackoverflow.com/questions/8026694/c-unary-minus-operator-behavior-with-unsigned-operands
@stephentyrone can you please verify (since I've seen you answer that SO question...)?

@inlinable
public mutating func next<T: FixedWidthInteger & UnsignedInteger>(
  upperBound: T
) -> T {
  _precondition(upperBound != 0, "upperBound cannot be zero.")
  var random: T = next()
  var m = random.multipliedFullWidth(by: upperBound)
  if (m.low < upperBound) {
    let t = (T.max - upperBound + 1) % upperBound
    while (m.low < t) {
      random = next()
      m = random.multipliedFullWidth(by: upperBound)
    }
  }
  return m.high
}

Is there better way to express this in Swift than (T.max - upperBound + 1)

In my local test with WyRand on the worst case(?) bound UInt64.max / 2 + 1 I'm seeing ~4x speed improvement in random number generation.

Open Questions

Is it even possible to change the implementation of next(upperBound:) this way given ABI stability constraints?
How do we evaluate performance of algorithms like this across architectures? (@swift-ci does not perform ARM benchmarking AFAIK, does Apple do this internally?)

palimondo · 2019-06-06T17:15:58Z

@swift-ci please benchmark

I'm expecting to see improvement in RandomShuffleLCG2 benchmark. I don't think RandomShuffleDef2 will pop up as the slow generator should hide the improvement. We'll see...

Azoy · 2019-06-06T17:51:54Z

We are able to change the implementation of next(upperBound:) within ABI constraints. There's currently a notice on the documentation that states:

The algorithm used to create random values may change in a future version of Swift. If you’re passing a generator that results in the same sequence of integer values each time you run your program, that sequence may change when your program is compiled using a different version of Swift.

cc: @stephentyrone

lemire · 2019-06-06T17:52:40Z

Out of instinct, I would replace

 (T.max -upperBound + 1) % upperBound

by

((~upperBound) &+ 1) % upperBound

or

(0 &- upperBound) % upperBound

depending on how clever you feel.

It might be useful to point to the reference as otherwise this function might look like mysterious magic to some: Fast Random Integer Generation in an Interval, ACM Transactions on Modeling and Computer Simulation 29 (1), 2019

Azoy · 2019-06-06T17:53:44Z

I expect to see some performance wins with the floating point random API as well.

palimondo · 2019-06-06T19:27:26Z

Something's messed up. I need to investigate locally why has the benchmark crashed… but it's late over here, so I'll come back to this tomorrow.

palimondo · 2019-06-06T19:37:39Z

@lemire I have, of course, linked to your paper in the PR's description right at the top 😉

I liked the ((~upperBound) &+ 1) % upperBound as it resembles your original C code, but looking at how's the ~ implemented it's just 2 extra and unnecessary operations, so I'll go with (0 &- upperBound) % upperBound for a better unoptimized performance (I understand they all boil down to same instructions in -O).

lemire · 2019-06-06T19:41:59Z

@palimondo Sorry, I missed the first comment.

palimondo · 2019-06-06T19:50:06Z

Hmm.. reading through Melissa O'Neil's post, I see there is still a "Bitmask with Rejection (Unbiased) — Apple's Method" that performed even slightly better…
@stephentyrone is there a way to express mask >>= __builtin_clz(range|1); in Swift?

Azoy · 2019-06-06T19:53:19Z

@palimondo mask >>= (range | 1).leadingZeroBitCount?

lemire · 2019-06-06T20:08:44Z

@palimondo

I thought that Melissa's conclusion was that "The fastest (unbiased) method is Lemire's with an extra tweak from me [Melissa]."

Let us ping her : @imneme

palimondo · 2019-06-06T20:16:24Z

I'm sure you're right, I did not finish reading before posting here… 😇

All this tweaking and performance fine-tuning still hinges on the important question of:

How do we evaluate performance of algorithms like this across architectures? (@swift-ci does not perform ARM benchmarking AFAIK, does Apple do this internally?)

…and I'm guessing that when considering the ARM performance, Swift would decide based on specific performance of these algorithms on A family of processors, not the reference implementations.

imneme · 2019-06-06T20:29:38Z

@lemire, That is what I said. Not sure I can add much beyond that…

Benchmarks are always something of a rabbit hole with more data the deeper you go but typically additional data reveals complexity rather than simple clarity. I really like @lemire's method in general. Frankly, I'd like it if there was a specific mod instruction itself was smart enough to realize when it has no work to do and short-circuit itself rather than needing any tricks. As it is (at least on x86), mod is really just a side effect of div and so it needs to go to the full amount of work, so my optimization is helpful.

(Possibly there is a general compiler optimization for mod, perhaps triggered by profiling!)

lemire · 2019-06-06T22:32:26Z

New blog post where I include ARM benchmarks: Nearly Divisionless Random Integer Generation On Various Systems. I include benchmarks on an ARM box. (I had the numbers inverted in an early version of the blog post: short story is that avoiding division is always good in my tests, using floats is slower.)

palimondo · 2019-06-07T06:08:18Z

@swift-ci please benchmark

swift-ci · 2019-06-07T06:36:10Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.Mixed	258	281	+8.9%	0.92x (?)
ObjectiveCBridgeFromNSString	935	1010	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RandomShuffleLCG2	2272	704	-69.0%	3.23x
RandomIntegersLCG	178	145	-18.5%	1.23x
RandomDoubleLCG	270	220	-18.5%	1.23x
RemoveWhereMoveInts	37	34	-8.1%	1.09x
Array2D	7520	6928	-7.9%	1.09x
RemoveWhereSwapInts	65	60	-7.7%	1.08x
MapReduce	397	368	-7.3%	1.08x
MapReduceAnyCollection	398	369	-7.3%	1.08x (?)

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
RandomValues.o	3006	2894	-3.7%	1.04x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.Mixed	259	280	+8.1%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RandomIntegersLCG	757	152	-79.9%	4.98x
RandomDoubleLCG	886	222	-74.9%	3.99x
RandomShuffleLCG2	2272	768	-66.2%	2.96x
RemoveWhereMoveInts	37	34	-8.1%	1.09x (?)
Array2D	7520	6912	-8.1%	1.09x
RemoveWhereSwapInts	67	62	-7.5%	1.08x (?)
FlattenListLoop	5280	4910	-7.0%	1.08x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
RandomValues.o	2679	2711	+1.2%	0.99x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
RandomIntegersLCG	34992	47766	+36.5%	0.73x
RandomDoubleLCG	45246	56050	+23.9%	0.81x
RandomIntegersDef	44200	53700	+21.5%	0.82x
RandomDoubleDef	69300	82200	+18.6%	0.84x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	284	308	+8.5%	0.92x (?)
ArrayOfGenericPOD2	1322	1433	+8.4%	0.92x (?)
ArrayOfPOD	1065	1146	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringMatch	60100	53200	-11.5%	1.13x (?)
LessSubstringSubstringGenericComparable	48	44	-8.3%	1.09x (?)
DataCreateEmptyArray	174500	160600	-8.0%	1.09x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

See http://www.pcg-random.org/posts/bounded-rands.html#optimizing-modulo

palimondo · 2019-06-07T07:01:28Z

@swift-ci please benchmark

palimondo · 2019-06-07T07:14:28Z

The optimized results seem quite nice. It even looks like the -Osize likes the new code a lot better than original. I dunno what weight to assign to the -Onone regressions…

I'm re-running benchmarks with @imneme's modulo optimization, to see how it fares in Swift.

swift-ci · 2019-06-07T07:28:51Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	7287	10078	+38.3%	0.72x (?)
FlattenListLoop	4332	4830	+11.5%	0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	258	283	+9.7%	0.91x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	261	281	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RandomShuffleLCG2	2272	736	-67.6%	3.09x
RandomIntegersLCG	178	145	-18.5%	1.23x
RandomDoubleLCG	270	220	-18.5%	1.23x
RemoveWhereMoveInts	37	34	-8.1%	1.09x (?)
Array2D	7520	6928	-7.9%	1.09x
RemoveWhereSwapInts	65	60	-7.7%	1.08x (?)
MapReduce	397	368	-7.3%	1.08x (?)
MapReduceAnyCollection	398	369	-7.3%	1.08x (?)

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
RandomValues.o	3006	2894	-3.7%	1.04x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	4426	4909	+10.9%	0.90x (?)
StringBuilderWithLongSubstring	1400	1550	+10.7%	0.90x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	258	282	+9.3%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
RandomIntegersLCG	757	152	-79.9%	4.98x
RandomDoubleLCG	886	222	-74.9%	3.99x
RandomShuffleLCG2	2272	768	-66.2%	2.96x
ObjectiveCBridgeStubFromNSDateRef	4170	3770	-9.6%	1.11x (?)
RemoveWhereMoveInts	37	34	-8.1%	1.09x (?)
Array2D	7520	6912	-8.1%	1.09x
RemoveWhereSwapInts	67	62	-7.5%	1.08x (?)
MapReduce	433	404	-6.7%	1.07x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
RandomShuffle.o	3757	3805	+1.3%	0.99x
RandomValues.o	2679	2711	+1.2%	0.99x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
RandomIntegersLCG	34896	46916	+34.4%	0.74x
RandomDoubleLCG	45108	55980	+24.1%	0.81x
RandomIntegersDef	44000	54000	+22.7%	0.81x
RandomDoubleDef	69200	80900	+16.9%	0.86x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	280	307	+9.6%	0.91x (?)
ArrayOfGenericPOD2	1323	1434	+8.4%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringMatch	60000	53200	-11.3%	1.13x (?)
LessSubstringSubstringGenericComparable	48	44	-8.3%	1.09x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2019-06-07T08:15:17Z

All results were identical with the exception of regressed -O RandomShuffleLCG2, but the difference comes to effectively just 2 μs (that test has legacyFactor of 16, meaning the real measured times were 44 μs and 46 μs), I'd say that was probably within the margin of error.

Looking at the generated assembly for the modulo optimization on Compiler Explorer, it looks like the Swift version is not as tight as the C version for some reason… Thoughts?

lemire · 2019-06-07T12:09:12Z

@palimondo One difference for sure is that Swift is careful regarding the case where the bound is zero (which leads to a division by zero) whereas the C version assumes the programmer checked.

stdlib/public/core/Random.swift

Swift, compared to C, seems unable to generate tightly fused instructions here for some reason (probably the division by zero check?)… removing.

palimondo · 2019-06-09T07:28:11Z

I realize that I have forgotten to use the masking subtraction assignment operator (&-=), and had just t -= upperBound, in the modulo optimization, but correcting that did not fix the generated code. I have tried bracketing the whole optimization in an explicit upperBound !=0 if statement, but nothing I have tried made a difference…

So I'm removing the modulo optimization until someone more clever than me figures out how to make it work in Swift.

palimondo · 2019-06-09T07:28:35Z

@swift-ci benchmark

palimondo · 2019-06-09T07:29:19Z

@swift-ci test

stephentyrone · 2019-06-10T13:31:40Z

So I'm removing the modulo optimization until someone more clever than me figures out how to make it work in Swift.

Please create two swift bugs if you haven't already:

the missed optimization
re-enabling the modulo optimization once we can

stephentyrone

Two things:

a. This method works well for small (sub-word) integer types, but is bad for larger types--we want to switch to rejection sampling (what Melissa calls "Apple's method", though it's so blindingly obvious that Ian and I can't have invented it) for types bigger than a word (this takes us from O(M(n)) down to O(n) complexity). That improvement doesn't need to be part of this PR, but is worth a bug report.

b. multipliedFullWidth currently fatalErrors for Int64 on 32b systems (https://github.com/apple/swift/blob/master/stdlib/public/core/IntegerTypes.swift.gyb#L1521). That's a bug, but it either needs to be fixed before this can go in (I can do that sometime this week), or you need to work around it here for now, probably by conditionally using the old version on 32b systems.

palimondo · 2019-06-10T14:40:50Z

@stephentyrone I'm confused by what you mean in point a). Isn't the word size 64 bits on a 64bit system and Swift does not yet support any bigger types?

I'm also unsure about the bug for missing optimization. If the reason for why O'Neill's optimization didn't work in Swift is the division by zero check as Lemire hypothesized, how would this be expressible without compromising the safety? Introduction of a new operator?

Anyway… I've filed SR-10909 Missed optimization for modulo and will add the architecture fork in next commit to keep the old implementation around for 32-bit platforms.

stephentyrone · 2019-06-10T15:08:58Z

@stephentyrone I'm confused by what you mean in point a). Isn't the word size 64 bits on a 64bit system and Swift does not yet support any bigger types?

In the standard library today, yes. In third party code or in the standard library in the future, there may be arbitrary types conforming to FixedWidthInteger. Now that we have binary and module stability, future third-party types may even be back-deployed against this version of the standard library, so it is good not to let clear inefficiencies be introduced. I'm fine with this one in the short-term, because it's no worse than the current situation, but we should resolve it when we can.

I'm also unsure about the bug for missing optimization. If the reason for why O'Neill's optimization didn't work in Swift is the division by zero check as Lemire hypothesized, how would this be expressible without compromising the safety? Introduction of a new operator?

Given _precondition(upperBound != 0, "upperBound cannot be zero."), the compiler should be able to elide the check; it would have already been detected. I think this may be as simple as changing _precondition to something else--I don't think _precondition gives the needed hint to the compiler, and I recall there being a reason for it--but I'll need to dig in my notes.

Even without the precondition, m.low < upperBound together with the UnsignedInteger constraint should tell the compiler all it needs to know to prove that upperBound != 0. So that's a missed optimization that we should be tracking.

Anyway… I've filed SR-10909 Missed optimization for modulo and will add the architecture fork in next commit to keep the old implementation around for 32-bit platforms.

Thanks!

palimondo · 2019-06-10T15:23:07Z

@stephentyrone I hope you don't mind that I've created a reminder to add the Support multipliedFullWidth on 32-bit systems (SR-10910) for you… 😇

systems until the https://bugs.swift.org/browse/SR-10910 is resolved.

stdlib/public/core/Random.swift

Remove the old OpenBSD generation method, once 32-bit systems support multipliedFullWidth on UInt64.

stephentyrone · 2019-06-10T17:29:10Z

@swift-ci please test

palimondo · 2019-06-10T20:05:30Z

Do we have a way to benchmark this on 32-bit systems once SR-10910 gets resolved?
(Could it be necessary to keep the OpenBSD method for performance reasons on 32-bit platforms?)

stephentyrone · 2019-06-11T00:19:01Z

I'm going to review this once more tomorrow morning for numerical correctness specifically and then merge it assuming it looks good. @airspeedswift and I are discussing cherry-picking for 5.1 as well.

stephentyrone · 2019-06-11T01:14:49Z

Do we have a way to benchmark this on 32-bit systems once SR-10910 gets resolved?
(Could it be necessary to keep the OpenBSD method for performance reasons on 32-bit platforms?)

No, because we'll want to have a rejection sampling path in the long run anyway for bigger-than-word types, which would also cover this case, and the existing modulus-based system is already pretty painful for 64b int on 32b.

stephentyrone · 2019-06-11T02:44:01Z

Ended up having some time tonight. This is fine to merge.

stephentyrone · 2019-06-11T02:48:38Z

n.b. @lemire @imneme thinking about this, I believe that we can actually do somewhat better still by using straight-forward rejection sampling for large ranges and this algorithm for small ranges. This would be an extra branch, but it will be perfectly predicted in most uses (and could be hoisted or specialized for algorithms that produce values in bulk). I'll write up a demo sometime soon.

palimondo · 2019-06-11T02:50:25Z

@stephentyrone Didn't @imneme try that here already? That version was not in the charts of the corresponding article for some reason...

stephentyrone · 2019-06-11T15:29:35Z

@stephentyrone Didn't @imneme try that here already? That version was not in the charts of the corresponding article for some reason...

I would expect the perf tradeoff to be somewhat workload- (and hence benchmark-) dependent. It may be that it wasn't a substantial win for the use case that Melissa was testing, or there may be some issue we still need to understand, but for the top half of upperBound values, the actual selection looks the same with both approaches, and a comparison is certainly cheaper than a full-width multiply followed by a comparison. It likely doesn't matter for int64 due to the fact that the rejection branch is necessarily "unpredictable" with a good RNG, but for custom types where multiply can be much more costly than a branch mispredict (as this algorithm supports in Swift), it will matter.

palimondo · 2019-06-11T16:03:47Z

I meant it more like… "Is this the algorithm you had in mind? Maybe I could transcribe also that one…" 😜

… but I have overlooked my clue:

I'll write up a demo sometime soon.

palimondo · 2019-06-11T20:16:33Z

FWIW, I suspect I have misinterpreted the situation with modulo optimization. The Swift vs. C assembly comparison wasn't fair as I took a too narrow code snippet I suspected to be problematic and took out code that included important hints for the Swift compiler to perform effective optimization. Furthermore, the original code was C++!

I re-did the Godbolt Compiler Explorer comparison again:

Swift
C++

I believe the generated instructions are largely equivalent.

Looking again at the original article by @imneme, the modulo optimization provides visible improvement on Large-Shuffle Benchmark, but on Small-Shuffle Benchmark the version with modulo optimization, Debiased Int Mult (t-opt, m-opt) in the chart, performs the same as Lemire's original version Debiased Int Mult (t-opt).

Large-Shuffle	Small-Shuffle

Our benchmarks in SBS most closely correspond to small shuffle, which would explain why I saw no improvement from modulo optimization on our benchmarks.

@stephentyrone I guess I should come up with a synthetic benchmark to mimic O'Neill's Large-Shuffle and we could re-introduce the modulo optimization together with your rejection sampling (when you get around to it).

tempelmann · 2020-05-17T11:50:39Z

Here's a little critique, and a suggestion at the end. I'm not actively involved in Swift development, not even using it, so this is the best place I could find to add comments.

I am worried that a mistake has been made when designing this class. Sadly it's too late now to correct it.

The mistake is that the doc missed out on explaining how to create a custom next() function, especially if one wants to create a PRNG with a reproducible output (using a seed value). The problem is that the original impl of upperBounds: was apparently using the lower bits of the UInt64, whereas the new one uses the higher bits. That leads to the ugly effect that when someone implemented a PRNG that only delivered a smaller range, e.g. in the lower 32 bits, this used to work before this change when he only pulled small ranges from it. But with this change, the PRNG now fails, always returning zero for randomized subranges.

The issue is shown here in a code example with sample output: https://gist.github.com/nicklockwood/cf1fde3eb24602883c17f94e63e490bc

This potential problem should have been known about - it's clearly documented in any theory of PRNGs. For instance, this WP article conveniently lists examples and shows that many of them do not return good randomized values in their lower bits.

As one comment above says, the API docs only warned that PRNGs might return different values after an implementation change, but they did not consider to break it entirely - yet, this change did. Agreed?

To prevent this, there could have been an extra function that returns the range of significantly randomized values. This could be done by returning a divisor and max value, which in the gist example above would have been: divisor=1 and max= 233280, whereas for the system RNG it's possibly 2^32 and 2^32 (assuming only the upper 32 bit contain good random values). With that, any subrange function, like the upperBounds: one, would use this information to first reduce the value by this range before applying the upperBounds reduction.

In fact, adding such a function might still be valuable as it would (a) make future implementations of custom PRNGs via a custom next() function easier, and (b) would also help other code that uses the original next() result and create valid values from it.

xwu · 2020-05-17T12:21:55Z

That leads to the ugly effect that when someone implemented a PRNG that only delivered a smaller range, e.g. in the lower 32 bits

To prevent this, there could have been an extra function that returns the range of significantly randomized values.

The method next() -> UInt64 is expected to produce 64 bits of usable randomness. If your source of randomness only gives you 32 usable bits at a time, then you need to get data from that source more than once before returning from next(). We can certainly document that requirement more clearly.

tempelmann · 2020-05-17T12:37:34Z

Yes, that would be better than the current affair. Still, people might struggle with coming up with 64 bit random gens, and the need for reproducible RNGs is not uncommon. So please provide some assistance with that if you can.

Also, has this been proven (and postulated) that every bit of the next()'s result properly appearing to be random and not repetitive? So, even the lowest bit has a wide randomness and is not providing a repeating pattern in a short cycle? I'm thinking of random gens that keep giving something like 0, 1, 0, 1 etc. (cycle length 2) or 0, 0, 1, 1 etc. (cycle length 4) on the lowest bit often. Because, the fact that the next upperBounds has been rewritten to prefer the upper bits suggest that the lower bits are, in fact, not as well randomized. Otherwise, I wonder if that code can't be made faster by staying with the lower bits. Might save a few cycles because you can save the down-shift, right?

lemire · 2020-05-17T14:02:41Z

I agree with @xwu's answer and I think it is the correct one though I would point out that the documentation already states: "Returns a value from a uniform, independent distribution of binary data." If you have non-trivial cycles or you are offering a limited amount of randomness, it seems that you are clearly violating the "uniform + independent" ideal.

This potential problem should have been known about - it's clearly documented in any theory of PRNGs.

It is true that, historically, people (and standard libraries) have relied on random number generators that would produce, for example, 32-bit words with only 31 bits of randomness (or less). In fact, the history of pseudo-random number generators is filled with really bad generators.

However, today, we have many really fast pseudo-random generators that produce high quality random words: it is quite easy to have a really fast generator that will produce 64-bit words that will pass hard statistical tests (e.g., consider the PCG family). If you really need it, you can rather easily go for cryptographically strong generator. Sky is the limit! Twenty or thirty years ago, there were difficult to find and difficult to write, but it is now much easier. There is no good reason to default to whatever bad generator your standard C library offers if you care about quality.

I'm thinking of random gens that keep giving something like 0, 1, 0, 1 etc. (cycle length 2) or 0, 0, 1, 1 etc. (cycle length 4) on the lowest bit often.

I think that trying to accomodate obviously flawed pseudo-random number generators, like the ones you describe, is unnecessary because we now have (in 2020) a large collection of high-quality generators.

To prevent this, there could have been an extra function that returns the range of significantly randomized values.

It seems that you are asking for support for legacy generators.

Still, people might struggle with coming up with 64 bit random gens, and the need for reproducible RNGs is not uncommon.

That's a fair and probably correct point.

Some of us have offered some implementations...
https://github.com/lemire/SwiftWyhash

I think that the standard library could be extended with a few options. I, for one, would be willing to provide a couple if the core team called for it.

tempelmann · 2020-05-17T14:14:37Z

Thanks for your clarifications. I must admit I'm a bit behind on this topic, having working on and with PRNGs in the 80s, mostly. Also, thanks for taking this seriously. I only came into this because I had been interested why someone's attempt at subclassing Swift's RNG did work in older Xcode versions but not in recent ones. I tried to figure this out by reading the docs and found that they didn't provide answers to this. So, I think adding some clarifying docs is the right way to resolve this for others in the future.

[stdlib] Lemire’s nearly divisionless random int

65098b2

Implementation of Daniel Lemire’s “Fast Random Integer Generation in Interval” See https://arxiv.org/pdf/1805.10941.pdf

palimondo mentioned this pull request Jun 7, 2019

[benchmark] RandomIntegersLCG future-proofed #25295

Merged

[stdlib] Simpler, optimized expression

54041aa

[stdlib] O'Neill’s modulo optimization

6eb12bd

See http://www.pcg-random.org/posts/bounded-rands.html#optimizing-modulo

xwu reviewed Jun 7, 2019

View reviewed changes

stdlib/public/core/Random.swift Outdated Show resolved Hide resolved

[stdlib] Remove modulo optimization

2595a3e

Swift, compared to C, seems unable to generate tightly fused instructions here for some reason (probably the division by zero check?)… removing.

stephentyrone requested changes Jun 10, 2019

View reviewed changes

[stdlib] Keep OpenBSD debiasing method on 32-bit

a346f03

systems until the https://bugs.swift.org/browse/SR-10910 is resolved.

palimondo requested a review from stephentyrone June 10, 2019 16:06

stephentyrone reviewed Jun 10, 2019

View reviewed changes

stdlib/public/core/Random.swift Outdated Show resolved Hide resolved

[stdlib] TODO FIXME SR-10912

f737292

Remove the old OpenBSD generation method, once 32-bit systems support multipliedFullWidth on UInt64.

palimondo requested a review from stephentyrone June 10, 2019 16:54

stephentyrone approved these changes Jun 10, 2019

View reviewed changes

stephentyrone merged commit 2bc648c into swiftlang:master Jun 11, 2019

This was referenced Jul 5, 2021

[stdlib] Make unavailable protocol implementation @_alwaysEmitIntoClient xwu/swift#1

Closed

[stdlib] Add a comment about use of Lemire's nearly divisionless method #38263

Merged

palimondo mannequin mentioned this pull request Jun 5, 2024

[SR-10910] Support multipliedFullWidth on 32-bit systems #53300

Closed

[stdlib] Lemire’s Nearly Divisionless Random Integer Generation #25286

[stdlib] Lemire’s Nearly Divisionless Random Integer Generation #25286

Uh oh!

Conversation

palimondo commented Jun 6, 2019

Uh oh!

palimondo commented Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Open Questions

Uh oh!

palimondo commented Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Azoy commented Jun 6, 2019

Uh oh!

lemire commented Jun 6, 2019

Uh oh!

Azoy commented Jun 6, 2019

Uh oh!

palimondo commented Jun 6, 2019

Uh oh!

palimondo commented Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Jun 6, 2019

Uh oh!

palimondo commented Jun 6, 2019

Uh oh!

Azoy commented Jun 6, 2019

Uh oh!

lemire commented Jun 6, 2019

Uh oh!

palimondo commented Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imneme commented Jun 6, 2019

Uh oh!

lemire commented Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Jun 7, 2019

Uh oh!

swift-ci commented Jun 7, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

palimondo commented Jun 7, 2019

Uh oh!

palimondo commented Jun 7, 2019

Uh oh!

swift-ci commented Jun 7, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

palimondo commented Jun 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Jun 7, 2019

Uh oh!

Uh oh!

palimondo commented Jun 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Jun 9, 2019

Uh oh!

palimondo commented Jun 9, 2019

Uh oh!

stephentyrone commented Jun 10, 2019

Uh oh!

stephentyrone left a comment

Choose a reason for hiding this comment

Uh oh!

palimondo commented Jun 6, 2019 •

edited

Loading

palimondo commented Jun 6, 2019 •

edited

Loading

palimondo commented Jun 6, 2019 •

edited

Loading

palimondo commented Jun 6, 2019 •

edited

Loading

lemire commented Jun 6, 2019 •

edited

Loading

palimondo commented Jun 7, 2019 •

edited

Loading

palimondo commented Jun 9, 2019 •

edited

Loading

stephentyrone commented Jun 10, 2019 •

edited

Loading

palimondo commented Jun 10, 2019 •

edited

Loading

stephentyrone commented Jun 11, 2019 •

edited

Loading

palimondo commented Jun 11, 2019 •

edited

Loading

palimondo commented Jun 11, 2019 •

edited

Loading

palimondo commented Jun 11, 2019 •

edited

Loading

tempelmann commented May 17, 2020 •

edited

Loading