Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] changing the Random PRNG to a splittable algorithm from PRINGO #28

Closed
wants to merge 2 commits into from

Conversation

gasche
Copy link
Member

@gasche gasche commented Sep 5, 2021

[RFC text copied below]


RFC: Change the stdlib Random implementation to a splittable PRNG from PRINGO (SplitMix or ChaCha)

This RFC proposes to replace the current implementation of the standard library Random module by one of the "splittable" PRNG proposed by @xavierleroy's pringo library. The motivation is the move to Multicore, where splittable generators let us provide better behavior for the current global Random interface.

Note: This RFC is in a Draft state: it needs input from other people (in particular @xavierleroy, js-of-OCaml and Mirage people) before it can be considered a final proposal.

Motivation: Multicore

Random and Multicore

Random provides a "global" interface that is not explicitly parametrized over the generator state -- the Random.State module provides a parametrized version: Random.bool : unit -> bool, etc. This "global" interface creates correctness or performance problems in a Multicore world:

  • If we keep a single global mutable generator state, it needs to be protected by a lock, which makes the PRNG a concurrency bottleneck.

  • If we give an independent random generator to each domain, it is unclear how to initialize the state of a new domain. Reusing the state of the parent domain is wrong (the two domain will generate the same random numbers), and inflexible "seeding" policies will be incompatible against some user choices of seed.

Other approaches tend to have dubious randomness properties, and often require tight coupling between the Random library and the multicore runtime, which would increase implementation complexity.

Splitting saves the day

Some PRNG algorithms provide a "split" operation that returns two pairs of random-generator states, designed to generate independently-looking sequences.

With such "Splittable PRNGs", creating a new domain simply requires splitting the generator state of the parent domain. This approach is efficient (no synchronization of the generator state), has good randomness properties, and it is simple to implement inside the Random module (using runtime-provided thread-local-storage primitives). The decisions of the user in terms of seeding (OS-random or deterministic) are naturally respected, etc. In short: you want a splittable generator for Multicore.

(For a previous discussion of this problem where the suggestion to move to splittable PRNGs emerged, see ocaml-multicore/ocaml-multicore#582 (review) )

PRINGO generators

The pringo library provides two splittable PRNG implementations.

Name numeric type speed wrt. Random pure OCaml secure when to reseed?
Random int = yes no ?
SplitMix int64 slightly faster no (C stubs) no after 2^32 draws
ChaCha int32 slightly slower no (C stubs) yes (weak crypto) after 2^64 draws

To give a sense of the performance difference, on one machine, on a micro-benchmarking drawing numbers in a tight loop (make benchmarks from the pringo directory):

  • drawing int has the same speed for all generators,
  • drawing int64 takes 55% of the Random time with SplitMix and 93% of the Random time with ChaCha, and
  • drawing float values in [0;1] takes 66% of the Random time with SplitMix, and 120% of the Random time with ChaCha.

Those differences are unlikely to be noticeable, as drawing random numbers is a neglectible part of most programs. Some very specific PRNG-bound algorithms (Monte-Carlo simulation, etc.) probably use custom PRNGs anyway.

Requirements for a standard-library PRNG

Here is the list of potential requirements we considered to decide on including a PRNG implementation the standard library.

Non-concern: 32-bit hosts

SplitMix, using 64-bit integers internally, is going to have lesser performance under 32-bit systems. We have more or less decided that we care less about the performance of 32-bit CPU architectures these days (we specifically discuss Javascript below), so we propose to not consider this in the choice process.

Concern: good randomness

We want PRNGs that pass statistical tests for good randomness. All algorithms discussed here perform well under standard PRNG-quality test (diehard, etc.).

Possible concern: lesser portability of C stubs

Random currently has a pure-OCaml implementation (except for the system-specific auto-seeding logic). Moving to C stubs may be an issue, at least for Mirage users, and require extra work from alternative implementations (js_of_ocaml, BuckleScript, etc.).

SplitMix is a very simple algorithm, it is easy to provide a pure-OCaml version that should perform well. Porting Chacha, which is more elaborate, is more work, but it's also harder to predict performance for the pure-OCaml version.

Possible concern: Javascript performance

js_of_ocaml and Bucklescript/ReScript are important for our users, and it would be nice to ensure that the performance is not disappointing on these alternative implementations.

js_of_ocaml implements int64 with an emulation, suggesting that SplitMix could take a performance hit. (There is no native int64 type under Javascript anyway, "mainstream" approaches such as long.js emulate them with two numbers.) It may be that JS engines optimize the emulation layer efficiently, so we should evaluate each choice.

(Using C stubs in fact gives more flexibility for alternative implementations to provide their own native implementation of these functions.)

Summary

What are the requirements for the Random module?

The requirements discussed in the current version of the RFC are:

  • implementation being pure OCaml
  • performance
  • "security" of the PRNG
  • reseeding recommendations

Here is a summary of what we understand to be the consensus so far (I'll edit this part of the RFC as discussion progresses):

  • pure-OCaml: yes.

    A pure-OCaml implementation makes people's life simpler would be strongly preferable; we should port SplitMix and Chacha and re-run benchmarks.

  • performance: not important, but check that js_of_ocaml does okay.

    A small performance hits are not very important for the standard Random PRNG, so a reasonable decrease (for everyone or just js_of_ocaml) would be perfectly acceptable.

    We should still benchmark under js_of_ocaml to have an idea of the performance (once we have pure-OCaml versions). It's bad if some parts of OCaml programs suddenly become much slower there.

  • Having a crypto-secure PRNG is not part of our requirement -- Random is not secure in that sense.

  • Having to reseed SplitMix every 2^30 draws may be a problem in practice -- loops running 2^32 iterations are perfectly reasonable thse days. All other things being equal, ChaCha should be preferred.

Copy link

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that's the proper way to comment on a RFC, but here we go.

Comment on lines +14 to +15
- If we keep a single global mutable generator state, it needs to be protected by a lock, which makes the PRNG a concurrency bottleneck.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock-free implementations are possible for some PRNGs. For instance, SplitMix needs only a 64-bit fetch-and-add, because the mutable internal state is just a 64-bit counter.

Copy link
Member Author

@gasche gasche Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, but (1) it constrains the algorithm design, and (2) atomic operations would still be measurably slower than purely local computations on a PRNG-bound program.

| SplitMix | int64 | slightly faster | no (C stubs) | no | after 2^30 draws |
| ChaCha | int32 | slightly slower | no (C stubs) | yes (weak crypto) | after 2^64 draws |

To give a sense of the performance difference, on my machine, on a micro-benchmarking drawing numbers in a tight loop (`make benchmarks` from the pringo directory):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative performance varies a lot depending on 1- 64 vs 32 bit processor, and 2- the size of the numbers drawn (schematically: 64-bit integers benefit SplitMix but bytes benefit ChaCha.

- drawing `int64` takes 55% of the Random time with SplitMix and 93% of the Random time with ChaCha, and
- drawing `float` values in [0;1] takes 66% of the Random time with SplitMix, and 120% of the Random time with ChaCha.

Those differences are unlikely to be noticeable, as drawing random numbers is a neglectible part of most programs. Some very specific PRNG-bound algorithms (Monte-Carlo simulation, etc.) probably use custom PRNGs anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be so sure. QuickCheck-style random property testing can use a lot of pseudo-random numbers. I'd like to hear from Monte-Carlo specialists re: custom PRNGs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point! I can try to benchmark QCheck with an alternative PRNG. My guess is that the testing time is dominated by the actual check in user code (running the test), so this would not be noticeable in practice.

I don't know if we have people resembling Monte-Carlo specialists, but my bet would be on @Octachron, with @jhjourdan as the informed hobbyist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to benchmark QCheck with an alternative PRNG

Or just profile to know how much time is spent in the stdlib PRNG. Then we can extrapolate from there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one datapoint Stan (whose one job is Hamiltonian MCMC[0]) uses boost::ecuyer1988 ([1], also known as CombLec88 in eg [2]), using its ability to skip-ahead fast to generate several independant streams ([3] explains why they chose this approach). Unfortunately I have no measure of how much of a typical Stan run the RNG takes (though I suspect little, as Stan spends a lot of time computing likelihoods and derivatives).

[0] sorry if this is not the kind of Monte-Carlo you're thinking of
[1] https://www.boost.org/doc/libs/1_77_0/doc/html/boost_random/reference.html#boost_random.reference.generators
[2] http://videos.rennes.inria.fr/seminaire_Irisa/lecuyer/rngx.pdf
[3] stan-dev/stan#3028 (comment)

Copy link
Member

@Octachron Octachron Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect Markov Chain Monte-Carlo simulation to tune their PRNG since those require to generate a lot of samples from a simple distribution to achieve thermalization. I will try to find some of my sampling experience from my PhD to do more performance analysis (but from my memory using efficient algorithm for sampling non-uniform distribution had a larger impact than the PRNG itself).


Random currently has a pure-OCaml implementation (except for the system-specific auto-seeding logic). Moving to C stubs may be an issue, at least for Mirage users, and require extra work from alternative implementations (js_of_ocaml, BuckleScript, etc.).

SplitMix is a very simple algorithm, it is easy to provide a pure-OCaml version that should perform well. Porting Chacha, which is more elaborate, is more work, but it's also harder to predict performance for the pure-OCaml version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChaCha reimplemented in OCaml will be awful. The C implementation of parts of SplitMix is motivated by having decent performance with ocamlopt on 32-bit machines and with ocamlc (bytecode interpretation). I agree that an OCaml implementation of SplitMix should be quite fast with ocamlopt on a 64-bit machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that you didn't care about 32-bit CPUs anymore? (If we keep x86-32 for cross-compilation we don't care about PRNG performance; riscv32 remains, and in general embedded targets.)

We could consider trying to have both a pure-OCaml and a C-stubs implementation, and choose between the two at configure-time. (Or by matching on Sys constants, but that would be less comfortable to Mirage people and other pure-OCaml constraints.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xavierleroy as a non-expert, I'm curious as to how you can tell that pure-OCaml ChaCha performance would be bad. The current implementation uses 15 mutable (stack-allocated) uint32_t variables; using Int32.t ref would introduce a lot of boxing.

But couldn't we use another approach, such that:

  • unrolling the loop and using only (immutable) temporaries, and letting the register allocator sort it out,
  • or using an int32 bigarray stored in the generator state, which hopefully would let us read/write without boxing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an OCaml implementation of ChaCha would look pretty much like the OCaml implementation of MD5 that we use as a test (test/lib-digest/md5.ml) and that I used as a benchmark back in the days, and performance was not great compared with a pure C implementation. The boxing of int32's in the state doesn't help. The fact that ocamlopt doesn't recognize "rotate" instructions doesn't either.

rfcs/splittable-prng.md Outdated Show resolved Hide resolved

- Having a crypto-secure PRNG is not part of our requirement -- Random is not secure in that sense.

- Having to reseed SplitMix every 2^30 draws may be a problem in practice -- loops running 2^30 iterations are perfectly reasonable thse days. All other things being equal, ChaCha should be preferred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again it's 2^32. "Having to reseed" may be too strong: it is recommended to reseed after 2^32 draws. Basically, with a N-bit counter as internal state, some statistical deviations are to be expected after sqrt(2^N) numbers have been generated, because of the birthday paradox and all that. But again I don't have a reference handy.

(spotted by Xavier Leroy)
@gasche
Copy link
Member Author

gasche commented Sep 7, 2021

Thanks @xavierleroy! I wonder if you have an "overall opinion" of the RFC, independently of the details: is it worth trying to change Random in this way?

@dbuenzli
Copy link
Contributor

dbuenzli commented Sep 7, 2021

This is slightly tangential to the RFC but I'd just like to mention this in case it could also influence a final design.

I'm just wondering about whether it is really a good idea to "save" the global Random interface as it exists now.

The current situation is not that great and I think it would be better to make people move away from it. Especially for libraries where you might want to control their randomness globally and make sure these don't Random.self_init under your feet or eschew using the global random state you want them to use.

In general when I need random numbers in the function of an API I use:

val f : ?r:Random.State.t -> ...

and default to r to Random.State.self_init () for those who don't want to be bothered to carrying around a random state value. But if you use f repeteadly and r only for one or two numbers in f it's not that great.

There are two problems here:

  1. The State.t value of the global interface is not exposed.
  2. There is no way to determine if the user did take care to seed it in some way (i.e. that it is not just relying on the default state).

Basically what I would like here is something like:

 Random.State.default : ensure_seeded:bool -> Random.State.t

That has a global state value (this could return the splitted version of the domain on multicore) and ensure_seeded a flag that tests whether the default state was actually seeded and if not simply seeds it with Random.State.self_init ().

@xavierleroy
Copy link

xavierleroy commented Sep 7, 2021

Thanks @xavierleroy! I wonder if you have an "overall opinion" of the RFC, independently of the details: is it worth trying to change Random in this way?

Yes, but I don't know when.

The main reason I developed the Pringo library was to pave the way to a reimplementation of the Random module in the standard library. Splittable PRNGs are good for concurrency but also for generating random functions and random streams for Quickcheck-style property testing, if I remember correctly.

Now, I would like to read more about the state of the art in PRNGs, and perhaps to experiment with a third algorithm
(e.g. from the xoshiro family) before choosing an algorithm to replace Random's lagged Fibonacci.

@xavierleroy
Copy link

xavierleroy commented Sep 7, 2021

Actually, I did test xoshiro in Pringo: https://github.com/xavierleroy/pringo/tree/xoshiro

It has a very convincing approach to splitting, and a large state space making reseeding unnecessary. On the negative side, it's a bit slower than both SplitMix and ChaCha for drawing integers, and much slower for splitting.

@gasche
Copy link
Member Author

gasche commented Sep 13, 2021

I had a discussion with @xavierleroy today and he proposed the following three requirements:

  1. What should be the state size? Is 2^64 too small?
    (Smaller states means that one should reseed more often, or observe small deviations due to violations of the birthday paradox. Larger state need less reseeding, but are more costly to copy during splitting or when providing a pure interface.)
  2. Do we insist on a fast splitting function, or is something "moderately slow" okay?
    (for the Multicore use-case, something slightly expansive is fine as it's only done at domain creation. For QuichCheck function generation, we split much more often.)
  3. SplitMix may be weaker statistically, waiting for a clever source of bias to be found. How much do we care about this?

@gasche
Copy link
Member Author

gasche commented Sep 21, 2021

Here is some data about the PRNG profile with QCheck, the main Quickcheck-style library for OCaml. I wrote a dumb micro-benchmark:

let test =
  QCheck.Test.make ~count:100_000 ~name:"trivial test"
   QCheck.(list small_nat)
   (fun n -> true)

let () =
  ignore (QCheck_runner.run_tests [test]);;

which generates lists of integers (small_nat is the content of the list, not its size; some lists are large) and checks nothing about them.

Then running perf record --call-graph=dward ./tiny_test.native and perf report, then filtering on random, gives the following result:

  Children      Self  Command          Shared Object     Symbol
+   38.54%     9.20%  tiny_test.nativ  tiny_test.native  [.] Stdlib.random.rawfloat_438
+   16.27%    10.60%  tiny_test.nativ  tiny_test.native  [.] Stdlib.random.intaux_279
+   15.73%    15.73%  tiny_test.nativ  tiny_test.native  [.] Stdlib.random.bits_273
     0.34%     0.34%  tiny_test.nativ  tiny_test.native  [.] Stdlib.random.int_284
     0.00%     0.00%  tiny_test.nativ  tiny_test.native  [.] Stdlib.random.full_init_142

My understanding is that, summing the "Self" column, this means that about 35% of the program runtime is spent in the PRNG.

Note: this result is obvious sensitive to the complexity of the test (with our trivial case, we are checking the worst case) but also to the complexity of the random generator: some generators may draw much more random inputs per test, and therefore spend much less time in the QCheck scaffolding. I have some "real-world" generators lying around, and may give them a try.

@gasche
Copy link
Member Author

gasche commented Sep 21, 2021

In my real-world use-case of QCheck, the time spent in the PRNG is neglectible in normal operation (0.03% according to perf). If I gut the check out and only keep the generator (a generator of well-typed programs in a small fragment of OCaml), I get this:

  Children      Self  Command      Shared Obje  Symbol
+    3.07%     1.94%  effmain.exe  effmain.exe  [.] Stdlib.random.intaux_279                                                                                                                                                                 ◆
+    2.61%     2.61%  effmain.exe  effmain.exe  [.] Stdlib.random.bits_273                                                                                                                                                                   ▒
+    1.94%     0.50%  effmain.exe  effmain.exe  [.] Stdlib.random.rawfloat_438                                                                                                                                                               ▒
     0.15%     0.15%  effmain.exe  effmain.exe  [.] Stdlib.random.int_284  

So about 4.5% in total. My understanding is that a generator for a more complex data structure spends more time checking validity of the structure, and thus less time in the PRNG.

@xavierleroy
Copy link

My understanding is that, summing the "Self" column, this means that about 35% of the program runtime is spent in the PRNG.

For such a trivial test, this is a reasonable figure. I note that QCheck uses Random.float quite a lot. This use case is favorable to Splitmix, which naturally works 64 bits at a time. Had Random.bool been used a lot, this would favor Chacha or other byte-oriented PRNGs.

@xavierleroy
Copy link

And by the way: nice demangling of ocamlopt-generated symbols in the output of perf...

@gasche
Copy link
Member Author

gasche commented Sep 22, 2021

QCheck uses float as weights to mix "easy cases" with "edge/hard cases" in its generators:

  (* natural number generator *)
  let nat st =
    let p = RS.float st 1. in
    if p < 0.5 then RS.int st 10
    else if p < 0.75 then RS.int st 100
    else if p < 0.95 then RS.int st 1_000
    else RS.int st 10_000

(This could easily be rewritten without floats if performance mattered.)

@xavierleroy
Copy link

(This could easily be rewritten without floats if performance mattered.)

Oh, no! It's a perfectly legitimate use of a PRNG (to draw probabilities between 0 and 1). The PRNG should fit its uses, not the other way around. Splitmix and the Xoshiro family have an advantage here, since their primitive operation is to draw random 64-bit integers, while Chacha and other PRNGs based on cryptographic ciphers are oriented towards drawing random bytes.

With a bit of exaggeration, PRNG uses tend to gravitate towards one of two uses: (1) draw probabilities between 0 and 1, and (2) fill N bytes with noise. Since use (2) often requires cryptographic quality (think nonces and session keys), it makes sense to favor (1).

@dbuenzli
Copy link
Contributor

Oh, no! It's a perfectly legitimate use of a PRNG (to draw probabilities between 0 and 1).

That just occured to me. Can you witness the effect of the non-uniform representation of floating point numbers when you do this ?

@gasche
Copy link
Member Author

gasche commented Sep 23, 2021

The short answer is "no, things were done carefully".

Pringo's float generator (on 64 bits) draws a random 53-bit integer, converts it in float-point and then multiplies it by 2^{-53} to get a float in [0;1]. (53 bits is the precision of a double's significand/mantissa, so that conversion is lossless). You really get the (double)floating-point approximation of a uniform [0;1] real.

@xavierleroy
Copy link

xavierleroy commented Sep 25, 2021

Actually, I'm not sure the Pringo method is the best possible. It gives you 2^53 possible results, evenly spaced between 0.0 and 1.0, and evenly distributed. But many FP values between 0.0 and 1.0 are never returned, such as 1.0 itself and a bunch of values close to 0.0.

If we take a 64-bit random integer, convert it to FP and multiply by 2^{-64}, we would get some of these other FP values (but still not all), including 1.0 thanks to rounding, but with a bit of a bias because of rounding. (1.0 would occur half as often as its FP predecessor, if I'm not mistaken.)

Off the top of my head, I don't know how to write a PRNG that samples ALL the FP values between 0.0 and 1.0 with the correct frequencies. But I'm not sure we care.

I'll do a literature search soon, but pointers to the literature are welcome.

@gasche
Copy link
Member Author

gasche commented Sep 25, 2021

If you open a more specific issue on Pringo, we can ping the usual float experts. I'm sure at least half of them also have a knack for PRNGs, somehow these vices are correlated.

@xavierleroy
Copy link

As promised, here are some references on generating pseudo-random FP numbers:

The bottom line is that 1) the algorithm used in PRINGO is not bad and is used in many standard libraries, 2) it might be worth exposing the core function that returns a pseudo-random FP number between 0.0 and 1.0 and document how it works and that it never returns 1.0.

@jhjourdan
Copy link

FTR, when I use a PRNG to sample values between 0 and 1, I would expect it to never return 0 nor 1. The reason is that we usually transform these sampled values to get other distributions, often in ]-oo, +oo[ or ]0, +oo[, and 0 and 1 usually correspond to infinite values in the transformed space. For example, the typical method to get a exponentially distributed real is by taking the opposite of the logarithm of a uniform [0,1] variable. If such uniform variable is allowed to be 0, then the exponentially distributed value can be +oo, which can lead to problems. This is for example the case in statmemprof.

@jhjourdan
Copy link

The bias induced by forbidding 0 and 1 is in the order of 2^53. Hence one would need about 2^100 samples to observe it. Seems ok to me.

@xavierleroy
Copy link

After more thoughts and experiments, I suggest to go with a Xoshiro-based PRNG. See proposed implementation here: ocaml/ocaml#10701

@gasche
Copy link
Member Author

gasche commented Oct 14, 2021

I'm happy to go with Xoshiro as long as you can build a consensus (with yourself :-) and we can move the decision-making stage. I will try to update my RFC document with an explanation for the choice of Xoshiro. Here is what I understand for now:

  • the performance is good (I guess comparable to SplitMix in C?)
  • we ended up unconvinced that "being implemented in native OCaml" is an important design goal
  • the statistical properties of splitting are easier to reason about, it's less likely to turn out later to be flawed in an unforeseen way
  • it's a very modern design from a recognized expert in the field, so you/we feel good about it

@xavierleroy
Copy link

The consensus is being challenged at ocaml/ocaml#10701, so don't hurry.

Note that Xoshiro can be implemented in OCaml without too much pain. (Less pain than Chacha but more pain than Splitmix.) But performance will suffer.

@gasche
Copy link
Member Author

gasche commented Oct 27, 2021

The discussion at ocaml/ocaml#10701 points out that a jump-based generator makes it difficult to seed domains in a way that is independent from their scheduling, which would be solved by an ideal split function. Jump is easier to think about statistically, but it does not provide the same expressivity.
(A simple strategy to build split from jump is "sub-sequencing", but this strategy breaks down after log(P) dependent splits, where P is the period. This might be okay for domains, but it's not great.)


There is a new paper on splittable generators, LXM: better splittable pseudorandom number generators (and almost as fast) by Steele and Vigna, OOPSLA 2021. This is a combination-based generator using some of our new or old friends as building blocks (Xoshiro, Murmur3). The paper is dense with dozens of combinations analyzed, and a very thorough experimental analysis of the resulting sequences which suggest that most combinations are statistically very robust. Basically I understand them as "improved SplitMix", in particular with sensibly larger periods (so lesser chances of accidental overlap on splitting).

Note: the Related Work section mentions a "splittable generator survey" from Schaatun in 2015, which compares a large number of splittable-RNG constructions (but not SplitMix, which was not known to the author at the time), and concludes that the only robust approach is the "cryptographic" approach. My understanding is that Steele and Vigna agree that the "safest" approach to splitting (but not the fastest) is the cryptographic one.


I looked at how QuickCheck uses splitting for function generation. The way they implement Gen (a -> b), where Gen is the "random generation" monad, is by requiring an operation of type a -> Gen b -> Gen b: you have to be able to mix an input a into a generator b, to (deterministically) get an independent generator state. (There is a typeclass CoArbitrary a that expresses being able to do this for any b.) This is in turn defined in terms of a variant :: Int -> Gen b -> Gen b primitive, that "perturbs" the RNG state with an integer. This looks like a jump function, but "jump" is not good here because the generators obtained with fixed-sized jumps are not independent (in particular, jumping again on one of them may give exactly another of them). variant is implemented using an integerVariant helper function whose definition is there. It reads the binary representation of the integer and performs one split per bit, taking the left or right generator depending on the bit value.

Taking a step back: basically the QuickCheck approach to generate a random function graph of type (a -> b) is to perform Cardinal(a) splits on the random generator, to get an independent generator for each possible a input, and then generate a b from each generator. My intuition tells me that it should be possible to do something similar using jump rather than split -- we know Cardinal(a) in advance, so even-spaced jumps should work as well as splits or better.

@xavierleroy
Copy link

The discussion at ocaml/ocaml#10701 points out that a jump-based generator makes it difficult to seed domains in a way that is independent from their scheduling, which would be solved by an ideal split function

I'm not sure I understand the discussion in question. But there is no argument that split is strictly more useful than jump.

Will look at the LXM paper.

@xavierleroy
Copy link

xavierleroy commented Oct 31, 2021

LXM is a very interesting design, indeed! It has everything we've been looking for: large state space, well-understood splitting, and efficient 64-bit implementation.

I implemented the L64X128 variant in Pringo and I confirm the claim of the paper: it is barely any slower than SplitMix, despite the increased state space and additional operations.

One thing that worries me is that LXM might be covered by this rather general patent: Generating pseudorandom number sequences by nonlinear mixing of multiple subsidiary pseudorandom number generators .

@xavierleroy
Copy link

One thing that worries me is that LXM might be covered by this rather general patent: Generating pseudorandom number sequences by nonlinear mixing of multiple subsidiary pseudorandom number generators .

Actually it's not that general because of the nonlinear mixing. LXM combines its L and X generators with a linear operation. The patent describes TwoLCG, an earlier design of Steele.

So, I don't think this patent applies to LCM, but I wouldn't be surprised if a patent that covers LXM was in preparation.

@xavierleroy
Copy link

xavierleroy commented Nov 1, 2021

New proposal based on an LXM generator: ocaml/ocaml#10742

@gasche
Copy link
Member Author

gasche commented Jan 17, 2022

We did end up merging ocaml/ocaml#10742, so we have now replaced the Random algorithm by a splittable RNG, and this RFC can be closed. (Or merged? Doesn't matter, it served its purpose to facilitate and accelerate decision-making on the issue.)

@gasche gasche closed this Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants