-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Rayon support to Rand #566
Comments
Considering that one of the reasons to choose an RNG is to have control over reproducibility, which cannot be guaranteed when using automatic migration of some samples to sub-RNGs, there is less need to have this flexibility.
If we switch streams and ensure each thread has a unique stream this isn't an issue. If we use freshly-seeded RNGs this is harder to reason about; with CSPRNGs there is no problem since these must ensure any output cannot be used to predict future output; with some simple PRNGs like Xorshift this is definitively a problem (without an extra permutation step); with other PRNGs I don't think statistical quality/independence is easy to reason about, though e.g. the PCG family are supposed to be difficult to predict. There is also a potential circular dependency problem. |
Wouldn't it be much simpler if the thread pool managed the RNGs? This should also be more performant, because then the RNG initialization is only done once per worker instead of once per stolen work. This could possibly implemented in /// For each thread, initialize an RNG from the thread ID.
pub fn with_rng<C>(self, rng_constructor: C) -> ThreadPoolBuilder where
C: Fn(usize) + Send + Sync + 'static -> impl rand_core::RngCore,
|
Isn't that equivalent to using |
Yes, you could choose and initialize the RNG, which is not possible with Interestingly enough, rayon already uses an internal per-thread RNG as an implementation detail for deciding which job to steal: pub struct WorkerThread {
/// the "worker" half of our local deque
worker: Worker<JobRef>,
index: usize,
/// are these workers configured to steal breadth-first or not?
breadth_first: bool,
/// A weak random number generator.
rng: UnsafeCell<rand::XorShiftRng>,
registry: Arc<Registry>,
}
|
We ran into a similar dilemma with But once you implement
If "normal" PRNGs are supposed to be deterministic, then I would say this is problematic, but may be something you could just paper over as a documented limitation. The simplest test is that
I don't know too much about PRNGs, but it seems to me that one which actually degrades on splitting is not really suitable here.
Suppose I tried to zip your random source with my own vector of 100 heavy work items. Your forced Generally, I think splitting isn't something you should have to worry much about. Rayon will start with approximately the number of threads, and only split further when work is actively stolen. This can lead to small splits at the tail end of a run, but it's usually not noticeable except in microbenchmarks. Plus your benchmark can always call
Is your split so expensive that it would be noticeable compared to practical use of this data? Your benchmarks are just collecting the random data, so the workload is basically just a memory write. @vks - I don't think we'd want to make |
Taking a step back - If you just had some RNG wrapper which resets/reseeds when cloned, then I think one could just use that as the value for (0..100_000).into_par_iter().map_with(RngCloner(r), |r, i| i + r.gen())... Would this meet your needs? (Are your needs even well defined?) |
Thank you for the reply, very helpful!
Yes, that is exactly my problem! In one way it looks a bit ugly (
Returning the right number of items seems easy, I now just use a counter.
In a way it is funny how much people can value that their "random" numers are deterministic 😄. But I understand why (reproducibility of results), so definitely something to document. But it does not seem like Rayon depends on
I have a little trouble imagining when somebody would do so, but it would not be good style then... Maybe it is best to just recommend users use
For some RNGs it could be basically free, for others I suspect one split can easily take as much time as 50 iterations of practical use (and for HC-128 something like a 1000 iterations, but I suppose that just disqualifies it here).
That is an interesting function! Sounds almost as something we could use. But how do you prevent a second clone from ending up the same as the first clone? Except by always reseeding the RNG with something like |
We don't prevent them from looking the same, nor does rayon even care. But with a custom wrapper, |
I'm not a specialist of rayon at all, but about parallel independent random generator, you should look at the Lecuyer random generator. It allows one to have parallel independent (reproducible) streams of random numbers. |
Does this matter if it is behind a feature flag?
I think it would be nice to make it reproducible given a seed, but I'm not sure whether that is feasible with the work-stealing approach. |
I don't see how the flag changes that consideration -- it's still a first-class part of the API for semver concerns. Even our former "unstable" feature caused problems when we made any changes, which is why we switched to the out-of-band
Note that any kind of thread-local solution also fails to be reproducible, since you can't control which thread each part will run on. Only an indexed |
Thanks for all the help so far! I am going with the wrapper approach. Wrote a lengthier comment a couple of days ago in this issue, but somehow it is lost... Using And if a user wants reproducible results I am thinking of adding a comment somewhere in our documentation to use both @cuviper For me this issue can be closed. But as there are some more ideas floating around in this issue, can I leave deciding on that to you? |
Maybe one more question: Do you maybe want to give some input on rust-random/rand#413? The idea is to require a lower rustc version for |
There's also #136 as an enhancement request for a parallel PRNG. I'll file a quick bug about the possibility of rolling our own PRNG for rayon-core. If there are any other individual ideas that should be tracked, please comment or raise them again in separate issues! |
I have made a PR to Rand to add support for Rayon. It is very much in the 'just exploring' phase. It would be great if someone who is more familiar with Rayon could weight in.
We have one basic problem: every PRNG needs to store its state somewhere and mutate it to produce the next random number. Just cloning the state would produce two identical streams of random numbers.
One solution that works right now is to use
thread_rng()
rust-random/rand#398. It does nothing special (from our perspective), it just sets up one RNG in thread-local storage for every thread in Rayons thread-pool. This may be a costly operation, but only has to be done once during the lifetime of the application. At least if the thread-pool also remains available all that time.What I am exploring is somehow splitting/reseeding an RNG every time Rayon splits a job. If this a good idea at all, or in no way better than using RNGs in thread-local storage, is something to explore. And so are the details around splitting RNGs and keeping things correct, but that seems like a problem for us in Rand. It would have the advantage that users can choose their favourite, or most appropriate, RNG.
Can someone review the approach, and maybe answer some questions?
Is implementing
IndexedParallelIterator
, and setting the number of values to generate up front the right choice? I tried working withUnindexedProducer
, but that expects to somehow finish by itself, which is something an RNG never does.Is my abuse of
DoubleEndedIterator
okay, to implement the necessary traits? It will just produce the next random number likeIterator
, it can't produce something like the previous random number. For normal PRNGs that is a computationally expensive operation (and not supported by Rand), and for cryptographic PRNGs impossible.It is hard to predict what will happen to the statistical quality of PRNGs when they are split off many times, and only very short runs are used. So it seems sensible to set a minimum for the number of items that should be used from the PRNG, to hopefully keep similar statistical properties as one PRNG used continuously. Is setting a hard limit with
min_len
the best approach? Which Rayon supports increasing by users?And from the final comment (currently): I wonder if Rayon is designed do deal with situations like ours, where every split of the job is not a basically free operation? So ideally it would split the job as little as possible, instead of splitting it into many manageable chunks. I don't know what drives Rayons decision here...
The text was updated successfully, but these errors were encountered: