-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove rng seeder #5787
Remove rng seeder #5787
Conversation
cee0125
to
3f5b297
Compare
Codecov Report
@@ Coverage Diff @@
## main #5787 +/- ##
==========================================
+ Coverage 89.36% 89.38% +0.01%
==========================================
Files 74 74
Lines 13804 13764 -40
==========================================
- Hits 12336 12303 -33
+ Misses 1468 1461 -7
|
9739855
to
c6093c4
Compare
@ricardoV94 I can't invest the time for a thorough review right now---the diff is a little big and seeding is a sensitive and tricky topic.. |
ea90c4f
to
c8a2464
Compare
3ba1070
to
9d65f7a
Compare
0bc7acf
to
392b00d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of questions @ricardoV94. The aesaraf functions look good but I have some questions regarding sampling. How does the random seed go into the step methods at the moment? The idea of passing different "seeds" (the entropy parameter) to SeedSequence
isn't the best way to ensure independent streams. The best way is to spawn as many seeders as necessary. These will all get the same entropy value, but they'll have different spawn_keys
. We should aim to use this instead of different entropies somehow.
392b00d
to
c72b0a7
Compare
In forward sampling, we always go via a single In MCMC sampling, all our step methods rely on global seeding. We have to create different integer seeds for each chain. Ideally we would be passing Generators around instead, and we would probably spawn them (or their SeedSequences) before branching off. This however will require a larger refactoring. See #5093 |
…bal seeding. Together, `test_sample_does_not_set_seed` and `test_parallel_sample_does_not_reuse_seed` covered two unspoken behaviors of `sample`: 1. When no seed is specified, PyMC shall not set global seed state of numpy in the main process. 2. When no seed is specified, sampling will depend on numpy global seeding state for reproducible behavior. Point 1 is due to PyMC legacy dependency on global seeding for step samplers. It tries to minimize "damage" by only setting global seeds when it absolutely needs to, in order to ensure deterministic sampling. Ideally calls to `numpy.seed` would never be made. Point 2 goes against NumPy current best practices of using None when defining new Generators / SeedSequences (https://numpy.org/doc/stable/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence) The refactored tests cover point 1 more directly, and assert the opposite of point 2.
c72b0a7
to
1db49ae
Compare
1db49ae
to
c0e4f08
Compare
…s in compiled function Sampling functions now also accept RandomState or Generators as input to random_seed, similarly to how random_state behaves in scipy distributions. For backwards compatibility this argument was not renamed.
c0e4f08
to
7096547
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good @ricardoV94 !
@ricardoV94, I’ll let you merge since I’m not sure if you want to squash merge or rebase and preserve the history |
Always rebase with me :) |
Thanks for reviewing! |
This PR removes the
Model(rng_seeder=seed)
API as discussed in #5785 and instead makes use of therandom_seed
kwarg in all sampling functions. One big breaking change done on purpose (but not strictly needed) is that these functions will not care/respect user defined external global seeding.This is done so that default seeding quality can be "as good as possible", as suggested by NumPy best practice: https://numpy.org/doc/stable/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
Is everyone okay with this change?
Also allowed passing RandomState or Generators to all sampling functions for user convenience.
Closes #5785
Closes #5733
Closes #5784
Closes #4301