Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SeedSequence in RandomStream #936

Closed
ricardoV94 opened this issue Apr 28, 2022 · 1 comment · Fixed by #939
Closed

Use SeedSequence in RandomStream #936

ricardoV94 opened this issue Apr 28, 2022 · 1 comment · Fixed by #939
Labels
enhancement New feature or request help wanted Extra attention is needed important random variables Involves random variables and/or sampling

Comments

@ricardoV94
Copy link
Contributor

ricardoV94 commented Apr 28, 2022

Numpy docs advise using SeedSequence for spawning independent bit_generators: https://numpy.org/doc/stable/reference/random/parallel.html

Their method should be quite more robust to collisions than our naive random.integers(2**30) strategy.

seed = int(self.gen_seedgen.integers(2**30))

import math

unique_seeds = 2**30
for n_seeds in (10, 100, 1_000, 10_000, 100_000):
    # birthday paradox probability: https://en.wikipedia.org/wiki/Birthday_problem
    p_collision = 1-math.perm(unique_seeds, n_seeds) / unique_seeds**n_seeds
    print(f"{n_seeds=}, {p_collision=}")
n_seeds=10, p_collision=4.1909515080540416e-08
n_seeds=100, p_collision=4.610036260510597e-06
n_seeds=1000, p_collision=0.0004650875835883195
n_seeds=10000, p_collision=0.04549425469529611
n_seeds=100000, p_collision=0.9905023499278603
@ricardoV94 ricardoV94 changed the title Should we use seed sequence spawning in RandomStreams Should we use seed sequence spawning in RandomStreams? Apr 28, 2022
@ricardoV94 ricardoV94 changed the title Should we use seed sequence spawning in RandomStreams? Should we use SeedSequence spawning in RandomStreams? Apr 28, 2022
@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Apr 28, 2022

From my understanding, we would only need one level of spawning. In this case numpy claims there is no chance of collision whatsoever: https://numpy.org/doc/stable/reference/random/parallel.html#id1

@brandonwillard brandonwillard added enhancement New feature or request help wanted Extra attention is needed important random variables Involves random variables and/or sampling labels Apr 28, 2022
@brandonwillard brandonwillard changed the title Should we use SeedSequence spawning in RandomStreams? Use SeedSequence in RandomStream Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed important random variables Involves random variables and/or sampling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants