Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCRandom initializes its own default state in a non reproducible way #350

Open
nilsbecker opened this issue Feb 4, 2021 · 11 comments
Open

Comments

@nilsbecker
Copy link
Contributor

nilsbecker commented Feb 4, 2021

let __default_state = Random.State.make_self_init ()

here an internal __default_state is created with a random seed, which is later used by default in the run functions. this default state is different from the one in the included Stdlib.Random module, namely Random.default, afaict. i came across this because i encountered non-reproducible simulation runs.

i worked around it by always calling run ~st:(Random.get_default_state ()) which copies the default state from Stdlib.Random, which i make sure to initialized in a reproducible way. i think this copying costs performance; a better way would be to create, initialize and later supply my own random state to give to the ~st argument of run.

anyway, i don't quite understand why CCRandom does not reuse the default state in Stdlib.Random? that way, Random.init would make CCRandom.run with default state reproducible, iiuc.

@c-cube
Copy link
Owner

c-cube commented Feb 4, 2021

Hmm, I don't see a Random.default in merlin nor in the stdlib manual, is it a recent addition?

I see your point about reproducibility, but on the other hand when I call random() i don't necessarily expect it to be the same on each run of a program. I think both defaults have issues. I'm open to using a fixed seed, but I think it needs more discussion.

What do other programming languages do here? If they have a similar situation?

@bluddy
Copy link
Contributor

bluddy commented Feb 4, 2021

Looks like you should call Random.State.get_state() for the default init, so that in case the user seeded the random state (to maintain reproducibility), you use the same state.

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Feb 4, 2021

i think i didn't make my point very clearly, sorry about that. Random.default is not exposed; you can find it in random.ml only. it is the state that gets used (and mutated) when calling Random functions without referring to a specific state.

i didn't mean to suggest using a deterministic seed by default in CCRandom. just calling random should give a pseudo-random, non-deterministic value. however i think it's important to have a way to apply a deterministic seed to make simulations reproducible, if desired. this is possible in Stdlib with Random.init, for all random generators that use the default state at once. currently, with the way CCRandom handles state, i did not see how that can be done. instead one has to specify a state as the ~st argument to Random.run.

@bluddy: that was my workaround: i explicitly called run ~st:(get_state ()). but i think it's not ideal since get_state copies an allocates needlessly. imo it would be better to take (the non-exposed) Random.default directly, if possible. then Random.init would also seed CCRandom.run reproducibly.

@bluddy
Copy link
Contributor

bluddy commented Feb 4, 2021

@nilsbecker it's not possible because it's not exposed in the interface. It's also just one allocation and really not a huge deal IMO.

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Feb 4, 2021

@bluddy ah. i thought maybe include lets you include the raw module without filtering by signature but i guess that would break abstraction. also apparently it's not possible to blit from the default state with the provided interface of Random.State, which would have enabled pre-allocating.

then indeed get_state () seems the only way to get at the state. in my stochastic simulations i call run a lot and i do see a 15-20% performance hit from it in a specific example, which i can avoid by allocating my own state.

there is another unwanted effect of get_state. copying the state means the original state is not modified by creation of a new random number. so calling Random.run ~st:Random.(get_state ()) rng twice in a row will yield two times the same random number (!)

@c-cube
Copy link
Owner

c-cube commented Feb 4, 2021

we could have a toplevel let default_state = Random.get_state() which is called early on, and then use that one as a default: let run ?(state=default_state) g = …

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Feb 4, 2021

i guess. however, Random.set_state state would then not set that one. ah, maybe one could override it: in CCRandom:

let default_state = ref (Random.get_state ())
let set_state state = default_state := state; Stdlib.Random.set_state state

looks ugly but at least a reset would do what one would expect. it's a duplication of effort but i don't see how one could hook into the chain of states that Stdlib.Random generates in its own rngs.

but what should (CC)Random.get_state () then return? the original one? the extra one? i think one wants the invariant that first calling Random.get_state () and then running a rng will yield the same result as let st = Random.get_state () in Random.set_state st; and then running the rng. i think this is not so with the snippet above, since the CCRandom default state will in general have been different from the Stdlib.Random default state before calling set_state. damn.

@c-cube
Copy link
Owner

c-cube commented Feb 4, 2021

This seems confusing, because it depends on whether or not you open Containers;;.

It seems like we have conflicting requirements here:

  • be compatible with OCaml as a drop-in replacement
  • be non-surprising (as in, calling run g several times should look random rather than always returning the same thing)

I'm not sure we can satisfy both. For the second one we could at least make the __default_state deterministic using a fixed seed (which wouldn't be the same as Random, but I think that's ok, one should not depend on that).

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Feb 4, 2021

i would think that not having

* calling `run g` several times should look random rather than always returning the same thing

would be so surprising to qualify as a bug. there is also

  • calling run g without initialization should give pseudo random results on successive runs of the program

which i think is nice to have and expected but not having it may be less of a problem?

[edit: deleted, i think this was not correct. the CCrandom generators mostly require an explicit state argument, so there is no problem. only run seems to be problematic]

@nilsbecker
Copy link
Contributor Author

nilsbecker commented Feb 4, 2021

so a solution could be to be explicit that run without ~st updates its own private state, and to supply setter and getter functions set_run_state and get_run_state specifically to handle it?

@c-cube
Copy link
Owner

c-cube commented Feb 12, 2021

I don't know what to think. set_run_state and get_run_state do seem like a reasonable idea, provided it's all well documented. This is all mostly a documentation issue as people can have as much control as they want already by just passing ~st explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants