Deterministic sampling with Gym environments #2180

MkuuWaUjinga · 2020-11-20T15:43:52Z

Hello everybody,

when I used Garage's EpsilonGreedyStrategy and Gym environments I found that sampling is not deterministic. I've set the seed via deterministic.set_seed(seed).
After some investigation I found that Garage doesn't set any seeds for Gym. Is there any reason for that? As a user I would actually expect that Garage handles all that for me.
Happy to do a PR in case you feel like this should be added!

The text was updated successfully, but these errors were encountered:

yeukfu · 2020-11-23T03:11:28Z

Hi, thank you for trying garage.

It seems that Gym don't have a method that can set seed for all environments. Instead, you can call env.seed(0) to set the seed for a specific environment according to this.

MkuuWaUjinga · 2020-11-23T10:06:45Z

Thanks for your answer.
You're right. Setting the seed requires an initialized environment. However, setting env.seed(0) doesn't make action sampling deterministic. For that Gym seems to have a different seed that can be set with env.action_space.seed(seed).

So these are already two new seeds users have to be aware of in order to make their experiments reproducible. Furthermore, there might be similar issues with other environment libraries such as dm_control or metaworld. Since reproducibility is a core promise of garage I'd suggest to add the info to the docs or even provide a method for seeding the environment like deterministic.set_seed_env(env, seed) that sets seeds depending on the underlying environment library.

yeukfu · 2020-11-23T18:40:32Z

Thank you for your informative reply. Reproducibility is our core promise. We will address this issue.

MkuuWaUjinga · 2020-11-23T18:53:44Z

Cool, would be happy to contribute and make a PR.

yeukfu · 2020-11-23T19:00:15Z

Great! You can follow this git workflow to make a PR.

krzentner · 2020-11-24T01:56:33Z

Hi Adrian, Thank you for offering to fix this. To get you started on where to look, I believe the best place to make this change is in GymEnv.__init__ (by retrieving the seed using deterministic.get_seed()).

The alternative way of implementing this would be to extend the environment API to include a .seed method (which would be reasonable). In that case, I think the main two places that would need to be changed are this method (which is called in each process when a sampler is constructed) as well as this function (which handles when the environment in a sampler is modified). That should be enough to ensure determinism when sampling (except for TensorFlow, which has several inherently non-deterministic kernels, and when using multi-process sampling, since the OS scheduler can affect results).

Unfortunately ensuring determinism when evaluating policies in off-policy algorithms is somewhat more complicated because there isn't a clear place to do it right now. That evaluation always gathers samples using this function, but there's no guarantee that the algorithm calls env.seed before that point. Because it's difficult to ensure env.seed gets called in these algorithms (or in other algorithms users might write), I think the first approach is probably better.

ryanjulian · 2020-11-25T19:26:41Z

I would endorse extending the Environment API, since that would allow us to encode how to set the seed of each environment library in a single place. Thereafter, deterministic.set_seed can only have one touch-point with the Environment API.

MkuuWaUjinga · 2020-11-26T18:52:27Z

Thanks for your input. Both implementation proposals make sense to me. Though I find extending the Environment API more elegant I agree that it doesn't guarantee determinism in an off-policy setting. For that seeding has to happen as soon as the Environment subclass is created.
If we wanted to stick to extending the Environment API we could set the Space-seed when the action_space property is used for the first time. Or let users call env.seed() manually. But then I don't see a big difference to doing everything in __init__ already.

MkuuWaUjinga · 2020-11-26T21:23:03Z

Just open a WiP-PR. We could also move discussions there if you feel its more convenient.

MkuuWaUjinga · 2020-12-04T12:02:15Z

Hi @ryanjulian and @krzentner,
any update on this? Otherwise I would continue with extending the Environment API and add seeding for libraries other than Gym.

ryanjulian · 2020-12-04T22:22:26Z

I left a couple comments on the PR, otherwise this looks good to me!

yeukfu added the bug Something isn't working label Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic sampling with Gym environments #2180

Deterministic sampling with Gym environments #2180

MkuuWaUjinga commented Nov 20, 2020 •

edited

Loading

yeukfu commented Nov 23, 2020

MkuuWaUjinga commented Nov 23, 2020 •

edited

Loading

yeukfu commented Nov 23, 2020

MkuuWaUjinga commented Nov 23, 2020

yeukfu commented Nov 23, 2020 •

edited

Loading

krzentner commented Nov 24, 2020

ryanjulian commented Nov 25, 2020

MkuuWaUjinga commented Nov 26, 2020 •

edited

Loading

MkuuWaUjinga commented Nov 26, 2020

MkuuWaUjinga commented Dec 4, 2020

ryanjulian commented Dec 4, 2020

Deterministic sampling with Gym environments #2180

Deterministic sampling with Gym environments #2180

Comments

MkuuWaUjinga commented Nov 20, 2020 • edited Loading

yeukfu commented Nov 23, 2020

MkuuWaUjinga commented Nov 23, 2020 • edited Loading

yeukfu commented Nov 23, 2020

MkuuWaUjinga commented Nov 23, 2020

yeukfu commented Nov 23, 2020 • edited Loading

krzentner commented Nov 24, 2020

ryanjulian commented Nov 25, 2020

MkuuWaUjinga commented Nov 26, 2020 • edited Loading

MkuuWaUjinga commented Nov 26, 2020

MkuuWaUjinga commented Dec 4, 2020

ryanjulian commented Dec 4, 2020

MkuuWaUjinga commented Nov 20, 2020 •

edited

Loading

MkuuWaUjinga commented Nov 23, 2020 •

edited

Loading

yeukfu commented Nov 23, 2020 •

edited

Loading

MkuuWaUjinga commented Nov 26, 2020 •

edited

Loading