Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic sampling with Gym environments #2180

Open
MkuuWaUjinga opened this issue Nov 20, 2020 · 11 comments
Open

Deterministic sampling with Gym environments #2180

MkuuWaUjinga opened this issue Nov 20, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@MkuuWaUjinga
Copy link

MkuuWaUjinga commented Nov 20, 2020

Hello everybody,

when I used Garage's EpsilonGreedyStrategy and Gym environments I found that sampling is not deterministic. I've set the seed via deterministic.set_seed(seed).
After some investigation I found that Garage doesn't set any seeds for Gym. Is there any reason for that? As a user I would actually expect that Garage handles all that for me.
Happy to do a PR in case you feel like this should be added!

@yeukfu
Copy link
Contributor

yeukfu commented Nov 23, 2020

Hi, thank you for trying garage.

It seems that Gym don't have a method that can set seed for all environments. Instead, you can call env.seed(0) to set the seed for a specific environment according to this.

@MkuuWaUjinga
Copy link
Author

MkuuWaUjinga commented Nov 23, 2020

Thanks for your answer.
You're right. Setting the seed requires an initialized environment. However, setting env.seed(0) doesn't make action sampling deterministic. For that Gym seems to have a different seed that can be set with env.action_space.seed(seed).

So these are already two new seeds users have to be aware of in order to make their experiments reproducible. Furthermore, there might be similar issues with other environment libraries such as dm_control or metaworld. Since reproducibility is a core promise of garage I'd suggest to add the info to the docs or even provide a method for seeding the environment like deterministic.set_seed_env(env, seed) that sets seeds depending on the underlying environment library.

@yeukfu
Copy link
Contributor

yeukfu commented Nov 23, 2020

Thank you for your informative reply. Reproducibility is our core promise. We will address this issue.

@yeukfu yeukfu added the bug Something isn't working label Nov 23, 2020
@MkuuWaUjinga
Copy link
Author

Cool, would be happy to contribute and make a PR.

@yeukfu
Copy link
Contributor

yeukfu commented Nov 23, 2020

Great! You can follow this git workflow to make a PR.

@krzentner
Copy link
Contributor

Hi Adrian, Thank you for offering to fix this. To get you started on where to look, I believe the best place to make this change is in GymEnv.__init__ (by retrieving the seed using deterministic.get_seed()).

The alternative way of implementing this would be to extend the environment API to include a .seed method (which would be reasonable). In that case, I think the main two places that would need to be changed are this method (which is called in each process when a sampler is constructed) as well as this function (which handles when the environment in a sampler is modified). That should be enough to ensure determinism when sampling (except for TensorFlow, which has several inherently non-deterministic kernels, and when using multi-process sampling, since the OS scheduler can affect results).

Unfortunately ensuring determinism when evaluating policies in off-policy algorithms is somewhat more complicated because there isn't a clear place to do it right now. That evaluation always gathers samples using this function, but there's no guarantee that the algorithm calls env.seed before that point. Because it's difficult to ensure env.seed gets called in these algorithms (or in other algorithms users might write), I think the first approach is probably better.

@ryanjulian
Copy link
Member

I would endorse extending the Environment API, since that would allow us to encode how to set the seed of each environment library in a single place. Thereafter, deterministic.set_seed can only have one touch-point with the Environment API.

@MkuuWaUjinga
Copy link
Author

MkuuWaUjinga commented Nov 26, 2020

Thanks for your input. Both implementation proposals make sense to me. Though I find extending the Environment API more elegant I agree that it doesn't guarantee determinism in an off-policy setting. For that seeding has to happen as soon as the Environment subclass is created.
If we wanted to stick to extending the Environment API we could set the Space-seed when the action_space property is used for the first time. Or let users call env.seed() manually. But then I don't see a big difference to doing everything in __init__ already.

@MkuuWaUjinga
Copy link
Author

Just open a WiP-PR. We could also move discussions there if you feel its more convenient.

@MkuuWaUjinga
Copy link
Author

Hi @ryanjulian and @krzentner,
any update on this? Otherwise I would continue with extending the Environment API and add seeding for libraries other than Gym.

@ryanjulian
Copy link
Member

I left a couple comments on the PR, otherwise this looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants