Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dict space support for VecEnv #207

Merged
merged 14 commits into from
Mar 8, 2019
Merged

Conversation

AdamGleave
Copy link
Collaborator

OpenAI added support for Dict observation spaces to DummyVecEnv and SubprocVecEnv after the fork. This PR cherry-picks these changes, while retaining the improvements made by Stable Baselines. This will be needed (amongst other changes) to support use cases such as #133.

OpenAI also added ShmemVecEnv, with common code factored out into util.py. I've retained the util.py as a separate file in case we wish to also port over ShmemVecEnv in the future.

@araffin
Copy link
Collaborator

araffin commented Feb 18, 2019

Hello,
Thanks for this pr. Before any review, please read "CONTRIBUTING.md" carefully ;)

@AdamGleave
Copy link
Collaborator Author

Thanks @araffin. I think the only things I missed were updating the change log (done now) and creating an accompanying issue for the PR (closes #208). Let me know if there are any changes you'd like to see.

There's also a case for adding Tuple support, as discussed in the issue; let me know if you'd like to see this and I can add it as well.

@araffin
Copy link
Collaborator

araffin commented Feb 18, 2019

I think the only things I missed

Yes for the changelog, but I was more concern about the import and the documentation of each function (and the doc style).

@AdamGleave
Copy link
Collaborator Author

Thanks for the comments. Re: the imports, I assume the issue is with the relative import style. This was already present in DummyVecEnv so I did not want to change it, but have now switched to an absolute import style.

I've expanded on the documentation for the functions added and made them a consistent style. Happy to elaborate on this more but the functions added in the PR are all module-private so documentation doesn't seem as important as if part of public API.

@araffin
Copy link
Collaborator

araffin commented Feb 19, 2019

I should have been more explicit, relative import are fine, i was refering to the import collection that was not at the top of the file.
Sure, go ahead if you can also add support for tuple =)!
Every function documentation matter ;)
EDIT: i will try to do a full review this week, but not sure to have the time.

@AdamGleave
Copy link
Collaborator Author

I've added support for Tuple observations, ready for review.

One area I'd appreciate feedback is what axis ordering we want for actions. Note there are two axes in the observations/actions for Tuple and Dict spaces in VecEnv environments: the index into the tuple or dict, and the index of the environment.

For observations, the first axis indexes into the tuple or dict, and the environment id second. This follows the convention used in OpenAI master, and I think is generally a reasonable design choice: e.g. one might want to feed a dict of observations of differently shaped values into different placeholders in a TensorFlow graph.

However, I'm less sure what the right choice is for actions. Right now, step_async follows the opposite convention: first index is environment, second index is into tuple or dict. This is the same convention as in OpenAI master, and does have its advantages: e.g. random sampling is easy, [action_space.sample() for _ in range(num_envs)]. However, I dislike the inconsistency, and for my own application (multi-agent RL, where each tuple index corresponds to a different agent) it would be more convenient to have the tuple or dict axis come first.

Converting between these two conventions is just a transpose operation, so the user can always change it, but it'd be good to get the default right.

@araffin araffin requested review from hill-a, araffin and ernestum and removed request for hill-a February 22, 2019 13:13
@araffin
Copy link
Collaborator

araffin commented Feb 26, 2019

Hello,

I did a quick test today and found a minor inconsistency:

  • the implementation seems inconsistent between DummyVecEnv and SubprocVecEnv: one requires an ordered dict while the other is ok with a normal dict. Also, that is a problem because it just crashes if this requirement is not satisfied.

It also seems that the doc needs to be updated (+ adding a note saying that currently no model supports those spaces).

I'll try to have a deeper look in the coming days but I'm quite busy right now...

PS: please add your name at the bottom of the changelog too ;)

@AdamGleave
Copy link
Collaborator Author

Thanks for the review and catching the error with dict v.s. OrderedDict. Gym is a little unclear on this point: the Dict space has spaces be an OrderedDict and sample always returns an OrderedDict, but contains accepts any dict instance (and some Gym environments return unordered dicts as observations). Following Postel's law, I've modified the VecEnv classes to accept unordered dict as observations, but to always return an OrderedDict to the user. I've also added a test case to verify this behaviour.

What docs would you like me to update? I believe the VecEnv classes should now support all observation spaces supported in Gym; do you think this should be stated explicitly (currently the docs do not mention anything regarding supported observation spaces)?

@araffin
Copy link
Collaborator

araffin commented Feb 28, 2019

What docs would you like me to update?

I would say this one.

do you think this should be stated explicitly (currently the docs do not mention anything regarding supported observation spaces)?

yes, that what I meant for updating the doc. Please add a note about what is supported by the vec env.
For the algorithm, we already have the table in the doc and a more complete one in the readme.

Copy link
Collaborator

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes required.
For your question about the order (key of the dict first or env idx first), I need to think a bit more about that.

stable_baselines/common/vec_env/util.py Outdated Show resolved Hide resolved
stable_baselines/common/vec_env/util.py Outdated Show resolved Hide resolved
stable_baselines/common/vec_env/subproc_vec_env.py Outdated Show resolved Hide resolved
stable_baselines/common/vec_env/subproc_vec_env.py Outdated Show resolved Hide resolved
@AdamGleave
Copy link
Collaborator Author

Thanks for your comments. I've added documentation on supported spaces to the vector environment page, and also clarified on the RL algorithms page that the algorithms do not currently support Dict and Tuple spaces. I believe I've addressed the comments you made on the code. I also realised while writing the documentation I was missing test cases for some of the more obscure spaces (e.g. MultiBinary) so added those.

docs/guide/algos.rst Outdated Show resolved Hide resolved
docs/guide/vec_envs.rst Outdated Show resolved Hide resolved
araffin
araffin previously approved these changes Mar 8, 2019
@araffin
Copy link
Collaborator

araffin commented Mar 8, 2019

Thank you very much for your good PR.
Just minor details in the doc, it will be merged after ;)

@AdamGleave
Copy link
Collaborator Author

I've made the suggested improvements to the documentation, thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants