Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for VecMonitor for gym3-style environments #302

Closed
1 task done
vwxyzjn opened this issue Jan 25, 2021 · 6 comments
Closed
1 task done
Labels
enhancement New feature or request

Comments

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Jan 25, 2021

🚀 Feature

Gym3-style environments such a procgen directly produces the vectorized environments, so there is no chance of passing a Monitor wrapper to the envs creation process. However, there still is a need to record the episodic stats. One potential way to do this is through the VecMonitor wrapper as done in openai/baselines

### Check list

  • I have checked that there is no similar issue in the repo (required)
@vwxyzjn vwxyzjn added the enhancement New feature or request label Jan 25, 2021
@araffin
Copy link
Member

araffin commented Jan 25, 2021

Hello,

I think VecMonitor would be a good addition, however, I'm not sure that this would make SB3 compatible with Gym3 by any mean...
Also, apply VecMonitor at that stage won't give the true reward if wrappers are used before.

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Jan 26, 2021

Hi Antonin,

Thanks for the prompt response.

I'm not sure that this would make SB3 compatible with Gym3 by any mean...

I think this addition should at least push SB3 closer to be compatible with Gym3 envs, that has an API that produces the SB3-style vectorized env. For example, in procgen, they used baselines to train and initialize the env in the following way

# https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/train_procgen/train.py#L36-L43
venv = ProcgenEnv(num_envs=num_envs, env_name=env_name, num_levels=num_levels, start_level=start_level, distribution_mode=distribution_mode)
venv = VecExtractDictObs(venv, "rgb")
venv = VecMonitor(
    venv=venv, filename=None, keep_buf=100,
)
venv = VecNormalize(venv=venv, ob=False)

apply VecMonitor at that stage won't give the true reward if wrappers are used before.

That's a great catch. As shown by the snippet above, the users should be quite careful about the order in which the vec wrapper is applied (e.g. VecMonitor should be applied before VecNormalize); same as how regular Monitor wrapper should be applied with care.

@araffin
Copy link
Member

araffin commented Jan 26, 2021

I think this addition should at least push SB3 closer to be compatible with Gym3 envs, that has an API that produces the SB3-style vectorized env. For example, in procgen, they used baselines to train and initialize the env in the following way

Interesting. Then it should work (but the env must be always wrapped with a VecEnvWrapper (VecMonitor should do the job) as we are doing internal checks and wrapping it automatically (in a DummyVecEnv) if needed, but here if we do that, it will fail).

@araffin
Copy link
Member

araffin commented Feb 1, 2021

In case it was not clear, you can go ahead and implement the VecMonitor, make sure to read the contributing guide ;)

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Feb 5, 2021

Hey sorry for the delay. I made a fork and added the changes but was having trouble setting up test cases; this could take some time. Additionally, I noticed the existing gym monitor class also has the feature to save a CSV; would this be a desired feature as well?

@araffin
Copy link
Member

araffin commented May 23, 2021

fixed in #311

@araffin araffin closed this as completed May 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants