Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for VecMonitor for gym3-style environments #311

Merged
merged 35 commits into from
Apr 13, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
333482b
add vectorized monitor
vwxyzjn Jan 25, 2021
7553232
auto format of the code
vwxyzjn Jan 25, 2021
287ca33
add documentation and VecExtractDictObs
vwxyzjn Feb 5, 2021
621e8cf
Merge branch 'master' into master
araffin Feb 5, 2021
d698499
refactor and add test cases
vwxyzjn Feb 6, 2021
e5d4636
Merge branch 'master' of https://github.com/vwxyzjn/stable-baselines3
vwxyzjn Feb 6, 2021
dda2436
Merge branch 'master' into master
araffin Feb 6, 2021
696af57
add test cases and format
vwxyzjn Feb 7, 2021
65e7a0f
Merge branch 'master' of https://github.com/vwxyzjn/stable-baselines3
vwxyzjn Feb 7, 2021
23d0e69
avoid circular import and fix doc
vwxyzjn Feb 7, 2021
cc6cbc9
fix type
vwxyzjn Feb 7, 2021
aa0e400
fix type
vwxyzjn Feb 7, 2021
1c7bf32
oops
vwxyzjn Feb 7, 2021
90601fe
Merge branch 'master' into master
araffin Feb 27, 2021
4a85d21
Merge branch 'master' into master
araffin Mar 1, 2021
99229e2
Update stable_baselines3/common/monitor.py
vwxyzjn Mar 1, 2021
a64bf95
Update stable_baselines3/common/monitor.py
vwxyzjn Mar 1, 2021
59781fb
add test cases
vwxyzjn Mar 1, 2021
99022b5
update changelog
vwxyzjn Mar 1, 2021
fa09feb
Merge branch 'master' into master
araffin Mar 5, 2021
b48e956
Merge branch 'master' into master
araffin Mar 6, 2021
c07637a
fix mutable argument
vwxyzjn Mar 7, 2021
91a2fcd
quick fix
vwxyzjn Mar 7, 2021
cfbadbb
Apply suggestions from code review
araffin Mar 8, 2021
52f803b
Merge branch 'master' into master
araffin Mar 17, 2021
cfbb5f0
Merge branch 'master' into master
araffin Apr 4, 2021
4500ec9
fix terminal observation for gym3 envs
vwxyzjn Apr 6, 2021
723224b
delete comment
vwxyzjn Apr 6, 2021
153afa7
Merge branch 'master' into master
araffin Apr 13, 2021
df8e27a
Update doc and bump version
araffin Apr 13, 2021
ca56818
Add warning when already using `Monitor` wrapper
araffin Apr 13, 2021
fcc8609
Update vecmonitor tests
araffin Apr 13, 2021
e59d82c
Merge pull request #1 from vwxyzjn/vwxyzjn/master
araffin Apr 13, 2021
6a58cb7
Fixes
araffin Apr 13, 2021
bdd8b19
Merge pull request #2 from vwxyzjn/vwxyzjn/master
araffin Apr 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/guide/vec_envs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,15 @@ VecTransposeImage

.. autoclass:: VecTransposeImage
:members:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a general remark, as mentioned in the PR template and contributing guide, please update the changelog accordingly too

VecMonitor
~~~~~~~~~~~~~~~~~

.. autoclass:: VecMonitor
:members:

VecExtractDictObs
~~~~~~~~~~~~~~~~~

.. autoclass:: VecExtractDictObs
:members:
2 changes: 2 additions & 0 deletions stable_baselines3/common/vec_env/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
from stable_baselines3.common.vec_env.dummy_vec_env import DummyVecEnv
from stable_baselines3.common.vec_env.subproc_vec_env import SubprocVecEnv
from stable_baselines3.common.vec_env.vec_check_nan import VecCheckNan
from stable_baselines3.common.vec_env.vec_extract_dict_obs import VecExtractDictObs
from stable_baselines3.common.vec_env.vec_frame_stack import VecFrameStack
from stable_baselines3.common.vec_env.vec_monitor import VecMonitor
from stable_baselines3.common.vec_env.vec_normalize import VecNormalize
from stable_baselines3.common.vec_env.vec_transpose import VecTransposeImage
from stable_baselines3.common.vec_env.vec_video_recorder import VecVideoRecorder
Expand Down
26 changes: 26 additions & 0 deletions stable_baselines3/common/vec_env/vec_extract_dict_obs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import time

import numpy as np

from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvStepReturn, VecEnvWrapper


class VecExtractDictObs(VecEnvWrapper):
"""
A vectorized monitor wrapper for extracting dictionary observations.
araffin marked this conversation as resolved.
Show resolved Hide resolved

:param venv: The vectorized environment
:param key: The key of the dictionary observation
"""

def __init__(self, venv: VecEnv, key: str):
self.key = key
super().__init__(venv=venv, observation_space=venv.observation_space.spaces[self.key])

def reset(self) -> np.ndarray:
obs = self.venv.reset()
return obs[self.key]

def step_wait(self) -> VecEnvStepReturn:
obs, reward, done, info = self.venv.step_wait()
return obs[self.key], reward, done, info
72 changes: 72 additions & 0 deletions stable_baselines3/common/vec_env/vec_monitor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import time

import numpy as np

from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvObs, VecEnvStepReturn, VecEnvWrapper


class VecMonitor(VecEnvWrapper):
"""
A vectorized monitor wrapper for *vectorized* Gym environments, it is used to record the episode reward, length, time and other data.

Some environments like [`openai/procgen`](https://github.com/openai/procgen) or [`gym3`](https://github.com/openai/gym3) directly
araffin marked this conversation as resolved.
Show resolved Hide resolved
initialize the vectorized environments, without giving us a chance to use the `Monitor` wrapper. So this class simply does the job
of the `Monitor` wrapper on a vectorized level.

As an example, the following two ways of initializing vectorized envs should be equivalent

```python
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv
import gym
def make_env(gym_id):
def thunk():
env = gym.make(gym_id, render_mode='rgb_array')
env = Monitor(env)
return env
return thunk
envs = DummyVecEnv([make_env('procgen-starpilot-v0')])
```

```python
from procgen import ProcgenEnv
from stable_baselines3.common.vec_env import VecExtractDictObs, VecMonitor
venv = ProcgenEnv(num_envs=1, env_name='starpilot')
venv = VecExtractDictObs(venv, "rgb")
venv = VecMonitor(venv=venv)
```
See [here](https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/train_procgen/train.py#L36-L43) for a full example.

:param venv: The vectorized environment
"""

def __init__(self, venv: VecEnv):
VecEnvWrapper.__init__(self, venv)
self.eprets = None
araffin marked this conversation as resolved.
Show resolved Hide resolved
self.eplens = None
self.epcount = 0
self.tstart = time.time()

def reset(self) -> VecEnvObs:
obs = self.venv.reset()
self.eprets = np.zeros(self.num_envs, "f")
araffin marked this conversation as resolved.
Show resolved Hide resolved
self.eplens = np.zeros(self.num_envs, "i")
return obs

def step_wait(self) -> VecEnvStepReturn:
obs, rews, dones, infos = self.venv.step_wait()
araffin marked this conversation as resolved.
Show resolved Hide resolved
self.eprets += rews
self.eplens += 1
newinfos = list(infos[:])
for i in range(len(dones)):
if dones[i]:
info = infos[i].copy()
ret = self.eprets[i]
eplen = self.eplens[i]
epinfo = {"r": ret, "l": eplen, "t": round(time.time() - self.tstart, 6)}
info["episode"] = epinfo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably check at the very beginning if a monitor wrapper is already present or not (cf what we do with evaluate_policy) otherwise it would overwrite it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to do the check exactly...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.epcount += 1
self.eprets[i] = 0
self.eplens[i] = 0
newinfos[i] = info
return obs, rews, dones, newinfos