Batch action_space in VectorEnv #2280

tristandeleu · 2021-07-31T18:58:01Z

Given the discussion in #2279, here is a proposal to have a batch action_space instead of a Tuple instance.

import gym
env = gym.vector.make('CartPole-v1', num_envs=5)
observations = env.reset()

print(env.action_space)
# Before: Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2))
# After: MultiDiscrete([2 2 2 2 2])

actions = np.array([1, 0, 0, 1, 1])
observations, rewards, dones, infos = env.step(actions)
print(f'Observations shape: {observations.shape}')
# Observations shape: (5, 4)

This handles any nested action space as well:

import gym
import numpy as np

from gym.spaces import Dict, Box, Discrete
from gym.vector import AsyncVectorEnv

class CustomEnv(gym.Env):
    observation_space = Box(low=0, high=255, shape=(84, 84), dtype=np.uint8)
    action_space = Dict({
        'fire': Discrete(2),
        'jump': Discrete(2),
        'move': Box(low=-1., high=1., shape=(2,), dtype=np.float32)
    })

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        # Do something with action['fire'], action['jump'], action['move']
        observation = self.observation_space.sample()
        return observation, 0., False, {}

def make_env():
    return CustomEnv()

env = AsyncVectorEnv([make_env for _ in range(5)])
print(f'Action space: {env.action_space}')
env.reset()

actions = {
    'fire': np.array([1, 0, 0, 1, 1]),
    'jump': np.array([0, 0, 1, 1, 0]),
    'move': np.random.rand(5, 2)
}
observations, rewards, dones, infos = env.step(actions)
print(f'Observations shape: {observations.shape}')

# Action space: Dict(fire:MultiDiscrete([2 2 2 2 2]), jump:MultiDiscrete([2 2 2 2 2]), move:Box(-1.0, 1.0, (5, 2), float32))
# Observations shape: (5, 84, 84)

Use batch_space instead of Tuple in VectorEnv
Add the iterate utility function to iterate over items from a (batch) space
Add tests
Check if the action_space are all the same in all sub-environments

lebrice · 2021-08-03T14:46:46Z

gym/vector/utils/spaces.py

+    >>> next(it)
+    StopIteration
+    """
+    if isinstance(space, _BaseGymSpaces):


Why not put the space as the first argument, and use a singledispatch callable here?
This would let users customize how this function should behave with their custom spaces, which isn't possible as it is.

You're also essentially doing the same thing as a singledispatch here, but worse! :P

It is definitely a prime candidate for singledispatch!
I was waiting for #2093, but I'll get the ball rolling and switch this function to singledispatch already.

lebrice · 2021-08-04T06:20:15Z

gym/vector/utils/spaces.py

+
+    Parameters
+    ----------
+    items : samples of `space`


Nit: those docstrings and the doctest below still have the "items, space" ordering

tristandeleu · 2021-09-10T17:56:54Z

What is the status of this PR? Any timeline for this PR getting merged?

tristandeleu · 2021-09-10T17:58:02Z

Pinging @jkterry1

jkterry1 · 2021-09-10T19:32:46Z

I'm just going to reply to all the notifications you sent in one place:

Everything regarding changes to the vector API is temporarily on hold until I get more time to develop a cohesive plan for it going forward. This is has been a lower priority for me than many other fixes to Gym at the moment. The documentation stuff isn't on hold per se, there's just been a long list of problems in getting the desired website fully operational and getting other parts of the text of the documentation written.

The status of all your other PRs is that I'm aware they exist and I need to go through them again myself for one reason or another and have not had the free time. Amongst other considerations, I'm working on 3 first author ICLR submissions. Gym wasn't maintained for years, things are going to take a little while to catch up.

tristandeleu · 2021-09-11T13:55:57Z

I see, perhaps you should get other maintainers on board to not be limited by the free time you can allocate to Gym, and so that you can offload some of those tasks.

For the particular case of this PR (to stay focused), these changes were discussed in #2279, and you were welcome to comment on those changes. You could rely on the community more instead of taking the responsibility to devise a plan by yourself if you are too busy.

tristandeleu · 2021-12-07T18:56:42Z

Any update on this PR? @jkterry1

vwxyzjn · 2021-12-09T01:51:23Z

The proposed changes make sense to me. I have a quick question.

print(env.action_space)
# Before: Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2))
# After: MultiDiscrete([2 2 2 2 2])

So what happened if you have Tuple(MultiDiscrete([2 2 2 2 2]), MultiDiscrete([2 2 2 2 2]))?

tristandeleu · 2021-12-09T02:05:31Z

You mean if the environment has action space Tuple(MultiDiscrete([2 2 2 2 2]), MultiDiscrete([2 2 2 2 2])) instead of Discrete(2)? Then it follows the rules of batch_space:

from gym.vector.utils.spaces import batch_space

space = Tuple((MultiDiscrete([2, 2, 2, 2, 2]), MultiDiscrete([2, 2, 2, 2, 2])))
batch_space(space, n=5)
# Tuple(Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64), Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64))

Just like for observation_space, any action_space can be batched if they are standard gym Space instances (i.e. Box, Discrete, Tuple, Dict, etc...).

vwxyzjn · 2021-12-09T02:12:20Z

No I meant if the envs.single_action_space is MultiDiscrete([2, 2, 2, 2, 2]), what is envs.action_space?

tristandeleu · 2021-12-09T02:15:48Z

Again, it follows the rules of batch_space

space = MultiDiscrete([2, 2, 2, 2, 2])
batch_space(space, n=5)
# Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64)

vwxyzjn · 2021-12-09T02:19:42Z

I see. Thank you. The API makes sense. LGTM @jkterry1.

tristandeleu · 2021-12-09T02:19:47Z

A more complete example:

class CustomEnv(gym.Env):
    observation_space = Box(low=0., high=1., shape=(2,), dtype=np.float32)
    action_space = MultiDiscrete([2, 2, 2, 2, 2])

env = AsyncVectorEnv([lambda: CustomEnv() for _ in range(5)])
print(env.observation_space)
# Box([[0. 0.]
#  [0. 0.]
#  [0. 0.]
#  [0. 0.]
#  [0. 0.]], [[1. 1.]
#  [1. 1.]
#  [1. 1.]
#  [1. 1.]
#  [1. 1.]], (5, 2), float32)

print(env.action_space)
# Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64)

tristandeleu force-pushed the task/vector-batch-action-space branch from 4b8dbb5 to 7c64777 Compare July 31, 2021 20:34

lebrice reviewed Aug 3, 2021

View reviewed changes

lebrice reviewed Aug 4, 2021

View reviewed changes

tristandeleu added 9 commits December 7, 2021 13:51

Batch the action space in VectorEnv and add iterate utility function

350cd33

Add tests for iterate

21cade4

Add tests for action spaces in SyncVectorEnv and AsyncVectorEnv

05db656

Black formatting

a654eb3

Use singledispatch for iterate utility function

6fe9f36

Update the ordering of the arguments in the docstring

809df2e

Fix ordering in docstring example of iterate

f97615a

Check for same action spaces in vectorized environments

8c0cda7

Separate Discrete from other space types in iterate singledispatch

7167c33

tristandeleu force-pushed the task/vector-batch-action-space branch from 6dad0da to 7167c33 Compare December 7, 2021 18:55

jkterry1 merged commit fbe3631 into openai:master Dec 9, 2021

vwxyzjn mentioned this pull request Mar 9, 2022

[Bug Report] VectorEnv action space seed problem #2680

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch action_space in VectorEnv #2280

Batch action_space in VectorEnv #2280

tristandeleu commented Jul 31, 2021 •

edited

Loading

lebrice Aug 3, 2021

lebrice Aug 3, 2021

tristandeleu Aug 3, 2021

lebrice Aug 4, 2021

tristandeleu commented Sep 10, 2021

tristandeleu commented Sep 10, 2021

jkterry1 commented Sep 10, 2021 •

edited

Loading

tristandeleu commented Sep 11, 2021

tristandeleu commented Dec 7, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

Batch action_space in VectorEnv #2280

Batch action_space in VectorEnv #2280

Conversation

tristandeleu commented Jul 31, 2021 • edited Loading

lebrice Aug 3, 2021

Choose a reason for hiding this comment

lebrice Aug 3, 2021

Choose a reason for hiding this comment

tristandeleu Aug 3, 2021

Choose a reason for hiding this comment

lebrice Aug 4, 2021

Choose a reason for hiding this comment

tristandeleu commented Sep 10, 2021

tristandeleu commented Sep 10, 2021

jkterry1 commented Sep 10, 2021 • edited Loading

tristandeleu commented Sep 11, 2021

tristandeleu commented Dec 7, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

vwxyzjn commented Dec 9, 2021

tristandeleu commented Dec 9, 2021

tristandeleu commented Jul 31, 2021 •

edited

Loading

jkterry1 commented Sep 10, 2021 •

edited

Loading