Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch action_space in VectorEnv #2280

Merged
merged 9 commits into from
Dec 9, 2021

Conversation

tristandeleu
Copy link
Contributor

@tristandeleu tristandeleu commented Jul 31, 2021

Given the discussion in #2279, here is a proposal to have a batch action_space instead of a Tuple instance.

import gym
env = gym.vector.make('CartPole-v1', num_envs=5)
observations = env.reset()

print(env.action_space)
# Before: Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2))
# After: MultiDiscrete([2 2 2 2 2])

actions = np.array([1, 0, 0, 1, 1])
observations, rewards, dones, infos = env.step(actions)
print(f'Observations shape: {observations.shape}')
# Observations shape: (5, 4)

This handles any nested action space as well:

import gym
import numpy as np

from gym.spaces import Dict, Box, Discrete
from gym.vector import AsyncVectorEnv

class CustomEnv(gym.Env):
    observation_space = Box(low=0, high=255, shape=(84, 84), dtype=np.uint8)
    action_space = Dict({
        'fire': Discrete(2),
        'jump': Discrete(2),
        'move': Box(low=-1., high=1., shape=(2,), dtype=np.float32)
    })

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        # Do something with action['fire'], action['jump'], action['move']
        observation = self.observation_space.sample()
        return observation, 0., False, {}

def make_env():
    return CustomEnv()

env = AsyncVectorEnv([make_env for _ in range(5)])
print(f'Action space: {env.action_space}')
env.reset()

actions = {
    'fire': np.array([1, 0, 0, 1, 1]),
    'jump': np.array([0, 0, 1, 1, 0]),
    'move': np.random.rand(5, 2)
}
observations, rewards, dones, infos = env.step(actions)
print(f'Observations shape: {observations.shape}')

# Action space: Dict(fire:MultiDiscrete([2 2 2 2 2]), jump:MultiDiscrete([2 2 2 2 2]), move:Box(-1.0, 1.0, (5, 2), float32))
# Observations shape: (5, 84, 84)
  • Use batch_space instead of Tuple in VectorEnv
  • Add the iterate utility function to iterate over items from a (batch) space
  • Add tests
  • Check if the action_space are all the same in all sub-environments

@tristandeleu tristandeleu force-pushed the task/vector-batch-action-space branch from 4b8dbb5 to 7c64777 Compare July 31, 2021 20:34
>>> next(it)
StopIteration
"""
if isinstance(space, _BaseGymSpaces):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the space as the first argument, and use a singledispatch callable here?
This would let users customize how this function should behave with their custom spaces, which isn't possible as it is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're also essentially doing the same thing as a singledispatch here, but worse! :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely a prime candidate for singledispatch!
I was waiting for #2093, but I'll get the ball rolling and switch this function to singledispatch already.


Parameters
----------
items : samples of `space`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: those docstrings and the doctest below still have the "items, space" ordering

@tristandeleu
Copy link
Contributor Author

What is the status of this PR? Any timeline for this PR getting merged?

@tristandeleu
Copy link
Contributor Author

Pinging @jkterry1

@jkterry1
Copy link
Collaborator

jkterry1 commented Sep 10, 2021

I'm just going to reply to all the notifications you sent in one place:

Everything regarding changes to the vector API is temporarily on hold until I get more time to develop a cohesive plan for it going forward. This is has been a lower priority for me than many other fixes to Gym at the moment. The documentation stuff isn't on hold per se, there's just been a long list of problems in getting the desired website fully operational and getting other parts of the text of the documentation written.

The status of all your other PRs is that I'm aware they exist and I need to go through them again myself for one reason or another and have not had the free time. Amongst other considerations, I'm working on 3 first author ICLR submissions. Gym wasn't maintained for years, things are going to take a little while to catch up.

@tristandeleu
Copy link
Contributor Author

I see, perhaps you should get other maintainers on board to not be limited by the free time you can allocate to Gym, and so that you can offload some of those tasks.

For the particular case of this PR (to stay focused), these changes were discussed in #2279, and you were welcome to comment on those changes. You could rely on the community more instead of taking the responsibility to devise a plan by yourself if you are too busy.

@tristandeleu tristandeleu force-pushed the task/vector-batch-action-space branch from 6dad0da to 7167c33 Compare December 7, 2021 18:55
@tristandeleu
Copy link
Contributor Author

Any update on this PR? @jkterry1

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Dec 9, 2021

The proposed changes make sense to me. I have a quick question.

print(env.action_space)
# Before: Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2))
# After: MultiDiscrete([2 2 2 2 2])

So what happened if you have Tuple(MultiDiscrete([2 2 2 2 2]), MultiDiscrete([2 2 2 2 2]))?

@tristandeleu
Copy link
Contributor Author

You mean if the environment has action space Tuple(MultiDiscrete([2 2 2 2 2]), MultiDiscrete([2 2 2 2 2])) instead of Discrete(2)? Then it follows the rules of batch_space:

from gym.vector.utils.spaces import batch_space

space = Tuple((MultiDiscrete([2, 2, 2, 2, 2]), MultiDiscrete([2, 2, 2, 2, 2])))
batch_space(space, n=5)
# Tuple(Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64), Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64))

Just like for observation_space, any action_space can be batched if they are standard gym Space instances (i.e. Box, Discrete, Tuple, Dict, etc...).

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Dec 9, 2021

No I meant if the envs.single_action_space is MultiDiscrete([2, 2, 2, 2, 2]), what is envs.action_space?

@tristandeleu
Copy link
Contributor Author

Again, it follows the rules of batch_space

space = MultiDiscrete([2, 2, 2, 2, 2])
batch_space(space, n=5)
# Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64)

@vwxyzjn
Copy link
Contributor

vwxyzjn commented Dec 9, 2021

I see. Thank you. The API makes sense. LGTM @jkterry1.

@tristandeleu
Copy link
Contributor Author

A more complete example:

class CustomEnv(gym.Env):
    observation_space = Box(low=0., high=1., shape=(2,), dtype=np.float32)
    action_space = MultiDiscrete([2, 2, 2, 2, 2])

env = AsyncVectorEnv([lambda: CustomEnv() for _ in range(5)])
print(env.observation_space)
# Box([[0. 0.]
#  [0. 0.]
#  [0. 0.]
#  [0. 0.]
#  [0. 0.]], [[1. 1.]
#  [1. 1.]
#  [1. 1.]
#  [1. 1.]
#  [1. 1.]], (5, 2), float32)

print(env.action_space)
# Box([[0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]
#  [0 0 0 0 0]], [[1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]
#  [1 1 1 1 1]], (5, 5), int64)

@jkterry1 jkterry1 merged commit fbe3631 into openai:master Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants