You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The VectorizeAction wrapper is not correctly implemented.
The problem lies in how self.out is initialized, how iterate is called and how concatenate is called. There is a confusion between what should be self.action_space, self.single_action_space, self.env.action_space and self.env.single_action_space.
Cf. example/test below.
I did not submit a PR because the contributing guidelines mention that you have to run the test suite, but the jax functional test seem to run forever on my machine. PR is ready though, let me know if you want me to submit it.
Proposed solution:
diff --git a/gymnasium/wrappers/vector/vectorize_action.py b/gymnasium/wrappers/vector/vectorize_action.py
index 3dc4a797..89677da0 100644
--- a/gymnasium/wrappers/vector/vectorize_action.py
+++ b/gymnasium/wrappers/vector/vectorize_action.py
@@ -138,7 +138,7 @@ class VectorizeTransformAction(VectorActionWrapper):
self.action_space = batch_space(self.single_action_space, self.num_envs)
self.same_out = self.action_space == self.env.action_space
- self.out = create_empty_array(self.single_action_space, self.num_envs)
+ self.out = create_empty_array(self.env.single_action_space, self.num_envs)
def actions(self, actions: ActType) -> ActType:
"""Applies the wrapper to each of the action.
@@ -154,17 +154,17 @@ class VectorizeTransformAction(VectorActionWrapper):
self.single_action_space,
tuple(
self.wrapper.func(action)
- for action in iterate(self.action_space, actions)
+ for action in iterate(self.single_action_space, actions)
),
actions,
)
else:
return deepcopy(
concatenate(
- self.single_action_space,
+ self.env.single_action_space,
tuple(
self.wrapper.func(action)
- for action in iterate(self.env.action_space, actions)
+ for action in iterate(self.single_action_space, actions)
),
self.out,
)
Code example
"""Test suite for VectorizeAction wrapper."""
import numpy as np
import gymnasium as gym
from gymnasium import spaces
from gymnasium.vector import SyncVectorEnv
from tests.testing_env import GenericTestEnv
def create_env():
return GenericTestEnv(
action_space=spaces.Box(
low=0.0,
high=1.0,
shape=(1,),
dtype=np.float32
)
)
def test_vectorize_box_to_dict_action():
def func(x): return x["key"]
env = SyncVectorEnv([create_env for_in range(2)])
env = gym.wrappers.vector.VectorizeTransformAction(
env=env,
wrapper=gym.wrappers.TransformAction,
func=func,
action_space=gym.spaces.Dict(
{
"key": gym.spaces.Box(
low=0.0,
high=1.0,
shape=(1,),
dtype=np.float32
)
}
)
)
env.reset()
# Check that func transforms the outer env action into the inner env action
assert func(
env.single_action_space.sample()
) in env.env.single_action_space
# Check that the VectorizeWrapper correctly vectorizes func
assert env.actions(env.action_space.sample()) in env.env.action_space
System info
installed form source with pip install -e
version 1.0.0
Debian 12
Python 3.11.2
Additional context
No response
Checklist
I have checked that there is no similar issue in the repo
The text was updated successfully, but these errors were encountered:
In the message of this commit, there is some reasoning about which action space should be used where. Maybe that helps identifying issues.
I think the current implementation was adapted from the VectorizeObservation wrapper. But TransformObservation and TransformAction work in an opposite direction. TransformObservation wants a func that transforms an inner obs to an outer obs, TransformAction wants a func that transforms an outer action to an inner action. Hence the reversal of self.env.x_space and self.x_space in the VectorizeObservation and VectorizeAction wrappers.
Yes, in short, the problem was that when I was implementing it I was thinking about the data "moving" in the same direction of observation rather than the opposite.
I'm adding some more testing but yes, I think the primary issue was the out was using self.single_action_space not self.env.single_action_space as you corrected.
Also looking at the actions function, the two cases are identical except for the concatenate output which should be actions and out respectively.
Currently, I'm adding more testing for this to doubly check this.
Describe the bug
The
VectorizeAction
wrapper is not correctly implemented.The problem lies in how
self.out
is initialized, howiterate
is called and howconcatenate
is called. There is a confusion between what should beself.action_space
,self.single_action_space
,self.env.action_space
andself.env.single_action_space
.Cf. example/test below.
I did not submit a PR because the contributing guidelines mention that you have to run the test suite, but the jax functional test seem to run forever on my machine. PR is ready though, let me know if you want me to submit it.
Proposed solution:
Code example
System info
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: