Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

macciocu · 2023-09-24T12:19:54Z

🐛 Bug

There seems to be an incompatibility in the expected gym's Env.reset return format, when using a custom environment. Note this problem only occurs when using a custom observation space of non (2,) dimension.

See the code example, which provides two different Env.reset return formats, one which is the required one according to the gymnasium. documentation (i.e. the format is equal to return in the step method), and results in an error during model training step (see stracktrace, error below .. Format A; and an adjusted with which we can succesfully complete the training, but then get an error during the model prediction (see stractrace, error below .. Format B).

I'm able to work around the problem by making the following adjustements in the stable_baselines3, file common/vec_env/dummy_vec_env.py, and using My.reset method format A.

Class DummyVecEnv(VecEnv):
    ..
    ...
    def step_wait(self) -> VecEnvStepReturn:
        ..
        ...
        if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]["terminal_observation"] = obs
                # NOTE BUG (gmacciocu - 24-10-2023)
                # Results in Crash when using non custom (2,) observation space dimensions,
                # as the expected return format here, is incopatible with the format expected
                # observation to tenstor transformation (policies.py::obs_to_tenstor)
                # (original) `obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()`
                obs = self.envs[env_idx].reset()

    def reset(self) -> VecEnvObs:
        for env_idx in range(self.num_envs):
            # NOTE BUG (gmacciocu - 24-10-2023)
            # Results in Crash when using non custom (2,) observation space dimensions,
            # as the expected return format here, is incopatible with the format expected
            # observation to tenstor transformation (policies.py::obs_to_tenstor)
            # (original) `obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])`
            obs = self.envs[env_idx].reset(seed=self._seeds[env_idx])
            self._save_obs(env_idx, obs)

With the adjustements above I can succesfully perfom model training and predictions. This doesn't seem to be a feasible long-term solution though, as there is a deeper underlying problem which needs to be fixed.

Code example

# native
from typing import Callable, Optional

# installed
import numpy as np
from gymnasium import Env
from gymnasium.core import ObsType, ActType
from gymnasium.spaces import Box
from stable_baselines3 import PPO
from gymnasium import Env
from stable_baselines3.common.env_checker import check_env

class MyEnv(Env):
    def __init__(self): 
        self.observation_space = Box(low=-np.inf, high=np.inf, shape=(5, ))
        self.action_space = Box(low=-1.0, high=1.0, shape=tuple())
    
    def step(self, action: ActType) -> (ObsType, float, float, bool):
        return (np.array([0, 0, 0, 0 , 0], dtype=np.float32), 1, True, False, {})
    
   # Format A (The required return format according to gym's Env. base class)
   # Causes crash during model training (see bug report for details)
    def reset(
        self,
        *, 
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None) -> tuple[ObsType, dict[str, any]]
    ):
        return (np.array([0.0, 0.0, 0.0, 0, 0], dtype=np.float32), {})

   # Format B
   # Causes crash during model predict step (see bug report for details)
    def reset(
        self,
        *, 
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None) -> tuple[ObsType, dict[str, any]]
    ):
        return np.array([0.0, 0.0, 0.0, 0, 0])


env = MyEnv()
# no warnings for format A
# this line must be disabled in order to be able to use format B (i.e. in order to use the workaround in the issue description)
check_env(env) 

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000, progress_bar=True)
model.save("x")

env.reset()
model.predict(obs)

Relevant log output / Error message

***************************************************
*** Stack trace when using MyEnv.reset format A ***
***************************************************

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/.../main.py", line 121, in <module>
    train(args.model, args.steps, args.interval)
  File "/.../main.py", line 21, in train
    model.learn(total_timesteps=max_timesteps, progress_bar=True)
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/ppo/ppo.py", line 308, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 246, in learn
    total_timesteps, callback = self._setup_learn(
                                ^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 424, in _setup_learn
    self._last_obs = self.env.reset()  # type: ignore[assignment]
                     ^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 78, in reset
    obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

***************************************************
*** Stack trace when using MyEnv.reset format B ***
***************************************************

Traceback (most recent call last):
  File "/.../main.py", line 121, in <module>
    train(args.model, args.steps, args.interval)
  File "/.../main.py", line 35, in train
    action, _ = model.predict(obs)
                ^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 555, in predict
    return self.policy.predict(observation, state, episode_start, deterministic)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/policies.py", line 346, in predict
    observation, vectorized_env = self.obs_to_tensor(observation)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/policies.py", line 260, in obs_to_tensor
    observation = np.array(observation)
                  ^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

System Info

Libraries are installed with poetry, from the following pyproject.toml configuration:

[tool.poetry]
name = "x"
version = "0.1.0"

[tool.poetry.dependencies]
python = ">=3.11,<3.12"
gymnasium = "^0.29.1"
torch = {version = "^2.0.1+cpu", source = "pytorch"}
pydantic = "^2.3.0"
numpy = "^1.26.0"
stable-baselines3 = {extras = ["extra"], version = "^2.1.0"}

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.2"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
priority = "supplemental"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I have checked my env using the env checker
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2023-09-24T16:19:38Z

Hello,
I guess there must be a confusion between gym API and VecEnv API (see our doc, we highlight the differences).
The provided code is not enough to reproduce the error (obs is not defined), but the following is working without any issue:

# native
from typing import Optional

# installed
import numpy as np
from gymnasium import Env
from gymnasium.core import ActType, ObsType
from gymnasium.spaces import Box

from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env


class MyEnv(Env):
    def __init__(self):
        self.observation_space = Box(low=-np.inf, high=np.inf, shape=(5,))
        self.action_space = Box(low=-1.0, high=1.0, shape=tuple())

    def step(self, action: ActType) -> (ObsType, float, float, bool):
        return (np.array([0, 0, 0, 0, 0], dtype=np.float32), 1, True, False, {})

    def reset(
        self,
        *,
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None,
    ) -> tuple[ObsType, dict[str, any]]:
        return (np.array([0.0, 0.0, 0.0, 0, 0], dtype=np.float32), {})


env = MyEnv()
check_env(env)  # does not output any warnings

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=5_000)

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

the important part is here:

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

Probably a duplicate of #1637 (comment) and #1661

macciocu · 2023-09-24T22:51:25Z

Thank you very much @araffin! This indeed does the trick.

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

FYI I was taking obs as follows, from obs = env.reset(), with env being the instance from the custom environment.

macciocu added the custom gym env Issue related to Custom Gym Env label Sep 24, 2023

araffin added check the checklist You have checked the required items in the checklist but you didn't do what is written... and removed check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Sep 24, 2023

araffin added duplicate This issue or pull request already exists check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Sep 24, 2023

araffin mentioned this issue Sep 25, 2023

Add check for common mistake when mixing Gym/VecEnv API #1696

Merged

16 tasks

araffin closed this as completed in #1696 Sep 25, 2023

Hijackoo mentioned this issue Dec 7, 2023

AttributeError: 'NoneType' object has no attribute 'reset' #1774

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

macciocu commented Sep 24, 2023 •

edited

Loading

araffin commented Sep 24, 2023 •

edited

Loading

macciocu commented Sep 24, 2023 •

edited

Loading

Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

Comments

macciocu commented Sep 24, 2023 • edited Loading

🐛 Bug

Code example

Relevant log output / Error message

System Info

Checklist

araffin commented Sep 24, 2023 • edited Loading

macciocu commented Sep 24, 2023 • edited Loading

macciocu commented Sep 24, 2023 •

edited

Loading

araffin commented Sep 24, 2023 •

edited

Loading

macciocu commented Sep 24, 2023 •

edited

Loading