Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inompatible return format; gym's Env.reset method when using custom env. with non (2,) dimensions for the observation space. #1694

Closed
5 tasks done
macciocu opened this issue Sep 24, 2023 · 2 comments · Fixed by #1696
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... custom gym env Issue related to Custom Gym Env duplicate This issue or pull request already exists

Comments

@macciocu
Copy link

macciocu commented Sep 24, 2023

🐛 Bug

There seems to be an incompatibility in the expected gym's Env.reset return format, when using a custom environment. Note this problem only occurs when using a custom observation space of non (2,) dimension.

See the code example, which provides two different Env.reset return formats, one which is the required one according to the gymnasium. documentation (i.e. the format is equal to return in the step method), and results in an error during model training step (see stracktrace, error below .. Format A; and an adjusted with which we can succesfully complete the training, but then get an error during the model prediction (see stractrace, error below .. Format B).

I'm able to work around the problem by making the following adjustements in the stable_baselines3, file common/vec_env/dummy_vec_env.py, and using My.reset method format A.

Class DummyVecEnv(VecEnv):
    ..
    ...
    def step_wait(self) -> VecEnvStepReturn:
        ..
        ...
        if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]["terminal_observation"] = obs
                # NOTE BUG (gmacciocu - 24-10-2023)
                # Results in Crash when using non custom (2,) observation space dimensions,
                # as the expected return format here, is incopatible with the format expected
                # observation to tenstor transformation (policies.py::obs_to_tenstor)
                # (original) `obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()`
                obs = self.envs[env_idx].reset()

    def reset(self) -> VecEnvObs:
        for env_idx in range(self.num_envs):
            # NOTE BUG (gmacciocu - 24-10-2023)
            # Results in Crash when using non custom (2,) observation space dimensions,
            # as the expected return format here, is incopatible with the format expected
            # observation to tenstor transformation (policies.py::obs_to_tenstor)
            # (original) `obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])`
            obs = self.envs[env_idx].reset(seed=self._seeds[env_idx])
            self._save_obs(env_idx, obs)

With the adjustements above I can succesfully perfom model training and predictions. This doesn't seem to be a feasible long-term solution though, as there is a deeper underlying problem which needs to be fixed.

Code example

# native
from typing import Callable, Optional

# installed
import numpy as np
from gymnasium import Env
from gymnasium.core import ObsType, ActType
from gymnasium.spaces import Box
from stable_baselines3 import PPO
from gymnasium import Env
from stable_baselines3.common.env_checker import check_env

class MyEnv(Env):
    def __init__(self): 
        self.observation_space = Box(low=-np.inf, high=np.inf, shape=(5, ))
        self.action_space = Box(low=-1.0, high=1.0, shape=tuple())
    
    def step(self, action: ActType) -> (ObsType, float, float, bool):
        return (np.array([0, 0, 0, 0 , 0], dtype=np.float32), 1, True, False, {})
    
   # Format A (The required return format according to gym's Env. base class)
   # Causes crash during model training (see bug report for details)
    def reset(
        self,
        *, 
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None) -> tuple[ObsType, dict[str, any]]
    ):
        return (np.array([0.0, 0.0, 0.0, 0, 0], dtype=np.float32), {})

   # Format B
   # Causes crash during model predict step (see bug report for details)
    def reset(
        self,
        *, 
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None) -> tuple[ObsType, dict[str, any]]
    ):
        return np.array([0.0, 0.0, 0.0, 0, 0])


env = MyEnv()
# no warnings for format A
# this line must be disabled in order to be able to use format B (i.e. in order to use the workaround in the issue description)
check_env(env) 

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000, progress_bar=True)
model.save("x")

env.reset()
model.predict(obs)

Relevant log output / Error message

***************************************************
*** Stack trace when using MyEnv.reset format A ***
***************************************************

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/.../main.py", line 121, in <module>
    train(args.model, args.steps, args.interval)
  File "/.../main.py", line 21, in train
    model.learn(total_timesteps=max_timesteps, progress_bar=True)
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/ppo/ppo.py", line 308, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 246, in learn
    total_timesteps, callback = self._setup_learn(
                                ^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 424, in _setup_learn
    self._last_obs = self.env.reset()  # type: ignore[assignment]
                     ^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 78, in reset
    obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

***************************************************
*** Stack trace when using MyEnv.reset format B ***
***************************************************

Traceback (most recent call last):
  File "/.../main.py", line 121, in <module>
    train(args.model, args.steps, args.interval)
  File "/.../main.py", line 35, in train
    action, _ = model.predict(obs)
                ^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 555, in predict
    return self.policy.predict(observation, state, episode_start, deterministic)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/policies.py", line 346, in predict
    observation, vectorized_env = self.obs_to_tensor(observation)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.11/site-packages/stable_baselines3/common/policies.py", line 260, in obs_to_tensor
    observation = np.array(observation)
                  ^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

System Info

Libraries are installed with poetry, from the following pyproject.toml configuration:

[tool.poetry]
name = "x"
version = "0.1.0"

[tool.poetry.dependencies]
python = ">=3.11,<3.12"
gymnasium = "^0.29.1"
torch = {version = "^2.0.1+cpu", source = "pytorch"}
pydantic = "^2.3.0"
numpy = "^1.26.0"
stable-baselines3 = {extras = ["extra"], version = "^2.1.0"}

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.2"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
priority = "supplemental"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Checklist

@macciocu macciocu added the custom gym env Issue related to Custom Gym Env label Sep 24, 2023
@araffin araffin added check the checklist You have checked the required items in the checklist but you didn't do what is written... and removed check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Sep 24, 2023
@araffin
Copy link
Member

araffin commented Sep 24, 2023

Hello,
I guess there must be a confusion between gym API and VecEnv API (see our doc, we highlight the differences).
The provided code is not enough to reproduce the error (obs is not defined), but the following is working without any issue:

# native
from typing import Optional

# installed
import numpy as np
from gymnasium import Env
from gymnasium.core import ActType, ObsType
from gymnasium.spaces import Box

from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env


class MyEnv(Env):
    def __init__(self):
        self.observation_space = Box(low=-np.inf, high=np.inf, shape=(5,))
        self.action_space = Box(low=-1.0, high=1.0, shape=tuple())

    def step(self, action: ActType) -> (ObsType, float, float, bool):
        return (np.array([0, 0, 0, 0, 0], dtype=np.float32), 1, True, False, {})

    def reset(
        self,
        *,
        seed: Optional[int] = None,
        return_info: bool = False,
        options: Optional[dict] = None,
    ) -> tuple[ObsType, dict[str, any]]:
        return (np.array([0.0, 0.0, 0.0, 0, 0], dtype=np.float32), {})


env = MyEnv()
check_env(env)  # does not output any warnings

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=5_000)

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

the important part is here:

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

Probably a duplicate of #1637 (comment) and #1661

@araffin araffin added duplicate This issue or pull request already exists check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Sep 24, 2023
@macciocu
Copy link
Author

macciocu commented Sep 24, 2023

Thank you very much @araffin! This indeed does the trick.

vec_env = model.get_env()
obs = vec_env.reset()
model.predict(obs)

FYI I was taking obs as follows, from obs = env.reset(), with env being the instance from the custom environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check the checklist You have checked the required items in the checklist but you didn't do what is written... custom gym env Issue related to Custom Gym Env duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants