Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robo-gym check env issue #866

Closed
isaacncz opened this issue Apr 15, 2022 · 6 comments
Closed

robo-gym check env issue #866

isaacncz opened this issue Apr 15, 2022 · 6 comments
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support question Further information is requested

Comments

@isaacncz
Copy link

isaacncz commented Apr 15, 2022

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email.
Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

🤖 Custom Gym Environment

Please check your environment first using:

from stable_baselines3.common.env_checker import check_env

env = gym.make('EndEffectorPositioningURSim-v0', ip=target_machine_ip, gui=True)
# It will check your custom environment and output additional warnings if needed
check_env(env)

### Describe the bug

A clear and concise description of what the bug is.
Having issue with check_env with this custom environment

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell 3' in <cell line: [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000008?line=0)>()
----> 1[ check_env(env)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:291, in check_env(env, warn, skip_render_check)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=288)[ # The check only works with numpy arrays
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=289)[ if _is_numpy_array_space(observation_space) and _is_numpy_array_space(action_space):
--> ]()[291](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=290)[     _check_nan(env)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:93, in _check_nan(env)
     ]()[91](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=90)[ for _ in range(10):
     ]()[92](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=91)[     action = np.array([env.action_space.sample()])
---> ]()[93](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py?line=92)[     _, _, _, _ = vec_env.step(action)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:162, in VecEnv.step(self, actions)
    ]()[155](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=154)[ """
    ]()[156](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=155)[ Step the environments with the given action
    ]()[157](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=156)[ 
    ]()[158](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=157)[ :param actions: the action
    ]()[159](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=158)[ :return: observation, reward, done, information
    ]()[160](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=159)[ """
    ]()[161](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=160)[ self.step_async(actions)
--> ]()[162](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=161)[ return self.step_wait()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py:35, in VecCheckNan.step_wait(self)
     ]()[34](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=33)[ def step_wait(self) -> VecEnvStepReturn:
---> ]()[35](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=34)[     observations, rewards, news, infos = self.venv.step_wait()
     ]()[37](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=36)[     self._check_val(async_step=False, observations=observations, rewards=rewards, news=news)
     ]()[39](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/vec_check_nan.py?line=38)[     self._observations = observations

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:51, in DummyVecEnv.step_wait(self)
     ]()[49](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=48)[         obs = self.envs[env_idx].reset()
     ]()[50](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=49)[     self._save_obs(env_idx, obs)
---> ]()[51](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=50)[ return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
    ]()[203](file:///usr/lib/python3.8/copy.py?line=202)[ append = y.append
    ]()[204](file:///usr/lib/python3.8/copy.py?line=203)[ for a in x:
--> ]()[205](file:///usr/lib/python3.8/copy.py?line=204)[     append(deepcopy(a, memo))
    ]()[206](file:///usr/lib/python3.8/copy.py?line=205)[ return y

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
    ]()[228](file:///usr/lib/python3.8/copy.py?line=227)[ memo[id(x)] = y
    ]()[229](file:///usr/lib/python3.8/copy.py?line=228)[ for key, value in x.items():
--> ]()[230](file:///usr/lib/python3.8/copy.py?line=229)[     y[deepcopy(key, memo)] = deepcopy(value, memo)
    ]()[231](file:///usr/lib/python3.8/copy.py?line=230)[ return y

File /usr/lib/python3.8/copy.py:161, in deepcopy(x, memo, _nil)
    ]()[159](file:///usr/lib/python3.8/copy.py?line=158)[ reductor = getattr(x, "__reduce_ex__", None)
    ]()[160](file:///usr/lib/python3.8/copy.py?line=159)[ if reductor is not None:
--> ]()[161](file:///usr/lib/python3.8/copy.py?line=160)[     rv = reductor(4)
    ]()[162](file:///usr/lib/python3.8/copy.py?line=161)[ else:
    ]()[163](file:///usr/lib/python3.8/copy.py?line=162)[     reductor = getattr(x, "__reduce__", None)

TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object]()

The observation space: Box([ -inf -inf -inf -1.1 -1.1 -1.1 -1.1 -1.1 -1.1 -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -1.01 -1.01 -1.01 -1.01 -1.01 -1.01], [ inf inf inf 1.1 1.1 1.1 1.1 1.1 1.1 inf inf inf inf inf inf inf inf inf inf inf inf 1.01 1.01 1.01 1.01 1.01 1.01], (27,), float32) The action space: Box([-1. -1. -1. -1. -1.], [1. 1. 1. 1. 1.], (5,), float32)

### Code example

import gym
import robo_gym
from robo_gym.wrappers.exception_handling import ExceptionHandling

import stable_baselines3 as sb3
from stable_baselines3 import SAC,PPO

from stable_baselines3.common.env_checker import check_env
check_env(env)

Please try to provide a minimal example to reproduce the bug.

I was running the example here.
https://github.com/jr-robotics/robo-gym/blob/master/docs/environments.md#end-effector-positioning

For a custom environment, you need to give at least the observation space, action space, reset() and step() methods
(see working example below).
Error messages and stack traces are also helpful.

Please use the markdown code blocks
for both code and stack traces.

import gym
import numpy as np

from stable_baselines3 import A2C
from stable_baselines3.common.env_checker import check_env


class CustomEnv(gym.Env):

  def __init__(self):
    super(CustomEnv, self).__init__()
    self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(14,))
    self.action_space = gym.spaces.Box(low=-1, high=1, shape=(6,))

  def reset(self):
    return self.observation_space.sample()

  def step(self, action):
    obs = self.observation_space.sample()
    reward = 1.0
    done = False
    info = {}
    return obs, reward, done, info

env = CustomEnv()
check_env(env)

model = A2C("MlpPolicy", env, verbose=1).learn(1000)
Traceback (most recent call last): File ...

### System Info
Describe the characteristic of your environment:

  • Describe how the library was installed (pip, docker, source, ...)
  • GPU models and configuration
  • Python version
  • PyTorch version
  • Gym version
  • Versions of any other relevant libraries

You can use sb3.get_system_info() to print relevant packages info:

import stable_baselines3 as sb3
sb3.get_system_info()

OS: Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
Python: 3.8.10
Stable-Baselines3: 1.5.0
PyTorch: 1.11.0+cu113
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.21.0

({'OS': 'Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022',
  'Python': '3.8.10',
  'Stable-Baselines3': '1.5.0',
  'PyTorch': '1.11.0+cu113',
  'GPU Enabled': 'True',
  'Numpy': '1.20.0',
  'Gym': '0.21.0'},
 'OS: Linux-5.13.0-39-generic-x86_64-with-glibc2.29 #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022\nPython: 3.8.10\nStable-Baselines3: 1.5.0\nPyTorch: 1.11.0+cu113\nGPU Enabled: True\nNumpy: 1.20.0\nGym: 0.21.0\n')

Additional context

Add any other context about the problem here.

### Checklist

  • [ /] I have read the documentation (required)
  • [ /] I have checked that there is no similar issue in the repo (required)
  • [ /] I have checked my env using the env checker (required)
  • [ /] I have provided a minimal working example to reproduce the bug (required)
@isaacncz isaacncz added custom gym env Issue related to Custom Gym Env question Further information is requested labels Apr 15, 2022
@araffin araffin added the No tech support We do not do tech support label Apr 15, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Apr 15, 2022

Hey. It seems like the problem stems from the custom environment you are using. For some reason, doing deepcopy on the info buffer it returns seems to raise this. This can not really be fixed in stable-baselines3, but you could try creating a wrapper for your environment that returns normal dictionaries and numpy arrays instead of these google protobuf things.

Edit: see answer below ˆˆ

@araffin
Copy link
Member

araffin commented Apr 15, 2022

Hello,
the check env is made for gym.Env environments, not already vectorized one (if you are using isaac Gym).
You should use a VecEnvWrapper to use it with SB3, see #772 (comment)

@araffin araffin added the more information needed Please fill the issue template completely label Apr 15, 2022
@isaacncz
Copy link
Author

the environment was build using Gym api. Could you provide additional info whether VecEnvWrapper is required?
Really appreciate your input as i found the observation is return with
return gym.spaces.Box(low=min_obs, high=max_obs, dtype=np.float32)

and when i run
model = SAC("MultiInputPolicy", env, verbose=1)

the output was

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell 6' in <cell line: [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000005?line=0)>()
----> 1[ model = SAC("MultiInputPolicy", env, verbose=1)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:144, in SAC.__init__(self, policy, env, learning_rate, buffer_size, learning_starts, batch_size, tau, gamma, train_freq, gradient_steps, action_noise, replay_buffer_class, replay_buffer_kwargs, optimize_memory_usage, ent_coef, target_update_interval, target_entropy, use_sde, sde_sample_freq, use_sde_at_warmup, tensorboard_log, create_eval_env, policy_kwargs, verbose, seed, device, _init_setup_model)
    ]()[141](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=140)[ self.ent_coef_optimizer = None
    ]()[143](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=142)[ if _init_setup_model:
--> ]()[144](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=143)[     self._setup_model()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:147, in SAC._setup_model(self)
    ]()[146](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=145)[ def _setup_model(self) -> None:
--> ]()[147](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=146)[     super(SAC, self)._setup_model()
    ]()[148](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=147)[     self._create_aliases()
    ]()[149](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=148)[     # Target entropy is used when learning the entropy coefficient

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:216, in OffPolicyAlgorithm._setup_model(self)
    ]()[205](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=204)[ if self.replay_buffer is None:
    ]()[206](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=205)[     self.replay_buffer = self.replay_buffer_class(
    ]()[207](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=206)[         self.buffer_size,
    ]()[208](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=207)[         self.observation_space,
   (...)
    ]()[213](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=212)[         **self.replay_buffer_kwargs,
    ]()[214](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=213)[     )
--> ]()[216](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=215)[ self.policy = self.policy_class(  # pytype:disable=not-instantiable
    ]()[217](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=216)[     self.observation_space,
    ]()[218](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=217)[     self.action_space,
    ]()[219](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=218)[     self.lr_schedule,
    ]()[220](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=219)[     **self.policy_kwargs,  # pytype:disable=not-instantiable
    ]()[221](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=220)[ )
    ]()[222](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=221)[ self.policy = self.policy.to(self.device)
    ]()[224](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=223)[ # Convert train freq parameter to TrainFreq object

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:498, in MultiInputPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
    ]()[478](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=477)[ def __init__(
    ]()[479](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=478)[     self,
    ]()[480](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=479)[     observation_space: gym.spaces.Space,
   (...)
    ]()[496](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=495)[     share_features_extractor: bool = True,
    ]()[497](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=496)[ ):
--> ]()[498](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=497)[     super(MultiInputPolicy, self).__init__(
    ]()[499](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=498)[         observation_space,
    ]()[500](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=499)[         action_space,
    ]()[501](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=500)[         lr_schedule,
    ]()[502](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=501)[         net_arch,
    ]()[503](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=502)[         activation_fn,
    ]()[504](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=503)[         use_sde,
    ]()[505](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=504)[         log_std_init,
    ]()[506](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=505)[         sde_net_arch,
    ]()[507](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=506)[         use_expln,
    ]()[508](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=507)[         clip_mean,
    ]()[509](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=508)[         features_extractor_class,
    ]()[510](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=509)[         features_extractor_kwargs,
    ]()[511](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=510)[         normalize_images,
    ]()[512](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=511)[         optimizer_class,
    ]()[513](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=512)[         optimizer_kwargs,
    ]()[514](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=513)[         n_critics,
    ]()[515](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=514)[         share_features_extractor,
    ]()[516](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=515)[     )

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:292, in SACPolicy.__init__(self, observation_space, action_space, lr_schedule, net_arch, activation_fn, use_sde, log_std_init, sde_net_arch, use_expln, clip_mean, features_extractor_class, features_extractor_kwargs, normalize_images, optimizer_class, optimizer_kwargs, n_critics, share_features_extractor)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=288)[ self.critic, self.critic_target = None, None
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=289)[ self.share_features_extractor = share_features_extractor
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=291)[ self._build(lr_schedule)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:295, in SACPolicy._build(self, lr_schedule)
    ]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=293)[ def _build(self, lr_schedule: Schedule) -> None:
--> ]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=294)[     self.actor = self.make_actor()
    ]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=295)[     self.actor.optimizer = self.optimizer_class(self.actor.parameters(), lr=lr_schedule(1), **self.optimizer_kwargs)
    ]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=297)[     if self.share_features_extractor:

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py:348, in SACPolicy.make_actor(self, features_extractor)
    ]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=346)[ def make_actor(self, features_extractor: Optional[BaseFeaturesExtractor] = None) -> Actor:
--> ]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=347)[     actor_kwargs = self._update_features_extractor(self.actor_kwargs, features_extractor)
    ]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/policies.py?line=348)[     return Actor(**actor_kwargs).to(self.device)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:112, in BaseModel._update_features_extractor(self, net_kwargs, features_extractor)
    ]()[109](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=108)[ net_kwargs = net_kwargs.copy()
    ]()[110](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=109)[ if features_extractor is None:
    ]()[111](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=110)[     # The features extractor is not shared, create a new one
--> ]()[112](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=111)[     features_extractor = self.make_features_extractor()
    ]()[113](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=112)[ net_kwargs.update(dict(features_extractor=features_extractor, features_dim=features_extractor.features_dim))
    ]()[114](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=113)[ return net_kwargs

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py:118, in BaseModel.make_features_extractor(self)
    ]()[116](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=115)[ def make_features_extractor(self) -> BaseFeaturesExtractor:
    ]()[117](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=116)[     """Helper method to create a features extractor."""
--> ]()[118](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/policies.py?line=117)[     return self.features_extractor_class(self.observation_space, **self.features_extractor_kwargs)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py:258, in CombinedExtractor.__init__(self, observation_space, cnn_output_dim)
    ]()[255](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=254)[ extractors = {}
    ]()[257](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=256)[ total_concat_size = 0
--> ]()[258](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=257)[ for key, subspace in observation_space.spaces.items():
    ]()[259](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=258)[     if is_image_space(subspace):
    ]()[260](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/torch_layers.py?line=259)[         extractors[key] = NatureCNN(subspace, features_dim=cnn_output_dim)

AttributeError: 'Box' object has no attribute 'spaces']()

@Miffyli
Copy link
Collaborator

Miffyli commented Apr 16, 2022

You should use MlpPolicy instead of MultiInputPolicy for Box spaces.

@isaacncz
Copy link
Author

isaacncz commented Apr 17, 2022

thank you so much for the patience and reply. I tried with MlpPolicy model = SAC("MlpPolicy", env, verbose=1) and with the reply
Using cuda device Wrapping the env with a `Monitor` wrapper Wrapping the env in a DummyVecEnv.

However, i still cant perform model.learn(total_timesteps=10000)

TypeError                                 Traceback (most recent call last)
/home/isaac/robogym_ws/test.ipynb Cell [1](vscode-notebook-cell:/home/isaac/robogym_ws/test.ipynb#ch0000008?line=0)3' in <cell line: 1>()
----> 1[ model.learn(total_timesteps=10000)

File ~/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py:292, in SAC.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    ]()[279](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=278)[ def learn(
    ]()[280](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=279)[     self,
    ]()[281](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=280)[     total_timesteps: int,
   (...)
    ]()[289](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=288)[     reset_num_timesteps: bool = True,
    ]()[290](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=289)[ ) -> OffPolicyAlgorithm:
--> ]()[292](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=291)[     return super(SAC, self).learn(
    ]()[293](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=292)[         total_timesteps=total_timesteps,
    ]()[294](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=293)[         callback=callback,
    ]()[295](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=294)[         log_interval=log_interval,
    ]()[296](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=295)[         eval_env=eval_env,
    ]()[297](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=296)[         eval_freq=eval_freq,
    ]()[298](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=297)[         n_eval_episodes=n_eval_episodes,
    ]()[299](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=298)[         tb_log_name=tb_log_name,
    ]()[300](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=299)[         eval_log_path=eval_log_path,
    ]()[301](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=300)[         reset_num_timesteps=reset_num_timesteps,
    ]()[302](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/sac/sac.py?line=301)[     )

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:347, in OffPolicyAlgorithm.learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    ]()[344](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=343)[ callback.on_training_start(locals(), globals())
    ]()[346](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=345)[ while self.num_timesteps < total_timesteps:
--> ]()[347](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=346)[     rollout = self.collect_rollouts(
    ]()[348](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=347)[         self.env,
    ]()[349](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=348)[         train_freq=self.train_freq,
    ]()[350](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=349)[         action_noise=self.action_noise,
    ]()[351](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=350)[         callback=callback,
    ]()[352](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=351)[         learning_starts=self.learning_starts,
    ]()[353](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=352)[         replay_buffer=self.replay_buffer,
    ]()[354](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=353)[         log_interval=log_interval,
    ]()[355](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=354)[     )
    ]()[357](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=356)[     if rollout.continue_training is False:
    ]()[358](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=357)[         break

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py:580, in OffPolicyAlgorithm.collect_rollouts(self, env, callback, train_freq, replay_buffer, action_noise, learning_starts, log_interval)
    ]()[577](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=576)[ actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
    ]()[579](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=578)[ # Rescale and perform action
--> ]()[580](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=579)[ new_obs, rewards, dones, infos = env.step(actions)
    ]()[582](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=581)[ self.num_timesteps += env.num_envs
    ]()[583](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py?line=582)[ num_collected_steps += 1

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:162, in VecEnv.step(self, actions)
    ]()[155](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=154)[ """
    ]()[156](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=155)[ Step the environments with the given action
    ]()[157](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=156)[ 
    ]()[158](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=157)[ :param actions: the action
    ]()[159](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=158)[ :return: observation, reward, done, information
    ]()[160](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=159)[ """
    ]()[161](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=160)[ self.step_async(actions)
--> ]()[162](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py?line=161)[ return self.step_wait()

File ~/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:51, in DummyVecEnv.step_wait(self)
     ]()[49](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=48)[         obs = self.envs[env_idx].reset()
     ]()[50](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=49)[     self._save_obs(env_idx, obs)
---> ]()[51](file:///home/isaac/.local/lib/python3.8/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py?line=50)[ return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
    ]()[203](file:///usr/lib/python3.8/copy.py?line=202)[ append = y.append
    ]()[204](file:///usr/lib/python3.8/copy.py?line=203)[ for a in x:
--> ]()[205](file:///usr/lib/python3.8/copy.py?line=204)[     append(deepcopy(a, memo))
    ]()[206](file:///usr/lib/python3.8/copy.py?line=205)[ return y

File /usr/lib/python3.8/copy.py:146, in deepcopy(x, memo, _nil)
    ]()[144](file:///usr/lib/python3.8/copy.py?line=143)[ copier = _deepcopy_dispatch.get(cls)
    ]()[145](file:///usr/lib/python3.8/copy.py?line=144)[ if copier is not None:
--> ]()[146](file:///usr/lib/python3.8/copy.py?line=145)[     y = copier(x, memo)
    ]()[147](file:///usr/lib/python3.8/copy.py?line=146)[ else:
    ]()[148](file:///usr/lib/python3.8/copy.py?line=147)[     if issubclass(cls, type):

File /usr/lib/python3.8/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
    ]()[228](file:///usr/lib/python3.8/copy.py?line=227)[ memo[id(x)] = y
    ]()[229](file:///usr/lib/python3.8/copy.py?line=228)[ for key, value in x.items():
--> ]()[230](file:///usr/lib/python3.8/copy.py?line=229)[     y[deepcopy(key, memo)] = deepcopy(value, memo)
    ]()[231](file:///usr/lib/python3.8/copy.py?line=230)[ return y

File /usr/lib/python3.8/copy.py:161, in deepcopy(x, memo, _nil)
    ]()[159](file:///usr/lib/python3.8/copy.py?line=158)[ reductor = getattr(x, "__reduce_ex__", None)
    ]()[160](file:///usr/lib/python3.8/copy.py?line=159)[ if reductor is not None:
--> ]()[161](file:///usr/lib/python3.8/copy.py?line=160)[     rv = reductor(4)
    ]()[162](file:///usr/lib/python3.8/copy.py?line=161)[ else:
    ]()[163](file:///usr/lib/python3.8/copy.py?line=162)[     reductor = getattr(x, "__reduce__", None)

TypeError: cannot pickle 'google.protobuf.pyext._message.ScalarMapContainer' object]()

@araffin
Copy link
Member

araffin commented Apr 17, 2022

you are passing something in the info dict that is not pickable, please remove it or convert it (the error with the env checker was the same).
closing as we don't do tech support.

@araffin araffin closed this as completed Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env more information needed Please fill the issue template completely No tech support We do not do tech support question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants