You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Beginning training
100%| 200000/200000 [00:00<00:00, 176379478.55it/s]
Traceback (most recent call last):
File "/home/username/Code/ICRL-benchmarks-public/interface/train_policy.py", line 311, in
train(args)
File "/home/username/Code/ICRL-benchmarks-public/interface/train_policy.py", line 211, in train
policy_agent.learn(total_timesteps=forward_timesteps,
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/ppo/ppo.py", line 255, in learn
return super(PPO, self).learn(
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/on_policy_algorithm.py", line 223, in learn
total_timesteps, callback = self._setup_learn(
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/base_class.py", line 347, in _setup_learn
self._last_obs = self.env.reset()
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/vec_env/vec_normalize.py", line 157, in reset
return self.normalize_obs(obs)
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/vec_env/vec_normalize.py", line 113, in normalize_obs
obs = np.clip((obs - self.obs_rms.mean) / np.sqrt(self.obs_rms.var + self.epsilon), -self.clip_obs, self.clip_obs)
ValueError: operands could not be broadcast together with shapes (1,18) (17,)
Description: I encountered this error while following the tutorial to train a policy using PPO with train_policy.py. It seems like there's a mismatch between vector shapes during normalization. Specifically, the code is trying to operate on arrays with incompatible shapes: (1,18) and (17,).
It looks like the observation shapes aren't handled correctly when normalizing observations with VecNormalize.
Request: Could anyone help with this issue? It seems like a problem with the observation vector dimensions during PPO training. Any advice on how to resolve this shape mismatch would be appreciated.
Thank you!
The text was updated successfully, but these errors were encountered:
Thanks for the quick response!
I had initially installed a different version due to issues with Gym 0.21.0, but I managed to figure out how to install this version, and it indeed resolved the problem.
For anyone having trouble installing Gym 0.21.0, follow this link: openai/gym#3202
Steps to reproduce: I ran the following command as suggested in the readme:
python train_policy.py ../config/mujuco_BlockedHalfCheetah/train_ppo_HCWithPos-v0.yaml -n 1 -s 123
Beginning training
100%| 200000/200000 [00:00<00:00, 176379478.55it/s]
Traceback (most recent call last):
File "/home/username/Code/ICRL-benchmarks-public/interface/train_policy.py", line 311, in
train(args)
File "/home/username/Code/ICRL-benchmarks-public/interface/train_policy.py", line 211, in train
policy_agent.learn(total_timesteps=forward_timesteps,
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/ppo/ppo.py", line 255, in learn
return super(PPO, self).learn(
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/on_policy_algorithm.py", line 223, in learn
total_timesteps, callback = self._setup_learn(
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/base_class.py", line 347, in _setup_learn
self._last_obs = self.env.reset()
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/vec_env/vec_normalize.py", line 157, in reset
return self.normalize_obs(obs)
File "/home/username/Code/ICRL-benchmarks-public/stable_baselines3/common/vec_env/vec_normalize.py", line 113, in normalize_obs
obs = np.clip((obs - self.obs_rms.mean) / np.sqrt(self.obs_rms.var + self.epsilon), -self.clip_obs, self.clip_obs)
ValueError: operands could not be broadcast together with shapes (1,18) (17,)
Description: I encountered this error while following the tutorial to train a policy using PPO with train_policy.py. It seems like there's a mismatch between vector shapes during normalization. Specifically, the code is trying to operate on arrays with incompatible shapes: (1,18) and (17,).
It looks like the observation shapes aren't handled correctly when normalizing observations with VecNormalize.
Request: Could anyone help with this issue? It seems like a problem with the observation vector dimensions during PPO training. Any advice on how to resolve this shape mismatch would be appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: