-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The performance of Mujoco Swimmer is low. #401
Comments
@caozhangjie Achieving a reward of 300 in Mujoco Swimmer is rarely seen in the literature. As a matter of fact, I personally don't know any public algorithm that can get that high reward. The algorithm TRPO you seem to use can only achieve a reward of 120 in PPO's paper, and can only achieve a reward of ~80 in OPENAI baselines(see https://github.com/thu-ml/tianshou/tree/master/examples/mujoco). |
You can clone his repo and replace the |
I change one configuration (--log-interval from 5 to 10, same as changing seed): However the performance is still better than tianshou. I think it is because you change the observation normalization, but this change also affects other tasks' performance. It is quite possible that you get the best performance on swimmer but meanwhile fails on ant/halfcheetah. (People usually run mujoco benchmark with same configuration for all environments) |
Great thanks!!! But I have a question: how did you change the sensor from neck to head? Is there any forked repo I can refer? |
you don't need to change the sensor (see my last comment). |
I have run another RL algorithm here on Mujoco Swimmer-v3 and get the reward above 300. I'm not sure why tianshou can only achieve a lower than 100 reward for Swimmer.
The text was updated successfully, but these errors were encountered: