Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance of Mujoco Swimmer is low. #401

Closed
2 of 8 tasks
caozhangjie opened this issue Jul 22, 2021 · 7 comments
Closed
2 of 8 tasks

The performance of Mujoco Swimmer is low. #401

caozhangjie opened this issue Jul 22, 2021 · 7 comments
Labels
question Further information is requested

Comments

@caozhangjie
Copy link

caozhangjie commented Jul 22, 2021

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, numpy, sys
    print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I have run another RL algorithm here on Mujoco Swimmer-v3 and get the reward above 300. I'm not sure why tianshou can only achieve a lower than 100 reward for Swimmer.

@Trinkle23897
Copy link
Collaborator

@ChenDRAG

@ChenDRAG
Copy link
Collaborator

ChenDRAG commented Jul 23, 2021

@caozhangjie Achieving a reward of 300 in Mujoco Swimmer is rarely seen in the literature. As a matter of fact, I personally don't know any public algorithm that can get that high reward. The algorithm TRPO you seem to use can only achieve a reward of 120 in PPO's paper, and can only achieve a reward of ~80 in OPENAI baselines(see https://github.com/thu-ml/tianshou/tree/master/examples/mujoco).
Could you specify your detailed environment settings and provide training curves?

@Trinkle23897 Trinkle23897 added the question Further information is requested label Jul 23, 2021
@caozhangjie
Copy link
Author

caozhangjie commented Jul 23, 2021

You can clone his repo and replace the main.py with my attached main.py, where I write an evaluate function. I just do 20 rollouts to evaluate the policy (without discount factor). You can run python main.py --env-name Swimmer-v3 --batch-size 25000 --log-interval 10 in the repo. It will give a reward of about 300 in the 20 or 30 episodes (5 minutes with 1 cpu).
main.py.zip

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jul 23, 2021

I change one configuration (--log-interval from 5 to 10, same as changing seed):

2021-07-23 16-34-23屏幕截图

However the performance is still better than tianshou. I think it is because you change the observation normalization, but this change also affects other tasks' performance. It is quite possible that you get the best performance on swimmer but meanwhile fails on ant/halfcheetah. (People usually run mujoco benchmark with same configuration for all environments)

@araffin
Copy link

araffin commented Jul 23, 2021

hill-a/stable-baselines#500 (comment)

@Trinkle23897
Copy link
Collaborator

hill-a/stable-baselines#500 (comment)

Great thanks!!! But I have a question: how did you change the sensor from neck to head? Is there any forked repo I can refer?

@araffin
Copy link

araffin commented Jul 24, 2021

you don't need to change the sensor (see my last comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants