The performance of Mujoco Swimmer is low. #401

caozhangjie · 2021-07-22T18:08:25Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, torch, numpy, sys
print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I have run another RL algorithm here on Mujoco Swimmer-v3 and get the reward above 300. I'm not sure why tianshou can only achieve a lower than 100 reward for Swimmer.

Trinkle23897 · 2021-07-22T23:48:47Z

@ChenDRAG

ChenDRAG · 2021-07-23T04:06:53Z

@caozhangjie Achieving a reward of 300 in Mujoco Swimmer is rarely seen in the literature. As a matter of fact, I personally don't know any public algorithm that can get that high reward. The algorithm TRPO you seem to use can only achieve a reward of 120 in PPO's paper, and can only achieve a reward of ~80 in OPENAI baselines(see https://github.com/thu-ml/tianshou/tree/master/examples/mujoco).
Could you specify your detailed environment settings and provide training curves?

caozhangjie · 2021-07-23T07:56:31Z

You can clone his repo and replace the main.py with my attached main.py, where I write an evaluate function. I just do 20 rollouts to evaluate the policy (without discount factor). You can run python main.py --env-name Swimmer-v3 --batch-size 25000 --log-interval 10 in the repo. It will give a reward of about 300 in the 20 or 30 episodes (5 minutes with 1 cpu).
main.py.zip

Trinkle23897 · 2021-07-23T08:36:52Z

I change one configuration (--log-interval from 5 to 10, same as changing seed):

However the performance is still better than tianshou. I think it is because you change the observation normalization, but this change also affects other tasks' performance. It is quite possible that you get the best performance on swimmer but meanwhile fails on ant/halfcheetah. (People usually run mujoco benchmark with same configuration for all environments)

araffin · 2021-07-23T16:36:06Z

hill-a/stable-baselines#500 (comment)

Trinkle23897 · 2021-07-23T23:46:08Z

hill-a/stable-baselines#500 (comment)

Great thanks!!! But I have a question: how did you change the sensor from neck to head? Is there any forked repo I can refer?

araffin · 2021-07-24T04:44:59Z

you don't need to change the sensor (see my last comment).

Trinkle23897 added the question Further information is requested label Jul 23, 2021

Trinkle23897 closed this as completed Aug 10, 2021

Howuhh mentioned this issue Jun 22, 2022

Average PPO implementation vwxyzjn/cleanrl#212

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of Mujoco Swimmer is low. #401

The performance of Mujoco Swimmer is low. #401

caozhangjie commented Jul 22, 2021 •

edited

Loading

Trinkle23897 commented Jul 22, 2021

ChenDRAG commented Jul 23, 2021 •

edited

Loading

caozhangjie commented Jul 23, 2021 •

edited by Trinkle23897

Loading

Trinkle23897 commented Jul 23, 2021 •

edited

Loading

araffin commented Jul 23, 2021

Trinkle23897 commented Jul 23, 2021

araffin commented Jul 24, 2021

The performance of Mujoco Swimmer is low. #401

The performance of Mujoco Swimmer is low. #401

Comments

caozhangjie commented Jul 22, 2021 • edited Loading

Trinkle23897 commented Jul 22, 2021

ChenDRAG commented Jul 23, 2021 • edited Loading

caozhangjie commented Jul 23, 2021 • edited by Trinkle23897 Loading

Trinkle23897 commented Jul 23, 2021 • edited Loading

araffin commented Jul 23, 2021

Trinkle23897 commented Jul 23, 2021

araffin commented Jul 24, 2021

caozhangjie commented Jul 22, 2021 •

edited

Loading

ChenDRAG commented Jul 23, 2021 •

edited

Loading

caozhangjie commented Jul 23, 2021 •

edited by Trinkle23897

Loading

Trinkle23897 commented Jul 23, 2021 •

edited

Loading