Same episodic returns every epoch #667

c4cld · 2022-06-13T14:54:42Z

Hello, I use PPO with a customized environment to train an agent. However, I found every time the episodic returns are the same. Not sure what's going wrong...

Here's the output:

Epoch #1: 2001it [00:36, 54.27it/s, env_step=2000, len=1001, loss/actor=11.228, loss/critic1=0.062, loss/critic2=0.064, n/ep=0, n/st=1, rew=-943.40]                          
Epoch #2:   0%|          | 1/2000 [00:00<00:37, 53.87it/s, env_step=2001, len=1001, loss/actor=11.233, loss/critic1=0.061, loss/critic2=0.064, n/ep=0, n/st=1, rew=-943.40]Epoch #1: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #2: 2001it [00:38, 52.39it/s, env_step=4000, len=2000, loss/actor=19.322, loss/critic1=0.155, loss/critic2=0.158, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #3:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #2: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #3: 2001it [00:39, 51.08it/s, env_step=6000, len=2000, loss/actor=26.462, loss/critic1=0.356, loss/critic2=0.359, n/ep=0, n/st=1, rew=-1884.80]
Epoch #3: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #4: 2001it [00:37, 52.73it/s, env_step=8000, len=2000, loss/actor=32.905, loss/critic1=0.821, loss/critic2=0.835, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #4: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #5: 2001it [00:37, 53.59it/s, env_step=10000, len=2000, loss/actor=38.699, loss/critic1=0.514, loss/critic2=0.513, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #6:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #5: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #6: 2001it [00:37, 53.61it/s, env_step=12000, len=2000, loss/actor=43.835, loss/critic1=0.993, loss/critic2=0.992, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #6: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #7: 2001it [00:37, 53.69it/s, env_step=14000, len=2000, loss/actor=48.587, loss/critic1=1.452, loss/critic2=1.450, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #7: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #8: 2001it [00:37, 53.64it/s, env_step=16000, len=2000, loss/actor=52.685, loss/critic1=1.356, loss/critic2=1.346, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #8: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #9: 2001it [00:37, 53.65it/s, env_step=18000, len=2000, loss/actor=56.454, loss/critic1=1.917, loss/critic2=1.921, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #10:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #9: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #10: 2001it [00:37, 53.78it/s, env_step=20000, len=2000, loss/actor=59.955, loss/critic1=2.231, loss/critic2=2.229, n/ep=0, n/st=1, rew=-1884.80]

The text was updated successfully, but these errors were encountered:

Trinkle23897 · 2022-06-13T18:24:12Z

Have you tried to log the result of env step (obs, rew, done, info)?

c4cld · 2022-06-14T06:02:27Z

Thanks for your answer! I just find the bug: it's because the actor net output the same action every time.

Trinkle23897 added the question Further information is requested label Jun 13, 2022

Trinkle23897 changed the title ~~每回合的最终奖励一致~~ Same episodic returns every epoch Jun 13, 2022

c4cld closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same episodic returns every epoch #667

Same episodic returns every epoch #667

c4cld commented Jun 13, 2022 •

edited by Trinkle23897

Loading

Trinkle23897 commented Jun 13, 2022 •

edited

Loading

c4cld commented Jun 14, 2022 •

edited by Trinkle23897

Loading

Same episodic returns every epoch #667

Same episodic returns every epoch #667

Comments

c4cld commented Jun 13, 2022 • edited by Trinkle23897 Loading

Trinkle23897 commented Jun 13, 2022 • edited Loading

c4cld commented Jun 14, 2022 • edited by Trinkle23897 Loading

c4cld commented Jun 13, 2022 •

edited by Trinkle23897

Loading

Trinkle23897 commented Jun 13, 2022 •

edited

Loading

c4cld commented Jun 14, 2022 •

edited by Trinkle23897

Loading