Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same episodic returns every epoch #667

Closed
c4cld opened this issue Jun 13, 2022 · 2 comments
Closed

Same episodic returns every epoch #667

c4cld opened this issue Jun 13, 2022 · 2 comments
Labels
question Further information is requested

Comments

@c4cld
Copy link

c4cld commented Jun 13, 2022

Hello, I use PPO with a customized environment to train an agent. However, I found every time the episodic returns are the same. Not sure what's going wrong...

Here's the output:

Epoch #1: 2001it [00:36, 54.27it/s, env_step=2000, len=1001, loss/actor=11.228, loss/critic1=0.062, loss/critic2=0.064, n/ep=0, n/st=1, rew=-943.40]                          
Epoch #2:   0%|          | 1/2000 [00:00<00:37, 53.87it/s, env_step=2001, len=1001, loss/actor=11.233, loss/critic1=0.061, loss/critic2=0.064, n/ep=0, n/st=1, rew=-943.40]Epoch #1: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #2: 2001it [00:38, 52.39it/s, env_step=4000, len=2000, loss/actor=19.322, loss/critic1=0.155, loss/critic2=0.158, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #3:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #2: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #3: 2001it [00:39, 51.08it/s, env_step=6000, len=2000, loss/actor=26.462, loss/critic1=0.356, loss/critic2=0.359, n/ep=0, n/st=1, rew=-1884.80]
Epoch #3: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #4: 2001it [00:37, 52.73it/s, env_step=8000, len=2000, loss/actor=32.905, loss/critic1=0.821, loss/critic2=0.835, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #4: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #5: 2001it [00:37, 53.59it/s, env_step=10000, len=2000, loss/actor=38.699, loss/critic1=0.514, loss/critic2=0.513, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #6:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #5: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #6: 2001it [00:37, 53.61it/s, env_step=12000, len=2000, loss/actor=43.835, loss/critic1=0.993, loss/critic2=0.992, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #6: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #7: 2001it [00:37, 53.69it/s, env_step=14000, len=2000, loss/actor=48.587, loss/critic1=1.452, loss/critic2=1.450, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #7: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #8: 2001it [00:37, 53.64it/s, env_step=16000, len=2000, loss/actor=52.685, loss/critic1=1.356, loss/critic2=1.346, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #8: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #9: 2001it [00:37, 53.65it/s, env_step=18000, len=2000, loss/actor=56.454, loss/critic1=1.917, loss/critic2=1.921, n/ep=0, n/st=1, rew=-1884.80]                          
Epoch #10:   0%|          | 0/2000 [00:00<?, ?it/s]Epoch #9: test_reward: -943.400000 ± 0.000000, best_reward: -943.400000 ± 0.000000 in #0
Epoch #10: 2001it [00:37, 53.78it/s, env_step=20000, len=2000, loss/actor=59.955, loss/critic1=2.231, loss/critic2=2.229, n/ep=0, n/st=1, rew=-1884.80]   
@Trinkle23897 Trinkle23897 added the question Further information is requested label Jun 13, 2022
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jun 13, 2022

Have you tried to log the result of env step (obs, rew, done, info)?

@Trinkle23897 Trinkle23897 changed the title 每回合的最终奖励一致 Same episodic returns every epoch Jun 13, 2022
@c4cld
Copy link
Author

c4cld commented Jun 14, 2022

Thanks for your answer! I just find the bug: it's because the actor net output the same action every time.

@c4cld c4cld closed this as completed Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants