Atari example for policy-based method like A2C, PPO #374

FastWind123 · 2021-05-23T07:23:02Z

Is there any example using A2C, PPO to train atari ?

Trinkle23897 · 2021-05-23T07:24:01Z

will do in the next few weeks @ChenDRAG @yingchengyang

FastWind123 · 2021-05-23T07:25:49Z

will do in the next few weeks @ChenDRAG @yingchengyang

OK, thank you!

Syzygianinfern0 · 2021-07-25T10:03:09Z

Any updates on this? It would be really helpful.

Syzygianinfern0 · 2021-07-25T10:12:40Z

https://gist.github.com/Syzygianinfern0/63d6b2b2ec5342865510da861eb97938
This is my attempt at it, but the agent does not converge.

Syzygianinfern0 · 2021-07-25T10:47:49Z

🔶 => DQN
🔷 => DiscreteSAC

Environment: PongNoFrameskip-v4

Syzygianinfern0 · 2021-07-26T16:45:04Z

@Trinkle23897 do you have any script for hyperparam search for agents?

Trinkle23897 · 2021-07-26T23:18:34Z

I'll take a look this weekend, but usually sac-discrete cannot achieve good performance on atari games.

destinyyzy · 2021-08-24T10:06:49Z

The performance of PPO (implemented by tianshou) can hardly improve in Atari.

Trinkle23897 · 2021-08-24T10:10:26Z

I know but I have already find the bug, it is because list(param) + list(param) will cause undefined behavior and should be changed with set(param).union(param). After fixing this bug, I can train Pong within 10 mins in my laptop:

I'll push the code after finishing benchmarking.

destinyyzy · 2021-08-25T10:27:14Z

I know but I have already find the bug, it is because list(param) + list(param) will cause undefined behavior and should be changed with set(param).union(param). After fixing this bug, I can train Pong within 10 mins in my laptop:

I'll push the code after finishing benchmarking.

Can you share me the code by email?
I change the list + list in PPO, but I do not find the improvement in performance.

Trinkle23897 · 2021-08-25T10:43:30Z

I'll make a pull request, but these two days I have no time because I'm going to fly to America from Singapore (which takes a long time though...)

bingykang · 2021-08-26T13:00:51Z

Hi @destinyyzy,

Please find the demo code by @Trinkle23897.

Besides, you also need to replace

tianshou/tianshou/policy/modelfree/ppo.py

Line 142 in 291be08

list(self.actor.parameters()) + list(self.critic.parameters()),

with

set(self.actor.parameters()).union(self.critic.parameters()),

Hope you find it helpful.

ps. please change Pong-v5 to PongNoFrameskip-v4

I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.

GongYanfu · 2024-01-16T03:05:37Z

so is there any example using A2C to train atari games?

I needed a policy gradient baseline myself and it has been requested several times (thu-ml#497, thu-ml#374, thu-ml#440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.

Trinkle23897 assigned ChenDRAG May 23, 2021

Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label May 23, 2021

Trinkle23897 mentioned this issue Aug 24, 2021

Loss Divergence #425

Closed

8 tasks

Trinkle23897 mentioned this issue Aug 29, 2021

fix docs build failure and a bug in a2c/ppo optimizer #428

Merged

Trinkle23897 mentioned this issue Sep 7, 2021

Atari CNN Input Scale Issue #440

Closed

lsylusiyao mentioned this issue Sep 23, 2021

Reproduce problem on training #449

Closed

8 tasks

Trinkle23897 mentioned this issue Dec 27, 2021

PPO example for Spaceinvaders #497

Closed

nuance1979 mentioned this issue Feb 8, 2022

Add atari ppo example #523

Merged

9 tasks

Trinkle23897 linked a pull request Feb 10, 2022 that will close this issue

Add atari ppo example #523

Merged

9 tasks

Trinkle23897 closed this as completed in #523 Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atari example for policy-based method like A2C, PPO #374

Atari example for policy-based method like A2C, PPO #374

FastWind123 commented May 23, 2021

Trinkle23897 commented May 23, 2021 •

edited

Loading

FastWind123 commented May 23, 2021

Syzygianinfern0 commented Jul 25, 2021 •

edited

Loading

Syzygianinfern0 commented Jul 25, 2021

Syzygianinfern0 commented Jul 25, 2021 •

edited

Loading

Syzygianinfern0 commented Jul 26, 2021

Trinkle23897 commented Jul 26, 2021

destinyyzy commented Aug 24, 2021

Trinkle23897 commented Aug 24, 2021 •

edited

Loading

destinyyzy commented Aug 25, 2021

Trinkle23897 commented Aug 25, 2021

bingykang commented Aug 26, 2021 •

edited by Trinkle23897

Loading

GongYanfu commented Jan 16, 2024

Atari example for policy-based method like A2C, PPO #374

Atari example for policy-based method like A2C, PPO #374

Comments

FastWind123 commented May 23, 2021

Trinkle23897 commented May 23, 2021 • edited Loading

FastWind123 commented May 23, 2021

Syzygianinfern0 commented Jul 25, 2021 • edited Loading

Syzygianinfern0 commented Jul 25, 2021

Syzygianinfern0 commented Jul 25, 2021 • edited Loading

Syzygianinfern0 commented Jul 26, 2021

Trinkle23897 commented Jul 26, 2021

destinyyzy commented Aug 24, 2021

Trinkle23897 commented Aug 24, 2021 • edited Loading

destinyyzy commented Aug 25, 2021

Trinkle23897 commented Aug 25, 2021

bingykang commented Aug 26, 2021 • edited by Trinkle23897 Loading

GongYanfu commented Jan 16, 2024

Trinkle23897 commented May 23, 2021 •

edited

Loading

Syzygianinfern0 commented Jul 25, 2021 •

edited

Loading

Syzygianinfern0 commented Jul 25, 2021 •

edited

Loading

Trinkle23897 commented Aug 24, 2021 •

edited

Loading

bingykang commented Aug 26, 2021 •

edited by Trinkle23897

Loading