Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atari example for policy-based method like A2C, PPO #374

Closed
FastWind123 opened this issue May 23, 2021 · 13 comments · Fixed by #523
Closed

Atari example for policy-based method like A2C, PPO #374

FastWind123 opened this issue May 23, 2021 · 13 comments · Fixed by #523
Assignees
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement

Comments

@FastWind123
Copy link

Is there any example using A2C, PPO to train atari ?

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented May 23, 2021

will do in the next few weeks @ChenDRAG @yingchengyang

@Trinkle23897 Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label May 23, 2021
@FastWind123
Copy link
Author

will do in the next few weeks @ChenDRAG @yingchengyang

OK, thank you!

@Syzygianinfern0
Copy link

Syzygianinfern0 commented Jul 25, 2021

Any updates on this? It would be really helpful.

@Syzygianinfern0
Copy link

https://gist.github.com/Syzygianinfern0/63d6b2b2ec5342865510da861eb97938
This is my attempt at it, but the agent does not converge.

@Syzygianinfern0
Copy link

Syzygianinfern0 commented Jul 25, 2021

image

🔶 => DQN
🔷 => DiscreteSAC

Environment: PongNoFrameskip-v4

@Syzygianinfern0
Copy link

@Trinkle23897 do you have any script for hyperparam search for agents?

@Trinkle23897
Copy link
Collaborator

I'll take a look this weekend, but usually sac-discrete cannot achieve good performance on atari games.

@destinyyzy
Copy link

The performance of PPO (implemented by tianshou) can hardly improve in Atari.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Aug 24, 2021

I know but I have already find the bug, it is because list(param) + list(param) will cause undefined behavior and should be changed with set(param).union(param). After fixing this bug, I can train Pong within 10 mins in my laptop:
2021-08-24 18-11-02屏幕截图

I'll push the code after finishing benchmarking.

@Trinkle23897 Trinkle23897 mentioned this issue Aug 24, 2021
8 tasks
@destinyyzy
Copy link

I know but I have already find the bug, it is because list(param) + list(param) will cause undefined behavior and should be changed with set(param).union(param). After fixing this bug, I can train Pong within 10 mins in my laptop:
2021-08-24 18-11-02屏幕截图

I'll push the code after finishing benchmarking.

Can you share me the code by email?
I change the list + list in PPO, but I do not find the improvement in performance.

@Trinkle23897
Copy link
Collaborator

I'll make a pull request, but these two days I have no time because I'm going to fly to America from Singapore (which takes a long time though...)

@bingykang
Copy link

bingykang commented Aug 26, 2021

Hi @destinyyzy,

Please find the demo code by @Trinkle23897.

Besides, you also need to replace

list(self.actor.parameters()) + list(self.critic.parameters()),
with

set(self.actor.parameters()).union(self.critic.parameters()),

Hope you find it helpful.

ps. please change Pong-v5 to PongNoFrameskip-v4

@nuance1979 nuance1979 mentioned this issue Feb 8, 2022
9 tasks
@Trinkle23897 Trinkle23897 linked a pull request Feb 10, 2022 that will close this issue
9 tasks
Trinkle23897 pushed a commit that referenced this issue Feb 10, 2022
I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.
@GongYanfu
Copy link

so is there any example using A2C to train atari games?

BFAnas pushed a commit to BFAnas/tianshou that referenced this issue May 5, 2024
I needed a policy gradient baseline myself and it has been requested several times (thu-ml#497, thu-ml#374, thu-ml#440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters.

Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants