-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atari example for policy-based method like A2C, PPO #374
Comments
will do in the next few weeks @ChenDRAG @yingchengyang |
OK, thank you! |
Any updates on this? It would be really helpful. |
https://gist.github.com/Syzygianinfern0/63d6b2b2ec5342865510da861eb97938 |
🔶 => DQN Environment: PongNoFrameskip-v4 |
@Trinkle23897 do you have any script for hyperparam search for agents? |
I'll take a look this weekend, but usually sac-discrete cannot achieve good performance on atari games. |
The performance of PPO (implemented by tianshou) can hardly improve in Atari. |
I'll make a pull request, but these two days I have no time because I'm going to fly to America from Singapore (which takes a long time though...) |
Hi @destinyyzy, Please find the demo code by @Trinkle23897. Besides, you also need to replace tianshou/tianshou/policy/modelfree/ppo.py Line 142 in 291be08
set(self.actor.parameters()).union(self.critic.parameters()), Hope you find it helpful. ps. please change |
I needed a policy gradient baseline myself and it has been requested several times (#497, #374, #440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.
so is there any example using A2C to train atari games? |
I needed a policy gradient baseline myself and it has been requested several times (thu-ml#497, thu-ml#374, thu-ml#440). I used https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari.py as a reference for hyper-parameters. Note that using lr=2.5e-4 will result in "Invalid Value" error for 2 games. The fix is to reduce the learning rate. That's why I set the default lr to 1e-4. See discussion in DLR-RM/rl-baselines3-zoo#156.
Is there any example using A2C, PPO to train atari ?
The text was updated successfully, but these errors were encountered: