Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add A2C, ACER, and TRPO for Reinforcement Learning #596

Closed
blahBlahhhJ opened this issue Mar 16, 2021 · 4 comments
Closed

Add A2C, ACER, and TRPO for Reinforcement Learning #596

blahBlahhhJ opened this issue Mar 16, 2021 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed model

Comments

@blahBlahhhJ
Copy link
Contributor

blahBlahhhJ commented Mar 16, 2021

🚀 Feature

Implementation of more RL actor-critic based algorithms (models) like A2C, ACER, and TRPO.

Motivation

The RL section in this project has very few popular algorithms, especially lacking many policy-based and actor-critic-based algorithms. The only policy-based algorithms available now are policy gradient and REINFORCE, which are very old algorithms, and this was also pointed out by #186 . I would like to contribute to the RL section by adding more modern RL algorithms like Advantage Actor Critic (A2C), Soft Actor Critic (SAC), (Actor Critic with Experience Replay) ACER, and (Trust Region Policy Optimization) TRPO.

Pitch

Will implement various RL algorithms for easier and more convenient experiments.
Implementation will follow about the same structure as policy gradient, and would add a new Agent class (actor critic agent), and everything else will be added within each specific new algorithms.

Additional context

If no strong preference on which algorithm to start, I will first work on the A2C algorithm, which is the simplest one - the code will be easiest for people to understand comparing to the other two more sophisticated methods.
The reason I chose A2C is because it's the base of the actor-critic methods, and would definitely worth to be in this project. The other two are built upon this method with better convergence properties and better sample efficiencies.

A2C/A3C: https://arxiv.org/abs/1602.01783
SAC: https://arxiv.org/abs/1801.01290
TRPO: https://arxiv.org/abs/1502.05477
ACER: https://arxiv.org/abs/1611.01224

@blahBlahhhJ blahBlahhhJ added enhancement New feature or request help wanted Extra attention is needed labels Mar 16, 2021
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@akihironitta
Copy link
Contributor

@blahBlahhhJ Hi, thank you for suggesting your ideas! Feel free to submit PRs!

@plutasnyy
Copy link

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

@blahBlahhhJ
Copy link
Contributor Author

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

Hi. Kind of busy right now, so no plans to implement TRPO for me. Feel free to work on it, I guess you can simply submit a draft pull request following the format and mention this issue and you should be good to go. Good luck coding!

@plutasnyy plutasnyy mentioned this issue Jan 19, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed model
Projects
None yet
Development

No branches or pull requests

3 participants