Add A2C, ACER, and TRPO for Reinforcement Learning #596

blahBlahhhJ · 2021-03-16T20:40:16Z

🚀 Feature

Implementation of more RL actor-critic based algorithms (models) like A2C, ACER, and TRPO.

Motivation

The RL section in this project has very few popular algorithms, especially lacking many policy-based and actor-critic-based algorithms. The only policy-based algorithms available now are policy gradient and REINFORCE, which are very old algorithms, and this was also pointed out by #186 . I would like to contribute to the RL section by adding more modern RL algorithms like Advantage Actor Critic (A2C), Soft Actor Critic (SAC), (Actor Critic with Experience Replay) ACER, and (Trust Region Policy Optimization) TRPO.

Pitch

Will implement various RL algorithms for easier and more convenient experiments.
Implementation will follow about the same structure as policy gradient, and would add a new Agent class (actor critic agent), and everything else will be added within each specific new algorithms.

Additional context

If no strong preference on which algorithm to start, I will first work on the A2C algorithm, which is the simplest one - the code will be easiest for people to understand comparing to the other two more sophisticated methods.
The reason I chose A2C is because it's the base of the actor-critic methods, and would definitely worth to be in this project. The other two are built upon this method with better convergence properties and better sample efficiencies.

A2C/A3C: https://arxiv.org/abs/1602.01783
SAC: https://arxiv.org/abs/1801.01290
TRPO: https://arxiv.org/abs/1502.05477
ACER: https://arxiv.org/abs/1611.01224

github-actions · 2021-03-16T20:41:02Z

Hi! thanks for your contribution!, great first issue!

akihironitta · 2021-03-17T06:06:59Z

@blahBlahhhJ Hi, thank you for suggesting your ideas! Feel free to submit PRs!

plutasnyy · 2021-09-20T09:10:15Z

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

blahBlahhhJ · 2021-09-20T17:16:23Z

Hello! I would like to ask @blahBlahhhJ if you have already started working on for example TRPO? If not, then @NaIwo and I would be happy to try to work on it. Let us know!

Hi. Kind of busy right now, so no plans to implement TRPO for me. Feel free to work on it, I guess you can simply submit a draft pull request following the format and mention this issue and you should be good to go. Good luck coding!

blahBlahhhJ added enhancement New feature or request help wanted Extra attention is needed labels Mar 16, 2021

akihironitta added the model label Mar 17, 2021

blahBlahhhJ mentioned this issue Mar 19, 2021

Advantage Actor Critic (A2C) Model #598

Merged

8 tasks

blahBlahhhJ mentioned this issue May 1, 2021

Soft Actor Critic (SAC) Model #627

Merged

8 tasks

plutasnyy mentioned this issue Jan 19, 2022

TRPO Implementation #799

Closed

8 tasks

blahBlahhhJ closed this as completed Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add A2C, ACER, and TRPO for Reinforcement Learning #596

Add A2C, ACER, and TRPO for Reinforcement Learning #596

blahBlahhhJ commented Mar 16, 2021 •

edited

Loading

github-actions bot commented Mar 16, 2021

akihironitta commented Mar 17, 2021

plutasnyy commented Sep 20, 2021

blahBlahhhJ commented Sep 20, 2021

Add A2C, ACER, and TRPO for Reinforcement Learning #596

Add A2C, ACER, and TRPO for Reinforcement Learning #596

Comments

blahBlahhhJ commented Mar 16, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Additional context

github-actions bot commented Mar 16, 2021

akihironitta commented Mar 17, 2021

plutasnyy commented Sep 20, 2021

blahBlahhhJ commented Sep 20, 2021

blahBlahhhJ commented Mar 16, 2021 •

edited

Loading