Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] LSTM policies with custom feature extractors #160

Closed
cisprague opened this issue Sep 18, 2020 · 14 comments
Closed

[feature request] LSTM policies with custom feature extractors #160

cisprague opened this issue Sep 18, 2020 · 14 comments
Labels
enhancement New feature or request help wanted Help from contributors is welcomed

Comments

@cisprague
Copy link

cisprague commented Sep 18, 2020

Hi! It would be awesome to be able to implement LSTM policies in this library, like in the former version. Is there an straightforward way to accomplish this with the current version?

@araffin
Copy link
Member

araffin commented Sep 18, 2020

Hello,
As mentioned in the roadmap #1 and in the doc, recurrent policies are not yet supported but planned for v1.1+.

@araffin araffin added the enhancement New feature or request label Sep 18, 2020
@JadenTravnik
Copy link

How would recurrent policies be approached here? Any recommendations?

@Miffyli
Copy link
Collaborator

Miffyli commented Dec 10, 2020

@JadenTravnik

I think the implementation style of SB2 would be a good starting point, and there is already some support for that in SB3 (e.g. predict returns hidden_state despite there no being any hidden states yet). However I short review and discussion over it and its weaknesses would be a good idea, for the quality of the implementations and clarity of the code.

Edit: One thing we might want to consider is updating buffers to always store trajectory-information and/or even keep different trajectories separate to make this easier.

@AntoineMath
Copy link

@Miffyli
Do you mean :

  • We should implement ActorCriticLstmPolicy class in the same way as ActorCriticCnnPolicy is implemented in SB3 (with a LstmExtractor class in torch_layers.py), and following the lstm architecture of SB2.

or

  • We should re-implement ActorCriticPolicy class and all its different sublasses in the same way as in SB2 (e.g ReccurentActorCriticPolicy -> LstmPolicy -> MlpLstmPolicy classes). So the Lstm network is directly defined in the LstmPolicy class.

?

@Miffyli Miffyli added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Dec 29, 2020
@Miffyli Miffyli removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Jan 4, 2021
@Miffyli
Copy link
Collaborator

Miffyli commented Jan 4, 2021

I do not dare to give any guides just yet. PyTorch makes things much easier when it comes to RNN-type of networks and we are able to save ourselves from a ton of headache compared to TF if we do things right. On a quick ponder I think we might be able to implement RNN support in a very low level in policies (e.g. arguments to BasePolicy or ActorCriticPolicy). One option would be to assume that all observations have sequence-dimension on top of batch-dimension. This would make RNNs easier to add (just add nn.LSTM layer and things work out) and also avoid if-elseing, at the cost of remembering that we always have the time dimension.

My point being: First step for RNN support would be to think of clean and functionally satisfactory ("RNN agents that work") solutions before starting to think of code level things.

@jcarlosroldan
Copy link

Shouldn't this be part of the breaking changes on the migration guide? At least in the meantime

@Miffyli
Copy link
Collaborator

Miffyli commented May 12, 2021

@juancroldan do you mean something like this: "LstmPolicy is not supported for the time being"? Just to make sure an user does not go looking for this. If so, a PR to add this would be welcome :)

@jcarlosroldan jcarlosroldan mentioned this issue May 17, 2021
13 tasks
@araffin araffin mentioned this issue May 23, 2021
14 tasks
@glebarez
Copy link

glebarez commented May 29, 2021

How does recurrent policy differ from actually implementing custom policy with LSTM/GRU network and just prepare your state in a way that it will already contain previous experience/timeline?
Would you please list benefits here? Thank you

@Miffyli
Copy link
Collaborator

Miffyli commented May 30, 2021

@glebarez Storing hidden states during training, and the whole backprop through time, is the largest headache here. We need to update the code to store the hidden states between rollouts, load them up correctly and then do the training. Some of this infrastructure is there, so it is not that big of a deal (you can get plenty of hints from SB2), but only implementing it on the policy level is not enough (training needs to be updated as well).

@glebarez
Copy link

@Miffyli thank you for taking time in making explanation.
Indeed there seem to be much inner workings that are well suitable to be incapsulated in the policy.
I glanced through the SB2 code and find it somewhat overcomplicated.
Also SB2 documentation lacks comprehensive examples for recurrent policies.
I'm looking forward for SB3 release that would include recurrent policies on torch.
Again, many thanks.

@Miffyli
Copy link
Collaborator

Miffyli commented May 30, 2021

Yup the plan is to have much simpler/clearer RNN implementation this time around, using the things we learned during SB2 :). PS: If you feel like working towards anything like this, PRs are always welcome!

bstee615 added a commit to bstee615/stable-baselines3 that referenced this issue Jun 7, 2021
Recurrent policies are not supported yet as of (DLR-RM#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.
araffin pushed a commit that referenced this issue Jun 7, 2021
* Remove recurrent policies from A2C docs

Recurrent policies are not supported yet as of (#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.

* Update changelog

Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
@micronoz
Copy link

any updates on recurrent policies?

@Miffyli
Copy link
Collaborator

Miffyli commented Jul 24, 2021

No active development as of yet. You could try using original stable-baselines algorithms if those are suitable for you.

leor-c pushed a commit to leor-c/stable-baselines3 that referenced this issue Aug 26, 2021
* Remove recurrent policies from A2C docs

Recurrent policies are not supported yet as of (DLR-RM#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.

* Update changelog

Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
@araffin araffin added the help wanted Help from contributors is welcomed label Nov 4, 2021
@araffin
Copy link
Member

araffin commented Nov 25, 2021

I have a very experimental version of recurrent PPO in a SB3 contrib branch based on SB2/cleanRL implementation: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/feat/ppo-lstm

Use it at your own risk :p

closing in favor of #18

@araffin araffin closed this as completed Nov 25, 2021
TiagoFer0128 pushed a commit to TiagoFer0128/stable-baselines3 that referenced this issue Dec 15, 2023
* Remove recurrent policies from A2C docs

Recurrent policies are not supported yet as of (DLR-RM/stable-baselines3#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.

* Update changelog

Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Help from contributors is welcomed
Projects
None yet
Development

No branches or pull requests

8 participants