-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] What is the difference between a custom feature extractor and a custom policy? #347
Comments
Feature extractors only concern themselves with processing [whatever shaped] inputs into nice 1D vectors. Policies then take this 1D vector and map it into value/pi predictions et cetera. Policy holds the feature extractor and also handles initializations and such. So, to change most of the network, you probably want to define a new feature extractor. If your observations are 1D vectors, then you can use If some part of the docs was unclear, do point it out so it can be refined for clarity. |
I think we should update the doc. I created two diagrams to explain things faster: The feature extractor is usually shared between networks to save computation (can be disabled) and the network architecture comes afterward. |
Ok I see. So if I want true customisation I should follow the "Advanced" example here: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#on-policy-algorithms And then in addition to that I should also create a custom feature extractor. So to summarise, if I create a feature extractor |
If you define a custom policy there is no need to do a custom feature extractor. The separation is merely done in default policies to make it clear which part processes the input into 1D vector and which part maps into pi/value, and also make it easier to change this preprocessing network without having to touch other parts. For |
Thank you for the clear explanations! However, I'm still confused about a few points here. I'm using PPO actually. And from the paper, it seems that if training a network that shares parameters in actor and critic, the loss should be different: If I use a CNN as a features extractor as a shared part before the actor/critic network and train all the three parts together, do I need to change the loss? Or, maybe a proper way is to just write a custom policy but not sharing the CNN's parameters? Thank you! |
Question
When should I use a custom feature extractor vs a custom policy? It's a little unclear here in the docs on what the differences are. If I want to use a custom neural net, should I replace the feature extractor, define it as a custom policy or, should I define both a custom policy AND a custom feature extractor?
Checklist
The text was updated successfully, but these errors were encountered: