[Question] What is the difference between a custom feature extractor and a custom policy? #347

outdoteth · 2021-03-09T22:20:53Z

Question

When should I use a custom feature extractor vs a custom policy? It's a little unclear here in the docs on what the differences are. If I want to use a custom neural net, should I replace the feature extractor, define it as a custom policy or, should I define both a custom policy AND a custom feature extractor?

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Miffyli · 2021-03-10T11:04:25Z

Feature extractors only concern themselves with processing [whatever shaped] inputs into nice 1D vectors. Policies then take this 1D vector and map it into value/pi predictions et cetera. Policy holds the feature extractor and also handles initializations and such.

So, to change most of the network, you probably want to define a new feature extractor. If your observations are 1D vectors, then you can use net_arch argument to change the network. If you want something more custom, however, then you need to create a custom policy.

If some part of the docs was unclear, do point it out so it can be refined for clarity.

araffin · 2021-03-10T11:53:55Z

I think we should update the doc. I created two diagrams to explain things faster:

The feature extractor is usually shared between networks to save computation (can be disabled) and the network architecture comes afterward.
As mentioned in the doc, SB3 does an abuse of language when we talk about "Policy", in the code, it refers to all the networks + optimizer + target networks, whereas in RL it refers to the actor part only (the one taking actions).

outdoteth · 2021-03-10T20:49:02Z

Ok I see. So if I want true customisation I should follow the "Advanced" example here: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#on-policy-algorithms

And then in addition to that I should also create a custom feature extractor.

So to summarise, if I create a feature extractor F and a custom policy P then the parameters for F will be shared between both the actor and the critic - that is to say, the actor and critic will use the same F. Then inside of P there are no constraints other than there must be two outputs (one for the value function and one for the policy).

Miffyli · 2021-03-11T00:10:38Z

If you define a custom policy there is no need to do a custom feature extractor. The separation is merely done in default policies to make it clear which part processes the input into 1D vector and which part maps into pi/value, and also make it easier to change this preprocessing network without having to touch other parts.

For P the only real constraint is that you implement the public functions correctly (see the original ActorCriticPolicy). Other than that you are free to do almost anything. The example behind the link you shared is somewhat limited but a good starting point.

pengzhi1998 · 2022-05-24T09:27:09Z

Thank you for the clear explanations!

However, I'm still confused about a few points here. I'm using PPO actually. And from the paper, it seems that if training a network that shares parameters in actor and critic, the loss should be different:

If I use a CNN as a features extractor as a shared part before the actor/critic network and train all the three parts together, do I need to change the loss? Or, maybe a proper way is to just write a custom policy but not sharing the CNN's parameters?

Thank you!

outdoteth added the question Further information is requested label Mar 9, 2021

araffin added the documentation Improvements or additions to documentation label Mar 10, 2021

araffin mentioned this issue Mar 16, 2021

Stable-Baselines3 v1.0 #354

Merged

14 tasks

araffin closed this as completed in #354 Mar 17, 2021

pengzhi1998 mentioned this issue May 26, 2022

Question about the shared feature extractor for PPO #922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What is the difference between a custom feature extractor and a custom policy? #347

[Question] What is the difference between a custom feature extractor and a custom policy? #347

outdoteth commented Mar 9, 2021 •

edited

Loading

Miffyli commented Mar 10, 2021

araffin commented Mar 10, 2021

outdoteth commented Mar 10, 2021 •

edited

Loading

Miffyli commented Mar 11, 2021 •

edited

Loading

pengzhi1998 commented May 24, 2022 •

edited

Loading

[Question] What is the difference between a custom feature extractor and a custom policy? #347

[Question] What is the difference between a custom feature extractor and a custom policy? #347

Comments

outdoteth commented Mar 9, 2021 • edited Loading

Question

Checklist

Miffyli commented Mar 10, 2021

araffin commented Mar 10, 2021

outdoteth commented Mar 10, 2021 • edited Loading

Miffyli commented Mar 11, 2021 • edited Loading

pengzhi1998 commented May 24, 2022 • edited Loading

outdoteth commented Mar 9, 2021 •

edited

Loading

outdoteth commented Mar 10, 2021 •

edited

Loading

Miffyli commented Mar 11, 2021 •

edited

Loading

pengzhi1998 commented May 24, 2022 •

edited

Loading