Question about the shared feature extractor for PPO #922

pengzhi1998 · 2022-05-26T09:03:52Z

Question

I'm extremely sorry to keep bothering you. I found one issue a year ago regarding the custom feature extractor and custom policy. But I'm still confused about a few points here. I'm using PPO actually. I'd like to add a CNN part and feed its output vector to the linear net_arch and train them together. But from the paper, it seems that if training a network that shares parameters in actor and critic, the loss should be different:

If I use the CNN as a shared features extractor before the actor/critic network and train all the three parts (shared custom feature extractor, separated actor and critic networks) together, do I need to change the loss manually? Or, maybe a proper way is to just write a custom policy without sharing the CNN's parameters?

I REALLY REALLY appreciate your help!

Checklist

[yes] I have read the documentation (required)
[yes] I have checked that there is no similar issue in the repo (required)

Miffyli · 2022-05-26T16:11:52Z

Hey. Generally people have used the same loss function when using shared parameters, but you are right; having two losses updating the same parameters may cause some problems (i.e., slower learning). Hence I would recommend using the default parameters, which are generally ok. In some older results papers claimed they got better results with sharing CNN between policy and value heads, since they are both doing a similar task and "supporting each other", but your mileage may vary :).

But yes, if you are worried of this, you should create a custom policy network where there are two separate feature extractors. If you do not CNNs, you can use the net_arch argument.

pengzhi1998 · 2022-06-20T03:22:23Z

Sorry to bother you again. But would using the two losses to update the shared parameters cause the problem of instability?

Miffyli · 2022-06-20T19:00:57Z

Potentially, but it might still be beneficial. Phasic policy gradient is one such work which discusses this.

pengzhi1998 · 2022-07-20T10:46:16Z

@Miffyli I'm extremely sorry to keep bothering you. But I'm still confused about a few things regarding the custom policy.

From your previous answer in this issue:

But yes, if you are worried of this, you should create a custom policy network where there are two separate feature extractors.

Are you referring to this custom policy network? But from issue #347, you mentioned that:

If you define a custom policy there is no need to do a custom feature extractor.

I'm kind of confused about the two comments: when defining a custom policy, do I need to do a custom feature extractor or two separate custom feature extractors explicitly?

Actually, I'm trying to construct an attention block (it doesn't include sequential parts but is not a simple MLP either) for PPO. Its actor and critic networks (both include the attention block) don't share parameters with each other. So I'm thinking a custom feature extractor might not be enough (from the note in your user guide, by default, the custom feature extractor is shared between actor and critic network for on-policy algorithms).

But if using this advanced example, it seems that the network architecture is limited to MLP (the advanced example is using _build_mlp_extractor). I'm thinking of borrowing from your advanced example and the _build_mlp_extractor function, but directly define my attention block for the policy and value networks in the forward_actor and forward_critic functions in CustomNetwork class. May I have your suggestions?

Thank you again for your great help!!

pengzhi1998 added the question Further information is requested label May 26, 2022

araffin closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the shared feature extractor for PPO #922

Question about the shared feature extractor for PPO #922

pengzhi1998 commented May 26, 2022

Miffyli commented May 26, 2022

pengzhi1998 commented Jun 20, 2022

Miffyli commented Jun 20, 2022

pengzhi1998 commented Jul 20, 2022

Question about the shared feature extractor for PPO #922

Question about the shared feature extractor for PPO #922

Comments

pengzhi1998 commented May 26, 2022

Question

Checklist

Miffyli commented May 26, 2022

pengzhi1998 commented Jun 20, 2022

Miffyli commented Jun 20, 2022

pengzhi1998 commented Jul 20, 2022