-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the shared feature extractor for PPO #922
Comments
Hey. Generally people have used the same loss function when using shared parameters, but you are right; having two losses updating the same parameters may cause some problems (i.e., slower learning). Hence I would recommend using the default parameters, which are generally ok. In some older results papers claimed they got better results with sharing CNN between policy and value heads, since they are both doing a similar task and "supporting each other", but your mileage may vary :). But yes, if you are worried of this, you should create a custom policy network where there are two separate feature extractors. If you do not CNNs, you can use the |
Sorry to bother you again. But would using the two losses to update the shared parameters cause the problem of instability? |
Potentially, but it might still be beneficial. Phasic policy gradient is one such work which discusses this. |
@Miffyli I'm extremely sorry to keep bothering you. But I'm still confused about a few things regarding the custom policy. From your previous answer in this issue:
Are you referring to this custom policy network? But from issue #347, you mentioned that:
I'm kind of confused about the two comments: when defining a custom policy, do I need to do a custom feature extractor or two separate custom feature extractors explicitly? Actually, I'm trying to construct an attention block (it doesn't include sequential parts but is not a simple MLP either) for PPO. Its actor and critic networks (both include the attention block) don't share parameters with each other. So I'm thinking a custom feature extractor might not be enough (from the note in your user guide, by default, the custom feature extractor is shared between actor and critic network for on-policy algorithms). But if using this advanced example, it seems that the network architecture is limited to MLP (the advanced example is using Thank you again for your great help!! |
Question
I'm extremely sorry to keep bothering you. I found one issue a year ago regarding the custom feature extractor and custom policy. But I'm still confused about a few points here. I'm using PPO actually. I'd like to add a CNN part and feed its output vector to the linear net_arch and train them together. But from the paper, it seems that if training a network that shares parameters in actor and critic, the loss should be different:
If I use the CNN as a shared features extractor before the actor/critic network and train all the three parts (shared custom feature extractor, separated actor and critic networks) together, do I need to change the loss manually? Or, maybe a proper way is to just write a custom policy without sharing the CNN's parameters?
I REALLY REALLY appreciate your help!
Checklist
The text was updated successfully, but these errors were encountered: