Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips on how to use RevGrad #6

Open
RSKothari opened this issue May 7, 2021 · 6 comments
Open

Tips on how to use RevGrad #6

RSKothari opened this issue May 7, 2021 · 6 comments

Comments

@RSKothari
Copy link

This question is for users experienced with RevGrad. What is recommend approach to using RevGrad?

  1. Do we first train without RevGrad and then fine-tune with it?
  2. Do we ramp-up the Lambda parameter? Leave it as a learnable parameter?
@janfreyberg
Copy link
Owner

This is a really good question!

I actually don't see whether lambda (I assume you mean the parameter called alpha in this library) can be learned? It's not implemented as a learnable parameter here, and it only applies during the backward pass to scale the gradients. This doesn't feed into any objective, so I'm not sure it can be optimised directly - only through some sort of second-order optimisation. Please correct me if I'm wrong though.

I have personally not done a huge amount of experimentation with this, and usually keep it at a low constant. I can imagine increasing it later on during training may work, but I also wonder if it naturally increases in relative weight of the gradients as the main objective is learned better (and the gradients reduce).

Would be cool to do some experiments with this. What are you attempting to do?

@RSKothari
Copy link
Author

@janfreyberg Sorry for the late response. I am attempting to produce a network which is invariant to domain identity. I think Alpha can be backprop'ed if it is set to be a learnable parameter. Although I am not sure if that is a good idea. I can imagine the network completely corrupting the feature space. I find that RevGrad tends to destroy the primary objective if the weight of the loss is not properly balanced. I intend on exploring the effect with a ramp up.

@janfreyberg
Copy link
Owner

Cool. I'm still unclear if backpropping the parameter will be possible, but maybe you could try it? If you fork the repo and set it to be a learnable parameter, let me know how it goes and I can merge it if it works well.

And yes, fully agree that the balance has to be done pretty carefully because it can have a detrimental effect on performance. It would be great to see how the performance on the domain label changes, too, as you'd ideally want it to be at chance by the time you finish training, so some experiments on charting primary metrics, domain metrics, and gradient scaling over time would be very interesting.

@dingtao1
Copy link

#6 (comment)
Hello, I've been using GRL recently. If I set lambda larger, the main task loss will increase greatly. If I set lambda smaller, the test results will not change much (or even decrease). So, how to balance loss, can you give me some suggestions? thank you

@janfreyberg
Copy link
Owner

Hi @dingtao1,

my recommendation would be to sweep over the parameter, between 0 and 1, and to track the accuracy (or MSE, or whatever metric is appropriate) for the label you are using RevGrad for, and your main label of interest. Ideally, you'd find a point somewhere that reduces the accuracy on the RevGrad label, while keeping the performance on your main label as high (or nearly as high) as if you had no RevGrad at all (this should be your baseline).

Lastly, one recommendation is to scale the losses so they roughly match. For example, if you are using RevGrad on a regression target, and your main target is classification, I would look at the losses of both and roughly scale the losses to match. This is independent of the lambda parameter in RevGrad.

Hope that helps

@ammarlam10
Copy link

Hi, I have a question related to training the model for domain adaptation, are both the loss functions trained simultaneously or I train task loss first on source domain and then again train the model on domain tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants