-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tips on how to use RevGrad #6
Comments
This is a really good question! I actually don't see whether lambda (I assume you mean the parameter called alpha in this library) can be learned? It's not implemented as a learnable parameter here, and it only applies during the backward pass to scale the gradients. This doesn't feed into any objective, so I'm not sure it can be optimised directly - only through some sort of second-order optimisation. Please correct me if I'm wrong though. I have personally not done a huge amount of experimentation with this, and usually keep it at a low constant. I can imagine increasing it later on during training may work, but I also wonder if it naturally increases in relative weight of the gradients as the main objective is learned better (and the gradients reduce). Would be cool to do some experiments with this. What are you attempting to do? |
@janfreyberg Sorry for the late response. I am attempting to produce a network which is invariant to domain identity. I think Alpha can be backprop'ed if it is set to be a learnable parameter. Although I am not sure if that is a good idea. I can imagine the network completely corrupting the feature space. I find that RevGrad tends to destroy the primary objective if the weight of the loss is not properly balanced. I intend on exploring the effect with a ramp up. |
Cool. I'm still unclear if backpropping the parameter will be possible, but maybe you could try it? If you fork the repo and set it to be a learnable parameter, let me know how it goes and I can merge it if it works well. And yes, fully agree that the balance has to be done pretty carefully because it can have a detrimental effect on performance. It would be great to see how the performance on the domain label changes, too, as you'd ideally want it to be at chance by the time you finish training, so some experiments on charting primary metrics, domain metrics, and gradient scaling over time would be very interesting. |
#6 (comment) |
Hi @dingtao1, my recommendation would be to sweep over the parameter, between 0 and 1, and to track the accuracy (or MSE, or whatever metric is appropriate) for the label you are using RevGrad for, and your main label of interest. Ideally, you'd find a point somewhere that reduces the accuracy on the RevGrad label, while keeping the performance on your main label as high (or nearly as high) as if you had no RevGrad at all (this should be your baseline). Lastly, one recommendation is to scale the losses so they roughly match. For example, if you are using RevGrad on a regression target, and your main target is classification, I would look at the losses of both and roughly scale the losses to match. This is independent of the lambda parameter in RevGrad. Hope that helps |
Hi, I have a question related to training the model for domain adaptation, are both the loss functions trained simultaneously or I train task loss first on source domain and then again train the model on domain tasks? |
This question is for users experienced with RevGrad. What is recommend approach to using RevGrad?
The text was updated successfully, but these errors were encountered: