-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching losses with miners #169
Comments
A primary goal of this library is a high degree of flexibility to make it easy to try new ideas. So yes, you can pass in a tuple miner into any loss function, and it'll do something with it. Specifically:
There probably are certain combinations of loss+miner that work better than others, but I don't know which ones. There also isn't a great way of selecting hyperparameters, and there are so many other factors like model, optimizer, batch size, and learning rate. So unfortunately I don't have a great answer for this. Based on my experience, if I were trying to solve a new problem I would limit my focus to:
And maybe also try combining these with the MultiSimilarityMiner. Since you're interested in using miners, I suggest trying out the ThresholdReducer, which can be passed into any loss function using the |
Thank you very much for the detailed answer. As a follow up, I have one observation regarding the loss functions that use cosine similarity as the distance metric, and I am curious to hear your opinion on that. The euclidean distance depends on the embedding size and the values it contains. So, the optimal margin parameter may change according to the specific problem at hand, and is not easily transferable from one domain to another. Applying L2 normalization right before loss calculation might help, but I am skeptical of that since doing so can alter the relative similarities among embeddings. However, cosine similarity gives a value between -1 and 1 (between 0 and 2 for cosine distance) regardless of what the embeddings contain. Do you think the optimized loss parameters (which employs cosine similarity) that were reported in the literature can safely be transferable to other domains? I am just trying to figure out if there is one less hyperparameter to worry about. |
Yeah I think that is one advantage of cosine similarity vs unnormalized embeddings, i.e. you can usually expect to use a triplet margin of between 0 and 0.2. I think the optimized hyperparameters in one domain are a good starting place, but probably the true optimal is something different. |
I see. Thanks! |
I have two questions regarding using losses with miners:
The text was updated successfully, but these errors were encountered: