-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualisation of Loss value #7
Comments
Thanks for your interest in our work. We didn't do anything extra to stabilize the training process. Did you try using a large batchsize? The loss is aggregated over the batch, so larger batchsize would lower the loss variance. |
I am using gradient accumulation to a batch size of 4096. |
Which code and data did you use to train? |
Since there are no training scripts in the repo, I followed the procedure mentioned in the paper - parameters in similar range and used the open_clip implementation. |
If you train on DeepFashion, what ranking score do you use? It is possible for the validation loss to stay high, because if the ranking is arbitrary, then it doesn't generalize from training set to validation set. But the training loss should converge. |
I currently created a custom ranker using LLM based on product categories. I just wanted to test on product categories and ranked them based on similarity to the text.
|
The gap between validation and training loss looks quite normal. Tho i found it is often possible to get much lower training loss than 0.7. That is a good question. I implemented something to avoid query collision but haven't found a reliable way to solve this problem yet. |
Understood thanks for the insight. |
Thanks for sharing such an awesome working.
Since you have used weighted GCL, I wonder how the loss values actually looked like. Usually in contrastive losses there are lot of stability issues - was wondering how the train and validation loss curve looked like in your cases.
If you can't share that - would be happy to know how were the instability handled?
The text was updated successfully, but these errors were encountered: