Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualisation of Loss value #7

Open
aretius opened this issue Aug 21, 2024 · 8 comments
Open

Visualisation of Loss value #7

aretius opened this issue Aug 21, 2024 · 8 comments

Comments

@aretius
Copy link

aretius commented Aug 21, 2024

Thanks for sharing such an awesome working.

Since you have used weighted GCL, I wonder how the loss values actually looked like. Usually in contrastive losses there are lot of stability issues - was wondering how the train and validation loss curve looked like in your cases.
If you can't share that - would be happy to know how were the instability handled?

@alanzty
Copy link
Collaborator

alanzty commented Aug 21, 2024

Thanks for your interest in our work. We didn't do anything extra to stabilize the training process. Did you try using a large batchsize? The loss is aggregated over the batch, so larger batchsize would lower the loss variance.

@aretius
Copy link
Author

aretius commented Aug 21, 2024

I am using gradient accumulation to a batch size of 4096.
But my train and validation loss are quite high on the order of ~2. It does decrease from 2.7 to 2 finally after a few epochs but then kind of saturates.
In my previous experiments with open_clip I have usually seen good performance if my loss is around 0.2-0.4.

@alanzty
Copy link
Collaborator

alanzty commented Aug 21, 2024

Which code and data did you use to train?

@aretius
Copy link
Author

aretius commented Aug 22, 2024

Since there are no training scripts in the repo, I followed the procedure mentioned in the paper - parameters in similar range and used the open_clip implementation.
I was trying it on one of the fashion datasets - DeepFashion.
If you want i can share the code here

@alanzty
Copy link
Collaborator

alanzty commented Aug 22, 2024

If you train on DeepFashion, what ranking score do you use? It is possible for the validation loss to stay high, because if the ranking is arbitrary, then it doesn't generalize from training set to validation set. But the training loss should converge.

@aretius
Copy link
Author

aretius commented Aug 26, 2024

I currently created a custom ranker using LLM based on product categories. I just wanted to test on product categories and ranked them based on similarity to the text.
My training loss keeps on decreasing but validation loss unfortunately plateaus. My train loss goes to 0.7 but validation loss keeps hovering around 1.5. The loss starts at ~3.5 for both.

  • Have you observed something similar?
  • Also since you trained GCL on Google shopping queries did you do random negative sample or ensured that in a batch there are only unique queries?

@alanzty
Copy link
Collaborator

alanzty commented Aug 27, 2024

The gap between validation and training loss looks quite normal. Tho i found it is often possible to get much lower training loss than 0.7.

That is a good question. I implemented something to avoid query collision but haven't found a reliable way to solve this problem yet.

@aretius
Copy link
Author

aretius commented Aug 27, 2024

Understood thanks for the insight.
So in your experience query collision is not a good thing right? Ideally i should sample in a batch all different query items?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants