Visualisation of Loss value #7

aretius · 2024-08-21T09:34:16Z

Thanks for sharing such an awesome working.

Since you have used weighted GCL, I wonder how the loss values actually looked like. Usually in contrastive losses there are lot of stability issues - was wondering how the train and validation loss curve looked like in your cases.
If you can't share that - would be happy to know how were the instability handled?

alanzty · 2024-08-21T12:13:26Z

Thanks for your interest in our work. We didn't do anything extra to stabilize the training process. Did you try using a large batchsize? The loss is aggregated over the batch, so larger batchsize would lower the loss variance.

aretius · 2024-08-21T13:35:34Z

I am using gradient accumulation to a batch size of 4096.
But my train and validation loss are quite high on the order of ~2. It does decrease from 2.7 to 2 finally after a few epochs but then kind of saturates.
In my previous experiments with open_clip I have usually seen good performance if my loss is around 0.2-0.4.

alanzty · 2024-08-21T15:27:00Z

Which code and data did you use to train?

aretius · 2024-08-22T07:24:19Z

Since there are no training scripts in the repo, I followed the procedure mentioned in the paper - parameters in similar range and used the open_clip implementation.
I was trying it on one of the fashion datasets - DeepFashion.
If you want i can share the code here

alanzty · 2024-08-22T17:12:25Z

If you train on DeepFashion, what ranking score do you use? It is possible for the validation loss to stay high, because if the ranking is arbitrary, then it doesn't generalize from training set to validation set. But the training loss should converge.

aretius · 2024-08-26T11:25:26Z

I currently created a custom ranker using LLM based on product categories. I just wanted to test on product categories and ranked them based on similarity to the text.
My training loss keeps on decreasing but validation loss unfortunately plateaus. My train loss goes to 0.7 but validation loss keeps hovering around 1.5. The loss starts at ~3.5 for both.

Have you observed something similar?
Also since you trained GCL on Google shopping queries did you do random negative sample or ensured that in a batch there are only unique queries?

alanzty · 2024-08-27T02:53:02Z

The gap between validation and training loss looks quite normal. Tho i found it is often possible to get much lower training loss than 0.7.

That is a good question. I implemented something to avoid query collision but haven't found a reliable way to solve this problem yet.

aretius · 2024-08-27T10:07:01Z

Understood thanks for the insight.
So in your experience query collision is not a good thing right? Ideally i should sample in a batch all different query items?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualisation of Loss value #7

Visualisation of Loss value #7

aretius commented Aug 21, 2024 •

edited

Loading

alanzty commented Aug 21, 2024

aretius commented Aug 21, 2024

alanzty commented Aug 21, 2024

aretius commented Aug 22, 2024 •

edited

Loading

alanzty commented Aug 22, 2024 •

edited

Loading

aretius commented Aug 26, 2024 •

edited

Loading

alanzty commented Aug 27, 2024

aretius commented Aug 27, 2024

Visualisation of Loss value #7

Visualisation of Loss value #7

Comments

aretius commented Aug 21, 2024 • edited Loading

alanzty commented Aug 21, 2024

aretius commented Aug 21, 2024

alanzty commented Aug 21, 2024

aretius commented Aug 22, 2024 • edited Loading

alanzty commented Aug 22, 2024 • edited Loading

aretius commented Aug 26, 2024 • edited Loading

alanzty commented Aug 27, 2024

aretius commented Aug 27, 2024

aretius commented Aug 21, 2024 •

edited

Loading

aretius commented Aug 22, 2024 •

edited

Loading

alanzty commented Aug 22, 2024 •

edited

Loading

aretius commented Aug 26, 2024 •

edited

Loading