Feat/cross encoder trainer lambdaloss #4

milistu · 2025-02-26T18:32:01Z

LambdaLoss Implementation for Cross Encoder Trainer

This PR adds LambdaLoss functionality to the Cross Encoder Trainer feature.

Changes

Implemented LambdaLoss as a pairwise loss function
Added support for various weighing schemes for LambdaLoss
Created an example script demonstrating LambdaLoss usage

Implementation Details

LambdaLoss is a pairwise loss function that can be used for ranking problems. It's particularly useful for information retrieval tasks where the relative order of results is more important than absolute scores.

The implementation includes flexible weighing schemes that allow for different prioritization strategies when training the model.

Reference: https://marc.najork.org/papers/cikm2018.pdf

milistu · 2025-03-08T11:39:16Z

Hi @tomaarsen,

I updated LambdaLoss with the changes you made in ListNetLoss.

I've trained the model using the same parameters you used for ListNetLoss, and I'm very pleased with the results. Notably, there's an improvement compared to ListNetLoss on evaluation, and training on just 1 epoch produced better results compared to models trained on the old implementation of this loss with 20 epochs.

I would kindly ask you to review these changes and let me know if you're satisfied with them. Additionally, I plan to remove the cache from the example training script once we finalize the implementation.

I'm also considering completely removing this argument:

reduction: Literal["sum", "mean"] = "sum"

When "sum" is set, the loss is in hundreds (approximately ~190), however, when using "mean" I get something more reasonable (~1). What do you think? I am not sure if this argument gives any value, but maybe I am missing something

Here's the model from my test: https://huggingface.co/Studeni/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-lambdaloss

Lastly, I added mini_batch_size as an argument in get_config_dict. In my opinion, it's good to include all the details - if you agree, we could add this to ListNetLoss as well.

milistu · 2025-03-10T09:30:58Z

Hi @tomaarsen,

I was testing an idea where I wanted to see if we could get better results when expanding the MS_MARCO dataset. I used mini_hard_negatives for every query-positive pair to find 9 negative texts based on similarity between "positive_text". I concatenated this new dataset to v1.1 train and trained the model. I think it's a good example of how to use mini_hard_negatives for this use case. What do you think?

Here is the trained model: https://huggingface.co/Studeni/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-lambdaloss-hard-neg

tomaarsen · 2025-03-10T10:02:51Z

I'm also considering completely removing this argument:
reduction: Literal["sum", "mean"] = "sum"
When "sum" is set, the loss is in hundreds (approximately ~190), however, when using "mean" I get something more reasonable (~1). What do you think? I am not sure if this argument gives any value, but maybe I am missing something

I think exclusively having mean is fine: otherwise the loss depends a lot on the batch size, and people get the (incorrect) assumption that a smaller batch size makes for a better model (it's a smaller loss, after all), when that's not the case.

Lastly, I added mini_batch_size as an argument in get_config_dict. In my opinion, it's good to include all the details - if you agree, we could add this to ListNetLoss as well.

I agree, I think this is fine. mini_batch_size does not affect the model performance (only training speed and memory usage), but it's nice for reproducibility.

I was testing an idea where I wanted to see if we could get better results when expanding the MS_MARCO dataset. I used mini_hard_negatives for every query-positive pair to find 9 negative texts based on similarity between "positive_text". I concatenated this new dataset to v1.1 train and trained the model. I think it's a good example of how to use mini_hard_negatives for this use case. What do you think?

I like it! Sounds good to me. I'll review this later today, and try and get it merged as well!

Tom Aarsen

tomaarsen · 2025-03-10T18:26:02Z

I made some changes here and there, and trained a few models:

https://huggingface.co/tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-lambdaloss
https://huggingface.co/tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-lambdaloss-mean (mean reduction, no sum reduction anymore)
https://huggingface.co/tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-lambdaloss-hard-neg (hard negatives, slightly updated script)

I think this is just about ready to go!

Tom Aarsen

milistu added 3 commits February 19, 2025 20:31

Add LambdaLoss - pairwise loss function.

7c0aa6b

Add LambdaLoss and Weighing schemes.

6726eb4

Add example script for LambdaLoss.

d927941

milistu mentioned this pull request Feb 26, 2025

[feat] CrossEncoder Training refactor - MultiGPU, loss logging, bf16, etc. UKPLab/sentence-transformers#3222

Draft

11 tasks

milistu added 3 commits March 6, 2025 09:37

Update branch with changes from tomaarsen/feat/cross_encoder_trainer.

fda296b

Update and optimize LambdaLoss.

7690604

Update LambdaLoss training example.

dc56074

Add LambdaLoss train example with hard negative mining.

220f47c

tomaarsen added 9 commits March 10, 2025 15:56

Always reduce with mean

6276190

Rename G and D into gain and discount

e0bca61

Merge branch 'feat/cross_encoder_trainer' of into pr-4

b513467

Remove reduction in get_config_dict

f2e8813

Add some docs to the weighting schemes

a92a4ef

weighing_scheme -> weighting_scheme

b271b28

Add LambdaLoss to docs

bbf650f

Update hard negative script slightly (no 3 nested func), add docs

3153f83

Rename start_idx -> skip_n_hardest

2c53108

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/cross encoder trainer lambdaloss #4

Feat/cross encoder trainer lambdaloss #4

milistu commented Feb 26, 2025

milistu commented Mar 8, 2025 •

edited

Loading

milistu commented Mar 10, 2025

tomaarsen commented Mar 10, 2025

tomaarsen commented Mar 10, 2025

Feat/cross encoder trainer lambdaloss #4

Are you sure you want to change the base?

Feat/cross encoder trainer lambdaloss #4

Conversation

milistu commented Feb 26, 2025

LambdaLoss Implementation for Cross Encoder Trainer

Changes

Implementation Details

milistu commented Mar 8, 2025 • edited Loading

milistu commented Mar 10, 2025

tomaarsen commented Mar 10, 2025

tomaarsen commented Mar 10, 2025

milistu commented Mar 8, 2025 •

edited

Loading