You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
I found Contriever quite interesting based on the table 3 of the paper (few-shot retrieval) as Contriever-MSMarco achieves a score of 38.1 when finetuned on FiQA, which is much higher than the BERT-MSMarco which is at ~31. The difference is even bigger when comparing contriever and BERT (the checkpoints that were not first finetuned on msmarco), achieving a 10 pts improvements:
I’ve tried a similar set up (similar to DPR), with the differences being:
Trained for 20 epochs instead of 500
AdamW instead of ASAM
Included BM25 hard negatives (i.e. top results that are not a gold label) in addition to in-batch negative sampling
Batch size of 128 instead of 256 (though the number of negatives should be the same due to HNs)
Instead of early stopping, I just trained for 20 epochs and save the checkpoint at the epoch with the best dev NDCG@10
It seems that under those settings, the improvements isn't as high as the difference reported in the paper:
split
epoch
metric
model_name
learning_rate
k=1
k=3
k=5
k=10
k=100
k=1000
test
7
ndcg
facebook/contriever-msmarco
1e-05
0.24383
0.25005
0.2608
0.28715
0.36118
0.39975
test
16
ndcg
facebook/contriever
3e-05
0.25
0.23583
0.24952
0.2732
0.35149
0.39019
test
12
ndcg
roberta-base
5e-05
0.25309
0.22701
0.24416
0.26293
0.33809
0.37927
test
16
ndcg
bert-base-uncased
2e-05
0.21451
0.20465
0.21947
0.23826
0.31088
0.35118
Note the NDCG@10 of the contriever model is 3.49 higher than the bert-base-uncased (I tried learning rates between 1e-5 and 5e-5), which is small than the 10.3 pts improvements show in the screenshot (26.1 -> 36.4). I am not surprised that the results themselves are lower due to the differences in hyperperameters, but the delta in improvements surprises me. Is contriever harder to finetune when using the Adam optimizer? Or are we expected to use 256 batch sizes and/or avoid hard negatives from BM25?
Is it possible to either:
provide the huggingface checkpoints of contriever and contriever-msmarco finetuned on fiqa, or
share scripts that let me reproduce the process of finetuning contriever or contriever-msmarco on fiqa and save the checkpoint as huggingface model
Thank you!
The text was updated successfully, but these errors were encountered:
I found Contriever quite interesting based on the table 3 of the paper (few-shot retrieval) as Contriever-MSMarco achieves a score of 38.1 when finetuned on FiQA, which is much higher than the BERT-MSMarco which is at ~31. The difference is even bigger when comparing contriever and BERT (the checkpoints that were not first finetuned on msmarco), achieving a 10 pts improvements:
I’ve tried a similar set up (similar to DPR), with the differences being:
It seems that under those settings, the improvements isn't as high as the difference reported in the paper:
Note the NDCG@10 of the
contriever
model is 3.49 higher than thebert-base-uncased
(I tried learning rates between 1e-5 and 5e-5), which is small than the 10.3 pts improvements show in the screenshot (26.1 -> 36.4). I am not surprised that the results themselves are lower due to the differences in hyperperameters, but the delta in improvements surprises me. Is contriever harder to finetune when using the Adam optimizer? Or are we expected to use 256 batch sizes and/or avoid hard negatives from BM25?Is it possible to either:
Thank you!
The text was updated successfully, but these errors were encountered: