v0.2.2: Margin-MSE loss for training dense retrievers and Open-NLP Meetup
This is a small release update! We made the following changes in the release of the beir package:
1. Now train dense retrievers (SBERT) models using Margin-MSE loss
We have added a new loss, Margin-MSE, which learns a pairwise score using hard negatives and positives. Thanks to @kwang2049 for the motivation, we have now added the loss to the beir repo. The loss is most effective with a Knowledge distillation setup using a powerful teacher model. For more details, we would suggest you refer to the paper by Hofstätter et al., https://arxiv.org/abs/2010.02666.
Margin-MSE Loss function: https://github.com/UKPLab/beir/blob/main/beir/losses/margin_mse_loss.py
Train (SOTA) SBERT model using Margin-MSE: https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3_margin_MSE.py
2. Spoke about Neural Search and BEIR in the OpenNLP Meetup at 12.08.2021
I had fun speaking about BEIR and Neural Search in a recent OpenNLP event on benchmarking search using BEIR.
If you are interested, the talk was recorded and is available below:
YouTube: https://www.youtube.com/watch?v=e9nNr4ugNAo
Slides: https://drive.google.com/file/d/1gghRVv6nWWmMZRqkYvuCO0HWOTEVnPNz/view?usp=sharing
3. Added Splits for each dataset in the datasets table present in README
I plan to add the new big msmarco-v2 version of the passage collection soon, this contains 138,364,198 passages (13.5 GB). The dataset contains two dev splits (dev1
,dev2
). Adding splits would be useful to incorporate different splits that don't follow the traditional convention of a single train, dev and test splits.