Skip to content

Commit

Permalink
Add adore reproduce experiments (castorini#785)
Browse files Browse the repository at this point in the history
add adore reproduce retrieval stage reproduction
Co-authored-by: Hang Li <cecillll.lee@gmail.com>
  • Loading branch information
ArvinZhuang authored and MXueguang committed Nov 5, 2021
1 parent ed0c6e2 commit b3b2b05
Show file tree
Hide file tree
Showing 3 changed files with 115 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,7 @@ With Pyserini, it's easy to [reproduce](docs/reproducibility.md) runs on a numbe
+ Reproducing [DistilBERT KD experiments](docs/experiments-distilbert_kd.md)
+ Reproducing [DistilBERT Balanced Topic Aware Sampling experiments](docs/experiments-distilbert_tasb.md)
+ Reproducing [SBERT dense retrieval experiments](docs/experiments-sbert.md)
+ Reproducing [ADORE dense retrieval experiments](docs/experiments-adore.md)
+ Reproducing [Vector PRF experiments](docs/experiments-vector-prf.md)

## Baselines
Expand Down
102 changes: 102 additions & 0 deletions docs/experiments-adore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Pyserini: Reproducing ADORE Results

This guide provides instructions to reproduce the following dense retrieval work:

> Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma. [Optimizing Dense Retrieval Model Training with Hard Negatives](https://arxiv.org/pdf/2104.08051.pdf)
Starting with v0.12.0, you can reproduce these results directly from the [Pyserini PyPI package](https://pypi.org/project/pyserini/).
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
See [package installation notes](../README.md#package-installation) for more details.

Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS).
However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective.
Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.

## MS MARCO Passage

**ADORE retrieval** with brute-force index:

```bash
$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
--index msmarco-passage-adore-bf \
--encoded-queries adore-msmarco-passage-dev-subset \
--batch-size 36 \
--threads 12 \
--output runs/run.msmarco-passage.adore.bf.tsv \
--output-format msmarco
```

The option `--encoded-queries` specifies the use of encoded queries (i.e., queries that have already been converted into dense vectors and cached).

Unfortunately, the "on-the-fly" query encoding, ie, convert text queries into dense vectors as part of the dense retrieval process is not available for this model. This is because the original ADORE implementation is based on an old version of transformers (`transformers=2.8.0`). Pyserini uses a higher version so that the base model (`roberta-base`) performs differently.

To evaluate:

```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.adore.bf.tsv
#####################
MRR @10: 0.34661947969254514
QueriesRanked: 6980
#####################
```

We can also use the official TREC evaluation tool `trec_eval` to compute other metrics than MRR@10.
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.adore.bf.tsv --output runs/run.msmarco-passage.adore.bf.trec
$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.ance.bf.trec
map all 0.3523
recall_1000 all 0.9688
```

## TREC DL2019 Passage

**ANCE retrieval** with brute-force index:

```bash
$ python -m pyserini.dsearch --topics dl19-passage \
--index msmarco-passage-adore-bf \
--encoded-queries adore-dl19-passage \
--batch-size 36 \
--threads 12 \
--output runs/run.dl19-passage.adore.bf.trec
```

Same as above, you cannot use the "on-the-fly" query encoding feature.

To evaluate:

```bash
$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.10 -m recall.1000 -l 2 dl19-passage runs/run.dl19-passage.adore.bf.trec
map all 0.4188
recall_1000 all 0.7759
ndcg_cut_10 all 0.6832
```

## TREC DL2020 Passage

**ANCE retrieval** with brute-force index:

```bash
$ python -m pyserini.dsearch --topics dl20 \
--index msmarco-passage-adore-bf \
--encoded-queries adore-dl20-passage \
--batch-size 36 \
--threads 12 \
--output runs/run.dl20-passage.adore.bf.trec
```

Same as above, you cannot use the "on-the-fly" query encoding feature.

To evaluate:

```bash
$ python -m pyserini.eval.trec_eval -c -m map -m ndcg_cut.10 -m recall.1000 -l 2 dl20-passage runs/run.dl20-passage.adore.bf.trec
map all 0.4418
recall_1000 all 0.8151
ndcg_cut_10 all 0.6655
```

## Reproduction Log[*](reproducibility.md)

14 changes: 12 additions & 2 deletions docs/experiments-vector-prf.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ Here's how our results stack up against all available models and datasets in Pys
| SBERT | Original | 0.4060 | 0.5985 | 0.7872 |
| SBERT | Average PRF 3 | 0.4354 | 0.6149 | 0.7937 |
| SBERT | Rocchio PRF 5 A0.4 B0.6 | 0.4371 | 0.6149 | 0.7941 |
| ADORE | Original | 0.4188 | 0.5946 | 0.7759 |
| ADORE | Average PRF 3 | 0.4672 | 0.6263 | 0.7890 |
| ADORE | Rocchio PRF 5 A0.4 B0.6 | 0.4629 | 0.6325 | 0.7950 |


#### TREC DL 2020 Passage

Expand All @@ -63,6 +67,9 @@ Here's how our results stack up against all available models and datasets in Pys
| SBERT | Original | 0.4124 | 0.5734 | 0.7937 |
| SBERT | Average PRF 3 | 0.4258 | 0.5781 | 0.8169 |
| SBERT | Rocchio PRF 5 A0.4 B0.6 | 0.4342 | 0.5851 | 0.8226 |
| ADORE | Original | 0.4418 | 0.5949 | 0.8151 |
| ADORE | Average PRF 3 | 0.4706 | 0.6176 | 0.8323 |
| ADORE | Rocchio PRF 5 A0.4 B0.6 | 0.4760 | 0.6193 | 0.8251 |

#### MS MARCO Passage V1

Expand All @@ -87,7 +94,10 @@ The PRF does not perform well with sparse judgements like in MS MARCO, the resul
| DistillBERT Balanced | Rocchio PRF 5 A0.4 B0.6 | 0.2969 | 0.4178 | 0.9702 |
| SBERT | Original | 0.3373 | 0.4453 | 0.9558 |
| SBERT | Average PRF 3 | 0.3094 | 0.4183 | 0.9446 |
| SBERT | Rocchio PRF 5 A0.4 B0.6 | 0.3034 | 0.4157 | 0.9529 |
| SBERT | Rocchio PRF 5 A0.4 B0.6 | 0.3034 | 0.4157 | 0.9529 |
| ADORE | Original | 0.3523 | 0.4637 | 0.9688 |
| ADORE | Average PRF 3 | 0.3188 | 0.4330 | 0.9583 |
| ADORE | Rocchio PRF 5 A0.4 B0.6 | 0.3209 | 0.4376 | 0.9669 |

## Reproducing Results

Expand Down Expand Up @@ -145,7 +155,7 @@ _Note: TREC DL 2019, TREC DL 2020, and MS MARCO Passage V1 use the same passage

_Note: If you have pre-computed queries available, the `--encoder` can be replaced with `--encoded-queries` to avoid "on-the-fly" query encoding by passing in the path to your pre-computed query file.
For example, Pyserini has the ANCE pre-computed query available for MS MARCO Passage V1, so instead of using `--encoder castorini/ance-msmarco-passage`,
one can use `--encoded-queries ance-msmarco-passage-dev-subset`._
one can use `--encoded-queries ance-msmarco-passage-dev-subset`. For ADORE model, you can only use `--encoded-queries`, otf encoding is not available._

With these parameters, one can easily reproduce the results above, for example, to reproduce `TREC DL 2019 Passage with ANCE Average Vector PRF 3` the command will be:
```
Expand Down

0 comments on commit b3b2b05

Please sign in to comment.