Skip to content

Commit

Permalink
Add TREC-COVID round 4 baselines (#1308)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Jun 28, 2020
1 parent 2af2026 commit 68ace26
Show file tree
Hide file tree
Showing 4 changed files with 455 additions and 11 deletions.
3 changes: 3 additions & 0 deletions docs/experiments-cord19.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ All versions of pre-built indexes:

| Version | Type | Size | Link | Checksum |
|:-----------|:----------|:------|:-----|:---------|
| 2020-06-19 | Abstract | 2.0G | [[Dropbox]](https://www.dropbox.com/s/bj6lx80wwiy5hxf/lucene-index-cord19-abstract-2020-06-19.tar.gz) | `029bd55daba8800fbae2be9e5fcd7b33`
| 2020-06-19 | Full-Text | 3.8G | [[Dropbox]](https://www.dropbox.com/s/vkhhxj8u36rgdu9/lucene-index-cord19-full-text-2020-06-19.tar.gz) | `3d0eb12094a24cff9bcacd1f17c3ea1c`
| 2020-06-19 | Paragraph | 5.5G | [[Dropbox]](https://www.dropbox.com/s/yk6egw6op4jccpi/lucene-index-cord19-paragraph-2020-06-19.tar.gz) | `5cd8cd6998177bed7a3e0057ef8b3595`
| 2020-06-12 | Abstract | 1.9G | [[Dropbox]](https://www.dropbox.com/s/7uy406atbcu7f2l/lucene-index-cord19-abstract-2020-06-12.tar.gz) | `e0d9d312a83d67c21069717957a56f47`
| 2020-06-12 | Full-Text | 3.7G | [[Dropbox]](https://www.dropbox.com/s/glh8n0c3odd6prm/lucene-index-cord19-full-text-2020-06-12.tar.gz) | `72018ee46556cc72d01885203ea386dc`
| 2020-06-12 | Paragraph | 5.3G | [[Dropbox]](https://www.dropbox.com/s/cbjxc89ti4fd218/lucene-index-cord19-paragraph-2020-06-12.tar.gz) | `72732d298885c2c317236af33b08197c`
Expand Down
60 changes: 49 additions & 11 deletions docs/experiments-covid.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,31 @@
This document describes various baselines for the [TREC-COVID Challenge](https://ir.nist.gov/covidSubmit/), which uses the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coronavirus-research) from the [Allen Institute for AI](https://allenai.org/).
Here, we focus on running retrieval experiments; for basic instructions on building Anserini indexes, see [this page](experiments-cord19.md).

## Round 4

These are runs that can be easily replicated with Anserini, from pre-built indexes available [here](experiments-cord19.md#pre-built-indexes-all-versions) (version from 2020/06/19, which is the official corpus used in the evaluation).
They were prepared _for_ round 4 (for participants who wish to have a baseline run to rerank), and so effectiveness is computed with the NIST-prepared cumulative qrels from rounds 1, 2, and 3 ([`qrels_covid_d3_j0.5-3.txt`](https://ir.nist.gov/covidSubmit/data/qrels-covid_d3_j0.5-3.txt) provided by NIST, stored in our repo as [`qrels.covid-round3-cumulative.txt`](../src/main/resources/topics-and-qrels/qrels.covid-round3-cumulative.txt)).

| | index | field(s) | nDCG@10 | Judged@10 | Recall@1000 | run file | checksum |
|---:|:----------|:--------------------------------|--------:|----------:|------------:|:---------|----------|
| 1 | abstract | query+question | 0.3143 | 0.4467 | 0.4257 | [[download](https://www.dropbox.com/s/mf79huhxfy96g6i/anserini.covid-r4.abstract.qq.bm25.txt)] | `56ac5a0410e235243ca6e9f0f00eefa1` |
| 2 | abstract | UDel qgen | 0.3260 | 0.4378 | 0.4432 | [[download](https://www.dropbox.com/s/4zau6ejrkvgn9m7/anserini.covid-r4.abstract.qdel.bm25.txt)] | `115d6d2e308b47ffacbc642175095c74` |
| 3 | full-text | query+question | 0.2108 | 0.4044 | 0.3891 | [[download](https://www.dropbox.com/s/bpdopie6gqffv0w/anserini.covid-r4.full-text.qq.bm25.txt)] | `af0d10a5344f4007e6781e8d2959eb54` |
| 4 | full-text | UDel qgen | 0.3499 | 0.5067 | 0.4537 | [[download](https://www.dropbox.com/s/rh0uy71ogbpas0v/anserini.covid-r4.full-text.qdel.bm25.txt)] | `594d469b8f45cf808092a3d8e870eaf5` |
| 5 | paragraph | query+question | 0.3229 | 0.5267 | 0.4863 | [[download](https://www.dropbox.com/s/ifkjm8ff8g2aoh1/anserini.covid-r4.paragraph.qq.bm25.txt)] | `6f468b7b60aaa05fc215d237b5475aec` |
| 6 | paragraph | UDel qgen | 0.4016 | 0.5333 | 0.5050 | [[download](https://www.dropbox.com/s/keuogpx1dzinsgy/anserini.covid-r4.paragraph.qdel.bm25.txt)] | `b7b39629c12573ee0bfed8687dacc743` |
| 7 | - | reciprocal rank fusion(1, 3, 5) | 0.3424 | 0.5289 | 0.5033 | [[download](https://www.dropbox.com/s/zjc0069do0a4gu3/anserini.covid-r4.fusion1.txt)] | `8ae9d1fca05bd1d9bfe7b24d1bdbe270` |
| 8 | - | reciprocal rank fusion(2, 4, 6) | 0.4004 | 0.5400 | 0.5291 | [[download](https://www.dropbox.com/s/qekc9vr3oom777n/anserini.covid-r4.fusion2.txt)] | `e1894209c815c96c6ddd4cacb578261a` |

**IMPORTANT NOTES!!!**

+ These runs are performed at [`b8609a`](https://github.com/castorini/anserini/commit/b8609aa8b640a0322641d823fcb3c169acb2f79a), at the release of Anserini 0.9.4.
+ The evaluation numbers are produced with the NIST-prepared cumulative qrels from rounds 1, 2, and 3 ([`qrels_covid_d3_j0.5-3.txt`](https://ir.nist.gov/covidSubmit/data/qrels-covid_d3_j0.5-3.txt) provided by NIST, stored in our repo as [`qrels.covid-round3-cumulative.txt`](../src/main/resources/topics-and-qrels/qrels.covid-round3-cumulative.txt)) on the round 4 collection (release of 6/19).
+ For the abstract and full-text indexes, we request up to 10k hits for each topic; the number of actual hits retrieved is fairly close to this (a bit less because of deduping). For the paragraph index, we request up to 50k hits for each topic; because multiple paragraphs are retrieved from the same document, the number of unique documents in each list of hits is much smaller. A cautionary note: our experience is that choosing the top _k_ documents to rerank has a large impact on end-to-end effectiveness. Reranking the top 100 seems to provide higher precision than top 1000, but the likely tradeoff is lower recall. It is very likely the case that you _don't_ want to rerank all available hits.

Exact commands for replicating these runs are found [further down on this page](experiments-covid.md#round-4-replication-commands).


## Round 3

These are runs that can be easily replicated with Anserini, from pre-built indexes available [here](experiments-cord19.md#pre-built-indexes-all-versions) (version from 2020/05/19, which is the official corpus used in the evaluation).
Expand All @@ -11,11 +36,11 @@ They were prepared _for_ round 3 (for participants who wish to have a baseline r
| | index | field(s) | nDCG@10 | Judged@10 | Recall@1000 | run file | checksum |
|---:|:----------|:-------------------------|--------:|----------:|------------:|:---------|----------|
| 1 | abstract | query+question | 0.2118 | 0.3300 | 0.4398 | [[download](https://www.dropbox.com/s/g80cqdxud1l06wq/anserini.covid-r3.abstract.qq.bm25.txt)] | `d08d85c87e30d6c4abf54799806d282f` |
| 2 | abstract | UDel query generator | 0.2470 | 0.3375 | 0.4537 | [[download](https://www.dropbox.com/s/sjcnxq7h0a3j3xz/anserini.covid-r3.abstract.qdel.bm25.txt)] | `d552dff90995cd860a5727637f0be4d1` |
| 2 | abstract | UDel qgen | 0.2470 | 0.3375 | 0.4537 | [[download](https://www.dropbox.com/s/sjcnxq7h0a3j3xz/anserini.covid-r3.abstract.qdel.bm25.txt)] | `d552dff90995cd860a5727637f0be4d1` |
| 3 | full-text | query+question | 0.2337 | 0.4650 | 0.4817 | [[download](https://www.dropbox.com/s/4bjx35sgosu0jz0/anserini.covid-r3.full-text.qq.bm25.txt)] | `6c9f4c09d842b887262ca84d61c61a1f` |
| 4 | full-text | UDel query generator | 0.3430 | 0.5025 | 0.5267 | [[download](https://www.dropbox.com/s/mjt7y1ywae784d0/anserini.covid-r3.full-text.qdel.bm25.txt)] | `c5f9db7733c72eea78ece2ade44d3d35` |
| 4 | full-text | UDel qgen | 0.3430 | 0.5025 | 0.5267 | [[download](https://www.dropbox.com/s/mjt7y1ywae784d0/anserini.covid-r3.full-text.qdel.bm25.txt)] | `c5f9db7733c72eea78ece2ade44d3d35` |
| 5 | paragraph | query+question | 0.2848 | 0.5175 | 0.5527 | [[download](https://www.dropbox.com/s/qwn7jd8vg2chjik/anserini.covid-r3.paragraph.qq.bm25.txt)] | `872673b3e12c661748d8899f24d3ba48` |
| 6 | paragraph | UDel query generator | 0.3604 | 0.5050 | 0.5676 | [[download](https://www.dropbox.com/s/2928i60fj2i09bt/anserini.covid-r3.paragraph.qdel.bm25.txt)] | `c1b966e4c3f387b6810211f339b35852` |
| 6 | paragraph | UDel qgen | 0.3604 | 0.5050 | 0.5676 | [[download](https://www.dropbox.com/s/2928i60fj2i09bt/anserini.covid-r3.paragraph.qdel.bm25.txt)] | `c1b966e4c3f387b6810211f339b35852` |
| 7 | - | reciprocal rank fusion(1, 3, 5) | 0.3093 | 0.4975 | 0.5566 | [[download](https://www.dropbox.com/s/6vk5iohqf81iy8b/anserini.covid-r3.fusion1.txt)] | `61cbd73c6e60ba44f18ce967b5b0e5b3` |
| 8 | - | reciprocal rank fusion(2, 4, 6) | 0.3568 | 0.5250 | 0.5769 | [[download](https://www.dropbox.com/s/n09595t1eqymkks/anserini.covid-r3.fusion2.txt)] | `d7eabf3dab840104c88de925e918fdab` |

Expand Down Expand Up @@ -65,16 +90,16 @@ And the effectiveness of the above runs:
| `anserini` | `r3.rf` | 0.6883 | 0.9750 | 0.2817

**(Updates 2020/06/23)** We report the effectiveness of all the runs that were prepared _for_ round 3 using the cumulative qrels file (containing qrels from round 1, 2 and 3) mapped to the Round 3 (May 19) release of CORD-19.
This qrels file, called `qrels_covid_d3_j0.5-3.txt`, provided by NIST [here](https://ir.nist.gov/covidSubmit/data/qrels-covid_d3_j0.5-3.txt), is stored in our repo as [`qrels.covid-round3-cumulative.txt`](https://github.com/castorini/anserini/blob/master/src/main/resources/topics-and-qrels/qrels.covid-round3-cumulative.txt).
This qrels file, provided by NIST as [`qrels_covid_d3_j0.5-3.txt`](https://ir.nist.gov/covidSubmit/data/qrels-covid_d3_j0.5-3.txt), is stored in our repo as [`qrels.covid-round3-cumulative.txt`](../src/main/resources/topics-and-qrels/qrels.covid-round3-cumulative.txt).

| | index | field(s) | nDCG@10 | Judged@10 | Recall@1000 |
|---:|:----------|:-------------------------|--------:|----------:|------------:|
| 1 | abstract | query+question | 0.5781 | 0.8875 | 0.5040 |
| 2 | abstract | UDel query generator | 0.6291 | 0.9300 | 0.5215 |
| 2 | abstract | UDel qgen | 0.6291 | 0.9300 | 0.5215 |
| 3 | full-text | query+question | 0.3977 | 0.7500 | 0.4708 |
| 4 | full-text | UDel query generator | 0.5790 | 0.9050 | 0.5313 |
| 4 | full-text | UDel qgen | 0.5790 | 0.9050 | 0.5313 |
| 5 | paragraph | query+question | 0.5396 | 0.9425 | 0.5766 |
| 6 | paragraph | UDel query generator | 0.6327 | 0.9600 | 0.5923 |
| 6 | paragraph | UDel qgen | 0.6327 | 0.9600 | 0.5923 |
| 7 | - | reciprocal rank fusion(1, 3, 5) | 0.5924 | 0.9625 | 0.5956 |
| 8 | - | reciprocal rank fusion(2, 4, 6) | 0.6515 | 0.9875 | 0.6194 |

Expand Down Expand Up @@ -173,12 +198,17 @@ Caveats:
+ These runs represent, essentially, testing on training data. Beware of generalization or lack thereof.
+ Beware of unjudged documents.

TODO:
Exact commands for replicating these runs are found [further down on this page](experiments-covid.md#round-1-replication-commands).

+ Run query expansion.
+ Run different fusion techniques.

Exact commands for replicating these runs are found [further down on this page](experiments-covid.md#round-1-replication-commands).
## Round 4: Replication Commands

We have written scripts that make replicating the round 4 baselines easy:

```
$ python src/main/python/trec-covid/download_indexes.py --date 2020-06-19
$ python src/main/python/trec-covid/generate_round4_baselines.py
```

## Round 3: Replication Commands

Expand Down Expand Up @@ -283,6 +313,14 @@ python src/main/python/trec-covid/remove_judged_docids.py --qrels src/main/resou
--input runs/anserini.covid-r3.fusion2.txt --output runs/anserini.final-r3.fusion2.txt --runtag r3.fusion2
```

**(Updates 2020/05/26)** We have written scripts that make replicating the round 3 baselines easy:

```
$ python src/main/python/trec-covid/download_indexes.py --date 2020-05-19
$ python src/main/python/trec-covid/generate_round3_baselines.py
```


## Round 2: Replication Commands

Here are the replication commands for the individual runs:
Expand Down
Loading

0 comments on commit 68ace26

Please sign in to comment.