Skip to content

Commit

Permalink
Add regression for the TREC background linking 2020 (#1493)
Browse files Browse the repository at this point in the history
+ Add topics
+ Add qrels
+ Add regression file
+Update 2019 regression file (it refers to the wrong qrel and topic files)
+ Generate docs according to these changes
  • Loading branch information
chriskamphuis authored Mar 30, 2021
1 parent 0867efd commit c75c63b
Show file tree
Hide file tree
Showing 7 changed files with 18,257 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ For additional details, see explanation of [common indexing options](common-inde
Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST:

+ [`topics.backgroundlinking19.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt): topics for the background linking task of the TREC 2019 News Track
+ [`qrels.backgroundlinking18.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt): qrels for the background linking task of the TREC 2019 News Track
+ [`qrels.backgroundlinking19.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt): qrels for the background linking task of the TREC 2019 News Track

After indexing has completed, you should be able to perform retrieval as follows:

Expand Down
73 changes: 73 additions & 0 deletions docs/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Anserini: Regressions for [TREC 2020 Background Linking](http://trec-news.org/)

This page describes regressions for the background linking task in the [TREC 2020 News Track](http://trec-news.org/).
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

## Indexing

Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection \
-input /path/to/backgroundlinking20 \
-index indexes/lucene-index.core18-v3.pos+docvectors+raw \
-generator WashingtonPostGenerator \
-threads 1 -storePositions -storeDocvectors -storeRaw \
>& logs/log.backgroundlinking20 &
```

The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/`
should bring up a single JSON file.

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST:

+ [`topics.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt): topics for the background linking task of the TREC 2020 News Track
+ [`qrels.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt): qrels for the background linking task of the TREC 2020 News Track

After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.backgroundlinking20.bm25.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.backgroundlinking20.bm25+rm3.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \
-output runs/run.backgroundlinking20.bm25+rm3+df.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25+rm3.topics.backgroundlinking20.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25+rm3+df.topics.backgroundlinking20.txt
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

NCDG@5 | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.5231 | 0.5673 | 0.5279 |


AP | BM25 | +RM3 | +RM3+DF |
:---------------------------------------|-----------|-----------|-----------|
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.3286 | 0.4504 | 0.3421 |

Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For additional details, see explanation of [common indexing options](common-inde
Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST:

+ [`topics.backgroundlinking19.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt): topics for the background linking task of the TREC 2019 News Track
+ [`qrels.backgroundlinking18.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking18.txt): qrels for the background linking task of the TREC 2019 News Track
+ [`qrels.backgroundlinking19.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking19.txt): qrels for the background linking task of the TREC 2019 News Track

After indexing has completed, you should be able to perform retrieval as follows:

Expand Down
44 changes: 44 additions & 0 deletions src/main/resources/docgen/templates/backgroundlinking20.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Anserini: Regressions for [TREC 2020 Background Linking](http://trec-news.org/)

This page describes regressions for the background linking task in the [TREC 2020 News Track](http://trec-news.org/).
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml).
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

## Indexing

Typical indexing command:

```
${index_cmds}
```

The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/`
should bring up a single JSON file.

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST:

+ [`topics.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt): topics for the background linking task of the TREC 2020 News Track
+ [`qrels.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt): qrels for the background linking task of the TREC 2020 News Track

After indexing has completed, you should be able to perform retrieval as follows:

```
${ranking_cmds}
```

Evaluation can be performed using `trec_eval`:

```
${eval_cmds}
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

${effectiveness}

75 changes: 75 additions & 0 deletions src/main/resources/regression/backgroundlinking20.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
name: backgroundlinking20
index_command: target/appassembler/bin/IndexCollection
index_utils_command: target/appassembler/bin/IndexReaderUtils
search_command: target/appassembler/bin/SearchCollection
topic_root: src/main/resources/topics-and-qrels/
qrels_root: src/main/resources/topics-and-qrels/
ranking_root:
generator: WashingtonPostGenerator
threads: 1
index_options:
- -storePositions
- -storeDocvectors
- -storeRaw
topic_reader: BackgroundLinking
input_roots:
- /tuna1/ # on tuna
- /store/ # on orca
- /scratch2/ # on damiano
input: collections/newswire/WashingtonPost.v3/data/
index_path: indexes/lucene-index.core18-v3.pos+docvectors+raw
collection: WashingtonPostCollection
index_stats:
documents: 671945
documents (non-empty): 671945
total terms: 366108299
topics:
- name: "[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)"
path: topics.backgroundlinking20.txt
qrel: qrels.backgroundlinking20.txt
evals:
- command: tools/eval/trec_eval.9.0.4/trec_eval
params:
- -c -M1000 -m ndcg_cut.5
separator: "\t"
parse_index: 2
metric: NCDG@5
metric_precision: 4
can_combine: true
- command: tools/eval/trec_eval.9.0.4/trec_eval
params:
- -c -M1000 -m map
separator: "\t"
parse_index: 2
metric: AP
metric_precision: 4
can_combine: true
models:
- name: bm25
display: BM25
params:
- -backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100
results:
AP:
- 0.3286
NCDG@5:
- 0.5231
- name: bm25+rm3
display: +RM3
params:
- -backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100
results:
AP:
- 0.4504
NCDG@5:
- 0.5673
- name: bm25+rm3+df
display: +RM3+DF
params:
- -backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100
results:
AP:
- 0.3421
NCDG@5:
- 0.5279
Loading

0 comments on commit c75c63b

Please sign in to comment.