-
Notifications
You must be signed in to change notification settings - Fork 467
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add regression for the TREC background linking 2020 (#1493)
+ Add topics + Add qrels + Add regression file +Update 2019 regression file (it refers to the wrong qrel and topic files) + Generate docs according to these changes
- Loading branch information
1 parent
0867efd
commit c75c63b
Showing
7 changed files
with
18,257 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Anserini: Regressions for [TREC 2020 Background Linking](http://trec-news.org/) | ||
|
||
This page describes regressions for the background linking task in the [TREC 2020 News Track](http://trec-news.org/). | ||
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml). | ||
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. | ||
|
||
## Indexing | ||
|
||
Typical indexing command: | ||
|
||
``` | ||
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection \ | ||
-input /path/to/backgroundlinking20 \ | ||
-index indexes/lucene-index.core18-v3.pos+docvectors+raw \ | ||
-generator WashingtonPostGenerator \ | ||
-threads 1 -storePositions -storeDocvectors -storeRaw \ | ||
>& logs/log.backgroundlinking20 & | ||
``` | ||
|
||
The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` | ||
should bring up a single JSON file. | ||
|
||
For additional details, see explanation of [common indexing options](common-indexing-options.md). | ||
|
||
## Retrieval | ||
|
||
Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST: | ||
|
||
+ [`topics.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt): topics for the background linking task of the TREC 2020 News Track | ||
+ [`qrels.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt): qrels for the background linking task of the TREC 2020 News Track | ||
|
||
After indexing has completed, you should be able to perform retrieval as follows: | ||
|
||
``` | ||
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \ | ||
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \ | ||
-output runs/run.backgroundlinking20.bm25.topics.backgroundlinking20.txt \ | ||
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 & | ||
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \ | ||
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \ | ||
-output runs/run.backgroundlinking20.bm25+rm3.topics.backgroundlinking20.txt \ | ||
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 & | ||
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18-v3.pos+docvectors+raw \ | ||
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt \ | ||
-output runs/run.backgroundlinking20.bm25+rm3+df.topics.backgroundlinking20.txt \ | ||
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 & | ||
``` | ||
|
||
Evaluation can be performed using `trec_eval`: | ||
|
||
``` | ||
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25.topics.backgroundlinking20.txt | ||
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25+rm3.topics.backgroundlinking20.txt | ||
tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m ndcg_cut.5 -c -M1000 -m map src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt runs/run.backgroundlinking20.bm25+rm3+df.topics.backgroundlinking20.txt | ||
``` | ||
|
||
## Effectiveness | ||
|
||
With the above commands, you should be able to replicate the following results: | ||
|
||
NCDG@5 | BM25 | +RM3 | +RM3+DF | | ||
:---------------------------------------|-----------|-----------|-----------| | ||
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.5231 | 0.5673 | 0.5279 | | ||
|
||
|
||
AP | BM25 | +RM3 | +RM3+DF | | ||
:---------------------------------------|-----------|-----------|-----------| | ||
[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)| 0.3286 | 0.4504 | 0.3421 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
src/main/resources/docgen/templates/backgroundlinking20.template
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Anserini: Regressions for [TREC 2020 Background Linking](http://trec-news.org/) | ||
|
||
This page describes regressions for the background linking task in the [TREC 2020 News Track](http://trec-news.org/). | ||
The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/backgroundlinking20.yaml). | ||
Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/backgroundlinking20.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. | ||
|
||
## Indexing | ||
|
||
Typical indexing command: | ||
|
||
``` | ||
${index_cmds} | ||
``` | ||
|
||
The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus *v3*](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/` | ||
should bring up a single JSON file. | ||
|
||
For additional details, see explanation of [common indexing options](common-indexing-options.md). | ||
|
||
## Retrieval | ||
|
||
Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/), downloaded from NIST: | ||
|
||
+ [`topics.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt): topics for the background linking task of the TREC 2020 News Track | ||
+ [`qrels.backgroundlinking20.txt`](../src/main/resources/topics-and-qrels/qrels.backgroundlinking20.txt): qrels for the background linking task of the TREC 2020 News Track | ||
|
||
After indexing has completed, you should be able to perform retrieval as follows: | ||
|
||
``` | ||
${ranking_cmds} | ||
``` | ||
|
||
Evaluation can be performed using `trec_eval`: | ||
|
||
``` | ||
${eval_cmds} | ||
``` | ||
|
||
## Effectiveness | ||
|
||
With the above commands, you should be able to replicate the following results: | ||
|
||
${effectiveness} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
--- | ||
name: backgroundlinking20 | ||
index_command: target/appassembler/bin/IndexCollection | ||
index_utils_command: target/appassembler/bin/IndexReaderUtils | ||
search_command: target/appassembler/bin/SearchCollection | ||
topic_root: src/main/resources/topics-and-qrels/ | ||
qrels_root: src/main/resources/topics-and-qrels/ | ||
ranking_root: | ||
generator: WashingtonPostGenerator | ||
threads: 1 | ||
index_options: | ||
- -storePositions | ||
- -storeDocvectors | ||
- -storeRaw | ||
topic_reader: BackgroundLinking | ||
input_roots: | ||
- /tuna1/ # on tuna | ||
- /store/ # on orca | ||
- /scratch2/ # on damiano | ||
input: collections/newswire/WashingtonPost.v3/data/ | ||
index_path: indexes/lucene-index.core18-v3.pos+docvectors+raw | ||
collection: WashingtonPostCollection | ||
index_stats: | ||
documents: 671945 | ||
documents (non-empty): 671945 | ||
total terms: 366108299 | ||
topics: | ||
- name: "[TREC 2020 Topics](../src/main/resources/topics-and-qrels/topics.backgroundlinking20.txt)" | ||
path: topics.backgroundlinking20.txt | ||
qrel: qrels.backgroundlinking20.txt | ||
evals: | ||
- command: tools/eval/trec_eval.9.0.4/trec_eval | ||
params: | ||
- -c -M1000 -m ndcg_cut.5 | ||
separator: "\t" | ||
parse_index: 2 | ||
metric: NCDG@5 | ||
metric_precision: 4 | ||
can_combine: true | ||
- command: tools/eval/trec_eval.9.0.4/trec_eval | ||
params: | ||
- -c -M1000 -m map | ||
separator: "\t" | ||
parse_index: 2 | ||
metric: AP | ||
metric_precision: 4 | ||
can_combine: true | ||
models: | ||
- name: bm25 | ||
display: BM25 | ||
params: | ||
- -backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 | ||
results: | ||
AP: | ||
- 0.3286 | ||
NCDG@5: | ||
- 0.5231 | ||
- name: bm25+rm3 | ||
display: +RM3 | ||
params: | ||
- -backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 | ||
results: | ||
AP: | ||
- 0.4504 | ||
NCDG@5: | ||
- 0.5673 | ||
- name: bm25+rm3+df | ||
display: +RM3+DF | ||
params: | ||
- -backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 | ||
results: | ||
AP: | ||
- 0.3421 | ||
NCDG@5: | ||
- 0.5279 |
Oops, something went wrong.