diff --git a/README.md b/README.md index 1a7b7900ee..3d7c7a12a8 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,7 @@ For the most part, these runs are based on [_default_ parameter settings](https: + Regressions for [Tweets2011 (MB11 & MB12)](docs/regressions-mb11.md), [Tweets2013 (MB13 & MB14)](docs/regressions-mb13.md) + Regressions for Complex Answer Retrieval (CAR17): [[v1.5](docs/regressions-car17v1.5.md)] [[v2.0](docs/regressions-car17v2.0.md)] [[v2.0 with doc2query](docs/regressions-car17v2.0-doc2query.md)] + Regressions for MS MARCO Passage Ranking: [[base](docs/regressions-msmarco-passage.md)] [[doc2query](docs/regressions-msmarco-passage-doc2query.md)] [[docTTTTTquery](docs/regressions-msmarco-passage-docTTTTTquery.md)] ++ Regressions for MS MARCO Passage Ranking: [[DeepImpact](docs/regressions-msmarco-passage-deepimpact.md)] [[uniCOIL](docs/regressions-msmarco-passage-unicoil.md)] + Regressions for MS MARCO Document Ranking, Per Doc: [[base](docs/regressions-msmarco-doc.md)] [[docTTTTTquery](docs/regressions-msmarco-doc-docTTTTTquery-per-doc.md)] + Regressions for MS MARCO Document Ranking, Per Passage: [[base](docs/regressions-msmarco-doc-per-passage.md)] [[docTTTTTquery](docs/regressions-msmarco-doc-docTTTTTquery-per-passage.md)] + Regressions for the TREC 2019 Deep Learning Track (Passage): [[base](docs/regressions-dl19-passage.md)] [[docTTTTTquery](docs/regressions-dl19-passage-docTTTTTquery.md)] diff --git a/docs/experiments-msmarco-passage-deepimpact.md b/docs/experiments-msmarco-passage-deepimpact.md index 78562ee930..a27e78a493 100644 --- a/docs/experiments-msmarco-passage-deepimpact.md +++ b/docs/experiments-msmarco-passage-deepimpact.md @@ -15,7 +15,7 @@ We're going to use the repository's root directory as the working directory. First, we need to download and extract the MS MARCO passage dataset with DeepImpact processing: ```bash -wget https://git.uwaterloo.ca/jimmylin/deep-impact/raw/master/msmarco-passage-deepimpact-b8.tar -P collections/ +wget https://git.uwaterloo.ca/jimmylin/deepimpact/raw/master/msmarco-passage-deepimpact-b8.tar -P collections/ # Alternate mirror wget https://vault.cs.uwaterloo.ca/s/57AE5aAjzw2ox2n/download -O collections/msmarco-passage-deepimpact-b8.tar @@ -51,7 +51,7 @@ The queries are already stored in the repo, so we can run retrieval directly: ```bash target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-deepimpact-b8 \ - -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \ + -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \ -output runs/run.msmarco-passage-deepimpact-b8.trec \ -impact -pretokenized ``` @@ -59,8 +59,8 @@ target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-pas The queries are also available to download at the following locations: ```bash -wget https://git.uwaterloo.ca/jimmylin/deep-impact/raw/master/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz -P collections/ -wget https://vault.cs.uwaterloo.ca/s/NYibRJ9bXs5PspH/download -O collections/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz +wget https://git.uwaterloo.ca/jimmylin/deepimpact/raw/master/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz -P collections/ +wget https://vault.cs.uwaterloo.ca/s/NYibRJ9bXs5PspH/download -O collections/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz # MD5 checksum: 88a2987d6a25b1be11c82e87677a262e ``` diff --git a/docs/experiments-msmarco-passage-unicoil.md b/docs/experiments-msmarco-passage-unicoil.md index b3b89847c6..965fd98f45 100644 --- a/docs/experiments-msmarco-passage-unicoil.md +++ b/docs/experiments-msmarco-passage-unicoil.md @@ -90,6 +90,8 @@ QueriesRanked: 6980 ##################### ``` +This corresponds to the effectiveness reported in the paper. + ## Reproduction Log[*](reproducibility.md) diff --git a/docs/regressions-msmarco-passage-deepimpact.md b/docs/regressions-msmarco-passage-deepimpact.md new file mode 100644 index 0000000000..d0f06af808 --- /dev/null +++ b/docs/regressions-msmarco-passage-deepimpact.md @@ -0,0 +1,91 @@ +# Anserini: Regressions for DeepImpact on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking) + +This page documents regression experiments for DeepImpact on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework. +DeepImpact is described in the following paper: + +> Antonio Mallia, Omar Khattab, Nicola Tonellotto, and Torsten Suel. [Learning Passage Impacts for Inverted Indexes.](https://dl.acm.org/doi/10.1145/3404835.3463030) _SIGIR 2021_. + +For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-deepimpact.md). + +The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-deepimpact.yaml). +Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. + +## Indexing + +Typical indexing command: + +``` +nohup sh target/appassembler/bin/IndexCollection -collection JsonVectorCollection \ + -input /path/to/msmarco-passage-deepimpact \ + -index indexes/lucene-index.msmarco-passage-deepimpact.raw \ + -generator DefaultLuceneDocumentGenerator \ + -threads 16 -impact -pretokenized -storeRaw \ + >& logs/log.msmarco-passage-deepimpact & +``` + +The directory `/path/to/msmarco-passage-deepimpact/` should be a directory containing the compressed `jsonl` files that comprise the corpus. +See [this page](experiments-msmarco-passage-deepimpact.md) for additional details. + +For additional details, see explanation of [common indexing options](common-indexing-options.md). + +## Retrieval + +Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/). +The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. + +After indexing has completed, you should be able to perform retrieval as follows: + +``` +nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-deepimpact.raw \ + -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \ + -output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \ + -impact -pretokenized & +``` + +Evaluation can be performed using `trec_eval`: + +``` +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -m recip_rank -c -m recall.1000 -c src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deepimpact.tsv.gz +``` + +## Effectiveness + +With the above commands, you should be able to reproduce the following results: + +MAP | DeepImpact| +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3334 | + + +MRR | DeepImpact| +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3386 | + + +R@1000 | DeepImpact| +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.9476 | + +The above runs are in TREC output format and evaluated with `trec_eval`. +In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate: + +```bash +python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \ + --input runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \ + --output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco --quiet + +python tools/scripts/msmarco/msmarco_passage_eval.py \ + collections/msmarco-passage/qrels.dev.small.tsv \ + runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco +``` + +The results should be as follows: + +``` +##################### +MRR @10: 0.3252764133351524 +QueriesRanked: 6980 +##################### +``` + +The final evaluation metric is very close to the one reported in the paper (0.326). diff --git a/docs/regressions-msmarco-passage-unicoil.md b/docs/regressions-msmarco-passage-unicoil.md new file mode 100644 index 0000000000..d11ab23f5d --- /dev/null +++ b/docs/regressions-msmarco-passage-unicoil.md @@ -0,0 +1,91 @@ +# Anserini: Regressions for uniCOIL on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking) + +This page documents regression experiments for uniCOIL on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework. +The uniCOIL model is described in the following paper: + +> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. + +For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-unicoil.md). + +The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil.yaml). +Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. + +## Indexing + +Typical indexing command: + +``` +nohup sh target/appassembler/bin/IndexCollection -collection JsonVectorCollection \ + -input /path/to/msmarco-passage-unicoil \ + -index indexes/lucene-index.msmarco-passage-unicoil.raw \ + -generator DefaultLuceneDocumentGenerator \ + -threads 16 -impact -pretokenized -storeRaw \ + >& logs/log.msmarco-passage-unicoil & +``` + +The directory `/path/to/msmarco-passage-unicoil/` should be a directory containing the compressed `jsonl` files that comprise the corpus. +See [this page](experiments-msmarco-passage-unicoil.md) for additional details. + +For additional details, see explanation of [common indexing options](common-indexing-options.md). + +## Retrieval + +Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/). +The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. + +After indexing has completed, you should be able to perform retrieval as follows: + +``` +nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-unicoil.raw \ + -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.unicoil.tsv.gz \ + -output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \ + -impact -pretokenized & +``` + +Evaluation can be performed using `trec_eval`: + +``` +tools/eval/trec_eval.9.0.4/trec_eval -m map -c -m recip_rank -c -m recall.1000 -c src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz +``` + +## Effectiveness + +With the above commands, you should be able to reproduce the following results: + +MAP | uniCOIL | +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3574 | + + +MRR | uniCOIL | +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3625 | + + +R@1000 | uniCOIL | +:---------------------------------------|-----------| +[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.9582 | + +The above runs are in TREC output format and evaluated with `trec_eval`. +In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate: + +```bash +python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \ + --input runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \ + --output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco --quiet + +python tools/scripts/msmarco/msmarco_passage_eval.py \ + tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt \ + runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco +``` + +The results should be as follows: + +``` +##################### +MRR @10: 0.35155222404147896 +QueriesRanked: 6980 +##################### +``` + +This corresponds to the effectiveness reported in the paper. \ No newline at end of file diff --git a/docs/regressions.md b/docs/regressions.md index e3599f3374..7a2e97bb63 100644 --- a/docs/regressions.md +++ b/docs/regressions.md @@ -56,6 +56,9 @@ nohup python src/main/python/run_regression.py --collection msmarco-passage >& l nohup python src/main/python/run_regression.py --collection msmarco-passage-doc2query >& logs/log.msmarco-passage-doc2query & nohup python src/main/python/run_regression.py --collection msmarco-passage-docTTTTTquery >& logs/log.msmarco-passage-docTTTTTquery & +nohup python src/main/python/run_regression.py --collection msmarco-passage-deepimpact >& logs/log.msmarco-passage-deepimpact & +nohup python src/main/python/run_regression.py --collection msmarco-passage-unicoil >& logs/log.msmarco-passage-unicoil & + nohup python src/main/python/run_regression.py --collection msmarco-doc >& logs/log.msmarco-doc & nohup python src/main/python/run_regression.py --collection msmarco-doc-per-passage >& logs/log.msmarco-doc-per-passage & nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTTquery-per-doc >& logs/log.msmarco-doc-docTTTTTquery-per-doc & @@ -121,6 +124,9 @@ nohup python src/main/python/run_regression.py --index --collection msmarco-pass nohup python src/main/python/run_regression.py --index --collection msmarco-passage-doc2query >& logs/log.msmarco-passage-doc2query & nohup python src/main/python/run_regression.py --index --collection msmarco-passage-docTTTTTquery >& logs/log.msmarco-passage-docTTTTTquery & +nohup python src/main/python/run_regression.py --index --collection msmarco-passage-deepimpact >& logs/log.msmarco-passage-deepimpact & +nohup python src/main/python/run_regression.py --index --collection msmarco-passage-unicoil >& logs/log.msmarco-passage-unicoil & + nohup python src/main/python/run_regression.py --index --collection msmarco-doc >& logs/log.msmarco-doc & nohup python src/main/python/run_regression.py --index --collection msmarco-doc-per-passage >& logs/log.msmarco-doc-per-passage & nohup python src/main/python/run_regression.py --index --collection msmarco-doc-docTTTTTquery-per-doc >& logs/log.msmarco-doc-docTTTTTquery-per-doc & diff --git a/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template b/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template new file mode 100644 index 0000000000..492696b94d --- /dev/null +++ b/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template @@ -0,0 +1,71 @@ +# Anserini: Regressions for DeepImpact on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking) + +This page documents regression experiments for DeepImpact on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework. +DeepImpact is described in the following paper: + +> Antonio Mallia, Omar Khattab, Nicola Tonellotto, and Torsten Suel. [Learning Passage Impacts for Inverted Indexes.](https://dl.acm.org/doi/10.1145/3404835.3463030) _SIGIR 2021_. + +For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-deepimpact.md). + +The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-deepimpact.yaml). +Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. + +## Indexing + +Typical indexing command: + +``` +${index_cmds} +``` + +The directory `/path/to/msmarco-passage-deepimpact/` should be a directory containing the compressed `jsonl` files that comprise the corpus. +See [this page](experiments-msmarco-passage-deepimpact.md) for additional details. + +For additional details, see explanation of [common indexing options](common-indexing-options.md). + +## Retrieval + +Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/). +The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. + +After indexing has completed, you should be able to perform retrieval as follows: + +``` +${ranking_cmds} +``` + +Evaluation can be performed using `trec_eval`: + +``` +${eval_cmds} +``` + +## Effectiveness + +With the above commands, you should be able to reproduce the following results: + +${effectiveness} + +The above runs are in TREC output format and evaluated with `trec_eval`. +In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate: + +```bash +python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \ + --input runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \ + --output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco --quiet + +python tools/scripts/msmarco/msmarco_passage_eval.py \ + collections/msmarco-passage/qrels.dev.small.tsv \ + runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco +``` + +The results should be as follows: + +``` +##################### +MRR @10: 0.3252764133351524 +QueriesRanked: 6980 +##################### +``` + +The final evaluation metric is very close to the one reported in the paper (0.326). diff --git a/src/main/resources/docgen/templates/msmarco-passage-unicoil.template b/src/main/resources/docgen/templates/msmarco-passage-unicoil.template new file mode 100644 index 0000000000..21964fbd62 --- /dev/null +++ b/src/main/resources/docgen/templates/msmarco-passage-unicoil.template @@ -0,0 +1,71 @@ +# Anserini: Regressions for uniCOIL on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking) + +This page documents regression experiments for uniCOIL on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework. +The uniCOIL model is described in the following paper: + +> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_. + +For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-unicoil.md). + +The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil.yaml). +Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead. + +## Indexing + +Typical indexing command: + +``` +${index_cmds} +``` + +The directory `/path/to/msmarco-passage-unicoil/` should be a directory containing the compressed `jsonl` files that comprise the corpus. +See [this page](experiments-msmarco-passage-unicoil.md) for additional details. + +For additional details, see explanation of [common indexing options](common-indexing-options.md). + +## Retrieval + +Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/). +The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details. + +After indexing has completed, you should be able to perform retrieval as follows: + +``` +${ranking_cmds} +``` + +Evaluation can be performed using `trec_eval`: + +``` +${eval_cmds} +``` + +## Effectiveness + +With the above commands, you should be able to reproduce the following results: + +${effectiveness} + +The above runs are in TREC output format and evaluated with `trec_eval`. +In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate: + +```bash +python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \ + --input runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \ + --output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco --quiet + +python tools/scripts/msmarco/msmarco_passage_eval.py \ + tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt \ + runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco +``` + +The results should be as follows: + +``` +##################### +MRR @10: 0.35155222404147896 +QueriesRanked: 6980 +##################### +``` + +This corresponds to the effectiveness reported in the paper. \ No newline at end of file diff --git a/src/main/resources/regression/msmarco-passage-deepimpact.yaml b/src/main/resources/regression/msmarco-passage-deepimpact.yaml new file mode 100644 index 0000000000..2dfc6b03bb --- /dev/null +++ b/src/main/resources/regression/msmarco-passage-deepimpact.yaml @@ -0,0 +1,72 @@ +--- +name: msmarco-passage-deepimpact +index_command: target/appassembler/bin/IndexCollection +index_utils_command: target/appassembler/bin/IndexReaderUtils +search_command: target/appassembler/bin/SearchCollection +topic_root: src/main/resources/topics-and-qrels/ +qrels_root: src/main/resources/topics-and-qrels/ +index_root: +ranking_root: +collection: JsonVectorCollection +generator: DefaultLuceneDocumentGenerator +threads: 16 +index_options: + - -impact + - -pretokenized + - -storeRaw +topic_reader: TsvInt +evals: + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m map + - -c + separator: "\t" + parse_index: 2 + metric: map + metric_precision: 4 + can_combine: true + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m recip_rank + - -c + separator: "\t" + parse_index: 2 + metric: mrr + metric_precision: 4 + can_combine: true + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m recall.1000 + - -c + separator: "\t" + parse_index: 2 + metric: R@1000 + metric_precision: 4 + can_combine: true +input_roots: + - /tuna1/ # on tuna + - /store/ # on orca + - /scratch2/ # on damiano +input: collections/msmarco/msmarco-passage-deepimpact-b8/ +index_path: indexes/lucene-index.msmarco-passage-deepimpact.raw +index_stats: + documents: 8841823 + documents (non-empty): 8841823 + total terms: 35455908214 +topics: + - name: "[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)" + path: topics.msmarco-passage.dev-subset.deepimpact.tsv.gz + qrel: qrels.msmarco-passage.dev-subset.txt +models: + - name: deepimpact + display: DeepImpact + params: + - -impact -pretokenized + results: + map: + - 0.3334 + mrr: + - 0.3386 + R@1000: + - 0.9476 + diff --git a/src/main/resources/regression/msmarco-passage-unicoil.yaml b/src/main/resources/regression/msmarco-passage-unicoil.yaml new file mode 100644 index 0000000000..9deac9aa1c --- /dev/null +++ b/src/main/resources/regression/msmarco-passage-unicoil.yaml @@ -0,0 +1,72 @@ +--- +name: msmarco-passage-unicoil +index_command: target/appassembler/bin/IndexCollection +index_utils_command: target/appassembler/bin/IndexReaderUtils +search_command: target/appassembler/bin/SearchCollection +topic_root: src/main/resources/topics-and-qrels/ +qrels_root: src/main/resources/topics-and-qrels/ +index_root: +ranking_root: +collection: JsonVectorCollection +generator: DefaultLuceneDocumentGenerator +threads: 16 +index_options: + - -impact + - -pretokenized + - -storeRaw +topic_reader: TsvInt +evals: + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m map + - -c + separator: "\t" + parse_index: 2 + metric: map + metric_precision: 4 + can_combine: true + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m recip_rank + - -c + separator: "\t" + parse_index: 2 + metric: mrr + metric_precision: 4 + can_combine: true + - command: tools/eval/trec_eval.9.0.4/trec_eval + params: + - -m recall.1000 + - -c + separator: "\t" + parse_index: 2 + metric: R@1000 + metric_precision: 4 + can_combine: true +input_roots: + - /tuna1/ # on tuna + - /store/ # on orca + - /scratch2/ # on damiano +input: collections/msmarco/msmarco-passage-unicoil-b8/ +index_path: indexes/lucene-index.msmarco-passage-unicoil.raw +index_stats: + documents: 8841823 + documents (non-empty): 8841823 + total terms: 44495093768 +topics: + - name: "[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)" + path: topics.msmarco-passage.dev-subset.unicoil.tsv.gz + qrel: qrels.msmarco-passage.dev-subset.txt +models: + - name: unicoil + display: uniCOIL + params: + - -impact -pretokenized + results: + map: + - 0.3574 + mrr: + - 0.3625 + R@1000: + - 0.9582 + diff --git a/src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz b/src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz similarity index 100% rename from src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz rename to src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz diff --git a/src/test/java/io/anserini/search/topicreader/TsvIntTopicReaderGzTest.java b/src/test/java/io/anserini/search/topicreader/TsvIntTopicReaderGzTest.java index 6d1d0594a9..0b59959460 100644 --- a/src/test/java/io/anserini/search/topicreader/TsvIntTopicReaderGzTest.java +++ b/src/test/java/io/anserini/search/topicreader/TsvIntTopicReaderGzTest.java @@ -30,7 +30,7 @@ public class TsvIntTopicReaderGzTest { @Test public void test() throws IOException { TopicReader reader = new TsvIntTopicReader( - Paths.get("src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz")); + Paths.get("src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz")); SortedMap> topics = reader.read();