Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Test for Car17 #379

Merged
merged 28 commits into from
Aug 9, 2018
Merged

Conversation

Victor0118
Copy link
Member

  1. delete the doc for CAR18
  2. add the automatically generated doc for CAR17
  3. add qrels file and topic file for CAR17-test200
  4. update the CarTopicReader to make it work on CAR17

@Peilin-Yang @lintool Could you review this PR?

@Victor0118 Victor0118 changed the title Regression car17 Regression Test for Car17 Aug 2, 2018
map:
- 0.1354
recip_rank:
- 0.1861
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not BM25+AX and QL+AX?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it!

@Victor0118
Copy link
Member Author

@Peilin-Yang @lintool The axiom results are added. Could you review this PR now?

${ranking_cmds}
```

Evaluation can be performed using `trec_eval` and `gdeval.pl`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it is not using gdeval.pl

threads: 40
index_options:
- -storePositions
- -storeDocvectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to enable -storeRawDocs for regression test.
Could you please add it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

- name: bm25+ax
params:
- -bm25
- -axiom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make the axiomatic reranking deterministic, you will need to add

- -rerankCutoff 20
- -axiom.deterministic

Please see other yaml files for examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add it.

- name: ql+ax
params:
- -bm25
- -axiom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- -rerankCutoff 20
- -axiom.deterministic

Topics and qrels are stored in `src/main/resources/topics-and-qrels/`, downloaded from NIST:

+ `topics.car17.test200.txt`: [Topics for the test200 subset (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
+ `qrel: qrels.car17.test200.hierarchical.txt`: [adhoc qrels (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file called qrels.car17.test200.hierarchical, which is not compatible with the topics file?
I think topics.car17.txt and qrels.car17.txt is better.

Copy link
Member Author

@Victor0118 Victor0118 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In car17, there are two test sets: test200 and benchmark_test. I follow benchmark paper and use test200 as the evaluation set in Anserini.

And for each test set, there are three kinds of qrels file provided: article, toplevel, hierarchical. Car17 use hierarchical for the final evaluation.

That's where test200 and hierarchicalare from. To clarify which test set and qrels file I use, I add them in the file name. Is that good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this makes more sense to me!

generator: LuceneDocumentGenerator
threads: 40
index_options:
- -storeRawDocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put -storeRawDocs right before -optimize

Copy link
Contributor

@Peilin-Yang Peilin-Yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Peilin-Yang Peilin-Yang merged commit d4b3272 into castorini:master Aug 9, 2018
crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants