Skip to content

data for the emnlp'20 paper "Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2"

License

Notifications You must be signed in to change notification settings

YosiMass/ad-hoc-retrieval

Repository files navigation

ad-hoc-retrieval

This package contains the generated paraphrases for the three ad-hoc datasets (TREC-COVID, WSK and AP) used in the EMNLP'20 paper "Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2"

The package contains the following files

  • TREC-Covid-paraphrases-filtered.tsv
  • WSJ-paraphrases-filtered.tsv
  • AP-paraphrases-filtered

The three files are the generated paraphrases for each of the three datasets used in the paper. Those paraphrases have passed the filter as described in the paper.

Each line in those files contains two fields that are tab separated

  • The original title from the document
  • A generated paraphrase

For example -

Clinical review: A systematic review of corticosteroid use in infections Strong evidence for the use of corticosteroids in septic shock

If you use the data please include citations to the following paper:

Yosi Mass, Haggai Roitman, 2020,
Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2,
in proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20

About

data for the emnlp'20 paper "Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published