Datasets of Paraphrased SQuAD Questions

This repository contains the datasets used in the paper

@inproceedings{gan-ng-2019-improving,
    title = "Improving the Robustness of Question Answering Systems to Question Paraphrasing",
    author = "Gan, Wee Chung  and
      Ng, Hwee Tou",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
}

The datasets are organised according to the original .json format of SQuAD v1.1. Hence, models can be evaluated on these datasets in the exact same manner as the original development set. There are a total of 4 .json files in this repository:

Non-adversarial paraphrased dataset used to evaluate models' over-sensitivity to small paraphrasing in the questions:

dev_para.json: Dataset containing paraphrased SQuAD questions.
dev_orig.json: Dataset containing the corresponding original SQuAD questions for performance comparison.

Adversarial paraphrased dataset used to evaluate models' over-reliance on string matching to obtain the answer:

adv_para.json: Dataset containing paraphrased SQuAD questions.
adv_orig.json: Dataset containing the corresponding original SQuAD questions for performance comparison.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
datasets		datasets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasets of Paraphrased SQuAD Questions

About

Releases

Packages

nusnlp/paraphrasing-squad

Folders and files

Latest commit

History

Repository files navigation

Datasets of Paraphrased SQuAD Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages