Skip to content

Datasets for the paper "Improving the Robustness of Question Answering Systems to Question Paraphrasing" (ACL 2019)

Notifications You must be signed in to change notification settings

nusnlp/paraphrasing-squad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Datasets of Paraphrased SQuAD Questions

This repository contains the datasets used in the paper

@inproceedings{gan-ng-2019-improving,
    title = "Improving the Robustness of Question Answering Systems to Question Paraphrasing",
    author = "Gan, Wee Chung  and
      Ng, Hwee Tou",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
}

The datasets are organised according to the original .json format of SQuAD v1.1. Hence, models can be evaluated on these datasets in the exact same manner as the original development set. There are a total of 4 .json files in this repository:

Non-adversarial paraphrased dataset used to evaluate models' over-sensitivity to small paraphrasing in the questions:

  1. dev_para.json: Dataset containing paraphrased SQuAD questions.
  2. dev_orig.json: Dataset containing the corresponding original SQuAD questions for performance comparison.

Adversarial paraphrased dataset used to evaluate models' over-reliance on string matching to obtain the answer:

  1. adv_para.json: Dataset containing paraphrased SQuAD questions.
  2. adv_orig.json: Dataset containing the corresponding original SQuAD questions for performance comparison.

About

Datasets for the paper "Improving the Robustness of Question Answering Systems to Question Paraphrasing" (ACL 2019)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published