Skip to content

GQNLI: The Generalized Quantifier Reasoning NLI Corpus

Notifications You must be signed in to change notification settings

ruixiangcui/GQNLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

GQNLI: The Generalized Quantifier NLI Challenge Dataset

GQNLI is an evaluation corpus that is aimed for testing language model's generalized quantifier reasoning ability (1.0: in English).

Introduction

Logical approaches to representing language have developed and evaluated computational models of quantifier words since the 19th century, but today's NLU models still struggle to capture their semantics.

We rely on Generalized Quantifier Theory for language-independent representations of the semantics of quantifier words, to quantify their contribution to the errors of NLU models.

We find that quantifiers are pervasive in NLU benchmarks, and their occurrence at test time is associated with performance drops.

To facilitate directly-targeted probing, we present an adversarial generalized quantifier NLI task (GQNLI) and show that pre-trained language models have a clear lack of robustness in generalized quantifier reasoning.

The GQNLI Corpus

GQNLI is a generalized quantifier NLI challenge dataset, consisting of 30 premises and 300 hypotheses.

To choose the premises, we first randomly sampled 100 premises with GQs from SNLI and ANLI test sets, respectively, and selected 10 premises in total, that we consider are semantically adequate for adding GQs and making simple hypotheses.

To construct the hypotheses, we rely on RoBERTa fine-tuned on MNLI, and manually select examples about which the model is unsure or incorrect. The labels are uniformed distributed.

We augmented the examples by two times by substituting non-quantifier words (e.g., replacing "dogs" with "cats") while keeping the labels, to exclude the effect of specific lexical items.

img.png

Download

Version 1.0 is available and stored under this repository named gqnli-1.0.zip.

Leaderboard

If you want to have your model added to the leaderboard, please reach out to us.

Model Training Data % Accuracy
DeBERTa-v3 (base) MNLI, FeverNLI, ANLI 48
DeBERTa-v3 (base) MNLI, FeverNLI, LingNLI, DocNLI 45
BART (large) SNLI, MNLI, FeverNLI, ANLI 42.7
ALBERT (xxlarge) SNLI, MNLI, FeverNLI, ANLI 41.7
BART (large) MNLI 41.3
RoBERTa (large) SNLI, MNLI, FeverNLI, ANLI 39.3
SBERT (large) SNLI, MNLI, FeverNLI, ANLI 39.3
ELECTRA (large) SNLI, MNLI, FeverNLI, ANLI 38
DeBERTa-v3 (base) MNLI 34.7
BERT (large) SNLI, MNLI, FeverNLI, ANLI 30
RoBERTa (large) MNLI 28.2

Citations

If you use this dataset, please cite the following:

@inproceedings{cui-etal-2022-generalized,
    title = "Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks",
    author = "Cui, Ruixiang  and
      		Hershcovich, Daniel  and
      		S{\o}gaard, Anders",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    address = "Seattle, USA",
}

Contact

For questions and usage issues, please contact rc@di.ku.dk .

License

GQNLI is released under the CC-BY license.

About

GQNLI: The Generalized Quantifier Reasoning NLI Corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published