Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.
Example:
+-----------+
| |
I voted for Obama because he was most aligned with my values", she said.
| | |
+-------------------------------------------------+------------+
"I", "my", and "she" belong to the same cluster and "Obama" and "he" belong to the same cluster.
Experiments are conducted on the data of the CoNLL-2012 shared task, which uses OntoNotes coreference annotations. Papers report the precision, recall, and F1 of the MUC, B3, and CEAFφ4 metrics using the official CoNLL-2012 evaluation scripts. The main evaluation metric is the average F1 of the three metrics.
[1] Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+SpanBERT (Joshi et al., 2019)
[2] Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+BERT (Devlin et al., 2019)
Experiments are conducted on GAP dataset. Metrics used are F1 score on Masculine (M) and Feminine (F) examples, Overall, and a Bias factor calculated as F / M.
Model | Overall F1 | Masculine F1 (M) | Feminine F1 (F) | Bias (F/M) | Paper / Source | Code |
---|---|---|---|---|---|---|
Attree et al. (2019) | 92.5 | 94.0 | 91.1 | 0.97 | Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling | GREP |
Chada et al. (2019) | 90.2 | 90.9 | 89.5 | 0.98 | Gendered Pronoun Resolution using BERT and an extractive question answering formulation | CorefQA |