MT-GenEval

This repository contains the data and code for the MT-GenEval benchmark, which evaluates gender translation accuracy on English -> {Arabic, French, German, Hindi, Italian, Portuguese, Russian, Spanish}. The MT-GenEval benchmark was released in the EMNLP 2022 paper MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation by Anna Currey, Maria Nadejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, and Georgiana Dinu.

Citing

@inproceedings{currey-etal-2022-mtgeneval,
    title = "{MT-GenEval}: {A} Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation",
    author = "Currey, Anna  and
      Nadejde, Maria  and
      Pappagari, Raghavendra  and
      Mayer, Mia  and
      Lauly, Stanislas,  and
      Niu, Xing  and
      Hsu, Benjamin  and
      Dinu, Georgiana",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = ""https://arxiv.org/pdf/2211.01355.pdf,
}

Data

The data is originally sourced from Wikipedia. We include sentence-level development and test segments in data/sentences/ and inter-sentence test segments in data/context/.

Compute accuracy

To compute accuracy, use accuracy_metric.py script. Example usage for English-Russian contextual test dataset is as follows

python3 accuracy_metric.py \
        --target_lang ru \
        --dataset contextual \
        --data_split test \
        --hyp PATH_FOR_YOUR_SYSTEM_TRANSLATIONS

Example usage for English-Russian counterfactual test dataset is as follows

python3 accuracy_metric.py \
        --target_lang ru \
        --dataset counterfactual \
        --data_split test \
        --hyp_masculine PATH_FOR_YOUR_SYSTEM_TRANSLATIONS_FOR_MASCULINE_SEGMENTS \
        --hyp_feminine PATH_FOR_YOUR_SYSTEM_TRANSLATIONS_FOR_FEMININE_SEGMENTS

Security

See CONTRIBUTING for more information.

License

The data and code are released under the CC-BY-SA-3.0 License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
data		data
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
accuracy_metric.py		accuracy_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT-GenEval

Citing

Data

Compute accuracy

Security

License

About

Releases

Packages

Contributors 4

Languages

License

amazon-science/machine-translation-gender-eval

Folders and files

Latest commit

History

Repository files navigation

MT-GenEval

Citing

Data

Compute accuracy

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages