A Study on Errors in Multilingual Machine Translation

Authors: Stefania Radu, Lisa Koopmans, Alina Dima

This repository contains 2 python notebooks, one for the multilingual error analysis, and one for the editing effort prediction. It also contains the DivEMT dataset, annotated with new features for prediction.

These notebooks can be run in Google Collaboratory. You can also install all the requiremenets by running: pip install -r requirements.txt.

Multilingual error analysis

This notebook generates the plots for the following tasks:

high level analysis of the errors distribution across all the languages with respect to the BLEU, CHRF, CER and BAD rate. See the plot below as an example.

in depth analysis of the errors distribution across all languages and across different linguistic features:
- part-of-speech (POS)
- named-entity recognition (NER)
- dependency relations (deprel)

Editing effort prediction

In this notebook, we train 3 linear regression models to predict the HTER score and post-editing time (PET). The models have different configurations of features. See the paper for an explanation. You can also find the annotated dataset, including the features. We do not provide a pre-trained model, since the linear regression can be easily trained in Google Collaboratory.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Editing_effort_prediction.ipynb		Editing_effort_prediction.ipynb
Error_analysis.ipynb		Error_analysis.ipynb
README.md		README.md
dataset_with_features.csv		dataset_with_features.csv
error_distrib.png		error_distrib.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Study on Errors in Multilingual Machine Translation

Authors: Stefania Radu, Lisa Koopmans, Alina Dima

Multilingual error analysis

Editing effort prediction

About

Releases

Packages

Languages

stefania-radu/Multilingual-Machine-Translation

Folders and files

Latest commit

History

Repository files navigation

A Study on Errors in Multilingual Machine Translation

Authors: Stefania Radu, Lisa Koopmans, Alina Dima

Multilingual error analysis

Editing effort prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages