This repository contains the scripts used in the experiments of my dissertation for the master's in Specialized Translation at the University of Bologna.
It represents an attempt at approaching the problem of assessing machine translation quality starting from the source text alone. The work involved first building a corpus pairing source segments with the evaluation scores obtained by their respective machine translated version using state-of-the-art metrics for MT Evaluation (cushLEPOR, BERTScore) and MT Quality Estimation (COMET, TransQuest). On the basis of that corpus, the multilingual model XLM-RoBERTa (base) was fine-tuned and evaluated to predict those same scores, once as a single-task model and once as a multi-task model.
If you wish to know more about this research, please refer to the paper:
Francesco Fernicola, Silvia Bernardini, Federico Garcea, Adriano Ferraresi and Alberto Barrón-Cedeño. 2023. Return to the Source: Assessing Machine Translation Suitability. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 79-89, Tampere, Finland. European Association for Machine Translation.
All required dependencies are listed in the Pipfile
. You can directly install a virtual environment using pipenv and running the following command:
pipenv install Pipfile
corpus_creation
contains the scripts used to generate both the machine translated version of the text as well as the script for their automatic evaluation.
models
contains the scripts used to fine-tune XLM-RoBERTa, once as a single-task model and once as a multi-task model.
Francesco Fernicola – @FrancescoDaimon – francesco.fernicola2@unibo.it