XMoverScore is a metric for reference-free MT evaluation, described in the paper On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation (ACL 2020).
XMoverScore is a cross-lingual extension of MoverScore for Machine Translation Evaluation. We have also released a similar metric for summarization, called SUPERT.
CURRENT VERSION:
- We provide a reference-free metric coupled with re-aligned multilingual BERT and a target-side LM (GPT-2).
- We provide the metrics evaluation on the WMT17, WMT18 and WMT19 datasets.
- The remapping weights are released, with 11 supported languages involving German, Chinese, Czech, Latvian, Finnish, Russian, Turkish, Gujarati, Kazakh, Lithuanian and Estonian.
- We provide manual annotations of cross-lingual (German-English and Russian-English) DA scores for source-translation pairs.
- Since our metric uses BERT and GPT-2, a GPU is necessary.
- Note that the current version uses normalized language model scores. For reproducibility, replacing them with negative log-likelihoods is necessary.
- Python 3.6
- PyTorch, tested with 1.3.1
- NumPy, tested with 1.18.4
- Pyemd, fast earth mover distance, tested with 0.5.1
- Transformers, multilingual BERT and GPT-2, tested with 2.7.0
- [Mosestokenizer] tokenization from the Moses encoder, tested with 1.0.0
We provide a python object XMOVERScorer
which caches multilingual BERT and a target-side LM and wrapps the implementations of our metric. Check our demo on the WMT17 testset to see its usage. Please refer to score_utils.py
for the implementation details.
We provide main.py
which could reproduce the results, i.e., the system-level and seg-level correlations of metrics and human judgments (DA scores), reported in the paper.
XMoverScore is Apache-licensed, as found in the LICENSE
file in the root directory of this source tree.
@inproceedings{zhao-etal-2020-limitations,
title = "On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation",
author = "Zhao, Wei and Glava{\v{s}}, Goran and Peyrard, Maxime and Gao, Yang and West, Robert and Eger, Steffen",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.151",
pages = "1656--1671"
}
@inproceedings{gao-etal-2020-supert,
title = "{SUPERT}: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization",
author = "Gao, Yang and Zhao, Wei and Eger, Steffen",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.124",
pages = "1347--1354"
}
@inproceedings{zhao-etal-2019-moverscore,
title = "{M}over{S}core: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance",
author = "Zhao, Wei and Peyrard, Maxime and Liu, Fei and Gao, Yang and Meyer, Christian M. and Eger, Steffen",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1053",
doi = "10.18653/v1/D19-1053",
pages = "563--578",
}