Skip to content

Latest commit

 

History

History
46 lines (33 loc) · 5.97 KB

MODELS.md

File metadata and controls

46 lines (33 loc) · 5.97 KB

Available Evaluation Models

Within COMET, there are several evaluation models available. The primary reference-based and reference-free models are:

  • Default Model: Unbabel/wmt22-comet-da - This model employs a reference-based regression approach and is built upon the XLM-R architecture. It has been trained on direct assessments from WMT17 to WMT20 and provides scores ranging from 0 to 1, where 1 signifies a perfect translation.
  • Reference-free Model: Unbabel/wmt22-cometkiwi-da - This reference-free model employs a regression approach and is built on top of InfoXLM. It has been trained using direct assessments from WMT17 to WMT20, as well as direct assessments from the MLQE-PE corpus. Similar to other models, it generates scores ranging from 0 to 1. For those interested, we also offer larger versions of this model: Unbabel/wmt23-cometkiwi-da-xl with 3.5 billion parameters and Unbabel/wmt23-cometkiwi-da-xxl with 10.7 billion parameters.
  • eXplainable COMET (XCOMET): Unbabel/XCOMET-XXL - Our latest model is trained to identify error spans and assign a final quality score, resulting in an explainable neural metric. We offer this version in XXL with 10.7 billion parameters, as well as the XL variant with 3.5 billion parameters (Unbabel/XCOMET-XL). These models have demonstrated the highest correlation with MQM and are our best performing evaluation models.

Please be aware that different models may be subject to varying licenses. To learn more, kindly refer to the LICENSES.models and model licenses sections.

If you intend to compare your results with papers published before 2022, it's likely that they used older evaluation models. In such cases, please refer to Unbabel/wmt20-comet-da and Unbabel/wmt20-comet-qe-da, which were the primary checkpoints used in previous versions (<2.0) of COMET.

UniTE Models:

UniTE Metric was developed by the NLP2CT Lab at the University of Macau and Alibaba Group, and all credits should be attributed to these groups. COMET framework fully supports running UniTE and thus we made the original UniTE-MUP checkpoint available in Hugging Face Hub. Additionally, we also trained our own UniTE model using the same data as wmt22-comet-da. You can access both models here:

Older Models:

All other models developed through the years can be accessed through the following links:

Model Download Link Paper
emnlp20-comet-rank 🔗 🔗
wmt20-comet-qe-da 🔗 🔗
wmt21-comet-da 🔗 🔗
wmt21-comet-mqm 🔗 🔗
wmt21-comet-qe-da 🔗 🔗
wmt21-comet-qe-mqm 🔗 🔗
wmt21-comet-qe-da 🔗 🔗
wmt21-cometinho-mqm 🔗 🔗
wmt21-cometinho-da 🔗 🔗
eamt22-cometinho-da 🔗 🔗
eamt22-prune-comet-da 🔗 🔗

Example :

wget https://unbabel-experimental-models.s3.amazonaws.com/comet/eamt22/eamt22-cometinho-da.tar.gz
tar -xf eamt22-cometinho-da.tar.gz
comet-score -s src.de -t hyp1.en -r ref.en --model eamt22-cometinho-da/checkpoints/model.ckpt