This is a fork of the original XTREME repository used for my bachelor thesis. We adjust the code to be able to use other source languages than English for three downstream tasks: UD POS, Panx (NER) and XNLI. Additionally, we make another dataset compatible with this benchmark - CLS+ (sentiment analysis).
Clone this repo and follow installation instructions from the original repository.
To train and evaluate models use the same commands as for the original repository (for example, >> bash scripts/train.sh xlm-roberta-large udpos
). However, you can select training and testing languages by changing the TRAIN_LANGS
and PRED_LANGS
variables in train_udpos.sh
,train_panx.sh
and train_xnli.sh
depending on which task you want to run. By default, we use all available language for a given dataset.
To run experiments on CLS+ with the XLM-R (Large) model, execute:
>> bash scripts/train_cls.sh xlm-roberta-large