These scripts perform vocabulary pruning on the classification model (XLMRobertaForSequenceClassification
) and evaluate the performance.
We use the English and Chinese training sets as the vocabulary file.
Download the fine-tuned model or train your own model on XNLI dataset, and save the files to ../models/xlmr_xnli
.
Download link: * Hugging Face Models
See the README in ../datasets/xnli for how to construct the dataset.
- Pruning with the python script:
VOCABULARY_FILE=../datasets/xnli/multinli.train.en_zh.tsv
MODEL_PATH=../models/xlmr_xnli
python vocabulary_pruning.py $MODEL_PATH $VOCABULARY_FILE
- Evaluate the model:
Set $PRUNED_MODEL_PATH
to the directory where the pruned model is stored.
python measure_performance.py $PRUNED_MODEL_PATH