README.md

Pruning the Classification model

These scripts perform vocabulary pruning on the classification model (XLMRobertaForSequenceClassification) and evaluate the performance.

We use the English and Chinese training sets as the vocabulary file.

Download the fine-tuned model or train your own model on XNLI dataset, and save the files to ../models/xlmr_xnli.

See the README in ../datasets/xnli for how to construct the dataset.

VOCABULARY_FILE=../datasets/xnli/multinli.train.en_zh.tsv
MODEL_PATH=../models/xlmr_xnli
python vocabulary_pruning.py $MODEL_PATH $VOCABULARY_FILE

Set $PRUNED_MODEL_PATH to the directory where the pruned model is stored.

python measure_performance.py $PRUNED_MODEL_PATH