-
Notifications
You must be signed in to change notification settings - Fork 142
CLUE Classification
Here is a short summary of our solution on CLUE classification benchmark. We submitted two results, UER and UER-ensemble to the benchmark. The results of UER is based on the cluecorpussmall_roberta_wwm_large_seq512_model.bin pre-trained weights. The results of UER-ensemble is based on the ensemble of a large number of models. This section mainly focuses on single model. More details of ensemble are discussed in here.
We firstly do multi-task learning. We select LCQMC and XNLI as auxiliary tasks:
python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--dataset_path_list datasets/afqmc/ datasets/lcqmc/ datasets/xnli/ \
--output_model_path models/afqmc_multitask_classifier_model.bin \
--epochs_num 1 --batch_size 64
Then we load afqmc_multitask_classifier_model.bin and fine-tune it on AFQMC:
python3 finetune/run_classifier.py --pretrained_model_path models/afqmc_multitask_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/afqmc/train.tsv \
--dev_path datasets/afqmc/dev.tsv \
--output_model_path models/afqmc_classifier_model.bin \
--epochs_num 3 --batch_size 32
Then we do inference with afqmc_classifier_model.bin:
python3 inference/run_classifier_infer.py --load_model_path models/afqmc_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/afqmc/test_nolabel.tsv \
--prediction_path datasets/afqmc/prediction.tsv \
--seq_length 128 --labels_num 2
We firstly do multi-task learning. We select XNLI as auxiliary task:
python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--dataset_path_list datasets/cmnli/ datasets/xnli/ \
--output_model_path models/cmnli_multitask_classifier_model.bin \
--epochs_num 1 --batch_size 64
Then we load cmnli_multitask_classifier_model.bin and fine-tune it on CMNLI:
python3 finetune/run_classifier.py --pretrained_model_path models/cmnli_multitask_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/cmnli/train.tsv \
--dev_path datasets/cmnli/dev.tsv \
--output_model_path models/cmnli_classifier_model.bin \
--epochs_num 1 --batch_size 64
Then we do inference with cmnli_classifier_model.bin:
python3 inference/run_classifier_infer.py --load_model_path models/cmnli_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/cmnli/test_nolabel.tsv \
--prediction_path datasets/cmnli/prediction.tsv \
--seq_length 128 --labels_num 3
The example of fine-tuning and doing inference on IFLYTEK dataset:
python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/iflytek/train.tsv \
--dev_path datasets/iflytek/dev.tsv \
--output_model_path models/iflytek_classifier_model.bin \
--epochs_num 3 --batch_size 32 --seq_length 256
python3 inference/run_classifier_infer.py --load_model_path models/iflytek_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/iflytek/test_nolabel.tsv \
--prediction_path datasets/iflytek/prediction.tsv \
--seq_length 256 --labels_num 119
Chinese Scientific Literature task is to tell whether the given keywords are real keywords of a paper or not. The key of achieving competitive results on CSL is to use a special symbol to split keywords. We find that the pseudo keywords in CSL dataset are usually short. Special symbols can explicitly tell the model the length of keywords. The example of fine-tuning and doing inference on CSL dataset:
python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/csl/train.tsv \
--dev_path datasets/csl/dev.tsv \
--output_model_path models/csl_classifier_model.bin \
--epochs_num 3 --batch_size 32 --seq_length 384
python3 inference/run_classifier_infer.py --load_model_path models/csl_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/csl/test_nolabel.tsv \
--prediction_path datasets/csl/prediction.tsv \
--seq_length 384 --labels_num 2
The example of fine-tuning and doing inference on CLUEWSC2020 dataset:
python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/cluewsc2020/train.tsv \
--dev_path datasets/cluewsc2020/dev.tsv \
--output_model_path models/cluewsc2020_classifier_model.bin \
--learning_rate 5e-6 --epochs_num 20 --batch_size 8
python3 inference/run_classifier_infer.py --load_model_path models/cluewsc2020_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/cluewsc2020/test_nolabel.tsv \
--prediction_path datasets/cluewsc2020/prediction.tsv \
--seq_length 128 --labels_num 2
A useful trick for CLUEWSC2020 is to use the trainset of WSC (the former version of CLUEWSC2020) as training samples.
The example of fine-tuning and doing inference on TNEWS dataset:
python3 finetune/run_classifier.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/tnews/train.tsv \
--dev_path datasets/tnews/dev.tsv \
--output_model_path models/tnews_classifier_model.bin \
--epochs_num 3 --batch_size 32
python3 inference/run_classifier_infer.py --load_model_path models/tnews_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/tnews/test_nolabel.tsv \
--prediction_path datasets/tnews/prediction.tsv \
--seq_length 128 --labels_num 15
We firstly do multi-task learning. We select XNLI and CMNLI as auxiliary tasks:
python3 finetune/run_classifier_mt.py --pretrained_model_path models/cluecorpussmall_roberta_wwm_large_seq512_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--dataset_path_list datasets/ocnli/ datasets/cmnli/ datasets/xnli/ \
--output_model_path models/ocnli_multitask_classifier_model.bin \
--epochs_num 1 --batch_size 64
Then we load ocnli_multitask_classifier_model.bin and fine-tune it on OCNLI:
python3 finetune/run_classifier.py --pretrained_model_path models/ocnli_multitask_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--train_path datasets/ocnli/train.tsv \
--dev_path datasets/ocnli/dev.tsv \
--output_model_path models/ocnli_classifier_model.bin \
--epochs_num 1 --batch_size 64
Then we do inference with ocnli_classifier_model.bin:
python3 inference/run_classifier_infer.py --load_model_path models/ocnli_classifier_model.bin \
--vocab_path models/google_zh_vocab.txt \
--config_path models/bert/large_config.json \
--test_path datasets/ocnli/test_nolabel.tsv \
--prediction_path datasets/ocnli/prediction.tsv \
--seq_length 128 --labels_num 3