Selective Masking

Source code for "Train No Evil: Selective Masking for Task-Guided Pre-Training"

Download Data

The datasets can be downloaded from this link. The datasets need to be put in data/datasets.

Run the Whole Pipeline

Modify config/test.json for input path, output path, BERT model path, GPU usage etc.
run bash scripts/run_all_pipeline.sh .

Run each step

The meaning of each step can be found in the appendix of our paper. The input/output paths are also set in config/test.json. Run python3 convert_config.py config/test.json to convert the .json file to a .sh file.

1 GenePT

We use the training scripts from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT for general pre-training.

2 Selective Masking

2.1 Finetune BERT

bash scripts/finetune_origin.sh

2.2 Downstream Mask

bash data/create_data_rule/run.sh.

2.3 Train NN

bash scripts/run_mask_model.sh

2.4 In-domain Mask

bash data/create_data_model/run.sh

3 TaskPT

bash scripts/run_pretraining.sh

4 Fine-tune

bash scripts/finetune_ckpt_all_seed.sh
python3 gather_results.py $PATH_TO_THE_FINETUNE_OUTPUT

Cite

If you use the code, please cite this paper:

@inproceedings{gu2020train,
    title={Train No Evil: Selective Masking for Task-Guided Pre-Training},
    author={Yuxian Gu and Zhengyan Zhang and Xiaozhi Wang and Zhiyuan Liu and Maosong Sun},
    year={2020},
    booktitle={Proceedings of EMNLP 2020},
}

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
config		config
data		data
images		images
model		model
plot		plot
scripts		scripts
sig-test		sig-test
.gitignore		.gitignore
README.md		README.md
convert_config.py		convert_config.py
finetune.py		finetune.py
gather_results.py		gather_results.py
mask_model_pretrain.py		mask_model_pretrain.py
requirements.txt		requirements.txt
run_pretraining.py		run_pretraining.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Selective Masking

Download Data

Run the Whole Pipeline

Run each step

1 GenePT

2 Selective Masking

2.1 Finetune BERT

2.2 Downstream Mask

2.3 Train NN

2.4 In-domain Mask

3 TaskPT

4 Fine-tune

Cite

About

Releases

Packages

Contributors 2

Languages

thunlp/SelectiveMasking

Folders and files

Latest commit

History

Repository files navigation

Selective Masking

Download Data

Run the Whole Pipeline

Run each step

1 GenePT

2 Selective Masking

2.1 Finetune BERT

2.2 Downstream Mask

2.3 Train NN

2.4 In-domain Mask

3 TaskPT

4 Fine-tune

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages