Skip to content

Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"

License

Notifications You must be signed in to change notification settings

wang-research-lab/deepstruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepStruct: Pretraining of Language Models for Structure Prediction

Source code repo for paper DeepStruct: Pretraining of Language Models for Structure Prediction, ACL 2022.

Setup Environment

DeepStruct is based on GLM dependency. Please use GLM's docker as follow to setup the basic GPU environment (zxdu20/glm-cuda112 for Ampere GPUs and zxdu20/glm-cuda102 for older version GPUs such as Tesla V100).

git clone --recursive git@github.com:cgraywang/deepstruct.git
cd ./deepstruct

docker run --net=host --privileged --pid=host --gpus all --rm -it --ipc=host -v ./deepstruct:/workspace/deepstruct zxdu20/glm-cuda112
cd /workspace/deepstruct

and install the dependency via setup.sh:

bash setup.sh

The final directory structure should be as follows:

workspace/
├─ deepstruct/
├─ data/
├─ ckpt/

Download Checkpoints

Most of our experiments are based on 10-billion-parameter DeepStruct checkpoint. Run the following shell scripts to download all multi-task trained DeepStruct checkpoints from huggingface hub (might take a while).

bash download_ckpt.sh

Data Preparation & Reproduce

To run following experiments on DeepStruct-10B, our experiments adopt batch_size_per_gpu=1 and require at least 32 GB GPU memory to run. The scripts default use --num-gpus-per-node=1 in src/tasks/mt/*.sh, and if you want to use multiple gpu for acceleration, please customize it in src/tasks/mt/*.sh.

Notice that CoNLL12, CoNLL05 for semantic role labeling, ACE2005 for event extraction require manual download from LDC (LDC2006T06, LDC2013T19, PTB-3).

Task Dataset Data preparation Multi-task Result
Joint entity and relation extraction CoNLL04 bash run_scripts/conll04.sh Ent. 88.4/Rel. 72.8
Joint entity and relation extraction ADE bash run_scripts/ade.sh Ent. 90.5/Rel. 83.6
Joint entity and relation extraction NYT bash run_scripts/nyt.sh Ent. 95.4/Rel. 93.7
Joint entity and relation extraction ACE2005 bash run_scripts/ace2005_jer.sh <abs_path_to_LDC2006T06> Ent. 90.2/Rel. 58.9
Semantic role labeling CoNLL05 WSJ bash run_scripts/conll05_srl_wsj.sh <abs_path_to_PTB_3> 95.5
Semantic role labeling CoNLL05 Brown bash run_scripts/conll05_srl_brown.sh <abs_path_to_PTB_3> 92.0
Semantic role labeling CoNLL12 bash run_scripts/conll12_srl.sh <abs_path_to_LDC2013T19> 97.2
Event extraction ACE2005 bash run_scripts/ace2005event.sh <abs_path_to_LDC2006T06> Trigger: Id-72.7/Cl-69.2 Argument: Id-67.5/Cl-63.9
Intent detection ATIS bash run_scripts/atis.sh 97.3
Intent detection SNIPS bash run_scripts/snips.sh 97.4
Dialogue state tracking MultiWOZ 2.1 bash run_scripts/multi_woz.sh 53.5

Arguments in running scripts

The arguments in src/tasks/mt/*.sh configure the training and inference of DeepStruct. Here are their meanings:

  • --model-type: the type of model backbone to use. Currently we only support model_blocklm_10B, which means using the 10-billion DeepStruct model as the backbone.
  • --model-checkpoint: the path to the directory of DeepStruct checkpoint.
  • --task: the task being trained or inferenced.
  • --task-epochs: number of epochs to run. If set to 0, it means evaluation only.
  • --length-penalty: a hyperparameter to configure the lengths of generated sequences in the beam search.

Scripts for Pretraining

Following the commands below to prepare pretraining data and run training.

# prepare pretraining data
bash data_scripts/PRETRAIN.sh

# run pretraining
cd ./glm/
bash scripts/ds_finetune_seq2seq_pretrain.sh config_tasks/<MODEL_TYPE>.sh config_tasks/pretrain.sh cnn_dm_original

Currently <MODEL_TYPE> supports model_blocklm_10B_pretrain, which refers to the 10 billion pretrained model as backbone.

Please customize NUM_GPUS_PER_WORKER in glm/scripts/ds_finetune_seq2seq_pretrain.sh and train_micro_batch_size_per_gpu in glm/config_tasks/config.json according to your environment, as fine-tuning a 10B language model requires quite sufficient GPU memory. The data preprocessing for pretraining may require over 600G main memory, as the current dataloader implementation preloads all tokenized data into main memory in pretraining.

Citation

@inproceedings{wang-etal-2022-deepstruct,
    title = "{D}eep{S}truct: Pretraining of Language Models for Structure Prediction",
    author = "Wang, Chenguang  and
      Liu, Xiao  and
      Chen, Zui  and
      Hong, Haoyun  and
      Tang, Jie  and
      Song, Dawn",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    year = "2022",
}

About

Code repo for ACL22 paper "DeepStruct: Pretraining of Language Models for Structure Prediction"

Resources

License

Stars

Watchers

Forks