Skip to content

An out-of-the-box framework for solving NLU tasks based on BERT.

Notifications You must be signed in to change notification settings

DandyQi/CentraBert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CentraBert

We present an efficient BERT-based multi-task (MT) framework that is particularly suitable for iterative and incremental development of the tasks. Unlike in conventional Multi-Task Learning where the tasks are coupled due to joint training, in our framework the tasks are independent of each other and can be updated on a per-task basis. One key advantage our framework provides is that the owner of one task do not need to coordinate with other task owners in order to update the model for that task, and any modification made to that task does not interfere with the rest of the tasks.

Installation

We recommend to install all relevant packages in a virtual environment

conda create -n centra-bert --file requirements.txt python=3.6
conda activate centra-bert

Getting Started

The proposed framework is based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keep the other layers frozen. It features a pipeline that consists of three steps:

  1. Single-task partial fine-tuning
  2. Single-task knowledge-distillation
  3. Model merging

avatar

In what follows, we demonstrate the functionality that this library offers on two GLUE tasks MRPC and RTE as an example.

Step 0. Preparation

Before proceeding, we need to download the task corpora and convert their format. This can be achieved with the following script:

python convert_data_format.py --task=rte --input_dir=data/glue --output_dir=data/glue
python convert_data_format.py --task=mrpc --input_dir=data/glue --output_dir=data/glue

Next, we need to create a task config file conf/glue_task_config.cfg to specify the meta information for each of the tasks, including name of the task, type of the task, corpus path, etc.

[rte_conf]
task_name = rte
task_type = classification
input_file = data/glue/rte/train_json_format.txt,data/glue/rte/dev_json_format.txt,data/glue/rte/test_json_format.txt
max_seq_length = 128
output_method = cls
is_eng = True

[mrpc_conf]
task_name = mrpc
task_type = classification
input_file = data/glue/mrpc/train_json_format.txt,data/glue/mrpc/dev_json_format.txt,data/glue/mrpc/test_json_format.txt
max_seq_length = 128
output_method = cls
is_eng = True

Step 1. Single-task partial fine-tuning

In the first step, we partial fine-tune for each task an independent copy of BERT. The exact number of layers L to fine-tune may vary across the tasks. We propose to experiment for each task with different value of L and select the best one according to some predefined criterion.

The following code snippet (see also shell/fine_tuning.sh) trains a number of models for the task of RTE with different hyper-parameters.

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/teacher
init_checkpoint=model/uncased_bert_base/bert_model.ckpt
task_config=conf/glue_task_config.cfg
num_train_epoch=10
train_batch_size=64
gpu_id=2

# Current task
task=rte

# Hyper param, separated by commas
learning_rate=2e-5,5e-5
fine_tuning_layers=4,5,6,7,8,9,10

# Number of repetitions for each hyper parameter
exam_num=3

for lr in ${learning_rate//,/ }
do
    for layers in ${fine_tuning_layers//,/ }
    do
        for i in $(seq 1 ${exam_num})
        do
            python fine_tuning.py \
                --bert_config_file=${bert_config_file} \
                --vocab_file=${vocab_file} \
                --output_dir=${output_dir} \
                --init_checkpoint=${init_checkpoint} \
                --task_config=${task_config} \
                --available_tasks=${task} \
                --current_task=${task} \
                --ex_idx=${i} \
                --num_train_epoch=${num_train_epoch} \
                --train_batch_size=${train_batch_size} \
                --learning_rate=${lr} \
                --fine_tuning_layers=${layers} \
                --gpu_id=${gpu_id}
        done
    done
done

# Result summary
python result_summary.py \
    --output_dir=${output_dir} \
    --task=${task} \
    --learning_rate=${learning_rate} \
    --fine_tuning_layers=${fine_tuning_layers} \
    --exam_num=${exam_num} \
    --dev=True \
    --version=teacher

When the training is completed, we can find in the log file model/glue/teacher/rte/summary.txt the information on model with the best dev result:

Best metrics: 89.77, best checkpoint: model/glue/teacher/mrpc/Lr-2e-05-Layers-8/ex-3/best_checkpoint/1623838640/model.ckpt-570

This is the teacher model that will be compressed in the next step.

Step 2. Single-Task Knowledge Distillation

In this step, we compress the L fine-tuned layers in the teacher model into a smaller l layered module. The following snippet (see also shell/distill.sh) trains three student models for each l in {1, 2, 3}. The training process is basically the same as in the previous step. The only difference is that we need to specify the path to the teacher model that is going to be distilled.

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/student
task_config=conf/glue_task_config.cfg
num_train_epoch=10
train_batch_size=64
gpu_id=6

# Teacher info
teacher_fine_tuning_layers=9
best_teacher_checkpoint=model/glue/teacher/rte/Lr-2e-05-Layers-9/ex-3/best_checkpoint/1623902794/model.ckpt-380

# Current task
task=rte

# Hyper param, separated by commas
learning_rate=2e-5
fine_tuning_layers=1,2,3

# Number of repetitions for each hyper parameter
exam_num=3

for lr in ${learning_rate//,/ }
do
    for layers in ${fine_tuning_layers//,/ }
    do
        for i in $(seq 1 ${exam_num})
        do
            python distill.py \
                --bert_config_file=${bert_config_file} \
                --vocab_file=${vocab_file} \
                --output_dir=${output_dir} \
                --best_teacher_checkpoint=${best_teacher_checkpoint} \
                --teacher_fine_tuning_layers=${teacher_fine_tuning_layers} \
                --task_config=${task_config} \
                --available_tasks=${task} \
                --current_task=${task} \
                --ex_idx=${i} \
                --num_train_epoch=${num_train_epoch} \
                --train_batch_size=${train_batch_size} \
                --learning_rate=${lr} \
                --student_fine_tuning_layers=${layers} \
                --gpu_id=${gpu_id}
        done
    done
done

# Result summary
python result_summary.py \
    --output_dir=${output_dir} \
    --task=${task} \
    --learning_rate=${learning_rate} \
    --fine_tuning_layers=${fine_tuning_layers} \
    --exam_num=${exam_num} \
    --dev=True \
    --keep_layers=$((12-teacher_fine_tuning_layers)) \
    --version=student

Step 3. Model Merging

In the final step, we merge the single task models into one multi-task model. To do this, we need to specify in the config file conf/branch.cfg which checkpoint to load, and which are fine-tuned layers for each of the tasks:

[ckpt_conf]
mrpc = model/glue/student/mrpc/Lr-2e-05-Layers-4-2/ex-2/best_checkpoint/1623900741/model.ckpt-572
rte = model/glue/student/rte/Lr-2e-05-Layers-3-2/ex-3/best_checkpoint/1623910751/model.ckpt-382

[layer_conf]
mrpc = 5,6
rte = 4,5

Then we run the script shell/merge.sh to merge all task branches:

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/merge
init_checkpoint=model/uncased_bert_base/bert_model.ckpt
task_config=conf/glue_task_config.cfg
branch_config=conf/branch.cfg
gather_from_student=True
gpu_id=3
input_file=data/glue/tmp_input_file.txt

# Current tasks
available_tasks=mrpc,rte

python merge_branch.py \
    --bert_config_file=${bert_config_file} \
    --vocab_file=${vocab_file} \
    --output_dir=${output_dir} \
    --init_checkpoint=${init_checkpoint} \
    --task_config=${task_config} \
    --branch_config=${branch_config} \
    --available_tasks=${available_tasks} \
    --gather_from_student=${gather_from_student} \
    --gpu_id=${gpu_id} \
    --input_file=${input_file}

Basically, this script iteratively adds task branches to a frozen backbone model and save checkpoint files at each intermediate step. The checkpoint in the latest task directory (in our example, model/glue/merge/rte) contains the final merged multi-task model.

Follow-Up: update existing merged MT model

Remove a task branch

Assume that we have a merged multi-task model containing three task branches: mrpc, rte and mnli. To remove a branch, e.g. rte, we run the following script:

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/merge
init_checkpoint=somewhere/merged_model/model.ckpt
task_config=conf/glue_task_config.cfg
branch_config=conf/branch.cfg
gather_from_student=True
gpu_id=3
input_file=data/glue/tmp_input_file.txt

# after removing rte, only mrpc and mnli remain
available_tasks=mrpc,mnli

python update.py \
    --bert_config_file=${bert_config_file} \
    --vocab_file=${vocab_file} \
    --output_dir=${output_dir} \
    --init_checkpoint=${init_checkpoint} \
    --task_config=${task_config} \
    --branch_config=${branch_config} \
    --available_tasks=${available_tasks} \
    --gather_from_student=${gather_from_student} \
    --gpu_id=${gpu_id} \
    --input_file=${input_file}

Note that in the code above we did not explicitly specify which task to remove. Instead, we have specified in the argument available_tasks which tasks to keep.

Add a new task branch

Assume that we have a merged multi-task model containing three task branches: mrpc, rte and mnli. The following snippet adds a new task (qnli) to an existing merged multi-task model:

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/merge
init_checkpoint=somewhere/merged_model/model.ckpt
task_config=conf/glue_task_config.cfg
branch_config=conf/branch.cfg # A new branch config which contains mnli's fine tuning layers
gather_from_student=True
gpu_id=3
input_file=data/glue/tmp_input_file.txt

# Current tasks, add the new task mnli
available_tasks=mrpc,rte,mnli,qnli

# new task info
update_checkpoint=somewhere/qnli/model.ckpt
update_scope=qnli


python update.py \
    --bert_config_file=${bert_config_file} \
    --vocab_file=${vocab_file} \
    --output_dir=${output_dir} \
    --init_checkpoint=${init_checkpoint} \
    --update_checkpoint=${update_checkpoint} \
    --update_scope=${update_scope} \
    --task_config=${task_config} \
    --branch_config=${branch_config} \
    --available_tasks=${available_tasks} \
    --gather_from_student=${gather_from_student} \
    --gpu_id=${gpu_id} \
    --input_file=${input_file}

Update a existing task branch

The following snippet updates the task branch for task rte:

#!/usr/bin/env bash

# General param
bert_config_file=conf/uncased_bert_base/bert_config.json
vocab_file=conf/uncased_bert_base/vocab.txt
output_dir=model/glue/merge
init_checkpoint=somewhere/merged_model/model.ckpt
task_config=conf/glue_task_config.cfg
branch_config=conf/branch.cfg # A new branch config if you change rte's fine tuning layers
gather_from_student=True
gpu_id=3
input_file=data/glue/tmp_input_file.txt

# Current tasks
available_tasks=mrpc,rte

# new task info
update_checkpoint=somewhere/new-rte/model.ckpt
update_scope=rte


python update.py \
    --bert_config_file=${bert_config_file} \
    --vocab_file=${vocab_file} \
    --output_dir=${output_dir} \
    --init_checkpoint=${init_checkpoint} \
    --update_checkpoint=${update_checkpoint} \
    --update_scope=${update_scope} \
    --task_config=${task_config} \
    --branch_config=${branch_config} \
    --available_tasks=${available_tasks} \
    --gather_from_student=${gather_from_student} \
    --gpu_id=${gpu_id} \
    --input_file=${input_file}

How does it work?

The merged model is generated as follows:

  1. Build the graph that contains a frozen part and several fine-tuned parts according to the layer_conf in branch.config
  2. Load parameters from init_checkpoint to initialize the frozen 'backbone' model
  3. Load parameters with specific scope update_scope from update_checkpoint to initialize the new task branch or reinitialize an existing task branch.

Result Analysis

One can use the following scripts to plot the task performance with different hyper-parameter settings:

output_dir=model/glue/teacher
task=mrpc
learning_rate=2e-5,1e-4
fine_tuning_layers=6,8
exam_num=3

# key_param=fine_tuning_layers
key_param=learning_rate

python result_summary.py \
    --job=plot \
    --key_param=${key_param} \
    --output_dir=${output_dir} \
    --task=${task} \
    --learning_rate=${learning_rate} \
    --fine_tuning_layers=${fine_tuning_layers} \
    --exam_num=${exam_num} \
    --dev=True \
    --version=teacher

avatar

avatar

About

An out-of-the-box framework for solving NLU tasks based on BERT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published