Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
model_bert_pack.py	model_bert_pack.py
nncf_config.json	nncf_config.json
pack_and_distill.py	pack_and_distill.py
requirements.txt	requirements.txt
tokens_bert.py	tokens_bert.py
train_qa.py	train_qa.py
train_qcemb.py	train_qcemb.py
utils.py	utils.py

Question Answering

This is implementation of the train scripts in PyTorch*. The scripts allow to fine tune, distill and quantize pretrained Transformers BERT-Large model for two question answering (QA) tasks:

Find answer's start stop positions for given question in in given context.
Calculate emebdding vectors for questions and contexts to fast find context with answer to given question

For details about the original model, check out BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, HuggingFace's Transformers: State-of-the-art Natural Language Processing.

Requirements
Preparation
Train QA model
Train Embedding model
Convert a Model to OpenVINO™ format for Demo

Requirements

To run scrips the Python* 3.6 has to be installed on machine. To install the required packages, run the following:

pip install -r requirements.txt

Preparation

To finetune BERT for QA task the SQUAD 1.1 dataset is used. Please download SQUAD and unpack. The dataset should have 3 files:

train-v1.1.json that contains train data
dev-v1.1.json that contains validation data
evaluate-v1.1.py that contains code to evaluate result The folder that contains all these file will ber refered as ${SQUAD} below

Train QA

After you prepared the data, you can train the model. The train process consist from 3 steps:

Fine tune pretrained BERT-Large model for QA task by train_qa.py script.
Distill the fine tuned model to much smaller model by pack_and_distill.py script
Quantize the small model to INT8 by train_qa.py script.

Fine tune BERT-Large for QA task

On this step Transformers BertForQuestionAnswering model is initialized by the pretrained bert-large-uncased-whole-word-masking model and finetuned using SQUAD train dataset to model that allows to find answer start and stop position for given question and context with answer.

To do this you may use the followed command line:

python3 train_qa.py \
--freeze_list=embedding \
--supervision_weight=0 \
--model_student=bert-large-uncased-whole-word-masking \
--output_dir=models/bert-large-uncased-wwm-squad-qa-fp32 \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--squad_eval_script=${SQUAD}/evaluate-v1.1.py \
--learning_rate=3e-5 \
--num_train_epochs=2 \
--max_seq_length_q=64 \
--max_seq_length_c=384 \
--per_gpu_eval_batch_size=16   \
--per_gpu_train_batch_size=2   \
--total_train_batch_size=48

As result the finetuned model has to be located in newly created folder 'models/bert-large-uncased-wwm-squad-qa-fp32'.

Distill the fine tuned model to much smaller model

On this step the model from previous step can be packed by reducing number of layers, hidden size, self attention heads and replacing some block to more efficient To do this you may use the followed command line:

python3 pack_and_distill.py \
--model_student=models/bert-large-uncased-wwm-squad-qa-fp32 \
--model_teacher=models/bert-large-uncased-wwm-squad-qa-fp32 \
--output_dir=models/bert-small-uncased-wwm-squad-qa-fp32 \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--squad_eval_script=${SQUAD}/evaluate-v1.1.py \
--pack_cfg=num_hidden_layers:12,ff_iter_num:4,num_attention_heads:8,hidden_size:512,pack_emb:1,hidden_act:orig \
--loss_weight_alpha=1.5 \
--learning_rate=5e-4 \
--learning_rate_for_tune=10e-4 \
--num_train_epochs=16 \
--max_seq_length_q=64 \
--max_seq_length_c=384 \
--per_gpu_eval_batch_size=32 \
--per_gpu_train_batch_size=4 \
--total_train_batch_size_for_tune=64 \
--total_train_batch_size=32

As result the packed small model has to be located in newly created folder 'models/bert-small-uncased-wwm-squad-qa-fp32'.

Quantize the small packed model to INT8

On this step the model from previous step can be quantized to INT8 using NNCF tool. To do this you may use the followed command line:

python train_qa.py \
--freeze_list=none \
--supervision_weight=0.02 \
--kd_weight=1 \
--model_student=models/bert-small-uncased-wwm-squad-qa-fp32 \
--model_teacher=models/bert-large-uncased-wwm-squad-qa-fp32 \
--output_dir=models/bert-small-uncased-wwm-squad-int8 \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--squad_eval_script=${SQUAD}/evaluate-v1.1.py \
--learning_rate=1e-4 \
--num_train_epochs=16 \
--max_seq_length_q=64 \
--max_seq_length_c=384 \
--nncf_config=nncf_config.json \
--per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=8 \
--total_train_batch_size=32

As result the packed small int8 model has to be located in newly created folder 'models/bert-small-uncased-wwm-squad-int8'.

Train Embedding

This model allows to calculate embedding vectors for questions and contexts. The L2 distance from question embeddings can be measured to several context embeddings to find the best candidate with answer.

After you prepared the data, you can train the model. The train process consist from 3 steps:

Fine tune pretrained BERT-Large model for Embedding task by train_qcemb.py script.
Distill the fine tuned model to much smaller model by pack_and_distill.py script
Quantize the small model to INT8 by train_qcemb.py script.

Fine tune BERT-Large for Embedding task

On this step BertModelEMB model is initialized by the pretrained bert-large-uncased-whole-word-masking model and finetuned using SQUAD train dataset to model that produces embeddings for question or context. The L2 distance from question embeddings can be measured to several context embeddings to find the best candidate with answer.

To tune the embedding model you may use the followed command line:

python train_qcemb.py \
--freeze_list=embedding \
--supervision_weight=0.02 \
--model_teacher=bert-large-uncased-whole-word-masking \
--model_student=bert-large-uncased-whole-word-masking \
--output_dir=models/bert-large-uncased-wwm-squad-emb-fp32 \
--hnm_batch_size=8 \
--hnm_hist_num=32 \
--hnm_num=256 \
--loss_cfg=triplet_num:1,emb_loss:none \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--learning_rate=3e-5 \
--num_train_epochs=4 \
--max_seq_length_q=32 \
--max_seq_length_c=384 \
--per_gpu_eval_batch_size=16   \
--per_gpu_train_batch_size=2   \
--total_train_batch_size=32

As result the finetuned embeding model has to be located in newly created folder 'models/bert-large-uncased-wwm-squad-emb-fp32'.

Distill the fine tuned embedding model to much smaller model

python pack_and_distill.py \
--model_student=models/bert-large-uncased-wwm-squad-emb-fp32 \
--model_teacher=models/bert-large-uncased-wwm-squad-emb-fp32 \
--output_dir=models/bert-small-uncased-wwm-squad-emb-fp32 \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--squad_eval_script=${SQUAD}/evaluate-v1.1.py \
--pack_cfg=num_hidden_layers:12,ff_iter_num:4,num_attention_heads:8,hidden_size:512,pack_emb:1,hidden_act:orig \
--loss_weight_alpha=1.5 \
--learning_rate=3e-4 \
--learning_rate_for_tune=3e-4 \
--num_train_epochs=16 \
--max_seq_length_q=32 \
--max_seq_length_c=384 \
--per_gpu_eval_batch_size=32 \
--per_gpu_train_batch_size=4 \
--total_train_batch_size_for_tune=32 \
--total_train_batch_size=32

As result the packed small embedding model has to be located in newly created folder 'models/bert-small-uncased-wwm-squad-emb-fp32'.

Quantize the small packed embeding model to INT8

On this step the embedding model from previous step can be quantized to INT8 using NNCF tool. To do this you may use the followed command line:

python train_qcemb.py \
--nncf_config=nncf_config.json \
--freeze_list=none \
--supervision_weight=0.02 \
--model_teacher=models/bert-large-uncased-wwm-squad-emb-fp32 \
--model_student=models/bert-small-uncased-wwm-squad-emb-fp32 \
--output_dir=models/bert-small-uncased-wwm-squad-emb-int8 \
--hnm_batch_size=8 \
--hnm_hist_num=32 \
--hnm_num=256 \
--loss_cfg=triplet_num:1,emb_loss:L2 \
--squad_train_data=${SQUAD}/train-v1.1.json \
--squad_dev_data=${SQUAD}/dev-v1.1.json \
--learning_rate=3e-5 \
--num_train_epochs=16 \
--max_seq_length_q=32 \
--max_seq_length_c=384 \
--per_gpu_eval_batch_size=16   \
--per_gpu_train_batch_size=8   \
--total_train_batch_size=32

As result the packed small int8 embedding model has to be located in newly created folder 'models/bert-small-uncased-wwm-squad-emb-int8'.

Convert a Models to OpenVINO™ format for Demo

Each script together with pytorch model also save ONNX* model into the same output folder. This ONNX* model can be converted to OpenVINO™ format

mo.py --input_model <path_to_output_onnx>

After conversion them to the OpenVINO™ format you can try models using the demo for QA models or the demo for Embedding models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question_answering

question_answering

README.md

Question Answering

Table of Contents

Requirements

Preparation

Train QA

Fine tune BERT-Large for QA task

Distill the fine tuned model to much smaller model

Quantize the small packed model to INT8

Train Embedding

Fine tune BERT-Large for Embedding task

Distill the fine tuned embedding model to much smaller model

Quantize the small packed embeding model to INT8

Convert a Models to OpenVINO™ format for Demo

Files

question_answering

Directory actions

More options

Directory actions

More options

Latest commit

History

question_answering

Folders and files

parent directory

README.md

Question Answering

Table of Contents

Requirements

Preparation

Train QA

Fine tune BERT-Large for QA task

Distill the fine tuned model to much smaller model

Quantize the small packed model to INT8

Train Embedding

Fine tune BERT-Large for Embedding task

Distill the fine tuned embedding model to much smaller model

Quantize the small packed embeding model to INT8

Convert a Models to OpenVINO™ format for Demo