Release Note:
- This is an initial releasing version
- Thanks @guoday for providing the codes for grammars and logical form search method
- Some programs are not tested after moving to this repo and keeping anonymous, so if any bug occurred during running, no hesitate to contact me via
This code is based on Tensorflow (1.11 or 1.10) and implemented via Python 3.6. Please install following python package:
timeout_decorator, fuzzywuzzy, tqdm, flask, nltk, ftfy
To reimplement our experiment, you need to download dataset from website.
Download and unzip dialog files ( ) to the data folder
shell unzip data/ -d data/ mv data/CSQA_v9 data/CSQA
Download wikidata and move all wikidata jsons to "data/kb"
shell mkdir data/kb
Down load bert base model from the official github repo to any where
Simply run the following code for preprocess and build kb
Some key parameter are:
num_parallel: The number of processes to perform the BFS, each consume about 55G ram
max_train: the number of dialog to search
90000 dialogs will be searched
python -mode offline -num_parallel 10 -beam_size 1000 -start_index 0 -max_train 90000 -data_mode dir -data_path data/BFS/train -shuffle 1 -out_dir_suffix wo_con -mask_mode direct -all_lf 0
6000 dialogs will be searched
python -mode offline -num_parallel 10 -beam_size 1000 -start_index 0 -max_train 6000 -data_mode dir -data_path data/BFS/dev -shuffle 1 -out_dir_suffix subset_wo_con -mask_mode direct -all_lf 0
The resulting bfs results are data/BFS/train_proc_direct_1000_wo_con
and data/BFS/dev_proc_direct_1000_subset_wo_con
key argments:
- pretrained_num_layers: -2 for do not loading any pretrianed and -1 for loading the word emb.
- num_parallels: the number of process to parallally decoding, each consumes 70-80G ram.
python3 --network_class bert --network_type bert_template --dataset e2e_wo_con \
--preprocessing_hparams \
bert_pretrained_dir=path_to_bert_base,max_sequence_len=72 --training_hparams \
train_batch_size=128,test_batch_size=128,num_epochs=8,eval_period=2000,save_model=True \
--model_hparams \
pos_gain=10.,use_qt_loss_gain=True,seq_label_loss_weight=1.,seq2seq_loss_weight=1.5,pretrained_num_layers=-2,level_for_ner=1,level_for_predicate=1,level_for_type=1,level_for_dec=1,decoder_layer=2,warmup_proportion=0.01,learning_rate=1e-4,hidden_size_input=300,num_attention_heads_input=6,intermediate_size_input=1200,hn=300,clf_head_num=6, \
--model_dir_prefix exp \
--gpu 0;
This will result in a dir located in runtime/run_model/xxx
where xxx
is the model id.
key argments:
- timeout: timeout second for each example
- num_parallels: the number of process to parallally decoding, each consumes 70-80G ram.
- gpu: each gpu with 16G memory can bear maximum three parallels
python3 --mode parallel_test --network_class bert --network_type bert_template \
--dataset e2e_wo_con \
--preprocessing_hparams bert_pretrained_dir=path_to_bert_base,timeout=5.,num_parallels=7,dump_dir=multi_sp,kb_mode=offline,use_filtered_ent=True \
--training_hparams \
load_model=True,load_path=/path_to_the_dir_gen_previously/ckpt \
--model_dir_prefix parallel_decoding \
--gpu 0,1,2
author = {Tao Shen and
Xiubo Geng and
Tao Qin and
Daya Guo and
Duyu Tang and
Nan Duan and
Guodong Long and
Daxin Jiang},
title = {Multi-Task Learning for Conversational Question Answering over a Large-Scale
Knowledge Base},
journal = {CoRR},
volume = {abs/1910.05069},
year = {2019},
url = {}