Release Note:
- This is an initial releasing version
- Thanks @guoday for providing the codes for grammars and logical form search method
- Some programs are not tested after moving to this repo and keeping anonymous, so if any bug occurred during running, no hesitate to contact me via tao.shen@student.uts.edu.au
This code is based on Tensorflow (1.11 or 1.10) and implemented via Python 3.6. Please install following python package:
timeout_decorator, fuzzywuzzy, tqdm, flask, nltk, ftfy
spacy
numpy
tensorflow-gpu==1.11.0
To reimplement our experiment, you need to download dataset from website.
-
Download and unzip dialog files (CSQA_v9.zip ) to the data folder
shell unzip data/CSQA_v9.zip -d data/ mv data/CSQA_v9 data/CSQA
-
Download wikidata and move all wikidata jsons to "data/kb"
shell mkdir data/kb
-
Down load bert base model from the official github repo to any where
Simply run the following code for preprocess and build kb
python main_bfs_preprocess.py
Some key parameter are:
num_parallel: The number of processes to perform the BFS, each consume about 55G ram
max_train: the number of dialog to search
90000 dialogs will be searched
python main_bfs_run.py -mode offline -num_parallel 10 -beam_size 1000 -start_index 0 -max_train 90000 -data_mode dir -data_path data/BFS/train -shuffle 1 -out_dir_suffix wo_con -mask_mode direct -all_lf 0
6000 dialogs will be searched
python main_bfs_run.py -mode offline -num_parallel 10 -beam_size 1000 -start_index 0 -max_train 6000 -data_mode dir -data_path data/BFS/dev -shuffle 1 -out_dir_suffix subset_wo_con -mask_mode direct -all_lf 0
The resulting bfs results are data/BFS/train_proc_direct_1000_wo_con
and data/BFS/dev_proc_direct_1000_subset_wo_con
.
key argments:
- pretrained_num_layers: -2 for do not loading any pretrianed and -1 for loading the word emb.
- num_parallels: the number of process to parallally decoding, each consumes 70-80G ram.
python3 main_e2e.py --network_class bert --network_type bert_template --dataset e2e_wo_con \
--preprocessing_hparams \
bert_pretrained_dir=path_to_bert_base,max_sequence_len=72 --training_hparams \
train_batch_size=128,test_batch_size=128,num_epochs=8,eval_period=2000,save_model=True \
--model_hparams \
pos_gain=10.,use_qt_loss_gain=True,seq_label_loss_weight=1.,seq2seq_loss_weight=1.5,pretrained_num_layers=-2,level_for_ner=1,level_for_predicate=1,level_for_type=1,level_for_dec=1,decoder_layer=2,warmup_proportion=0.01,learning_rate=1e-4,hidden_size_input=300,num_attention_heads_input=6,intermediate_size_input=1200,hn=300,clf_head_num=6, \
--model_dir_prefix exp \
--gpu 0;
This will result in a dir located in runtime/run_model/xxx
where xxx
is the model id.
key argments:
- timeout: timeout second for each example
- num_parallels: the number of process to parallally decoding, each consumes 70-80G ram.
- gpu: each gpu with 16G memory can bear maximum three parallels
python3 main_e2e.py --mode parallel_test --network_class bert --network_type bert_template \
--dataset e2e_wo_con \
--preprocessing_hparams bert_pretrained_dir=path_to_bert_base,timeout=5.,num_parallels=7,dump_dir=multi_sp,kb_mode=offline,use_filtered_ent=True \
--training_hparams \
load_model=True,load_path=/path_to_the_dir_gen_previously/ckpt \
--model_dir_prefix parallel_decoding \
--gpu 0,1,2
BibTex:
@article{shen2019masp,
author = {Tao Shen and
Xiubo Geng and
Tao Qin and
Daya Guo and
Duyu Tang and
Nan Duan and
Guodong Long and
Daxin Jiang},
title = {Multi-Task Learning for Conversational Question Answering over a Large-Scale
Knowledge Base},
journal = {CoRR},
volume = {abs/1910.05069},
year = {2019},
url = {http://arxiv.org/abs/1910.05069}
}