- Ubuntu 18.04
- Python 3.8.12
- Pytorch 1.11.0
Download the code and set up development environment:
pip install -r requirements.txt
The repository is organized as follows:
KB-cache
KB cache filesBLINK
entity linking toolmodels
the pretrained models, indices, and entity embeddings for entity linkingDatasets
CANARD
open-domain conversational QA datasetConvQuestions
conversatioanl KBQA dataset
config
config files for training and evaluatingRewriter
implementation of question rewriterReasoner
NSM
implementation of retrieval-based NSM reasonerKoPL
implementation of semantic parsing-based KoPL reasoner
We download and adopt the KB cache collected by Focal Entity in our experiments to save time. You can download here and put the KB-cache directory inside the root directory. For the KB facts they don't explore, we query the Wikidata API. You can simply run:
python Rewriter/sparqlretriever.py
to check whether the Wikidata API is working or not. Check whether the results are:
{'head': {'vars': ['r', 'e1']}, 'results': {'bindings': [{'e1': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7985008'}, 'r': {'type': 'uri', 'value': 'http://www.wikidata.org/prop/direct/P175'}}]}}
We use ELQ as our entity linking tool. First, clone the BLINK repo as follows:
git clone https://github.com/facebookresearch/BLINK.git
Place the BLINK directory inside the root directory and follow the setup steps here to prepare entity linking environment(the directory "models" will be created during the set up process).
We employ GloVe as our initialized word embeedings and download the pre-trained word vectors for Wikipedia. You can download here and put them into Datasets/ConvQuestions/. Rename the vocabulary file and word embedding file as "vocab_new.txt" and "word_embed_300d.npy".
We use CANARD dataset to pre-train the question rewriter. You can download here and put it into Datasets/.
We evaluate our method on the benchmark ConvQuestions. You can download here and put it into Datasets/.
- How to construct pseudo (question, relation) dataset?
python Rewriter/retrieve_subgraph.py --pre_train
python Rewriter/retrieve_relation.py --construct
- How to train the relation retriever?
python Rewriter/train_relation_retriever.py
- How to pre-train the question rewriter?
python Rewriter/train_rewriter.py --pre_train
- How to produce pseudo labels for self-training?
python Rewriter/train_rewriter.py --pretrain_generate
python Rewriter/retriever_topic__entity.py --pre_train
python Rewriter/retrieve_subgraph.py --pre_train
python Rewriter/retrieve_relation.py --infer
python Rewriter/generate_selftrain_datset.py
- How to self-train the question rewriter?
python Rewriter/train_rewriter.py --self_train
- How to generate self-contained rewritten questions?
python Rewriter/train_rewriter.py --selftrain_generate
-
Prepare environment for NSM. Prepare Question Rewriter for NSM:
- copy the directory "models" into "Reasoner/NSM/" for entity linking
- copy the self-trained rewriter model "t5_selftrain_rr" into "Reasoner/NSM/QuestionRewrite" for question rewriting
- copy the relaton retriever model "bert_finetune" into "Reasoner/NSM/QuestionRewrite" for relation retrieval
-
How to prepare NSM dataset?
python Rewriter/retriever_topic__entity.py --self_train
python Rewriter/retrieve_subgraph.py --self_train
python Rewriter/generate_nsm_dataset.py
Execute in the Datsets/ConvQuestions directory:
cp entities.txt relations.txt vocab_new.txt word_emb_300d.npy train_set/train_simple.json dev_set/dev_simple.json test_set/test_simple.json ../../Reasoner/NSM/ConvQuestions
Execute in the Reasoner/NSM/preprocessing/parse directory:
change the path of files in const_parse.sh and dependecy_parse.sh
bash run.sh
- How to train NSM? Execute in the Reasoner/NSM directory:
bash run_ConvQuestions.sh
- How to evaluate Question Rewriter combined with NSM? Execute in the Reasoner/NSM directory:
bash test_ConvQuestions.sh
-
Download pre-trained models and KBs for KoPL. Organize them as follows:
KoPL
KB
KB filesitem.json
kb.json
wikidata.json
Question2KoPL
pre-trained modelconfig.json
merges.txt
pytorch_model.bin
training_args.bin
vocab.json
-
How to generate pseduo labels for KoPL? Execute in the Reasoner/KoPL/code directory and change the file paths to your local paths:
python infer_ConvQuestions.py --pretrain
- How to self-train KoPL? Execute in the Reasoner/KoPL/code directory:
python finetune_kopl.py
- How to evaluate Question Rewriter combined with KoPL?
python Rewriter/test_kopl.py
How to evaluate Question Rewriter combined with Relation Retriever?
python Rewriter/retrieve_relation.py --eval