Requirements

Ubuntu 18.04
Python 3.8.12
Pytorch 1.11.0

Download the code and set up development environment:

pip install -r requirements.txt

Overview

The repository is organized as follows:

KB-cache KB cache files
BLINK entity linking tool
models the pretrained models, indices, and entity embeddings for entity linking
Datasets
- CANARD open-domain conversational QA dataset
- ConvQuestions conversatioanl KBQA dataset
config config files for training and evaluating
Rewriter implementation of question rewriter
Reasoner
- NSM implementation of retrieval-based NSM reasoner
- KoPL implementation of semantic parsing-based KoPL reasoner

Setups

KB

We download and adopt the KB cache collected by Focal Entity in our experiments to save time. You can download here and put the KB-cache directory inside the root directory. For the KB facts they don't explore, we query the Wikidata API. You can simply run:

python Rewriter/sparqlretriever.py

to check whether the Wikidata API is working or not. Check whether the results are:

{'head': {'vars': ['r', 'e1']}, 'results': {'bindings': [{'e1': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q7985008'}, 'r': {'type': 'uri', 'value': 'http://www.wikidata.org/prop/direct/P175'}}]}}

Entity Linking

We use ELQ as our entity linking tool. First, clone the BLINK repo as follows:

git clone https://github.com/facebookresearch/BLINK.git

Place the BLINK directory inside the root directory and follow the setup steps here to prepare entity linking environment(the directory "models" will be created during the set up process).

Word Embedding

We employ GloVe as our initialized word embeedings and download the pre-trained word vectors for Wikipedia. You can download here and put them into Datasets/ConvQuestions/. Rename the vocabulary file and word embedding file as "vocab_new.txt" and "word_embed_300d.npy".

Datasets

ConvQA dataset

We use CANARD dataset to pre-train the question rewriter. You can download here and put it into Datasets/.

ConKBQA dataset

We evaluate our method on the benchmark ConvQuestions. You can download here and put it into Datasets/.

Rewriter

Relation Retriever

How to construct pseudo (question, relation) dataset?

python  Rewriter/retrieve_subgraph.py --pre_train
python  Rewriter/retrieve_relation.py --construct

How to train the relation retriever?

python  Rewriter/train_relation_retriever.py

Question Rewriter

How to pre-train the question rewriter?

python  Rewriter/train_rewriter.py --pre_train

How to produce pseudo labels for self-training?

python  Rewriter/train_rewriter.py --pretrain_generate
python  Rewriter/retriever_topic__entity.py --pre_train
python  Rewriter/retrieve_subgraph.py --pre_train
python  Rewriter/retrieve_relation.py --infer
python  Rewriter/generate_selftrain_datset.py

How to self-train the question rewriter?

python  Rewriter/train_rewriter.py --self_train

How to generate self-contained rewritten questions?

python  Rewriter/train_rewriter.py --selftrain_generate

Reasoner

NSM

Prepare environment for NSM. Prepare Question Rewriter for NSM:
- copy the directory "models" into "Reasoner/NSM/" for entity linking
- copy the self-trained rewriter model "t5_selftrain_rr" into "Reasoner/NSM/QuestionRewrite" for question rewriting
- copy the relaton retriever model "bert_finetune" into "Reasoner/NSM/QuestionRewrite" for relation retrieval
How to prepare NSM dataset?

python  Rewriter/retriever_topic__entity.py --self_train
python  Rewriter/retrieve_subgraph.py --self_train
python  Rewriter/generate_nsm_dataset.py

Execute in the Datsets/ConvQuestions directory:

cp entities.txt relations.txt vocab_new.txt word_emb_300d.npy train_set/train_simple.json dev_set/dev_simple.json test_set/test_simple.json ../../Reasoner/NSM/ConvQuestions

Execute in the Reasoner/NSM/preprocessing/parse directory:

change the path of files in const_parse.sh and dependecy_parse.sh
bash run.sh

How to train NSM? Execute in the Reasoner/NSM directory:

bash run_ConvQuestions.sh

How to evaluate Question Rewriter combined with NSM? Execute in the Reasoner/NSM directory:

bash test_ConvQuestions.sh

KoPL

Download pre-trained models and KBs for KoPL. Organize them as follows:
- KoPL
  - KB KB files
    - item.json
    - kb.json
    - wikidata.json
  - Question2KoPL pre-trained model
    - config.json
    - merges.txt
    - pytorch_model.bin
    - training_args.bin
    - vocab.json
How to generate pseduo labels for KoPL? Execute in the Reasoner/KoPL/code directory and change the file paths to your local paths:

python  infer_ConvQuestions.py --pretrain

How to self-train KoPL? Execute in the Reasoner/KoPL/code directory:

python finetune_kopl.py

How to evaluate Question Rewriter combined with KoPL?

python Rewriter/test_kopl.py

Relation Retriever

How to evaluate Question Rewriter combined with Relation Retriever?

python  Rewriter/retrieve_relation.py --eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Overview

Setups

KB

Entity Linking

Word Embedding

Datasets

ConvQA dataset

ConKBQA dataset

Rewriter

Relation Retriever

Question Rewriter

Reasoner

NSM

KoPL

Relation Retriever

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Reasoner		Reasoner
Rewriter		Rewriter
config		config
README.md		README.md
requirements.txt		requirements.txt

RUCKBReasoning/QuestionRewriterConvKBQA

Folders and files

Latest commit

History

Repository files navigation

Requirements

Overview

Setups

KB

Entity Linking

Word Embedding

Datasets

ConvQA dataset

ConKBQA dataset

Rewriter

Relation Retriever

Question Rewriter

Reasoner

NSM

KoPL

Relation Retriever

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages