Code and data for the paper "Learning from Both Structural and Textual Knowledge for Inductive Knowledge Graph Completion"
- Python 3.8
- pytorch==1.10.0
- TensorFlow==1.15.0 (for LSTK-NeuralLP and LSTK-DRUM)
We use three datasets in our experiments.
Dataset | Download Link (original) |
---|---|
HacRED | https://github.com/qiaojiim/HacRED |
DocRED | https://github.com/thunlp/DocRED |
BioRel | https://bit.ly/biorel_dataset |
We use four models in our experiments.
Model | Code Download Link (original) |
---|---|
NeuralLP | https://github.com/fanyangxyz/Neural-LP |
DRUM | https://github.com/alisadeghian/DRUM |
RNNLogic | https://github.com/DeepGraphLearning/RNNLogic |
TELM | This work |
LSTK is a two-stage framework. In the first stage, it aims at generating a set of soft triples for reasoning.
You can generate a set of soft triples by:
Path for code: src_nli
- training a textual entailment model:
python main_nli.py [dataset]
- Searching triples with corresponding texts:
python generate_triples_by_index.py [dataset]
- If the dataset is in Chinese, please use:
python generate_triples_by_index_zh.py [dataset]
- Appling the trained textual entailment model to generate soft triples:
python apply_model_nli.py [dataset]
After the above process, you can get three files (train/valid/test_triple_scores.txt) storing soft triples.
You can also directly download our processed soft triples:
Dataset | Download Link (processed) |
---|---|
HacRED | Google Drive |
DocRED | Google Drive |
BioRel | Google Drive |
In the second stage, you can use the generated soft triples to train SOTA neural approximate rule-based models.
Path for code: src/LSTK-TELM
The script for both training and evaluation on the HacRED dataset is:
sh run_hacred.sh
The script for both training and evaluation on the HacRED dataset is:
sh run_docred.sh
The script for both training and evaluation on the BioRel dataset is:
sh run_biorel.sh
The script for rule extraction is:
sh run_rules.sh [dataset]
We also provide the runing scripts of baseline methods:
Path for code: src/LSTK-NeuralLP
or src/LSTK-DRUM
The training script is:
python -u src/main.py --datadir=[dataset]/ --exp_name=[dataset] --num_step 4 --gpu 0 --exps_dir exps --max_epoch 10 --seed 1234
The evaluation script is:
sh eval/collect_all_facts.sh [dataset]
python eval/get_truths.py [dataset]
python eval/evaluate.py --preds=exps/[dataset]/test_predictions.txt --truths=[dataset]/truths.pckl
Path for code: src/LSTK-RNNLogic
The script for environment installation is:
cd LSTK-RNNLogic/codes/pyrnnlogiclib/
python setup.py install
The script for data preparation is:
python process_dicts.py
python get_scores.py
python process_soft.py
The script for both training and evaluation is:
python run.py --data_path [dataset] --num_generated_rules 2000 --num_rules_for_test 500 --num_important_rules 0 --prior_weight 0.01 --cuda --predictor_learning_rate 0.1 --generator_epochs 5000 --max_rule_length 2
Please consider citing the following paper if you find our codes helpful. Thank you!
@inproceedings{QiDW23,
author = {Kunxun Qi and
Jianfeng Du and
Hai Wan},
title = {Learning from Both Structural and Textual Knowledge for Inductive
Knowledge Graph Completion},
booktitle = {NeurIPS},
year = {2023},
url = {http://papers.nips.cc/paper\_files/paper/2023/hash/544242770e8333875325d013328b2079-Abstract-Conference.html},
}