The repository is modified from KG-BERT and tested on Python 3.7+.
The repository is partially based on huggingface transformers, KG-BERT and AnyBURL.
pip install -r requirements.txt
(1) The benchmark knowledge graph datasets are in ./data.
(2) The demo dataset in ./demo_data can help run small demo datasets.
(3) entity2text.txt or entity2textlong.txt in each dataset contains entity textual sequences.
(4) relation2text.txt in each dataset contains relation textual sequences.
Augmented Datasets | Number of New Relations | Number of New Facts |
---|---|---|
WN18RR-DAMH | 84 | 13,8711 |
WN18RR-DAI | 9 | 60,403 |
UMLS-DAMH | 778 | 16,035 |
UMLS-DAI | 7 | 10,575 |
FB15k-237 (few-shot) | 702 | 116,544 |
NELL-ONE (few-shot) | 569 | 136,342 |
Here is an example of extracting 2-hop facts for demo UMLS dataset
python3 extract_multihop_triples.py
--data_dir ./demo_data/umls
--data_type multihop
--K 2
--T 10
Here is an example of extracting implicit facts for demo UMLS dataset. The steps are as follows:
python3 extract_implicit_triples.py
--data_dir ./demo_data/umls
--data_type multihop
--cs 0.85
--n_body 100
--T 1
python3 run_bert_link_prediction.py
--task_name kg
--do_train
--do_eval
--do_predict
--data_dir ./demo_data/WN18RR
--bert_model bert-base-uncased
--max_seq_length 50
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_WN18RR/
--gradient_accumulation_steps 1
--eval_batch_size 5000
python3 run_bert_link_prediction.py
--task_name kg
--do_train
--do_eval
--do_predict
--data_dir ./demo_data/umls
--bert_model bert-base-uncased
--max_seq_length 15
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_umls/
--gradient_accumulation_steps 1
--eval_batch_size 135
python3 run_bert_link_prediction.py
--task_name kg
--do_train
--do_eval
--do_predict
--data_dir ./demo_data/FB15k-237
--bert_model bert-base-uncased
--max_seq_length 150
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_FB15k-237/
--gradient_accumulation_steps 1
--eval_batch_size 1500
python3 run_bert_link_prediction.py
--task_name kg
--do_train
--do_eval
--do_predict
--data_dir ./demo_data/NELL_ONE_reconstructed
--bert_model bert-base-uncased
--max_seq_length 32
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_NELL-ONE/
# --load_weights_path $load_path \
--gradient_accumulation_steps 1
--eval_batch_size 5000
python3 run_bert_link_prediction_multi_hop.py
--task_name kg
--do_train
--do_extend_train
--do_eval
--do_predict
--data_dir ./demo_data/WN18RR
--bert_model bert-base-uncased
# --load_weights_path $load_path # if needed for relaod model weights from .h5 file
--max_seq_length 60
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_WN18RR_multi-hop/
--gradient_accumulation_steps 1
--eval_batch_size 4500
python3 run_bert_link_prediction_multi_hop.py
--task_name kg
--do_train
--do_extend_train
--do_eval
--do_predict
# --load_weights_path $load_path
--data_dir ./demo_data/umls
--bert_model bert-base-uncased
--max_seq_length 20
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_UMLS_multi-hop/
--gradient_accumulation_steps 1
--eval_batch_size 135
python3 run_bert_link_prediction_multi_hop.py \
--task_name kg
--do_train
--do_extend_train
--do_eval
--do_predict
--data_dir ./demo_data/FB15k-237
--bert_model bert-base-uncased
--max_seq_length 150
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_FB15k-237_multi-hop/
# --load_weights_path $load_path
--gradient_accumulation_steps 1
--eval_batch_size 1500
python3 run_bert_link_prediction_multi_hop.py \
--task_name kg
--do_train
--do_extend_train
--do_eval
--do_predict
--data_dir ./demo_data/NELL_ONE_reconstructed
# --load_weights_path $load_path
--bert_model bert-base-uncased
--max_seq_length 40
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_NELL-ONE_multi-hop/
--gradient_accumulation_steps 1
--eval_batch_size 5000
python3 run_bert_link_prediction_dropout.py \
--task_name kg
--do_train
--do_eval
--do_predict
--data_dir ./demo_data/WN18RR
--bert_model bert-base-uncased
# --load_weights_path $load_path
--max_seq_length 50
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_WN18RR_dropout/
--gradient_accumulation_steps 1
--eval_batch_size 5000
python3 run_bert_link_prediction_dropout.py
--task_name kg
--do_train
--do_eval
--do_predict
# --load_weights_path $load_path
--data_dir ./demo_data/umls
--bert_model bert-base-uncased
--max_seq_length 15
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_UMLS_dropout/
--gradient_accumulation_steps 1
--eval_batch_size 135
python3 run_bert_link_prediction_mixed.py \
--task_name kg
--do_train
--do_extend_train
--do_eval
--do_predict
--data_dir ./demo_data/WN18RR
--bert_model bert-base-uncased
# --load_weights_path $load_path
--max_seq_length 50
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_WN18RR_mixed/
--gradient_accumulation_steps 1
--eval_batch_size 5000
python3 run_bert_link_prediction_mixed.py
--task_name kg
--do_train
--do_eval
--do_predict
# --load_weights_path $load_path
--data_dir ./demo_data/umls
--bert_model bert-base-uncased
--max_seq_length 20
--train_batch_size 32
--learning_rate 5e-5
--num_train_epochs 5.0
--output_dir ./output_UMLS_mixed/
--gradient_accumulation_steps 1
--eval_batch_size 135