This repository contains the code & data of the paper Linking Surface Facts to Large-Scale Knowledge Graphs, published at EMNLP 2023.
Using miniconda, the virtual environment including all dependencies should be easily reproduced as conda env create -f environment.yml
.
The models that will be released soon are:
Model | Dataset | Download |
---|---|---|
OIE pre-ranker | REBEL | Link |
OIE pre-ranker + Context | REBEL | Link |
OIE-Fact re-ranker | REBEL | Link |
OIE pre-ranker | SynthIE | Link |
note that we are currently waiting for an approval of our legal team before the release. Please reach out via to some of the authors and we can provide temporary access.
Downloading and processing Wikidata takes a while, and for that reason we release a .json
file dump of Wikidata. After cloning the repository, you can obtain the processed version of Wikidata inside the data/wikidata
directory as:
conda install -c conda-forge git-lfs
git lfs install
git lfs pull
The datasets are released with the same license as REBEL. You can find our data here.
The Wikidata Knowledge Graph embeddings are needed for inference, and will be released with the models. Similarly, reach out to some of the authors via email and we can provide you with the embeddings.
Assuming the virutal environment is activated (conda activate kg-grounding
), the models are downloaded (e.g., the OIE pre-ranker trained on REBEL, and the OIE-Fact re-ranker) in the experiments/
directory, the datasets are either (re-)created or downloaded, and Wikidata is downloaded, inference can be run as:
python src/inference-slot-linking.py --slot_linking_experiment_path "experiments/preranker-rebel-context/" --fact_reranking_experiment_path "experiments/reranker-rebel/" --reranker_k 2 --opts DEVICE "cuda:0" BATCH_SIZE 128 NUM_WORKERS 8 TEST_DATASET_PATH "data/datasets/val_inductive.json" INDEX_PATH "experiments/preranker-rebel-context/kg-index"
where INDEX_PATH
indicates whether we perform OIE linking on a benchmark-restricted-KG-index (kg-index
), or large-scale-KG-index (full-kg-index
).
Training new models also assumes that the environment is activated, and that the datasets and Wikidata are downloaded. Then, you can train (e.g., an OIE pre-ranker) model as:
python src/train-slot-linking.py --config_path "configs/preranker.yaml"
In order to modify any of the config.yaml
values provide --opts ...
after the config_path
as: `--opts BATCH_SIZE 128 NUM_WORKERS 12 ...
Please see the license file.
If you use our work or resources for your research, please cite the following paper:
@article{radevski2023linking,
title={Linking surface facts to large-scale knowledge graphs},
author={Radevski, Gorjan and Gashteovski, Kiril and Hung, Chia-Chien and Lawrence, Carolin and Glava{\v{s}}, Goran},
journal={arXiv preprint arXiv:2310.14909},
year={2023}
}