Benchmarking-KE

This repository contains a benchmark evaluation of Knowledge Editing using logical rules. Our methodology includes multi-hop questions generated using logical rules to evaluate the effectiveness of knowledge editing methods. We conducted experiments on the popular approaches ROME, FT and KN and, the results show a considerable performance gap of up to 24% between evaluations on directly edited knowledge and on entailed knowledge particularly on ROME and FT.

Installation

To start, install the required packages:

cd evaluate_rules
pip install torch
pip install -r requirements.txt

Ensure that all dependencies are correctly installed

SPARQL Querry

To get triples from the KG, we extracted entities in MLaKE and MQuAKE which we use to query the DICE Dbpedia endpoint and construct our sub-knowledge graph. We use the following SPARQL queries :

    query = f"""SELECT ?s ?p ?o
    WHERE {{
      {{
        SELECT ?s ?p ?o WHERE {{
          VALUES ?s {{ <{entity_uris}> }}
          ?s ?p ?o
          FILTER (
            ?p != <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#label> &&
            ?p != <http://www.w3.org/2002/07/owl#sameAs> &&
            ?p != <http://dbpedia.org/property/wikiPageUsesTemplate> &&
            ?p != <http://dbpedia.org/ontology/wikiPageRedirects> &&
            ?p != <http://dbpedia.org/ontology/almaMater> &&
            ?p != <http://dbpedia.org/ontology/wikiPageExternalLink> &&
            ?p != <http://dbpedia.org/ontology/wikiPageWikiLink> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#comment>
          )
        }}
        LIMIT 100
      }} UNION
      {{
        SELECT ?s ?p ?o WHERE {{
          VALUES ?o {{ <{entity_uris}> }}
          ?s ?p ?o
          FILTER (
            ?p != <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#label> &&
            ?p != <http://www.w3.org/2002/07/owl#sameAs> &&
            ?p != <http://dbpedia.org/property/wikiPageUsesTemplate> &&
            ?p != <http://dbpedia.org/ontology/wikiPageRedirects> &&
            ?p != <http://dbpedia.org/ontology/almaMater> &&
            ?p != <http://dbpedia.org/ontology/wikiPageExternalLink> &&
            ?p != <http://dbpedia.org/ontology/wikiPageWikiLink> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#comment>
          )
        }}
        LIMIT 100
      }}
    }}"""

To get the triples files in the corrected form, run:

cd evaluate_rules/
python SparqlQuery.py

Running AMIE

To get the rules, run the following commands:

cd evaluate_rules/amie
java -jar amie-dev.jar  -mins 1 ../all_triples/processed_triples3.txt > ../all_triples/output_file.txt

Make sure that you have the latest version [Java] installed to run AMIE, Download an AMIE executable jar file [AMIE-JAR], and run the commands above.

Generating Multihop Questions and Answers

To generate the questions and answer run:

python generateQA.py

The outputs print the question and the answer related to the given rules and facts and save the QA dict into a file.

The datasets used in the experiments, all triples, rules and multihop_qa_pairs for each dataset are found in /evaluate_rules/all_triples .

Running KE Methods

We first run KE methods over the selected datasets (MLaKE and MQuAKE) and save the models weights To do so, you will need to clone rome repository into your local folder, and and run the following commands :

git clone https://github.com/kmeng01/rome.git
cd rome/rome  or cd rome/baselines for others KE

python rome_main.py --model_name openai-community/gpt2-large --dataset_path ../evaluate_rules/all_triples/MLaKE/new_en-qa.json --config ../hparams/ROME/gpt2-large.json --save_dir edited_models  #Feel free to change the model and the datasets path

Config files for each KE can be found in /hparams and others KE are placed in /baselines. Some examples of python code used to run each KE are found in /examples folder.

Evaluating KE Methods

To evaluate existing KE techniques on directly edited or correlated knowledge after saving models weights, run the followings commands .

python evaluate_rules/rome_eval.py  #for correlated knowledge 
python evaluate_rules/rome_eval_direct   #for directly edited knowledge

This will save the evaluation results in /results

Results on LLMs

ROME

	MLaKE	MQuAKE
Models	F1	F1
gpt2-medium	16.36	4.21
gpt2-large	10.23	2.13
gpt2-xl	12.90	1.51
gpt-j	8.56	-
Correlated knowledge
gpt2-medium	8.67	1.91
gpt2-large	6.08	2.42
gpt2-xl	7.17	3.84
gpt-j	15.91	-

FT

	MLaKE	MQuAKE
Models	F1	F1
gpt2-medium_constr	15.18	4.97
gpt2-large_constr	24.58	9.10
gpt2-xl_constr	17.15	4.25
Correlated knowledge
gpt2-medium_constr	0.90	0.008
gpt2-large_constr	0.45	0.28
gpt2-xl_constr	0.63	0.0

KN

	MLaKE	MQuAKE
Models	F1	F1
gpt2-xl	1.34	4.67
Correlated knowledge
gpt2-xl	14.26	18.53

Maintenance Plan

In the future, we will conduct the following experiments by adding other KE methods such as:

MEND
KE
MeLLo

Including the following knowledge graphs:

Wikidata
YAGO

Model	Status
LLama2	Upcoming
GPT-3-based architectures	In progress
Mistral	Upcoming

Citation

 Moteu Ngoli, T. (2025). Benchmarking Knowledge Editing using Logical Rules (1.0.0) [Data set]. THE 24th INTERNATIONAL SEMANTIC WEB CONFERENCE (ISWC 2025), Nara, Japan. Zenodo. https://doi.org/10.5281/zenodo.15697400

DOI

https://doi.org/10.5281/zenodo.15697400

Contact

Feel free to contact us at tatianam@mail.uni-paderborn.de if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
evaluate_rules		evaluate_rules
examples		examples
images		images
rome @ 0874014		rome @ 0874014
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking-KE

Installation

SPARQL Querry

Running AMIE

Generating Multihop Questions and Answers

Running KE Methods

Evaluating KE Methods

Results on LLMs

ROME

FT

KN

Maintenance Plan

Citation

DOI

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

dice-group/Benchmarking-KE

Folders and files

Latest commit

History

Repository files navigation

Benchmarking-KE

Installation

SPARQL Querry

Running AMIE

Generating Multihop Questions and Answers

Running KE Methods

Evaluating KE Methods

Results on LLMs

ROME

FT

KN

Maintenance Plan

Citation

DOI

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages