Skip to content

dice-group/Benchmarking-KE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking-KE

This repository contains a benchmark evaluation of Knowledge Editing using logical rules. Our methodology includes multi-hop questions generated using logical rules to evaluate the effectiveness of knowledge editing methods. We conducted experiments on the popular approaches ROME, FT and KN and, the results show a considerable performance gap of up to 24% between evaluations on directly edited knowledge and on entailed knowledge particularly on ROME and FT.

Approach Diagram

Installation

To start, install the required packages:

cd evaluate_rules
pip install torch
pip install -r requirements.txt

Ensure that all dependencies are correctly installed

SPARQL Querry

To get triples from the KG, we extracted entities in MLaKE and MQuAKE which we use to query the DICE Dbpedia endpoint and construct our sub-knowledge graph. We use the following SPARQL queries :

    query = f"""SELECT ?s ?p ?o
    WHERE {{
      {{
        SELECT ?s ?p ?o WHERE {{
          VALUES ?s {{ <{entity_uris}> }}
          ?s ?p ?o
          FILTER (
            ?p != <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#label> &&
            ?p != <http://www.w3.org/2002/07/owl#sameAs> &&
            ?p != <http://dbpedia.org/property/wikiPageUsesTemplate> &&
            ?p != <http://dbpedia.org/ontology/wikiPageRedirects> &&
            ?p != <http://dbpedia.org/ontology/almaMater> &&
            ?p != <http://dbpedia.org/ontology/wikiPageExternalLink> &&
            ?p != <http://dbpedia.org/ontology/wikiPageWikiLink> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#comment>
          )
        }}
        LIMIT 100
      }} UNION
      {{
        SELECT ?s ?p ?o WHERE {{
          VALUES ?o {{ <{entity_uris}> }}
          ?s ?p ?o
          FILTER (
            ?p != <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#label> &&
            ?p != <http://www.w3.org/2002/07/owl#sameAs> &&
            ?p != <http://dbpedia.org/property/wikiPageUsesTemplate> &&
            ?p != <http://dbpedia.org/ontology/wikiPageRedirects> &&
            ?p != <http://dbpedia.org/ontology/almaMater> &&
            ?p != <http://dbpedia.org/ontology/wikiPageExternalLink> &&
            ?p != <http://dbpedia.org/ontology/wikiPageWikiLink> &&
            ?p != <http://www.w3.org/2000/01/rdf-schema#comment>
          )
        }}
        LIMIT 100
      }}
    }}"""

To get the triples files in the corrected form, run:

cd evaluate_rules/
python SparqlQuery.py

Running AMIE

To get the rules, run the following commands:

cd evaluate_rules/amie
java -jar amie-dev.jar  -mins 1 ../all_triples/processed_triples3.txt > ../all_triples/output_file.txt

Make sure that you have the latest version [Java] installed to run AMIE, Download an AMIE executable jar file [AMIE-JAR], and run the commands above.

Generating Multihop Questions and Answers

To generate the questions and answer run:

python generateQA.py

The outputs print the question and the answer related to the given rules and facts and save the QA dict into a file.

The datasets used in the experiments, all triples, rules and multihop_qa_pairs for each dataset are found in /evaluate_rules/all_triples .

Running KE Methods

We first run KE methods over the selected datasets (MLaKE and MQuAKE) and save the models weights To do so, you will need to clone rome repository into your local folder, and and run the following commands :

git clone https://github.com/kmeng01/rome.git
cd rome/rome  or cd rome/baselines for others KE

python rome_main.py --model_name openai-community/gpt2-large --dataset_path ../evaluate_rules/all_triples/MLaKE/new_en-qa.json --config ../hparams/ROME/gpt2-large.json --save_dir edited_models  #Feel free to change the model and the datasets path

Config files for each KE can be found in /hparams and others KE are placed in /baselines. Some examples of python code used to run each KE are found in /examples folder.

Evaluating KE Methods

To evaluate existing KE techniques on directly edited or correlated knowledge after saving models weights, run the followings commands .

python evaluate_rules/rome_eval.py  #for correlated knowledge 
python evaluate_rules/rome_eval_direct   #for directly edited knowledge

This will save the evaluation results in /results

Results on LLMs

ROME

MLaKEMQuAKE
ModelsF1F1
gpt2-medium16.364.21
gpt2-large10.232.13
gpt2-xl12.901.51
gpt-j8.56 -
Correlated knowledge
gpt2-medium8.671.91
gpt2-large6.082.42
gpt2-xl7.173.84
gpt-j15.91 -

FT

MLaKEMQuAKE
ModelsF1F1
gpt2-medium_constr15.184.97
gpt2-large_constr24.589.10
gpt2-xl_constr17.154.25
Correlated knowledge
gpt2-medium_constr0.900.008
gpt2-large_constr0.450.28
gpt2-xl_constr0.630.0

KN

MLaKEMQuAKE
ModelsF1F1
gpt2-xl1.344.67
Correlated knowledge
gpt2-xl14.2618.53

Maintenance Plan

In the future, we will conduct the following experiments by adding other KE methods such as:

  • MEND
  • KE
  • MeLLo

Including the following knowledge graphs:

  • Wikidata
  • YAGO
ModelStatus
LLama2Upcoming
GPT-3-based architecturesIn progress
MistralUpcoming

Citation

 Moteu Ngoli, T. (2025). Benchmarking Knowledge Editing using Logical Rules (1.0.0) [Data set]. THE 24th INTERNATIONAL SEMANTIC WEB CONFERENCE (ISWC 2025), Nara, Japan. Zenodo. https://doi.org/10.5281/zenodo.15697400

DOI

https://doi.org/10.5281/zenodo.15697400

Contact

Feel free to contact us at tatianam@mail.uni-paderborn.de if you have any questions.

About

This repository is about benchmarking knowledge editing using logical rules

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages