Evaluating Deep Unlearning in Large Language Model

This is the official github page for the paper Evaluating Deep Unlearning in Large Language Model. This repository provides the code for using the benchmark EDU-RELAT, the evaluation code given any unlearning results, and the scripts of running four unlearning methods presented in the paper.

Preparation

Install the environment

conda env create -f environment.yml

Download the model checkpoints from this link that is finetuned on our synthetic data EDU-RELAT. The layout would be

deep_unlearning/
    ft_model_checkpoint/
        ft_gpt2-xl
        ft_llama2-7b
        ft_llama3-8b
        ft_phi

Set up your huggingface key HF_TOKEN in the os environment.

Load Synthetic Dataset EDU-RELAT

Here we provide the code to load the knowledge base and rule sets

Load the QA-form knowledge base as huggingfact datasets

from datasets import Dataset
dataset_relationships = Dataset.from_dict(torch.load("synthetic_data/family_relationships.pt")) #load the facts in family relationships
dataset_biographies = Dataset.from_dict(torch.load("synthetic_data/family_biographies.pt")) #load the facts in biographies

The question and answer are with the key question4 and answer4.

Load the rule set

from calculate_recall_and_acc import Rule
rule_list = torch.load("synthetic_data/family_rule.pt")

Load the relationships as a list of tuple

from calculate_recall_and_acc import Person
(edge_list,relation_list, _, _) = torch.load("synthetic_data/family-200-graph.pt") #edge_list is a list of pairs of two people; relation_list is a list of relationthips in string, e.g. child.

Evaluate Deep Unlearning on EDU-RELAT

We select a random subset of size 55 from the facts in family relationship to evaluate the deep unlearning. Given any unlearn_target_data_id in 0-54, the id of fact in relationships is

shuffled_edge_id_list = torch.load("synthetic_data/subsample.pt")
shuffled_unlearn_data_id = shuffled_edge_id_list[unlearn_target_data_id]

For any unlearning method, suppose relationships_correct.pt and biographies_correct.pt are two 0-1 vectors saved under the directory input_dir, which indicate the retained facts in family relationships and biographies after the unlearning the fact unlearn_target_data_id. The following script will calculate the recall and accuracy for this unlearning method to unlearn the target data.

python calculate_recall_and_acc.py --unlearn_data_id $unlearn_target_data_id --input_dir $input_dir

Reproducing the results of unlearning methods in the paper

In the paper, we tested with four unlearning methods: gradient ascent (GA), Negative preference optimization (NPO), task vector (TV), who's harry potter (WHP). The hyperparameter list of each method is saved in config/model_config.yaml. The related scripts are saved in ./unlearning_methods. By set any unlearning_method (ga, npo, tv, whp), any target_model (phi, gpt2-xl, llama2-7b, llama3-8b), and unlearn_target_data_id (0-54), the script is

bash unlearning_methods/${unlearning_methods}.sh $target_model $unlearn_target_data_id

After running unlearning methods, the code will save two 0-1 vectors relationships_correct.pt and biographies_correct.pt under the directory unlearning_checkpoint/${unlearning_methods}/${target_model}/${unlearn_target_data_id}/checkpoint-${hyperparameter}. Then run the script in the above section to calcualte the recall and accuracy.

Citing Our Work

If you find our codebase and dataset beneficial, please cite our work:

@misc{wu2024evaluatingdeepunlearninglarge,
      title={Evaluating Deep Unlearning in Large Language Models}, 
      author={Ruihan Wu and Chhavi Yadav and Russ Salakhutdinov and Kamalika Chaudhuri},
      year={2024},
      eprint={2410.15153},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.15153}, 
}

Acknowledgment

We would like to thank the authors of TOFU and MUSE. Our code is built upon the github repository of them.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
scripts_unlearning_methods		scripts_unlearning_methods
synthetic_data		synthetic_data
LICENSE		LICENSE
README.md		README.md
calculate_recall_and_acc.py		calculate_recall_and_acc.py
data_module.py		data_module.py
environment.yml		environment.yml
evaluate_util.py		evaluate_util.py
finetune_reinforced_model.py		finetune_reinforced_model.py
forget.py		forget.py
task_vector.py		task_vector.py
tv_run.py		tv_run.py
unlearn_trainer.py		unlearn_trainer.py
utils.py		utils.py
vllm_eval.py		vllm_eval.py
whp.py		whp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Deep Unlearning in Large Language Model

Preparation

Load Synthetic Dataset EDU-RELAT

Evaluate Deep Unlearning on EDU-RELAT

Reproducing the results of unlearning methods in the paper

Citing Our Work

Acknowledgment

About

Releases

Packages

Languages

License

wrh14/deep_unlearning

Folders and files

Latest commit

History

Repository files navigation

Evaluating Deep Unlearning in Large Language Model

Preparation

Load Synthetic Dataset EDU-RELAT

Evaluate Deep Unlearning on EDU-RELAT

Reproducing the results of unlearning methods in the paper

Citing Our Work

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages