Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Figure 1: Systematic overview and experiment highlights of SimNPO.

This is the official code repository for the paper Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning.

Abstract

In this work, we address the problem of large language model (LLM) unlearning, aiming to remove unwanted data influences and associated model capabilities (e.g., copyrighted data or harmful content generation) while preserving essential model utilities, without the need for retraining from scratch. Despite the growing need for LLM unlearning, a principled optimization framework remains lacking. To this end, we revisit the state-of-the-art approach, negative preference optimization (NPO), and identify the issue of reference model bias, which could undermine NPO's effectiveness, particularly when unlearning forget data of varying difficulty. Given that, we propose a simple yet effective unlearning optimization framework, called SimNPO, showing that 'simplicity' in removing the reliance on a reference model (through the lens of simple preference optimization) benefits unlearning. We also provide deeper insights into SimNPO's advantages, supported by analysis using mixtures of Markov chains. Furthermore, we present extensive experiments validating SimNPO's superiority over existing unlearning baselines in benchmarks like TOFU and MUSE, and robustness against relearning attacks.

Getting Started

Download Models

To directly using our unlearned model, please refer to our HuggingFace Collection:

🤗OPTML-Group/SimNPO-Unlearned-Models

Contributors

Cite This Work

@article{fan2024simplicity,
  title={Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning},
  author={Fan, Chongyu and Liu, Jiancheng and Lin, Licong and Jia, Jinghan and Zhang, Ruiqi and Mei, Song and Liu, Sijia},
  journal={arXiv preprint arXiv:2410.07163},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Images		Images
MUSE		MUSE
TOFU		TOFU
synthetic		synthetic
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Abstract

Getting Started

Download Models

Contributors

Cite This Work

About

Releases

Packages

Contributors 2

Languages

License

OPTML-Group/Unlearn-Simple

Folders and files

Latest commit

History

Repository files navigation

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

Abstract

Getting Started

Download Models

Contributors

Cite This Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages