adding [SimPER](https://arxiv.org/abs/2502.00883) #4486

leeparkuky · 2025-11-06T02:56:51Z

What does this PR do?

Adds SimPER (SimPerplexity) support to the CPOTrainer.

This PR:

Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.

File changed:

cpo_trainer.py, cpo_config.py

Motivation / context:
Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.

Zero Alignment Hyperparameters, Promising Results: SimPER's core objective function requires no tunable hyperparameters such as $\beta$ or temperature. The paper shows promising performance, often outperforming existing methods like DPO, KTO, and SimPO on various leaderboards and benchmarks. This design dramatically simplifies the alignment configuration.
Efficiency: SimPER does not require training or storing a separate reference model, contributing to reduced memory and computational overhead during training.
Industry Validation: SimPER's efficacy is demonstrated in cutting-edge models from LG AI Research. Both the EXAONE Deep: Reasoning Enhanced Language Models and EXAONE 4.0: Unified Large Language Models utilize our SimPER for preference optimization, achieving superior or competitive performance in complex domains.

Proposing SimPER: Simple Alignment with Perplexity Optimization

Short Description of the Method and Link to the Paper

SimPER (Simple alignment with Perplexity optimization) is a minimalist, hyperparameter-free preference optimization algorithm for aligning Large Language Models (LLMs).

The method simplifies alignment by exclusively optimizing inverse perplexity to minimize it for chosen responses and maximize it for rejected responses within a preference dataset. This eliminates the need for preference-specific hyperparameters (like $\beta$ in DPO) and a separate reference model, making it computationally and memory efficient. The paper shows promising performance, often significantly outperforming methods like DPO, KTO, and SimPO on benchmarks like AlpacaEval 2 and the Open LLM Leaderboard.

Paper Link: https://arxiv.org/abs/2502.00883

Link to the Implementation (if open-sourced)

The SimPER source code is publicly available on GitHub.

GitHub Repository: https://github.com/tengxiao1/SimPER

Link to Model Weights Trained with the Method (if available)

SimPER has been used as the core training algorithm for a new series of models released by LG AI Research, demonstrating its efficacy in complex, real-world tasks.

EXAONE Deep Models (LG AI Research): The EXAONE Deep series models, including the 2.4B, 7.8B, and 32B variants, utilize SimPER for training and show superior capabilities in reasoning tasks like math and coding benchmarks.
- Hugging Face Link: https://huggingface.co/LGAI-EXAONE

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
-> readme.md does not mention CPOTrainer and associated methods
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-11-07T05:19:38Z

thanks! can you please also add a small section in https://huggingface.co/docs/trl/main/en/paper_index

adding [SimPER](https://arxiv.org/abs/2502.00883)

549971f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding [SimPER](https://arxiv.org/abs/2502.00883) #4486

adding [SimPER](https://arxiv.org/abs/2502.00883) #4486

Uh oh!

leeparkuky commented Nov 6, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adding [SimPER](https://arxiv.org/abs/2502.00883) #4486

Are you sure you want to change the base?

adding [SimPER](https://arxiv.org/abs/2502.00883) #4486

Uh oh!

Conversation

leeparkuky commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation / context: Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.

Proposing SimPER: Simple Alignment with Perplexity Optimization

Short Description of the Method and Link to the Paper

Link to the Implementation (if open-sourced)

Link to Model Weights Trained with the Method (if available)

Before submitting

Who can review?

Uh oh!

qgallouedec commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leeparkuky commented Nov 6, 2025 •

edited

Loading

Motivation / context:
Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.