Skip to content

Conversation

@leeparkuky
Copy link

@leeparkuky leeparkuky commented Nov 6, 2025

What does this PR do?

Adds SimPER (SimPerplexity) support to the CPOTrainer.

This PR:

Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.

File changed:

cpo_trainer.py, cpo_config.py

Motivation / context:
Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.

  • Zero Alignment Hyperparameters, Promising Results: SimPER's core objective function requires no tunable hyperparameters such as $\beta$ or temperature. The paper shows promising performance, often outperforming existing methods like DPO, KTO, and SimPO on various leaderboards and benchmarks. This design dramatically simplifies the alignment configuration.
  • Efficiency: SimPER does not require training or storing a separate reference model, contributing to reduced memory and computational overhead during training.
  • Industry Validation: SimPER's efficacy is demonstrated in cutting-edge models from LG AI Research. Both the EXAONE Deep: Reasoning Enhanced Language Models and EXAONE 4.0: Unified Large Language Models utilize our SimPER for preference optimization, achieving superior or competitive performance in complex domains.

Proposing SimPER: Simple Alignment with Perplexity Optimization

Short Description of the Method and Link to the Paper

SimPER (Simple alignment with Perplexity optimization) is a minimalist, hyperparameter-free preference optimization algorithm for aligning Large Language Models (LLMs).

The method simplifies alignment by exclusively optimizing inverse perplexity to minimize it for chosen responses and maximize it for rejected responses within a preference dataset. This eliminates the need for preference-specific hyperparameters (like $\beta$ in DPO) and a separate reference model, making it computationally and memory efficient. The paper shows promising performance, often significantly outperforming methods like DPO, KTO, and SimPO on benchmarks like AlpacaEval 2 and the Open LLM Leaderboard.

  • Paper Link: https://arxiv.org/abs/2502.00883

Link to the Implementation (if open-sourced)

The SimPER source code is publicly available on GitHub.

  • GitHub Repository: https://github.com/tengxiao1/SimPER

Link to Model Weights Trained with the Method (if available)

SimPER has been used as the core training algorithm for a new series of models released by LG AI Research, demonstrating its efficacy in complex, real-world tasks.

  • EXAONE Deep Models (LG AI Research): The EXAONE Deep series models, including the 2.4B, 7.8B, and 32B variants, utilize SimPER for training and show superior capabilities in reasoning tasks like math and coding benchmarks.
    • Hugging Face Link: https://huggingface.co/LGAI-EXAONE

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
    -> readme.md does not mention CPOTrainer and associated methods
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qgallouedec
Copy link
Member

thanks! can you please also add a small section in https://huggingface.co/docs/trl/main/en/paper_index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants