adding [SimPER](https://arxiv.org/abs/2502.00883) #4486
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds SimPER (SimPerplexity) support to the CPOTrainer.
This PR:
Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.
File changed:
cpo_trainer.py, cpo_config.py
Motivation / context:
Adding SimPER to the TRL library provides a highly effective, efficient, and simple method for Large Language Model (LLM) alignment.
Proposing SimPER: Simple Alignment with Perplexity Optimization
Short Description of the Method and Link to the Paper
SimPER (Simple alignment with Perplexity optimization) is a minimalist, hyperparameter-free preference optimization algorithm for aligning Large Language Models (LLMs).
The method simplifies alignment by exclusively optimizing inverse perplexity to minimize it for chosen responses and maximize it for rejected responses within a preference dataset. This eliminates the need for preference-specific hyperparameters (like$\beta$ in DPO) and a separate reference model, making it computationally and memory efficient. The paper shows promising performance, often significantly outperforming methods like DPO, KTO, and SimPO on benchmarks like AlpacaEval 2 and the Open LLM Leaderboard.
https://arxiv.org/abs/2502.00883Link to the Implementation (if open-sourced)
The SimPER source code is publicly available on GitHub.
https://github.com/tengxiao1/SimPERLink to Model Weights Trained with the Method (if available)
SimPER has been used as the core training algorithm for a new series of models released by LG AI Research, demonstrating its efficacy in complex, real-world tasks.
https://huggingface.co/LGAI-EXAONEBefore submitting
Pull Request section?
to it if that's the case.
-> readme.md does not mention CPOTrainer and associated methods
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.