-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add num_generations_eval parameter for efficient evaluation #4458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add num_generations_eval parameter for efficient evaluation #4458
Conversation
…umber of generations during evaluation.
qgallouedec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, and welcome to the open-source community!
Regarding the PR — I don’t see a scenario where having a different number of generations during evaluation and training would be necessary. I’ll leave the PR open for now to see if the community expresses interest in this feature. If not, we can close it later.
Thank you @qgallouedec for the review! Community NeedBoth issues #3539 and #3566 specifically request this feature because evaluation overhead was a major bottleneck in their training pipelines. @qgallouedec You also mentioned in your replies to these issues that this problem needs to be addressed, which confirms the necessity of this feature. Why Different
|
|
@qgallouedec, gentle ping. This PR directly addresses the problem you acknowledged in issue #3539, which is also a prerequisite for #3566. It provides a solution for the problem you confirmed needs fixing. A quick decision on this would be appreciated. |
|
Thanks for the PR, we have limited bandwidth, please be patient |
Thanks for the update. No problem, I truly understand the bandwidth constraints — appreciate you and the team's hard work. I'll stay patient, and please don't hesitate to reach out if you have any questions. Look forward to your review when time permits. |



Add
num_generations_evalparameter for efficient evaluationThis is my first open-source PR contribution, I would greatly appreciate any feedback on areas for improvement. Please don't hesitate to suggest changes - I'm eager to learn and make this contribution as good as possible!
What does this PR do?
This PR adds support for using a different number of generations during evaluation compared to training in
GRPOTrainer. This allows users to save computation time during evaluation while maintaining training quality.Fixes
Fix #3539 #3566
Motivation
During training, multiple generations per prompt are often needed for better exploration and diversity. However, during evaluation, fewer generations are typically sufficient to assess model performance. This feature enables more efficient evaluation without compromising training effectiveness.
For example, users can train with 16 generations per prompt but evaluate with only 2 generations, reducing evaluation time by 8x.
Changes Made
1. Added
num_generations_evalparameter toGRPOConfigFile:
trl/trainer/grpo_config.pyAdded a new optional parameter after
num_generations:2. Modified
GRPOTrainer.__init__to store the parameterFile:
trl/trainer/grpo_trainer.pyAdded line 383 to store the new parameter:
3. Updated
_get_eval_samplermethodFile:
trl/trainer/grpo_trainer.pyModified the eval sampler to use
num_generations_evalwhen available:4. Updated vLLM server mode generation logic
File:
trl/trainer/grpo_trainer.py(lines 1166-1173)Modified to dynamically select the correct number of generations based on mode:
5. Updated prompt repetition logic in server mode
File:
trl/trainer/grpo_trainer.py(lines 1223-1231)Modified to repeat prompts the correct number of times:
6. Updated reward computation logic
File:
trl/trainer/grpo_trainer.py(lines 1616-1621)Modified to handle different generation counts for train/eval modes:
Summary of Modified Files
trl/trainer/grpo_config.py: Addednum_generations_evalparameter definitiontrl/trainer/grpo_trainer.py: Modified 4 locations:__init___get_eval_samplermethodBackward Compatibility
Fully backward compatible: When
num_generations_evalisNone(default), the trainer falls back to usingnum_generations, ensuring existing configurations work without any changes.Example Usage
Benefits
Who can review?
This PR is ready for review! Any community member is welcome to provide feedback.
A special thanks to @qgallouedec for considering this PR.
As my first open-source contribution, I'm excited to learn - please don't hesitate to suggest any enhancements!