Skip to content

Conversation

@colinzhaoxp
Copy link

@colinzhaoxp colinzhaoxp commented Nov 17, 2025

Add compute_metrics parameter for GRPOTrainer

This is my first open-source PR contribution, I would greatly appreciate any feedback on areas for improvement. Please don't hesitate to suggest changes - I'm eager to learn and make this contribution as good as possible!

What does this PR do?

This PR adds compute_metrics parameter for GRPOTrainer, which is already supported by Trainer. We can compute accuracy or downstream eval metrics over the evaluation dataset

Fixes related issues

#3729
#2959

Changes Made

Added compute_metrics parameter to GRPOTrainer
File: trl/trainer/grpo_trainer.py

Added a new optional parameter after num_generations:

from transformers.trainer_utils import seed_worker, EvalLoopOutput

class GRPOTrainer(BaseTrainer):
    """
    ...
    compute_metrics (`Callable[[EvalPrediction], Dict]`, *optional*):
    The function that will be used to compute metrics at evaluation. Must take a [`EvalPrediction`] and return
    a dictionary string to metric values. *Note* When passing TrainingArgs with `batch_eval_metrics` set to
    `True`, your compute_metrics function must take a boolean `compute_result` argument. This will be triggered
    after the last eval batch to signal that the function needs to calculate and return the global summary
    statistics rather than accumulating the batch-level statistics
    ...
    """
    def __init__(
        self,
        ...
        compute_metrics: Callable[[EvalLoopOutput], dict] | None = None,
        ...
    ):
    ...
    super().__init__(
        ...
        compute_metrics=compute_metrics,
        ...
    )

Example Usage

def my_eval_function(eval_predict):
       pass

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_func,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=my_eval_function,
)
# trainer.train() # evaluation during training
trainer.evaluate() # or directly evaluate your model.

More examples are available in this blog

Benefits

  • Flexible: Users can choose their own function to evaluate their model during the trianing.

Who can review?

Any community member is welcome to provide feedback.
As my first open-source contribution, I'm excited to learn - please don't hesitate to suggest any enhancements!

@colinzhaoxp
Copy link
Author

colinzhaoxp commented Nov 17, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant