Add `compute_metrics` parameter for `GRPOTrainer` #4534

colinzhaoxp · 2025-11-17T14:31:03Z

Add `compute_metrics` parameter for `GRPOTrainer`

This is my first open-source PR contribution, I would greatly appreciate any feedback on areas for improvement. Please don't hesitate to suggest changes - I'm eager to learn and make this contribution as good as possible!

What does this PR do?

This PR adds compute_metrics parameter for GRPOTrainer, which is already supported by Trainer. We can compute accuracy or downstream eval metrics over the evaluation dataset

Fixes related issues

#3729
#2959

Changes Made

Added compute_metrics parameter to GRPOTrainer
File: trl/trainer/grpo_trainer.py

Added a new optional parameter after num_generations:

from transformers.trainer_utils import seed_worker, EvalLoopOutput

class GRPOTrainer(BaseTrainer):
    """
    ...
    compute_metrics (`Callable[[EvalPrediction], Dict]`, *optional*):
    The function that will be used to compute metrics at evaluation. Must take a [`EvalPrediction`] and return
    a dictionary string to metric values. *Note* When passing TrainingArgs with `batch_eval_metrics` set to
    `True`, your compute_metrics function must take a boolean `compute_result` argument. This will be triggered
    after the last eval batch to signal that the function needs to calculate and return the global summary
    statistics rather than accumulating the batch-level statistics
    ...
    """
    def __init__(
        self,
        ...
        compute_metrics: Callable[[EvalLoopOutput], dict] | None = None,
        ...
    ):
    ...
    super().__init__(
        ...
        compute_metrics=compute_metrics,
        ...
    )

Example Usage

def my_eval_function(eval_predict):
       pass

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_func,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=my_eval_function,
)
# trainer.train() # evaluation during training
trainer.evaluate() # or directly evaluate your model.

More examples are available in this blog

Benefits

Flexible: Users can choose their own function to evaluate their model during the trianing.

Who can review?

Any community member is welcome to provide feedback.
As my first open-source contribution, I'm excited to learn - please don't hesitate to suggest any enhancements!

colinzhaoxp · 2025-11-17T14:35:23Z

@kashif @qgallouedec @burtenshaw

Add compute_metrics parameter for GRPOTrainer

6a44fae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `compute_metrics` parameter for `GRPOTrainer` #4534

Add `compute_metrics` parameter for `GRPOTrainer` #4534

Uh oh!

colinzhaoxp commented Nov 17, 2025 •

edited

Loading

Uh oh!

colinzhaoxp commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add compute_metrics parameter for GRPOTrainer #4534

Are you sure you want to change the base?

Add compute_metrics parameter for GRPOTrainer #4534

Uh oh!

Conversation

colinzhaoxp commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add compute_metrics parameter for GRPOTrainer

What does this PR do?

Fixes related issues

Changes Made

Example Usage

Benefits

Who can review?

Uh oh!

colinzhaoxp commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add `compute_metrics` parameter for `GRPOTrainer` #4534

Add `compute_metrics` parameter for `GRPOTrainer` #4534

colinzhaoxp commented Nov 17, 2025 •

edited

Loading

Add `compute_metrics` parameter for `GRPOTrainer`

colinzhaoxp commented Nov 17, 2025 •

edited

Loading