😷 Refactor GRPO/RLOO to isolate `_generate` #4114

qgallouedec · 2025-09-20T05:11:36Z

This PR belongs to a sequence of PR that aims to refactor the generation part of GRPO/RLOO to allow for easier customization and ultimately tool calling

🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors #4146
🧺 [2/N] Refactor _generate in GRPO/RLOO: Use prompt_ids from generation #4152
🧺 [3/N] Refactor _generate in GRPO/RLOO: Rely on generator for prompt truncation #4153
🧺 [4/N] Refactor _generate in GRPO/RLOO: Move forward_kwargs outside generation method #4154
🧺 [5/N] Refactor _generate in GRPO/RLOO: Insert images in the prompt #4155

…_thw` in GRPO and RLOO trainers; update `split_pixel_values_by_grid` to use `image_grid_thw`

HuggingFaceDocBuilderDev · 2025-09-20T05:15:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-09-20T17:14:32Z

cc @Peter-Chou, customization is made easier with this one

lewtun

LGTM once the tests pass and a question about whether we should split _generate from scoring entirely

trl/trainer/grpo_trainer.py

Peter-Chou · 2025-09-22T13:31:55Z

@qgallouedec Yes. The original _generate_and_score_completions method was way too lengthy.
Breaking it down into finer-grained sub-methods and chaining them together like this is an excellent approach!

LeonMalteW · 2025-10-22T17:30:41Z

I'm quite interested in following the refactoring process, as the biggest problem for me is scaling to context sizes of up to 32K, and a lot of OOM bombs get hit right now.
Will the final refactor turn _generate_and_score_completions and all the subfunctions into a pipeline so that we don't need to keep all the data on all GPUs the same, and only the necessary data gets synced? Or is this too specific and would it not hit the target of the refactor?

qgallouedec · 2025-10-22T20:32:30Z

This refactoring is actually for enabling multi-turn RL (with tool calling)

qgallouedec added 16 commits September 19, 2025 20:57

Refactor image handling: replace image_split_sizes with `image_grid…

552e899

…_thw` in GRPO and RLOO trainers; update `split_pixel_values_by_grid` to use `image_grid_thw`

simpler

449ef07

gfpo

c8933aa

multi-image grpo

229c554

log with wandb

3ca6ad5

no vlm reward models

dcf4b92

rloo

30ad7ca

gfpo

86cc30b

fix

088897b

test peft

d2adc63

fix gfpo

f4c82bf

rloo test

1257796

peft rloo

099a39b

oops

529add6

update test

fc6b11f

generate method

ae1f497

qgallouedec changed the base branch from main to multi-image-support September 20, 2025 05:11

qgallouedec and others added 7 commits September 20, 2025 05:18

debug

f998432

skip failing test

fa73876

Merge branch 'main' into drop-image_split_sizes

52d8bd9

Merge branch 'drop-image_split_sizes' into multi-image-support

dfc0d38

test fixed!

fc52e68

Merge branch 'multi-image-support' into generate-method

4d12aeb

gfpo

4fc2b5b

qgallouedec added 2 commits September 20, 2025 17:15

rm vllm

b628744

fix doc

d3a769f

lewtun approved these changes Sep 22, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

Merge branch 'main' into generate-method

a6a8c44

qgallouedec changed the title ~~Refactor GRPO to isolate _generate~~ 😷 Refactor GRPO to isolate _generate Sep 23, 2025

qgallouedec and others added 5 commits September 22, 2025 20:21

Merge branch 'main' into generate-method

d8665e1

Merge branch 'main' into generate-method

365d501

Merge branch 'main' into generate-method

cdb4c76

same for rloo

c83e710

nits style and align

ec6ad25

qgallouedec changed the title ~~😷 Refactor GRPO to isolate _generate~~ 😷 Refactor GRPO/RLOO to isolate _generate Sep 24, 2025

qgallouedec and others added 3 commits September 24, 2025 13:57

Merge branch 'main' into generate-method

b4cadde

restart

b0dceb9

progress

ebe32c2

qgallouedec changed the title ~~😷 Refactor GRPO/RLOO to isolate _generate~~ [WIP] 😷 Refactor GRPO/RLOO to isolate _generate Sep 25, 2025

qgallouedec and others added 5 commits September 25, 2025 18:24

progress continues

0213662

progress again again

8b3a724

back to working point

c1ae6aa

revert chage data utils

1a66b43

Merge branch 'main' into generate-method

2dc69a6

qgallouedec changed the title ~~[WIP] 😷 Refactor GRPO/RLOO to isolate _generate~~ 😷 Refactor GRPO/RLOO to isolate _generate Sep 26, 2025

qgallouedec merged commit 9603b41 into main Sep 26, 2025
5 of 12 checks passed

qgallouedec deleted the generate-method branch September 26, 2025 02:48

qgallouedec mentioned this pull request Sep 26, 2025

🧺 [1/N] Refactor _generate in GRPO/RLOO: list of ints instead of tensors #4146

Merged

kashif pushed a commit that referenced this pull request Sep 30, 2025

😷 Refactor GRPO/RLOO to isolate _generate (#4114)

85ff28d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

😷 Refactor GRPO/RLOO to isolate `_generate` #4114

😷 Refactor GRPO/RLOO to isolate `_generate` #4114

Uh oh!

qgallouedec commented Sep 20, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 20, 2025

Uh oh!

qgallouedec commented Sep 20, 2025

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

Peter-Chou commented Sep 22, 2025

Uh oh!

Uh oh!

LeonMalteW commented Oct 22, 2025

Uh oh!

qgallouedec commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

😷 Refactor GRPO/RLOO to isolate _generate #4114

😷 Refactor GRPO/RLOO to isolate _generate #4114

Uh oh!

Conversation

qgallouedec commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 20, 2025

Uh oh!

qgallouedec commented Sep 20, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Peter-Chou commented Sep 22, 2025

Uh oh!

Uh oh!

LeonMalteW commented Oct 22, 2025

Uh oh!

qgallouedec commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

😷 Refactor GRPO/RLOO to isolate `_generate` #4114

😷 Refactor GRPO/RLOO to isolate `_generate` #4114

qgallouedec commented Sep 20, 2025 •

edited

Loading