Skip to content

Conversation

@qgallouedec qgallouedec changed the base branch from main to multi-image-support September 20, 2025 05:11
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member Author

cc @Peter-Chou, customization is made easier with this one

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the tests pass and a question about whether we should split _generate from scoring entirely

@Peter-Chou
Copy link
Contributor

@qgallouedec Yes. The original _generate_and_score_completions method was way too lengthy.
Breaking it down into finer-grained sub-methods and chaining them together like this is an excellent approach!

@qgallouedec qgallouedec changed the title Refactor GRPO to isolate _generate 😷 Refactor GRPO to isolate _generate Sep 23, 2025
@qgallouedec qgallouedec changed the title 😷 Refactor GRPO to isolate _generate 😷 Refactor GRPO/RLOO to isolate _generate Sep 24, 2025
@qgallouedec qgallouedec changed the title 😷 Refactor GRPO/RLOO to isolate _generate [WIP] 😷 Refactor GRPO/RLOO to isolate _generate Sep 25, 2025
@LeonMalteW
Copy link

I'm quite interested in following the refactoring process, as the biggest problem for me is scaling to context sizes of up to 32K, and a lot of OOM bombs get hit right now.
Will the final refactor turn _generate_and_score_completions and all the subfunctions into a pipeline so that we don't need to keep all the data on all GPUs the same, and only the necessary data gets synced? Or is this too specific and would it not hit the target of the refactor?

@qgallouedec
Copy link
Member Author

This refactoring is actually for enabling multi-turn RL (with tool calling)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants