-
Notifications
You must be signed in to change notification settings - Fork 2.3k
😷 Refactor GRPO/RLOO to isolate _generate
#4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_thw` in GRPO and RLOO trainers; update `split_pixel_values_by_grid` to use `image_grid_thw`
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
cc @Peter-Chou, customization is made easier with this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the tests pass and a question about whether we should split _generate from scoring entirely
|
@qgallouedec Yes. The original |
_generate_generate
_generate_generate
_generate_generate
_generate_generate
|
I'm quite interested in following the refactoring process, as the biggest problem for me is scaling to context sizes of up to 32K, and a lot of OOM bombs get hit right now. |
|
This refactoring is actually for enabling multi-turn RL (with tool calling) |
This PR belongs to a sequence of PR that aims to refactor the generation part of GRPO/RLOO to allow for easier customization and ultimately tool calling
Previous:
image_split_sizesin favour ofimage_grid_thw#4111Next:
_generatein GRPO/RLOO: list of ints instead of tensors #4146_generatein GRPO/RLOO: Useprompt_idsfrom generation #4152_generatein GRPO/RLOO: Rely on generator for prompt truncation #4153_generatein GRPO/RLOO: Moveforward_kwargsoutside generation method #4154_generatein GRPO/RLOO: Insert images in the prompt #4155