-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing GenerationConfig in the GRPO Trainer #2702
Comments
More control on the generation does make sense. A reasonable way to allow for more control is probably to add more generation args in the GRPOConfig. |
Yes I will contribute this feature. |
Please add stop_strings or stopping criteria :) Although I don’t see why not exposing the full generation config to avoid the next issue of this type in a few weeks. |
@qgallouedec wdyt? Either we should just directly expose the entire generation config because there are all kinds of tricks that people might want to tune there. |
Would love to see this in Online DPO as well. Currently it's hard-coded to |
Since we are doing vLLM now, this might need some extra design in it. Either we expose one of them (HF preferrablly) and convert automatically into vLLM config. Or we allow two mutually exclusive configs being supplied. But I guess explicit is better than implicit. |
Feature request
Often people need to customize the generation config, now it's embedded in the training loop. Should be easy to extract it out.
Motivation
Customization
Your contribution
I can help to contribute.
The text was updated successfully, but these errors were encountered: