-
Notifications
You must be signed in to change notification settings - Fork 2.3k
docs: Extend CLI basic usage examples to all supported CLIs #4425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: Extend CLI basic usage examples to all supported CLIs #4425
Conversation
Resolves huggingface#4378 - Add GRPO CLI examples with trl-lib/ultrafeedback-prompt dataset - Add RLOO CLI examples with AI-MO/NuminaMath-TIR dataset - Add KTO CLI examples with trl-lib/kto-mix-14k dataset - Add examples to all sections: Basic Usage, Config Files, Accelerate, accelerate_config, and dataset mixtures - Ensure parity in documentation coverage across all 6 training CLIs - Verified CLI commands exist in trl/cli.py (lines 47-51) - Verified datasets match official examples in examples/scripts/
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sergiopaniego
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify the <hfoptions> by giving all of them the same id, so once you click on your preferred trainer, all of the options switch to it.
For the ... inline - ... w/ config file cases, we can expand these sections, concatenating both of them and only using the trainer name, so in the options we'd only use the trainer names.
Addresses review comments on PR #4425: 1. Unified all <hfoptions> IDs to 'trainers' (5 sections) - Enables persistent trainer selection across all documentation sections - Improved user experience when navigating between examples 2. Consolidated inline/config file options (12 pairs → 12 single options) - Merged separate "inline" and "w/ config file" options - Used "Or with config file:" pattern for clarity - Applied to Sections 3 (Scaling) and 4 (Accelerate Config) Result: 30 balanced options (6 trainers × 5 sections) with improved navigation
| trl grpo \ | ||
| --model_name_or_path Qwen/Qwen2.5-0.5B \ | ||
| --dataset_name trl-lib/ultrafeedback-prompt | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| <hfoption id="RLOO"> | ||
|
|
||
| ```bash | ||
| trl rloo \ | ||
| --model_name_or_path Qwen/Qwen2.5-0.5B \ | ||
| --dataset_name AI-MO/NuminaMath-TIR | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they both miss a reward function or model
| ```yaml | ||
| # grpo_config.yaml | ||
| model_name_or_path: Qwen/Qwen2.5-0.5B | ||
| dataset_name: trl-lib/ultrafeedback-prompt | ||
| ``` | ||
| Launch with: | ||
| ```bash | ||
| trl grpo --config grpo_config.yaml | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| <hfoption id="RLOO"> | ||
|
|
||
| ```yaml | ||
| # rloo_config.yaml | ||
| model_name_or_path: Qwen/Qwen2.5-0.5B | ||
| dataset_name: AI-MO/NuminaMath-TIR | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Resolves #4378
Summary
Extends the CLI documentation to include basic usage examples for GRPO, RLOO, and KTO, achieving parity with the existing SFT, DPO, and Reward examples.
Changes
trl-lib/ultrafeedback-promptdatasetAI-MO/NuminaMath-TIRdatasettrl-lib/kto-mix-14kdatasetAll three CLIs now have examples in every documentation section:
--accelerate_configVerification
trl/cli.py(lines 47-51)examples/scripts/hfoptionsblocks properly balanced)Testing
Documentation structure validated using Python script to verify: