-
Notifications
You must be signed in to change notification settings - Fork 2.3k
🗂 Update paper_index section #3937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🗂 Update paper_index section #3937
Conversation
455f543 to
b9859c6
Compare
|
Can you also mention that we don't support dynamic sampling |
- Added DAPO (An Open-Source LLM Reinforcement Learning System at Scale) section - Includes proper paper reference and implementation details - Added training configuration parameters from DAPO paper section 4.1
b9859c6 to
7c4665a
Compare
Done. I also added |
- Added Dr. GRPO configuration example with training parameters - Includes paper reference and implementation details from training section - Added parameters: loss_type, batch_size, num_generations, prompt/completion lengths, beta
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
qgallouedec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's looking gooooood
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: behroozazarkhalili <ermiaazarkhalili> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Summary
This PR updates the paper index documentation to include additional research papers and their corresponding implementation configurations within the TRL package.
Changes
This update expands the paper index to provide better coverage of research implementations available in TRL, making it easier for users to find and use relevant configurations for their research and experiments.