-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add initial version of docs for PPOTrainer
#665
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the docs contribution! The preview is also working now :) The PR looks in pretty good shape to me! I added some small suggestions here and there. I'll also let @vwxyzjn and @younesbelkada have a look.
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
- specified reference to reward model - added batched generator - added line of saving model - remove reference model
@lvwerra I already processed your comments and suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, some last small nits only!
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool ! Thanks a lot for your great effort on this!
* docs: add initial version of docs for `PPOTrainer` * Apply suggestions from code review Leandro Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * updated docs based on feedback leandro - specified reference to reward model - added batched generator - added line of saving model - remove reference model * Apply suggestions from code review Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* docs: add initial version of docs for `PPOTrainer` * Apply suggestions from code review Leandro Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * updated docs based on feedback leandro - specified reference to reward model - added batched generator - added line of saving model - remove reference model * Apply suggestions from code review Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
As discussed in #623, I am proposing more elaborate docs for the
PPOTrainer
.Closes #623