-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPO] IPO Training loss #1022
[DPO] IPO Training loss #1022
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great addition @kashif !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this so quickly @kashif, it's very elegant that you can now just swap out the losses for each method!
I left a nit about where to mention beta
, but otherwise this LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, left one comment
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
* initial IPO loss * fix loss * fixed comments * added docs * fix doc-strings * add tests * Update trl/trainer/dpo_trainer.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * fixes for review * Added doc about beta in the Trainer's docstring --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Implemented the IPO training loss from the paper:
A General Theoretical Paradigm to Understand Learning from Human Preferences
https://arxiv.org/pdf/2310.12036.pdf