[DPO] IPO Training loss #1022

kashif · 2023-11-22T13:12:07Z

Implemented the IPO training loss from the paper:
A General Theoretical Paradigm to Understand Learning from Human Preferences
https://arxiv.org/pdf/2310.12036.pdf

HuggingFaceDocBuilderDev · 2023-11-22T13:16:06Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

Thanks for this great addition @kashif !

lewtun

Thanks for adding this so quickly @kashif, it's very elegant that you can now just swap out the losses for each method!

I left a nit about where to mention beta, but otherwise this LGTM

trl/trainer/dpo_trainer.py

lvwerra

Generally looks good to me, left one comment

trl/trainer/dpo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

lvwerra

LGTM 🚀

* initial IPO loss * fix loss * fixed comments * added docs * fix doc-strings * add tests * Update trl/trainer/dpo_trainer.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * fixes for review * Added doc about beta in the Trainer's docstring --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

initial IPO loss

2899163

kashif added 5 commits November 22, 2023 14:25

fix loss

0dc4d27

fixed comments

52befc3

added docs

195d1f6

fix doc-strings

ed4147c

add tests

9dacd07

younesbelkada approved these changes Nov 23, 2023

View reviewed changes

lewtun approved these changes Nov 23, 2023

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/dpo_trainer.py Show resolved Hide resolved

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/dpo_trainer.py Show resolved Hide resolved

lvwerra reviewed Nov 23, 2023

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

kashif and others added 3 commits November 23, 2023 12:30

Update trl/trainer/dpo_trainer.py

1caa27f

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

fixes for review

8410b2d

Added doc about beta in the Trainer's docstring

de476f0

lvwerra approved these changes Nov 24, 2023

View reviewed changes

lvwerra merged commit 55d7c95 into huggingface:main Nov 24, 2023
9 checks passed

kashif deleted the ipo branch November 24, 2023 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] IPO Training loss #1022

[DPO] IPO Training loss #1022

kashif commented Nov 22, 2023

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading

younesbelkada left a comment

lewtun left a comment

lvwerra left a comment

lvwerra left a comment

[DPO] IPO Training loss #1022

[DPO] IPO Training loss #1022

Conversation

kashif commented Nov 22, 2023

HuggingFaceDocBuilderDev commented Nov 22, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading