feat: anchored pref optimization #1928

karel-contextual · 2024-08-13T23:00:50Z

Add APO objectives, specifically equation 7 and 8 of the APO paper (https://huggingface.co/papers/2408.06266)

qgallouedec · 2024-08-14T11:50:02Z

Nice, thanks, I'll take the opportunity to update the documentations for the losses we already support (in another PR that I'd like to merge first)

HuggingFaceDocBuilderDev · 2024-08-14T11:53:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2024-08-14T14:40:08Z

#1929

kashif · 2024-08-14T14:44:10Z

@karel-contextual can you kindly run:

pre-commit run --all-files

in the root dir of TRL to fix up the formatting

trl/trainer/dpo_trainer.py

…to apo

qgallouedec · 2024-08-14T15:37:17Z

LGTM now, thanks @karel-contextual!

karel-contextual and others added 2 commits August 13, 2024 22:27

feat: anchored pref optimization

d47d1f2

Merge branch 'main' into main

edc0fc7

Merge branch 'main' into main

8e52225

kashif approved these changes Aug 14, 2024

View reviewed changes

Merge branch 'main' into main

0105ef3

kashif reviewed Aug 14, 2024

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

kashif and others added 8 commits August 14, 2024 16:58

Update trl/trainer/dpo_trainer.py

e125e7c

format and properly deprecate loss_type

314db55

Merge branch 'main' of https://github.com/karel-contextual/trl-apo in…

120f497

…to apo

add aot in error message and reorder

5a3c73e

add "sppo_hard", "nca_pair" in label_smoothing warning warning

fac99bb

add tests

9c6f4da

doc

023c425

doc fixes

c6ff3d8

qgallouedec merged commit a7dc892 into huggingface:main Aug 14, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: anchored pref optimization #1928

feat: anchored pref optimization #1928

karel-contextual commented Aug 13, 2024

qgallouedec commented Aug 14, 2024

HuggingFaceDocBuilderDev commented Aug 14, 2024

qgallouedec commented Aug 14, 2024

kashif commented Aug 14, 2024

qgallouedec commented Aug 14, 2024

feat: anchored pref optimization #1928

feat: anchored pref optimization #1928

Conversation

karel-contextual commented Aug 13, 2024

qgallouedec commented Aug 14, 2024

HuggingFaceDocBuilderDev commented Aug 14, 2024

qgallouedec commented Aug 14, 2024

kashif commented Aug 14, 2024

qgallouedec commented Aug 14, 2024