Optionally logging reference response #847

vwxyzjn · 2023-10-09T13:34:48Z

This PR allows us to log the reference response in wandb as well, which can help us make a direct comparison of the data.

https://wandb.ai/costa-huang/trl/runs/0m8ylgjy

tested it to work with PEFT as well (needed to refactor the logic a bit) see https://wandb.ai/costa-huang/trl/runs/10i8zl27

HuggingFaceDocBuilderDev · 2023-10-09T13:44:47Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Generally like the idea of adding reference responses. Since this touches a lot of PEFT logic I'll let @younesbelkada make sure it doesn't break anything :)

trl/trainer/ppo_trainer.py

younesbelkada

Hi @vwxyzjn thanks a lot for iterating! I really like the optional_peft_ctx context manager!
I left a suggestion to use self.accelerator.unwrap_model once and make use of a local variable to pick up the correct ref model. What do you think?

trl/trainer/ppo_trainer.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

younesbelkada

Very clean! Thanks very much Costa for your work on this! 🙏

* Optionally logging reference response * log ref rewards as welll * peft logic re-write * fix peft test case * refactor * push changes * test * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * quick fix * black --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

vwxyzjn added 2 commits October 9, 2023 09:34

Optionally logging reference response

8a05483

log ref rewards as welll

57c2b5f

vwxyzjn marked this pull request as ready for review October 9, 2023 13:40

vwxyzjn requested review from lvwerra and younesbelkada October 9, 2023 13:40

vwxyzjn added 2 commits October 9, 2023 09:57

peft logic re-write

976110a

fix peft test case

847aa60

edbeeching mentioned this pull request Oct 10, 2023

Fixes reward and text gathering in distributed training #850

Merged

lvwerra reviewed Oct 10, 2023

View reviewed changes

Merge branch 'main' into logging-ref-response

d79f486

younesbelkada reviewed Oct 30, 2023

View reviewed changes

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved

vwxyzjn added 3 commits October 30, 2023 15:08

refactor

9e33709

push changes

09a448a

test

361b73d

younesbelkada reviewed Oct 31, 2023

View reviewed changes

vwxyzjn and others added 3 commits October 31, 2023 11:00

Apply suggestions from code review

7c0d8e6

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

quick fix

843be71

black

8eebb59

vwxyzjn requested a review from younesbelkada October 31, 2023 20:20

younesbelkada approved these changes Oct 31, 2023

View reviewed changes

vwxyzjn merged commit 5b32372 into huggingface:main Oct 31, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally logging reference response #847

Optionally logging reference response #847

vwxyzjn commented Oct 9, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 9, 2023 •

edited

Loading

lvwerra left a comment

younesbelkada left a comment

younesbelkada left a comment

Optionally logging reference response #847

Optionally logging reference response #847

Conversation

vwxyzjn commented Oct 9, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Oct 9, 2023 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

vwxyzjn commented Oct 9, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 9, 2023 •

edited

Loading