[Feature, NOMERGE] RLHF Rollouts (reopened) #1329

vmoens · 2023-06-28T09:16:34Z

See #1315 for discussion

vmoens · 2023-06-28T09:26:07Z

@tcbegley I think we should merge/rebase main into this

tcbegley · 2023-06-28T11:15:09Z

I've rebased and fixed up the tests. There's still the changes from #1328 visible in the history here because I need the reward model for the rollout. Once that's merged (I think it's ready), I'll do a final merge / rebase and we should be ready here too.

vmoens

I can't approve it bc it's my PR but LGTM anyway!

vmoens · 2023-06-28T17:53:52Z

torchrl/data/rlhf/utils.py

+    EOS_TOKEN_ID = 50256
+
+    def __init__(
+        self, model, ref_model, reward_model, max_new_tokens=50, score_clip=10.0


long term we may want to make this reward model optional and leave to the user the option of computing reward at a different time (eg when populating the replay buffer)

Co-authored-by: Alessandro Pietro Bardelli <apbard@users.noreply.github.com>

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

apbard

LGTM! thanks

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2023

vmoens changed the title ~~[Feature, NOMERGE] RLHF Rollouts~~ [Feature, NOMERGE] RLHF Rollouts (reopened) Jun 28, 2023

vmoens added the enhancement New feature or request label Jun 28, 2023

tcbegley force-pushed the rlhf-rollout branch 2 times, most recently from 897649b to ebed304 Compare June 28, 2023 11:08

vmoens commented Jun 28, 2023

View reviewed changes

vmoens force-pushed the rlhf-rollout branch from 810ed9a to 31b85fb Compare June 28, 2023 18:00

tcbegley and others added 11 commits June 29, 2023 10:47

Add RolloutFromModel class

b6fecbb

Add rollout tests

bd8fbb6

Apply suggestions from code review

6fbb603

Co-authored-by: Alessandro Pietro Bardelli <apbard@users.noreply.github.com>

Address comments

3e80a55

Docstring lint

385ac90

Apply suggestions from code review

8d0a152

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Address comments

fcddc97

Fix tests

5c7c72e

Handle missing transformers import

92d5757

Import transformers locally

eec0eaf

lint

87501ea

tcbegley force-pushed the rlhf-rollout branch from 9bfed41 to 87501ea Compare June 29, 2023 09:54

vmoens added 2 commits July 3, 2023 10:43

Merge branch 'main' into rlhf-rollout

9851259

Add doc

bc05960

apbard approved these changes Jul 3, 2023

View reviewed changes

vmoens merged commit 8700e15 into pytorch:main Jul 3, 2023

vmoens deleted the rlhf-rollout branch July 3, 2023 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature, NOMERGE] RLHF Rollouts (reopened) #1329

[Feature, NOMERGE] RLHF Rollouts (reopened) #1329

vmoens commented Jun 28, 2023 •

edited

Loading

vmoens commented Jun 28, 2023

tcbegley commented Jun 28, 2023

vmoens left a comment

vmoens Jun 28, 2023

apbard left a comment

[Feature, NOMERGE] RLHF Rollouts (reopened) #1329

[Feature, NOMERGE] RLHF Rollouts (reopened) #1329

Conversation

vmoens commented Jun 28, 2023 • edited Loading

vmoens commented Jun 28, 2023

tcbegley commented Jun 28, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens Jun 28, 2023

Choose a reason for hiding this comment

apbard left a comment

Choose a reason for hiding this comment

vmoens commented Jun 28, 2023 •

edited

Loading