Add margin to RM training #719

jvhoffbauer · 2023-08-31T13:15:49Z

HuggingFaceDocBuilderDev · 2023-08-31T13:21:19Z

The documentation is not available anymore as the PR was closed or merged.

jvhoffbauer · 2023-08-31T16:38:54Z

@younesbelkada could you give me a quick hint if the interface is ok that way?

Especially, as I solely use the margin if it is provided, but implicitly assume it is zero, if it is not provided. Is that ok?

Also, the tests for (3.9, ubuntu-latest) seem to encounter a time out. Is that something I can fix or unrelated?

younesbelkada

@jvhoffbauer , the interface looks really great to me IMO as it seems to preserve previous behaviour
Can you give an example of how to use this interface to enable margin? Also would be nice to add few lines in the documentation about it ! What do you think?
Regarding the failing tests, don't worry the time out issues happen sometimes and it is not related to your PR

jvhoffbauer · 2023-09-05T10:09:47Z

Awesome! I will look into this tomorrow/eow. Should I also add a test?

younesbelkada · 2023-09-06T08:37:10Z

Great! If possible yes, a simple test that checks computing the loss with margin works as expected would be really great ! 🙏

jvhoffbauer · 2023-09-09T15:11:03Z

Done. Let me know what you think!

One thing: I noticed that the rewards are actually tensors with shape [1, 2] even if I process just one sample. Is that correct?

batch = [[dummy_dataset[0]]]
batch = trainer.data_collator(batch)
loss = trainer.compute_loss(trainer.model, batch, return_outputs=True)

# Output: 
{
  'rewards_chosen': tensor([[ 0.2797, -0.1509]], grad_fn=<IndexBackward0>)
  'rewards_rejected': tensor([[0.0570, 0.0088]], grad_fn=<IndexBackward0>)
}

younesbelkada

Thanks a lot, this looks great ! I left one single comment with respect to a merge conflict you might forgot to fix, apart from that LGTM !

younesbelkada · 2023-09-14T08:36:23Z

docs/source/reward_trainer.mdx

+<<<<<<< HEAD
+After standardizing your dataset, you can use the `RewardTrainer` as a classic Hugging Face Trainer.
+You should pass an `AutoModelForSequenceClassification` model to the `RewardTrainer`.
+=======
+After preparing your dataset, you can use the [`RewardTrainer`] in the same way as the `Trainer` class from 🤗 Transformers.
 You should pass an `AutoModelForSequenceClassification` model to the [`RewardTrainer`], along with a [`RewardConfig`] which configures the hyperparameters of the training.

+> > > > > > > origin/main


Hmm there is a merge conflict here that has not been properly dealt with, can you please have a look 🙏

Thanks! I will fix this tomorrow.

Should be ready

lewtun

Thanks a lot for adding this @jvhoffbauer - it's going to be really interesting to see if we can use this on datasets like SHP :)

docs/source/reward_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

…into rm_training_with_margin

younesbelkada

Looking great thank you very much for your great contribution !

lewtun

Thanks a lot for iterating @jvhoffbauer - this LGTM 🔥 !

* Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Start adding margin to RM training

fa077fd

jvhoffbauer changed the title ~~Add margin to RM training~~ [WIP ]Add margin to RM training Aug 31, 2023

jvhoffbauer added 3 commits August 31, 2023 16:06

Fix typo and cleanup

f8b54c8

Fix incompatibilities when not using margin

1626e61

Format using 'make precommit'

77ab6a7

jvhoffbauer changed the title ~~[WIP ]Add margin to RM training~~ [WIP] Add margin to RM training Aug 31, 2023

younesbelkada reviewed Sep 1, 2023

View reviewed changes

jvhoffbauer added 2 commits September 9, 2023 14:58

Add documentation and test for reward trainer

1d30e28

Merge remote-tracking branch 'origin/main' into rm_training_with_margin

7005d3f

Run 'make precommit'

7d103a6

jvhoffbauer changed the title ~~[WIP] Add margin to RM training~~ Add margin to RM training Sep 10, 2023

younesbelkada reviewed Sep 14, 2023

View reviewed changes

lewtun reviewed Sep 18, 2023

View reviewed changes

docs/source/reward_trainer.mdx Outdated Show resolved Hide resolved

jvhoffbauer and others added 4 commits September 19, 2023 21:43

Merge branch 'main' into rm_training_with_margin

05a647b

Update docs/source/reward_trainer.mdx

167a3b1

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Fix missed merge conflict in reward trainer docs

4ddb321

Merge branch 'rm_training_with_margin' of github.com:jvhoffbauer/trl …

9c5a2a0

…into rm_training_with_margin

younesbelkada approved these changes Sep 20, 2023

View reviewed changes

lewtun approved these changes Sep 20, 2023

View reviewed changes

younesbelkada merged commit 08cfc41 into huggingface:main Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add margin to RM training #719

Add margin to RM training #719

jvhoffbauer commented Aug 31, 2023

HuggingFaceDocBuilderDev commented Aug 31, 2023 •

edited

Loading

jvhoffbauer commented Aug 31, 2023 •

edited

Loading

younesbelkada left a comment

jvhoffbauer commented Sep 5, 2023

younesbelkada commented Sep 6, 2023

jvhoffbauer commented Sep 9, 2023

younesbelkada left a comment

younesbelkada Sep 14, 2023

jvhoffbauer Sep 18, 2023

jvhoffbauer Sep 19, 2023

lewtun left a comment

younesbelkada left a comment

lewtun left a comment

Add margin to RM training #719

Add margin to RM training #719

Conversation

jvhoffbauer commented Aug 31, 2023

HuggingFaceDocBuilderDev commented Aug 31, 2023 • edited Loading

jvhoffbauer commented Aug 31, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

jvhoffbauer commented Sep 5, 2023

younesbelkada commented Sep 6, 2023

jvhoffbauer commented Sep 9, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Sep 14, 2023

Choose a reason for hiding this comment

jvhoffbauer Sep 18, 2023

Choose a reason for hiding this comment

jvhoffbauer Sep 19, 2023

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

lewtun left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 31, 2023 •

edited

Loading

jvhoffbauer commented Aug 31, 2023 •

edited

Loading