Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add margin to RM training #719

Merged
merged 11 commits into from
Sep 20, 2023

Conversation

jvhoffbauer
Copy link
Contributor

Fix #718

@jvhoffbauer jvhoffbauer changed the title Add margin to RM training [WIP ]Add margin to RM training Aug 31, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 31, 2023

The documentation is not available anymore as the PR was closed or merged.

@jvhoffbauer
Copy link
Contributor Author

jvhoffbauer commented Aug 31, 2023

@younesbelkada could you give me a quick hint if the interface is ok that way?

Especially, as I solely use the margin if it is provided, but implicitly assume it is zero, if it is not provided. Is that ok?

Also, the tests for (3.9, ubuntu-latest) seem to encounter a time out. Is that something I can fix or unrelated?

@jvhoffbauer jvhoffbauer changed the title [WIP ]Add margin to RM training [WIP] Add margin to RM training Aug 31, 2023
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvhoffbauer , the interface looks really great to me IMO as it seems to preserve previous behaviour
Can you give an example of how to use this interface to enable margin? Also would be nice to add few lines in the documentation about it ! What do you think?
Regarding the failing tests, don't worry the time out issues happen sometimes and it is not related to your PR

@jvhoffbauer
Copy link
Contributor Author

Awesome! I will look into this tomorrow/eow. Should I also add a test?

@younesbelkada
Copy link
Contributor

Great! If possible yes, a simple test that checks computing the loss with margin works as expected would be really great ! 🙏

@jvhoffbauer
Copy link
Contributor Author

Done. Let me know what you think!

One thing: I noticed that the rewards are actually tensors with shape [1, 2] even if I process just one sample. Is that correct?

batch = [[dummy_dataset[0]]]
batch = trainer.data_collator(batch)
loss = trainer.compute_loss(trainer.model, batch, return_outputs=True)

# Output: 
{
  'rewards_chosen': tensor([[ 0.2797, -0.1509]], grad_fn=<IndexBackward0>)
  'rewards_rejected': tensor([[0.0570, 0.0088]], grad_fn=<IndexBackward0>)
}

@jvhoffbauer jvhoffbauer changed the title [WIP] Add margin to RM training Add margin to RM training Sep 10, 2023
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this looks great ! I left one single comment with respect to a merge conflict you might forgot to fix, apart from that LGTM !

Comment on lines 24 to 31
<<<<<<< HEAD
After standardizing your dataset, you can use the `RewardTrainer` as a classic Hugging Face Trainer.
You should pass an `AutoModelForSequenceClassification` model to the `RewardTrainer`.
=======
After preparing your dataset, you can use the [`RewardTrainer`] in the same way as the `Trainer` class from 🤗 Transformers.
You should pass an `AutoModelForSequenceClassification` model to the [`RewardTrainer`], along with a [`RewardConfig`] which configures the hyperparameters of the training.

> > > > > > > origin/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm there is a merge conflict here that has not been properly dealt with, can you please have a look 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will fix this tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ready

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this @jvhoffbauer - it's going to be really interesting to see if we can use this on datasets like SHP :)

docs/source/reward_trainer.mdx Outdated Show resolved Hide resolved
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great thank you very much for your great contribution !

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating @jvhoffbauer - this LGTM 🔥 !

@younesbelkada younesbelkada merged commit 08cfc41 into huggingface:main Sep 20, 2023
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
* Start adding margin to RM training

* Fix typo and cleanup

* Fix incompatibilities when not using margin

* Format using 'make precommit'

* Add documentation and test for reward trainer

* Run 'make precommit'

* Update docs/source/reward_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Fix missed merge conflict in reward trainer docs

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add margin to reward trainer, similar to LLAMA-2
4 participants