-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add margin to RM training #719
Add margin to RM training #719
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@younesbelkada could you give me a quick hint if the interface is ok that way? Especially, as I solely use the margin if it is provided, but implicitly assume it is zero, if it is not provided. Is that ok? Also, the tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvhoffbauer , the interface looks really great to me IMO as it seems to preserve previous behaviour
Can you give an example of how to use this interface to enable margin? Also would be nice to add few lines in the documentation about it ! What do you think?
Regarding the failing tests, don't worry the time out issues happen sometimes and it is not related to your PR
Awesome! I will look into this tomorrow/eow. Should I also add a test? |
Great! If possible yes, a simple test that checks computing the loss with margin works as expected would be really great ! 🙏 |
Done. Let me know what you think! One thing: I noticed that the rewards are actually tensors with shape [1, 2] even if I process just one sample. Is that correct?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this looks great ! I left one single comment with respect to a merge conflict you might forgot to fix, apart from that LGTM !
docs/source/reward_trainer.mdx
Outdated
<<<<<<< HEAD | ||
After standardizing your dataset, you can use the `RewardTrainer` as a classic Hugging Face Trainer. | ||
You should pass an `AutoModelForSequenceClassification` model to the `RewardTrainer`. | ||
======= | ||
After preparing your dataset, you can use the [`RewardTrainer`] in the same way as the `Trainer` class from 🤗 Transformers. | ||
You should pass an `AutoModelForSequenceClassification` model to the [`RewardTrainer`], along with a [`RewardConfig`] which configures the hyperparameters of the training. | ||
|
||
> > > > > > > origin/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm there is a merge conflict here that has not been properly dealt with, can you please have a look 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will fix this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this @jvhoffbauer - it's going to be really interesting to see if we can use this on datasets like SHP :)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
…into rm_training_with_margin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great thank you very much for your great contribution !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for iterating @jvhoffbauer - this LGTM 🔥 !
* Start adding margin to RM training * Fix typo and cleanup * Fix incompatibilities when not using margin * Format using 'make precommit' * Add documentation and test for reward trainer * Run 'make precommit' * Update docs/source/reward_trainer.mdx Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Fix missed merge conflict in reward trainer docs --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Fix #718