Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add toxcitiy example #162
Add toxcitiy example #162
Changes from 25 commits
ad1a26f
fe3d50b
767b5fb
cf9204f
dcdc888
cc190c2
3c11002
0b1e684
fd7a4d5
0d8dcbc
8fef8d6
2bb0560
519050c
5b89554
c73c1f2
ba6f598
b5f8690
65dfe25
ba36a90
fd3367c
1aff6e8
fd30965
dd01369
452aa87
0aa67c7
ead3e22
d36293b
c8238df
69807bd
b028b98
c40c250
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different to mixed precision training? Since there you don't really save memory (usually the opposite) since you have the model in half precision and the weights additionally in full precision for the updates, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it is different from mixed precision, here I meant training the model in full
bfloat16
- maybe I can add few sentences to clarify thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a sentence clarifying that this then means that we only train the last 4 layers of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the other way around? We don't train the first 4 layers and train the rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update the model name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, updated the model name on the Hub and here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is a lot of uncertainty in those statements. I would maybe go for something like:
In addition to human feedback this could be a useful additional signal when training large language models to ensure there outputs are less toxic as well as useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! Proposed something below