-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add toxcitiy example #162
Add toxcitiy example #162
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @younesbelkada did a first pass over the text and looks good. Left a few comments and suggestions.
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.bfloat16) | ||
``` | ||
|
||
- Use shared layers: Since PPO algorithm requires to have both the active and reference model to be on the same device, we have decided to use shared layers to reduce the memory footprint of the model. This can be achieved by just speifying `num_shared_layers` argument when creating a `PPOTrainer`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a sentence clarifying that this then means that we only train the last 4 layers of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the other way around? We don't train the first 4 layers and train the rest
docs/source/detoxifying_a_lm.mdx
Outdated
We also think we could have trained the models on a "more toxic" dataset as the one we used is much cleaner than the dataset we used for testing our models (from our observation). | ||
A hypothesis we made is that larger models tends to be more toxic. Therefore, one could have also played with the KL-penalty term, to allow the model to deviate a bit more from its original distribution. We also believe that fine-tuning a model that is known to be toxic (i.e. trained on a toxic dataset) could also lead to better results. | ||
We have also observed that training the model with larger context helps getting better results for larger models. Therefore one could have also played with this factor for the larger model and produce a better model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is a lot of uncertainty in those statements. I would maybe go for something like:
In addition to human feedback this could be a useful additional signal when training large language models to ensure there outputs are less toxic as well as useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! Proposed something below
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
…da/trl into toxicity-example-new
…da/trl into toxicity-example-new
Thanks a mile for the extensive review @lvwerra ! Should have addressed them now and left few minor questions :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, then we can merge :)
docs/source/detoxifying_a_lm.mdx
Outdated
<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl-collapse-mode.png"> | ||
</div> | ||
|
||
The final training run of `ybelkada/gpt-j-6b-detoxified-1000-20shdl` looks like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update the model name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, updated the model name on the Hub and here as well
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
…da/trl into toxicity-example-new
What does this PR do?
This PR adds the toxicity example and replaces #142
TODO:
cc @lvwerra