Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toxcitiy example #162

Merged
merged 31 commits into from
Feb 28, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Feb 17, 2023

What does this PR do?

This PR adds the toxicity example and replaces #142

TODO:

  • update docs with tips

cc @lvwerra

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 17, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @younesbelkada did a first pass over the text and looks good. Left a few comments and suggestions.

trl/trainer/ppo_config.py Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.bfloat16)
```

- Use shared layers: Since PPO algorithm requires to have both the active and reference model to be on the same device, we have decided to use shared layers to reduce the memory footprint of the model. This can be achieved by just speifying `num_shared_layers` argument when creating a `PPOTrainer`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a sentence clarifying that this then means that we only train the last 4 layers of the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the other way around? We don't train the first 4 layers and train the rest

docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
Comment on lines 164 to 166
We also think we could have trained the models on a "more toxic" dataset as the one we used is much cleaner than the dataset we used for testing our models (from our observation).
A hypothesis we made is that larger models tends to be more toxic. Therefore, one could have also played with the KL-penalty term, to allow the model to deviate a bit more from its original distribution. We also believe that fine-tuning a model that is known to be toxic (i.e. trained on a toxic dataset) could also lead to better results.
We have also observed that training the model with larger context helps getting better results for larger models. Therefore one could have also played with this factor for the larger model and produce a better model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there is a lot of uncertainty in those statements. I would maybe go for something like:

In addition to human feedback this could be a useful additional signal when training large language models to ensure there outputs are less toxic as well as useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Proposed something below

@younesbelkada
Copy link
Contributor Author

Thanks a mile for the extensive review @lvwerra ! Should have addressed them now and left few minor questions :)

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, then we can merge :)

docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl-collapse-mode.png">
</div>

The final training run of `ybelkada/gpt-j-6b-detoxified-1000-20shdl` looks like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to update the model name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, updated the model name on the Hub and here as well

docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
docs/source/detoxifying_a_lm.mdx Outdated Show resolved Hide resolved
@younesbelkada younesbelkada merged commit 8ec80d5 into huggingface:main Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants