Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Improve docs #91

Merged
merged 15 commits into from
Jan 18, 2023
Merged

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Jan 17, 2023

Add the following on the documentation

API

  • Model classes (AutoModelForCausalLMWithValueHead & PreTrainedModelWrapper)
  • Trainer (PPOTrainer & PPOConfig)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 17, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada mentioned this pull request Jan 17, 2023
26 tasks
@younesbelkada younesbelkada changed the title [Draft] Improve docs [Doc] Improve docs Jan 17, 2023
@younesbelkada younesbelkada marked this pull request as ready for review January 17, 2023 16:32
@younesbelkada younesbelkada requested a review from lvwerra January 17, 2023 16:36
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few small comments. Also fortrain_minibatch: we could make the docstring a bit better: "Train the model for PPO mini-batch."

@@ -0,0 +1,12 @@
# Trainer

At TRL we plan to release several RLHF algorithms, we started our journey with PPO (Proximal Policy Optimisation) with an implementation that largely follows the structure introduced in the paper "Fine-Tuning Language Models from Human Preferences" by D. Ziegler et al. [[paper](https://arxiv.org/pdf/1909.08593.pdf), [code](https://github.com/openai/lm-human-preferences)].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since adding new algorithms is not on the roadmap at the moment maybe let's just focus on PPO :)

We could also add a sentence or two about the classes. E.g. that they are inspired/influence by the transformers.Trainer and are adapted to RL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Adapted the text in da456cf

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
docs/source/models.mdx Outdated Show resolved Hide resolved
@younesbelkada younesbelkada requested a review from lvwerra January 18, 2023 14:33
@younesbelkada younesbelkada merged commit 77273d1 into huggingface:main Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants