-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🃏 Model card for TRL #2123
🃏 Model card for TRL #2123
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work on the model cards @qgallouedec - now they're really packed with useful information 🔥 !!
LGTM with a tweak to the example inference code
@@ -133,6 +133,6 @@ def tokenize(element): | |||
# Save and push to hub | |||
trainer.save_model(training_args.output_dir) | |||
if training_args.push_to_hub: | |||
trainer.push_to_hub() | |||
trainer.push_to_hub(dataset_name="trl-internal-testing/descriptiveness-sentiment-trl-style") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: we should move these datasets that aren't strictly used for tests to trl-lib
Note that the code demo for diffusion models and VLM will be wrong but we can probably keep it like that for now |
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
When will this commit be released ? Will it be a part of v0.12 ? |
What does this PR do?
Having our own model card.
Demo
result: https://huggingface.co/qgallouedec/dpo-qwen2
It adds
Link to the paper
Link to the dataset
TRL own model card
Other
TODO
AlignPropTrainer
BCOTrainer
CPOTrainer
DPOTrainer
GKDTrainer
IterativeSFTTrainer
KTOTrainer
NashMDTrainer
OnlineDPOTrainer
ORPOTrainer
PPOv2Trainer
RewardTrainer
RLOOTrainer
SFTTrainer
XPOTrainer
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.