-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap - trl
0.2
#64
Comments
BTW, I can confirm that SetFit does make for a really good zero shot RM. There are some issues with using contrastive models as RMs though. It often requires very careful data cleaning and identifying what kinds of clusters work as RMs is a dark art to the point where we decided that it wasn't worth seriously pursing further after CARP CoOp. Rerank models are much better. |
I think that the "coolest" dataset we can use to train a model could be https://huggingface.co/datasets/openai/webgpt_comparisons, but it is hard to evaluate this sort of model after we train it. I might start by adding a summarization example, and then some decent ways by which it can be evaluated. Then the webgpt comparisons example |
https://colab.research.google.com/drive/1hkPBFtMP5xBAjNYMjWH7NqYn118kRLOJ?usp=sharing |
@AlexWortega can you open a separate issue / PR for this? Looks interesting, but may get loss in this big 1.0 roadmap thread. |
We ended up calling this release |
A list of cool things that we can aim for
trl
0.2! :API:
xxxForCausalLM
support #53accelerate
integration for training in mixed precision, DP (multi-GPU), using DeepSpeed:accelerate
integration #58PPOTrainer
] make the reference model optional #67wandb
) | [core] removewandb
dependency #92PPOTrainer
] Support generic optimizers #78step
(sanity checks) | [core] refactorstep
method #76dataset
attribute should be optional ? | [API] Makedataset
attribute optional #85v_head
when usingAutoModelForCausalLMWithValueHead
#86Documentation
Improvements
05
Convert notebook 05 #80from trl.trainer
->from .trainer
) Improvements 1a #70setup.py
and removesettings.ini
(legacy fromnbdev
) Improvements 1a #70Any suggestion very welcome!
The text was updated successfully, but these errors were encountered: