-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4/7] Refactor preference dataset with transforms design #1276
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1276
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8fd001d with merge base 0531dcb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1276 +/- ##
==========================================
+ Coverage 68.57% 69.18% +0.61%
==========================================
Files 258 262 +4
Lines 11972 12129 +157
==========================================
+ Hits 8210 8392 +182
+ Misses 3762 3737 -25 ☔ View full report in Codecov by Sentry. |
I've included a couple examples of preference datasets below below, see here for a good list, too. Chat FormatInstruct-format*![]() *this is actually a mixed dataset but there's good examples of instruct-based prompts here. These datasets are nice examples of processed datasets which are ready for consumption in a standard format. This is one of the things I was unclear about - it seems like we currently have a prompt template for supporting each specific dataset type OOTB? Do you think this scales in a sensible way, or are we just being deliberate with the datasets we provide OOTB to ensure there's sufficient examples for users to specify their own? For example, with preference datasets with standard chat/instruct formats as above, would it be possible for the user specify a standard template we provide (such as ChatML?) rather than having to write one for their specific usecase? Continuing on from above RE the default datasets we provide, I would also prefer the default exemplar preference dataset to be chat/instruct style. Chat/instruct style preference datasets are the standard for open source preference/DPO-trained models; I think it's a far more common use case (the stack exchange paired dataset is pretty specific), and one of the motivators for this refactor is to unify chat+instruct preference datasets, right? I'd like to make sure there's at least one example of a chat/instruct dataset which allows users to replicate e.g. instruct models tuned with DPO in torchtune. Sorry to make you explain probably obvious things here - I'm trying to get the full picture of how an end-user would configure one of these datasets with the current example. Thank you so much for helping out with this, I'm really enjoying seeing your datasets refactor masterplan fall into place so neatly. edit: will review a bit closer once I have a better grasp of your vision here. |
To further motivate my points above, my grand plan for alignment in torchtune is for a user to roughly follow the fine-tuning steps used by e.g. Llama2 or InstructGPT. The dataset in the the PPO config I provide is the chat-style HHRLHF dataset above - I want to demonstrate an end-to-end journey for what I anticipate will be the most common use-case for someone using torchtune for alignment: from SFT, to reward modelling, and finally RLHF using PPO. If I were to write a tutorial for this, I would love to show users how to follow this process with comprehensible changes to our configs (in an ideal world, without modifying any code at all!!). |
@SalmanMohammadi Thanks for sharing these examples. I suppose I generalized a bit based off of stack exchange paired but as you mention it is rather niche, although already present in our library. I'm happy to add one or both of the datasets you recommended as builders, both of those seem to share the same |
Not exactly, the prompt template is not required. A lot of the exemplar datasets have it just to show plenty of different examples of how you would configure a custom dataset. Users can use any or no prompt template when making their own builders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments. Changes make sense to me. I did not build the docs. Have you had a chance to make sure it renders well? There is also a build doc error
wait for tests to pass and solve conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments but no major concerns, please make sure other folks' comments are addressed as well
torchtune/datasets/_preference.py
Outdated
sourced from Hugging Face Hub, local files, or remote files. This class requires | ||
the dataset to have "chosen" and "rejected" model responses. At a high level, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can consider giving an example raw data format here
Could we please add a builder for https://huggingface.co/datasets/RLHFlow/HH-RLHF-Helpful-standard?row=1*, which is a chat-style preference dataset? Let me know if you'd rather see this as a followup. |
|
||
|
||
def stack_exchange_paired_dataset( | ||
tokenizer: ModelTokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this tokenizer: ModelTokenizer
, but all other builders use model_transform: Transform
?
Context
Following the RFC in #1186, we will use the unified message_transform -> template -> tokenization data pipeline in all our datasets. This PR updates
PreferenceDataset
to follow this pipeline. We can't useSFTDataset
here because chosen and rejected messages are treated separately.Covers #1214
Changelog
PreferenceDataset
andstack_exchange_paired_dataset
ChosenRejectedToMessages
message transformStackExchangedPairedTemplate
->QuestionAnswerTemplate
Test plan