-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][DOCS] Recipe [DOCS] ([DOC]umentation) #1230
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1230
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3d06179 with merge base 3653c4a (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first draft is looking great, and right in line with what I think we would want to expose the recipes.
Yes, if this is something @felipemello1 is planning to do. But we can leave as a follow-up |
@RdoubleA think I've addressed everything. Thanks again for such a thorough review |
torchtune comes with a host of plug-and-play memory optimization components which give you lots of flexibility | ||
to ``tune`` our recipes to your hardware. This page provides a brief glossary of these components and how you might use them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting at the top after having read this whole tutorial: I think this is a great start and a sorely-needed tutorial. Just a couple things to think about to give it a more unified feel (doesn't have to be part of this PR, this is more aspirational).
Right now this leans more towards an index of techniques with some tradeoff discussions, I'd like to see it become more of a cookbook of what to apply and when. To that end, we could shoot for some kind of table enumerating (a) various memory-savings techniques and their impact, (b) implications for performance and/or model quality, and (c) guidance on which ones to try or combine under different regimes. Anyways this is a longer-term thing so no need to worry about it for this PR, would definitely be interested to hear your thoughts on whether something like that makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes! Thank you so much for doing them. I am sure that the users will be thrilled :D
Overall comments can probably be summarize as two:
- IMO, we can probably make some content more focused (e.g. less links);
- There is some room to share trade-offs in the memory related parameters and suggesting reasonable defaults (I really liked this section!);
@@ -0,0 +1,60 @@ | |||
.. _lora_finetune_recipe_label: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two unfiltered cents: I am not sure how much I like all the links. I understand the intention, but in general, a good rule of thumb for me is "less is more". Also, as an engineer, i feel that most of us like when things go straight to the point, e.g. "show me code" or "a picture is worth a thousand words". However, when I try to think: "ok, how would i rewrite it?", it becomes a bit hard for me articulate something intelligent. So, if others are comfortable with it, its fine with me. But if its a shared feeling, maybe we could revisit it.
The links at the bottom are interesting though, as a type of "keep reading".
TLDR: Maybe increase ratio of LoRA-information / information. If most of the information are links or notes, then it may be too much noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I 100% agree here - IMO these docs aren't necessarily aimed towards engineers who are used to quickly reading condensed information, but to maximise discovery of what we offer in torchtune - this was my primary motivation for writing these.
I will comb through the links and make sure they're relevant/necessary though : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh I do get @felipemello1's sentiment on this one. Esp because we immediately lead with 5 links. While many of them may be useful, I think we should instead lead with an example or something. Otherwise as a reader who just wants to understand how this recipe works/what it does I am immediately overwhelmed with pointers to literally half our live docs, making it hard to tease out what the actual relevant information is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, sorry Felipe I don't think I fully grasped your original point : ) I'll address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to try remove the amount of noise and provide concrete examples, and leave additional information at the bottom.
* :ref:`glossary_lora`. | ||
* :ref:`glossary_qlora`. | ||
|
||
As with all of our recipes, you can also: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is redundant, and pointing the user to "<config_tutorial_label>" is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe this file has been changed - what was this referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on line 43, we say "Check out our :ref:configs tutorial <config_tutorial_label>
to learn how to customize recipes to suit your needs.""
But then from line 51 to 57 we talk about config options not related to lora. I was thinking that maybe we dont need to add 4 links and 6 lines if this is already in the config_tutorial_label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the config tutorial includes these things. This section is more to make it abundantly clear that you can still other memory optimization features alongside LoRA. It's a generic section you can include in most of the recipe docs, so someone doesn't need to figure out which components they can use with which recipes
LoRA Single Device Finetuning | ||
============================= | ||
|
||
This recipe supports finetuning on next-token prediction tasks using `LoRA <https://arxiv.org/abs/2106.09685>`_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: Maybe for every recipe, we could add at the very start:
Use this recipe if:
- A
- B
- C
For example:
Use this recipe if you:
- Only have access to one GPU;
- Want a small checkpoint;
- Don't have much GPU memory available;
- Is ok to possibly compromise a bit of accuracy in exchange of the above;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea!
I wonder if it might be better to include in the recipe overviews? Almost like a table of different compute desiderata and which recipes fulfill them.
.. _recipes_overview_label: | ||
|
||
================ | ||
Recipes Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could delete it and this could be the first part of the recipe_deepdive? Or if its just supposed to be an index, maybe it can just be a list and leave the information for the recipe pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like having a little context so this can be a kind-of standalone document for a reader. Second opinions? @joecummings @RdoubleA @ebsmothers @pbontrager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I like it. Without it there we just jump right into individual recipe pages and it's not really clear what their purpose is; I feel like this page provides useful framing for the entire section.
Okay, okay, I'll add a table. I saw this https://pytorch.org/docs/stable/torch.compiler_fine_grain_apis.html and I liked it. |
Think I've addressed all the comments. Added a table at the top of the tutorial. Have DPO/PPO recipes docs in the oven too. |
@@ -8,14 +8,29 @@ Memory Optimization Overview | |||
|
|||
torchtune comes with a host of plug-and-play memory optimization components which give you lots of flexibility | |||
to ``tune`` our recipes to your hardware. This page provides a brief glossary of these components and how you might use them. | |||
To make things easy, we've summarized these components in the following table: | |||
|
|||
.. csv-table:: Memory optimization components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the table, but i think it misses the trade-offs/numbers and for the most part repeats: "Use when memory constrained". Maybe an index would be fine?
When we have the table with all the % value of VRAM/tps, i think we could put it back here. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to callout a bit more the compromises each component requires on training speed, does that help? I'd like to make this useful and help users who don't want to read the whole tutorial, so very open to suggestions.
thanks for making the changes. I think that they addressed most if not all of my concerns. |
tysm for reviewing : ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in and iterate as needed.
It's a 100% improvement from having zero information on our recipes. Thanks for this Herculean effort @SalmanMohammadi!
What is the purpose of this PR? Is it to
Let's map out a user journey here:
Right now, don't have a clear way to communicate to our users which recipes we support, and how to quickly configure them. Documentation for recipes is:
I want to maximise the surface area of the features we expose in torchtune. Our design philosophy keeps things flat and modular, users can swap out components, models, and datasets freely. I wish for the ML PhD and the l33t gamer to be able to discover what we offer, and how to use it, with equal ease.
My contribution addresses this in the following ways:
I provide two examples in this PR; documentation for LoRA single device, and for QAT distributed. I'd like to put an issue up for documenting additional recipes so other people may help out here - it'll be a good first issue for many contributors..
There's also a couple things missing from the memory glossary; FSDP/FSDP2 (and maybe something else?), I'll also put issues up for this.
Test plan
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣤⣤⣤⣤⣴⣤⣤⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⣀⣴⣾⠿⠛⠋⠉⠁⠀⠀⠀⠈⠙⠻⢷⣦⡀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⣤⣾⡿⠋⠁⠀⣠⣶⣿⡿⢿⣷⣦⡀⠀⠀⠀⠙⠿⣦⣀⠀⠀⠀⠀
⠀⠀⢀⣴⣿⡿⠋⠀⠀⢀⣼⣿⣿⣿⣶⣿⣾⣽⣿⡆⠀⠀⠀⠀⢻⣿⣷⣶⣄⠀
⠀⣴⣿⣿⠋⠀⠀⠀⠀⠸⣿⣿⣿⣿⣯⣿⣿⣿⣿⣿⠀⠀⠀⠐⡄⡌⢻⣿⣿⡷
⢸⣿⣿⠃⢂⡋⠄⠀⠀⠀⢿⣿⣿⣿⣿⣿⣯⣿⣿⠏⠀⠀⠀⠀⢦⣷⣿⠿⠛⠁
⠀⠙⠿⢾⣤⡈⠙⠂⢤⢀⠀⠙⠿⢿⣿⣿⡿⠟⠁⠀⣀⣀⣤⣶⠟⠋⠁⠀⠀⠀
⠀⠀⠀⠀⠈⠙⠿⣾⣠⣆⣅⣀⣠⣄⣤⣴⣶⣾⣽⢿⠿⠟⠋⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠙⠛⠛⠙⠋⠉⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀