-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][DOCS] Recipe [DOCS] ([DOC]umentation) #1230
Changes from all commits
09443a0
487ebae
1b914ad
416e6a1
2422b11
a279cc4
6a38504
04f4407
78d0984
e970e89
8c0204c
7adcaec
2330e63
3d67a51
c323d67
eb93be1
4292014
c7bd45a
683a139
89da954
dc3bcbf
5d09ca1
1f35783
dbfc121
f212380
546224a
cdbe0d9
3d06179
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
.. _lora_finetune_recipe_label: | ||
|
||
============================= | ||
LoRA Single Device Finetuning | ||
============================= | ||
|
||
This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT) | ||
such as `LoRA <https://arxiv.org/abs/2106.09685>`_ and `QLoRA <https://arxiv.org/abs/2305.14314>`_. These techniques | ||
significantly reduce memory consumption during training whilst still maintaining competitive performance. | ||
|
||
We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_ | ||
in just two steps: | ||
|
||
.. note:: | ||
|
||
You may need to be granted access to the Llama model you're interested in. See | ||
:ref:`here <download_llama_label>` for details on accessing gated repositories. | ||
|
||
|
||
.. code-block:: bash | ||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
tune download meta-llama/Meta-Llama-3.1-8B-Instruct \ | ||
--output-dir /tmp/Meta-Llama-3.1-8B-Instruct \ | ||
--ignore-patterns "original/consolidated.00.pth" | ||
|
||
tune run lora_finetune_single_device \ | ||
--config llama3_1/8B_lora_single_device | ||
|
||
You can quickly customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to, | ||
and the scale of the imapct of LoRA during training: | ||
|
||
.. code-block:: bash | ||
|
||
tune run lora_finetune_single_device \ | ||
--config llama3_1/8B_lora_single_device \ | ||
--model.lora_attn_modules=["q_proj", "k_proj", "v_proj"] \ | ||
--model.apply_lora_to_mlp=True \ | ||
--model.lora_rank=64 \ | ||
--model.lora_alpha=128 | ||
|
||
This configuration in particular results in a aggressive LoRA policy which | ||
will tradeoff higher training accuracy with increased memory usage and slower training. | ||
|
||
For a deeper understanding of the different levers you can pull when using this recipe, | ||
see our documentation for the different PEFT training paradigms we support: | ||
|
||
* :ref:`glossary_lora` | ||
* :ref:`glossary_qlora` | ||
|
||
Many of our other memory optimization features can be used in this recipe, too: | ||
|
||
* Adjust :ref:`model precision <glossary_precision>`. | ||
* Use :ref:`activation checkpointing <glossary_act_ckpt>`. | ||
* Enable :ref:`gradient accumulation <glossary_grad_accm>`. | ||
* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`. However, note that since LoRA | ||
significantly reduces memory usage due to gradient state, you will likely not need this | ||
feature. | ||
|
||
You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`. | ||
|
||
Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used: | ||
|
||
* :ref:`Finetuning Llama2 with LoRA<lora_finetune_label>` | ||
* :ref:`End-to-End Workflow with torchtune<dataset_tutorial_label>` | ||
* :ref:`Fine-tuning Llama3 with Chat Data<chat_tutorial_label>` | ||
* :ref:`Meta Llama3 in torchtune<llama3_label>` | ||
* :ref:`Fine-Tune Your First LLM<finetune_llama_label>` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
.. _qat_distributed_recipe_label: | ||
|
||
============================================= | ||
Distributed Quantization-Aware Training (QAT) | ||
============================================= | ||
|
||
QAT allows for taking advantage of memory-saving optimizations from quantization at inference time, without significantly | ||
degrading model performance. In torchtune, we use `torchao <https://github.com/pytorch/ao>`_ to implement QAT. | ||
This works by :ref:`simulating quantization numerics during fine-tuning <what_is_qat_label>`. While this may introduce memory and | ||
compute overheads during training, our tests found that QAT significantly reduced performance degradation in evaluations of | ||
quantized model, without compromising on model size reduction gains. | ||
|
||
.. note:: | ||
|
||
The `PyTorch blogpost <https://pytorch.org/blog/quantization-aware-training/>`_ on QAT provides further insight into how QAT works. | ||
|
||
|
||
We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_ | ||
in just two steps: | ||
|
||
.. note:: | ||
|
||
You may need to be granted access to the Llama model you're interested in. See | ||
:ref:`here <download_llama_label>` for details on accessing gated repositories. | ||
|
||
.. code-block:: bash | ||
|
||
tune download meta-llama/Meta-Llama-3-8B-Instruct \ | ||
--output-dir /tmp/Meta-Llama-3-8B-Instruct \ | ||
--ignore-patterns "original/consolidated.00.pth" \ | ||
--HF_TOKEN <HF_TOKEN> | ||
|
||
tune run --nproc_per_node 6 qat_distributed \ | ||
--config llama3/8B_qat_full | ||
|
||
.. note:: | ||
This workload requires at least 6 GPUs, each with VRAM of at least 80GB. | ||
|
||
|
||
Currently, the main lever you can pull for QAT is by using *delayed fake quantization*. | ||
Delayed fake quantization allows for control over the step after which fake quantization occurs. | ||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Empirically, allowing the model to finetune without fake quantization initially allows the | ||
weight and activation values to stabilize before fake quantizing them, potentially leading | ||
to improved quantized accuracy. This can be specified through ``fake_quant_after_n_steps``. To | ||
provide you with an idea of how to roughly configure this parameter, we've achieved best results with | ||
``fake_quant_after_n_steps ~= total_steps // 2``. | ||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In the future we plan to support different quantization strategies. For now, note that you'll need at least | ||
``torch>=2.4.0`` to use the `Int8DynActInt4WeightQATQuantizer <https://github.com/pytorch/ao/blob/08024c686fdd3f3dc2817094f817f54be7d3c4ac/torchao/quantization/prototype/qat/api.py#L35>`_ | ||
strategy. Generally, the pipeline for training, quantizing, and evaluating a model using QAT is: | ||
|
||
#. Run the ``qat_distributed`` recipe using the above command, or by following the tutorial. By default, this will use ``Int8DynActInt4WeightQATQuantizer``. | ||
#. This produces an un-quantized model in the original data type. To get an actual quantized model, follow this with | ||
``tune run quantize`` while specifying the same quantizer in the config, e.g. | ||
|
||
.. code-block:: yaml | ||
|
||
# QAT specific args | ||
quantizer: | ||
_component_: torchtune.utils.quantization.Int8DynActInt4WeightQATQuantizer | ||
groupsize: 256 | ||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#. :ref:`Evaluate<qat_eval_label>` or `run inference <https://github.com/pytorch/torchtune/blob/main/recipes/quantization.md#generate>`_ | ||
using your your quantized model by specifying the corresponding post-training quantizer: | ||
|
||
.. code-block:: yaml | ||
|
||
quantizer: | ||
_component_: torchtune.utils.quantization.Int8DynActInt4WeightQuantizer | ||
groupsize: 256 | ||
|
||
.. note:: | ||
|
||
We're using config files to show how to customize the recipe in these examples. Check out the | ||
:ref:`configs tutorial <config_tutorial_label>` to learn more. | ||
|
||
Many of our other memory optimization features can be used in this recipe, too: | ||
|
||
* Adjust :ref:`model precision <glossary_precision>`. | ||
* Use :ref:`activation checkpointing <glossary_act_ckpt>`. | ||
* Enable :ref:`gradient accumulation <glossary_grad_accm>`. | ||
* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`. | ||
|
||
You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`. | ||
|
||
Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used: | ||
|
||
* :ref:`qat_finetune_label` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. _recipes_overview_label: | ||
|
||
================ | ||
Recipes Overview | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could delete it and this could be the first part of the recipe_deepdive? Or if its just supposed to be an index, maybe it can just be a list and leave the information for the recipe pages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally like having a little context so this can be a kind-of standalone document for a reader. Second opinions? @joecummings @RdoubleA @ebsmothers @pbontrager There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I like it. Without it there we just jump right into individual recipe pages and it's not really clear what their purpose is; I feel like this page provides useful framing for the entire section. |
||
================ | ||
|
||
Recipes are the primary entry points for torchtune users. | ||
These can be thought of as **hackable, singularly-focused scripts for interacting with LLMs** including fine-tuning, | ||
inference, evaluation, and quantization. | ||
|
||
Each recipe consists of three components: | ||
|
||
* **Configurable parameters**, specified through yaml configs and command-line overrides | ||
* **Recipe script**, entry-point which puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class | ||
* **Recipe class**, core logic needed for fine-tuning, exposed through a set of APIs | ||
|
||
.. note:: | ||
|
||
To learn more about the concept of "recipes", check out our technical deep-dive: :ref:`recipe_deepdive`. | ||
|
||
|
||
Supervised Finetuning | ||
--------------------- | ||
|
||
torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_, | ||
using a variety of :ref:`memory optimization features <memory_optimization_overview_label>`. Our fine-tuning recipes support all of our models and all our dataset types. | ||
This includes continued pre-training, and various supervised funetuning paradigms, which can be customized through our datasets. Check out our | ||
:ref:`dataset tutorial <dataset_tutorial_label>` for more information. | ||
|
||
Our supervised fine-tuning recipes include: | ||
|
||
* :ref:`Single-device <lora_finetune_recipe_label>` LoRA fine-tuning. | ||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* :ref:`Distributed Quantization-Aware Training<qat_distributed_recipe_label>`. | ||
|
||
.. Alignment finetuning | ||
.. -------------------- | ||
.. Interested in alignment fine-tuning? You've come to the right place! We support the following alignment techniques: | ||
|
||
.. Direct Preference Optimixation (DPO) Fine-Tuning | ||
.. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
.. `Direct Preference Optimixation <https://arxiv.org/abs/2305.18290>`_ (DPO) stype techniques allow for aligning language models with respect | ||
.. to a reward model objective function without the use of reinforcement learning. We support DPO preference fine-tuning with: | ||
|
||
.. * :ref:`Single-device <lora_finetune_recipe_label>` and :ref:`multi-device <lora_finetune_recipe_label>` LoRA finetuning. | ||
|
||
.. note:: | ||
|
||
Want to learn more about a certain recipe, but can't find the documentation here? | ||
Not to worry! Our recipe documentation is currently in construction - come back soon | ||
to see documentation of your favourite fine-tuning techniques. | ||
|
||
.. interested in contributing documentation? Check out our issue here TODO (SalmanMohammadi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two unfiltered cents: I am not sure how much I like all the links. I understand the intention, but in general, a good rule of thumb for me is "less is more". Also, as an engineer, i feel that most of us like when things go straight to the point, e.g. "show me code" or "a picture is worth a thousand words". However, when I try to think: "ok, how would i rewrite it?", it becomes a bit hard for me articulate something intelligent. So, if others are comfortable with it, its fine with me. But if its a shared feeling, maybe we could revisit it.
The links at the bottom are interesting though, as a type of "keep reading".
TLDR: Maybe increase ratio of LoRA-information / information. If most of the information are links or notes, then it may be too much noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I 100% agree here - IMO these docs aren't necessarily aimed towards engineers who are used to quickly reading condensed information, but to maximise discovery of what we offer in torchtune - this was my primary motivation for writing these.
I will comb through the links and make sure they're relevant/necessary though : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh I do get @felipemello1's sentiment on this one. Esp because we immediately lead with 5 links. While many of them may be useful, I think we should instead lead with an example or something. Otherwise as a reader who just wants to understand how this recipe works/what it does I am immediately overwhelmed with pointers to literally half our live docs, making it hard to tease out what the actual relevant information is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, sorry Felipe I don't think I fully grasped your original point : ) I'll address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to try remove the amount of noise and provide concrete examples, and leave additional information at the bottom.