Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][DOCS] Recipe [DOCS] ([DOC]umentation) #1230

Merged
merged 28 commits into from
Aug 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
09443a0
testing
SalmanMohammadi Jul 22, 2024
487ebae
init
SalmanMohammadi Jul 25, 2024
1b914ad
adding recipe docs draft
SalmanMohammadi Jul 26, 2024
416e6a1
label bug....
SalmanMohammadi Jul 26, 2024
2422b11
updating
SalmanMohammadi Jul 26, 2024
a279cc4
init commit
SalmanMohammadi Aug 1, 2024
6a38504
Merge branch 'memory-optimisations-overview' into recipe_docs
SalmanMohammadi Aug 1, 2024
04f4407
updating doc preview
SalmanMohammadi Aug 1, 2024
78d0984
updating doc preview
SalmanMohammadi Aug 1, 2024
e970e89
adding qat recipe, updating template
SalmanMohammadi Aug 2, 2024
8c0204c
fixing titling
SalmanMohammadi Aug 2, 2024
7adcaec
Update _recipe_template.rst
SalmanMohammadi Aug 2, 2024
2330e63
nits
SalmanMohammadi Aug 2, 2024
3d67a51
addressing comments
SalmanMohammadi Aug 3, 2024
c323d67
Merge branch 'main' into recipe_docs
SalmanMohammadi Aug 5, 2024
eb93be1
addressing comments
SalmanMohammadi Aug 5, 2024
4292014
Merge branch 'recipe_docs' of github.com:SalmanMohammadi/torchtune in…
SalmanMohammadi Aug 5, 2024
c7bd45a
adding guidance for using different mdoels before I forget
SalmanMohammadi Aug 6, 2024
683a139
addressing comments
SalmanMohammadi Aug 8, 2024
89da954
removing config
SalmanMohammadi Aug 8, 2024
dc3bcbf
sp
SalmanMohammadi Aug 10, 2024
5d09ca1
adding mem opt
SalmanMohammadi Aug 19, 2024
1f35783
updating template
SalmanMohammadi Aug 19, 2024
dbfc121
adding n step note
SalmanMohammadi Aug 19, 2024
f212380
further callouts in table
SalmanMohammadi Aug 19, 2024
546224a
updated table
SalmanMohammadi Aug 19, 2024
cdbe0d9
a little more callout
SalmanMohammadi Aug 21, 2024
3d06179
updating tutorials
SalmanMohammadi Aug 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ Please link to any issues this PR addresses.
What are the changes made in this PR?

#### Test plan
Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a [contributing page](../CONTRIBUTING.md) for some guidance on contributing.)
Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a [contributing page](https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md) for some guidance on contributing.)

- [ ] run pre-commit hooks and linters (make sure you've first installed via `pre-commit install`)
- [ ] add [unit tests](../tests/torchtune) for any new functionality
- [ ] update [docstrings](../docs/source) for any new or updated methods or classes
- [ ] add [unit tests](https://github.com/pytorch/torchtune/tree/main/tests/torchtune) for any new functionality
- [ ] update [docstrings](https://github.com/pytorch/torchtune/tree/main/docs/source) for any new or updated methods or classes
- [ ] run unit tests via `pytest tests`
- [ ] run recipe tests via `pytest tests -m integration_test`
- [ ] manually run any new or modified recipes with sufficient proof of correctness
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ Each API and class should be clearly documented. Well-documented code is easier

Documentation is written in [RST](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) format.

If you've contributed a new recipe, please ensure you've created a corresponding recipe doc file in [the recipes directory](docs/source/recipes), and updated the [recipe overview page](docs/source/recipes/recipes_overview.rst), and the [index sidebar](docs/source/index.rst). You can find a template to fill in [here](docs/source/_templates/_recipe_template.rst).

### Adding a new class/method to the API References
Once you've added an API that is meant to be exposed publically, you should add it to the appropriate rst file. For example, any new API within the [configs/](torchtune/configs)
directory should be added to `api_ref_configs.rst`, [data/](torchtune/data) should be added to `api_ref_data.rst`, [datasets](torchtune/datasets) should be added to
Expand Down
11 changes: 11 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,16 @@ torchtune tutorials.
tutorials/first_finetune_tutorial
tune_cli

.. toctree::
:glob:
:maxdepth: 1
:caption: Finetuning Recipes
:hidden:

recipes/recipes_overview
recipes/lora_finetune_single_device
recipes/qat_distributed

.. toctree::
:glob:
:maxdepth: 1
Expand All @@ -110,6 +120,7 @@ torchtune tutorials.
tutorials/e2e_flow
tutorials/datasets
tutorials/chat
tutorials/memory_optimizations

.. toctree::
:glob:
Expand Down
2 changes: 2 additions & 0 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Excited? To get started, checkout some of our tutorials, including:
- our :ref:`LoRA tutorial <lora_finetune_label>` to learn about parameter-efficient finetuning with torchtune.
- our :ref:`QLoRA tutorial <qlora_finetune_label>` to attain maximal memory efficiency with torchtune.

Eager for more? Check out our :ref:`recipes index<recipes_overview_label>` to see all the fine-tuning techniques we support.

Key Concepts
------------

Expand Down
67 changes: 67 additions & 0 deletions docs/source/recipes/lora_finetune_single_device.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
.. _lora_finetune_recipe_label:

Copy link
Contributor

@felipemello1 felipemello1 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two unfiltered cents: I am not sure how much I like all the links. I understand the intention, but in general, a good rule of thumb for me is "less is more". Also, as an engineer, i feel that most of us like when things go straight to the point, e.g. "show me code" or "a picture is worth a thousand words". However, when I try to think: "ok, how would i rewrite it?", it becomes a bit hard for me articulate something intelligent. So, if others are comfortable with it, its fine with me. But if its a shared feeling, maybe we could revisit it.

The links at the bottom are interesting though, as a type of "keep reading".

TLDR: Maybe increase ratio of LoRA-information / information. If most of the information are links or notes, then it may be too much noise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I 100% agree here - IMO these docs aren't necessarily aimed towards engineers who are used to quickly reading condensed information, but to maximise discovery of what we offer in torchtune - this was my primary motivation for writing these.

I will comb through the links and make sure they're relevant/necessary though : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh I do get @felipemello1's sentiment on this one. Esp because we immediately lead with 5 links. While many of them may be useful, I think we should instead lead with an example or something. Otherwise as a reader who just wants to understand how this recipe works/what it does I am immediately overwhelmed with pointers to literally half our live docs, making it hard to tease out what the actual relevant information is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, sorry Felipe I don't think I fully grasped your original point : ) I'll address.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to try remove the amount of noise and provide concrete examples, and leave additional information at the bottom.

=============================
LoRA Single Device Finetuning
=============================

This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT)
such as `LoRA <https://arxiv.org/abs/2106.09685>`_ and `QLoRA <https://arxiv.org/abs/2305.14314>`_. These techniques
significantly reduce memory consumption during training whilst still maintaining competitive performance.

We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_
in just two steps:

.. note::

You may need to be granted access to the Llama model you're interested in. See
:ref:`here <download_llama_label>` for details on accessing gated repositories.


.. code-block:: bash
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved

tune download meta-llama/Meta-Llama-3.1-8B-Instruct \
--output-dir /tmp/Meta-Llama-3.1-8B-Instruct \
--ignore-patterns "original/consolidated.00.pth"

tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device

You can quickly customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to,
and the scale of the imapct of LoRA during training:

.. code-block:: bash

tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device \
--model.lora_attn_modules=["q_proj", "k_proj", "v_proj"] \
--model.apply_lora_to_mlp=True \
--model.lora_rank=64 \
--model.lora_alpha=128

This configuration in particular results in a aggressive LoRA policy which
will tradeoff higher training accuracy with increased memory usage and slower training.

For a deeper understanding of the different levers you can pull when using this recipe,
see our documentation for the different PEFT training paradigms we support:

* :ref:`glossary_lora`
* :ref:`glossary_qlora`

Many of our other memory optimization features can be used in this recipe, too:

* Adjust :ref:`model precision <glossary_precision>`.
* Use :ref:`activation checkpointing <glossary_act_ckpt>`.
* Enable :ref:`gradient accumulation <glossary_grad_accm>`.
* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`. However, note that since LoRA
significantly reduces memory usage due to gradient state, you will likely not need this
feature.

You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`.

Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used:

* :ref:`Finetuning Llama2 with LoRA<lora_finetune_label>`
* :ref:`End-to-End Workflow with torchtune<dataset_tutorial_label>`
* :ref:`Fine-tuning Llama3 with Chat Data<chat_tutorial_label>`
* :ref:`Meta Llama3 in torchtune<llama3_label>`
* :ref:`Fine-Tune Your First LLM<finetune_llama_label>`
88 changes: 88 additions & 0 deletions docs/source/recipes/qat_distributed.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
.. _qat_distributed_recipe_label:

=============================================
Distributed Quantization-Aware Training (QAT)
=============================================

QAT allows for taking advantage of memory-saving optimizations from quantization at inference time, without significantly
degrading model performance. In torchtune, we use `torchao <https://github.com/pytorch/ao>`_ to implement QAT.
This works by :ref:`simulating quantization numerics during fine-tuning <what_is_qat_label>`. While this may introduce memory and
compute overheads during training, our tests found that QAT significantly reduced performance degradation in evaluations of
quantized model, without compromising on model size reduction gains.

.. note::

The `PyTorch blogpost <https://pytorch.org/blog/quantization-aware-training/>`_ on QAT provides further insight into how QAT works.


We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_
in just two steps:

.. note::

You may need to be granted access to the Llama model you're interested in. See
:ref:`here <download_llama_label>` for details on accessing gated repositories.

.. code-block:: bash

tune download meta-llama/Meta-Llama-3-8B-Instruct \
--output-dir /tmp/Meta-Llama-3-8B-Instruct \
--ignore-patterns "original/consolidated.00.pth" \
--HF_TOKEN <HF_TOKEN>

tune run --nproc_per_node 6 qat_distributed \
--config llama3/8B_qat_full

.. note::
This workload requires at least 6 GPUs, each with VRAM of at least 80GB.


Currently, the main lever you can pull for QAT is by using *delayed fake quantization*.
Delayed fake quantization allows for control over the step after which fake quantization occurs.
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved
Empirically, allowing the model to finetune without fake quantization initially allows the
weight and activation values to stabilize before fake quantizing them, potentially leading
to improved quantized accuracy. This can be specified through ``fake_quant_after_n_steps``. To
provide you with an idea of how to roughly configure this parameter, we've achieved best results with
``fake_quant_after_n_steps ~= total_steps // 2``.
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved

In the future we plan to support different quantization strategies. For now, note that you'll need at least
``torch>=2.4.0`` to use the `Int8DynActInt4WeightQATQuantizer <https://github.com/pytorch/ao/blob/08024c686fdd3f3dc2817094f817f54be7d3c4ac/torchao/quantization/prototype/qat/api.py#L35>`_
strategy. Generally, the pipeline for training, quantizing, and evaluating a model using QAT is:

#. Run the ``qat_distributed`` recipe using the above command, or by following the tutorial. By default, this will use ``Int8DynActInt4WeightQATQuantizer``.
#. This produces an un-quantized model in the original data type. To get an actual quantized model, follow this with
``tune run quantize`` while specifying the same quantizer in the config, e.g.

.. code-block:: yaml

# QAT specific args
quantizer:
_component_: torchtune.utils.quantization.Int8DynActInt4WeightQATQuantizer
groupsize: 256
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved

#. :ref:`Evaluate<qat_eval_label>` or `run inference <https://github.com/pytorch/torchtune/blob/main/recipes/quantization.md#generate>`_
using your your quantized model by specifying the corresponding post-training quantizer:

.. code-block:: yaml

quantizer:
_component_: torchtune.utils.quantization.Int8DynActInt4WeightQuantizer
groupsize: 256

.. note::

We're using config files to show how to customize the recipe in these examples. Check out the
:ref:`configs tutorial <config_tutorial_label>` to learn more.

Many of our other memory optimization features can be used in this recipe, too:

* Adjust :ref:`model precision <glossary_precision>`.
* Use :ref:`activation checkpointing <glossary_act_ckpt>`.
* Enable :ref:`gradient accumulation <glossary_grad_accm>`.
* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`.

You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`.

Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used:

* :ref:`qat_finetune_label`
53 changes: 53 additions & 0 deletions docs/source/recipes/recipes_overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _recipes_overview_label:

================
Recipes Overview
Copy link
Contributor

@felipemello1 felipemello1 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could delete it and this could be the first part of the recipe_deepdive? Or if its just supposed to be an index, maybe it can just be a list and leave the information for the recipe pages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like having a little context so this can be a kind-of standalone document for a reader. Second opinions? @joecummings @RdoubleA @ebsmothers @pbontrager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like it. Without it there we just jump right into individual recipe pages and it's not really clear what their purpose is; I feel like this page provides useful framing for the entire section.

================

Recipes are the primary entry points for torchtune users.
These can be thought of as **hackable, singularly-focused scripts for interacting with LLMs** including fine-tuning,
inference, evaluation, and quantization.

Each recipe consists of three components:

* **Configurable parameters**, specified through yaml configs and command-line overrides
* **Recipe script**, entry-point which puts everything together including parsing and validating configs, setting up the environment, and correctly using the recipe class
* **Recipe class**, core logic needed for fine-tuning, exposed through a set of APIs

.. note::

To learn more about the concept of "recipes", check out our technical deep-dive: :ref:`recipe_deepdive`.


Supervised Finetuning
---------------------

torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
using a variety of :ref:`memory optimization features <memory_optimization_overview_label>`. Our fine-tuning recipes support all of our models and all our dataset types.
This includes continued pre-training, and various supervised funetuning paradigms, which can be customized through our datasets. Check out our
:ref:`dataset tutorial <dataset_tutorial_label>` for more information.

Our supervised fine-tuning recipes include:

* :ref:`Single-device <lora_finetune_recipe_label>` LoRA fine-tuning.
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved
* :ref:`Distributed Quantization-Aware Training<qat_distributed_recipe_label>`.

.. Alignment finetuning
.. --------------------
.. Interested in alignment fine-tuning? You've come to the right place! We support the following alignment techniques:

.. Direct Preference Optimixation (DPO) Fine-Tuning
.. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. `Direct Preference Optimixation <https://arxiv.org/abs/2305.18290>`_ (DPO) stype techniques allow for aligning language models with respect
.. to a reward model objective function without the use of reinforcement learning. We support DPO preference fine-tuning with:

.. * :ref:`Single-device <lora_finetune_recipe_label>` and :ref:`multi-device <lora_finetune_recipe_label>` LoRA finetuning.

.. note::

Want to learn more about a certain recipe, but can't find the documentation here?
Not to worry! Our recipe documentation is currently in construction - come back soon
to see documentation of your favourite fine-tuning techniques.

.. interested in contributing documentation? Check out our issue here TODO (SalmanMohammadi)
2 changes: 2 additions & 0 deletions docs/source/tutorials/chat.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _chat_tutorial_label:

=================================
Fine-tuning Llama3 with Chat Data
=================================
Expand Down
7 changes: 3 additions & 4 deletions docs/source/tutorials/first_finetune_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,12 @@ Each recipe consists of three components:

SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved
.. note::

Check out our :ref:`recipes index<recipes_overview_label>` to see all the fine-tuning techniques we support.
To learn more about the concept of "recipes", check out our technical deep-dive: :ref:`recipe_deepdive`.

torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
using memory efficient techniques like `LoRA <https://arxiv.org/abs/2106.09685>`_, and more! You can view all built-in recipes `on GitHub <https://github.com/pytorch/torchtune/tree/main/recipes>`_. You can also utilize the
:ref:`tune ls <tune_ls_label>` command to print out all recipes and corresponding configs.

.. TODO (SalmanMohammadi) point to recipe index page here.
using memory efficient techniques like `LoRA <https://arxiv.org/abs/2106.09685>`_, and more! Check out all our built-in recipes in our :ref:`recipe index<recipes_overview_label>`. You can also utilize the
:code:`tune ls` command to print out all recipes and corresponding configs.

.. code-block:: bash

Expand Down
1 change: 1 addition & 0 deletions docs/source/tutorials/lora_finetune.rst
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@ A comparison of the (smoothed) loss curves between this run and our baseline ove
to generate similar loss curves, but you will need to install W&B and setup an account separately. For more details on
using W&B in torchtune, see our ":ref:`wandb_logging`" recipe.

.. _lora_tutorial_memory_tradeoff_label:

Trading off memory and model performance with LoRA
--------------------------------------------------
Expand Down
Loading
Loading