Skip to content

Commit

Permalink
Merge branch 'main' into mm_recipe
Browse files Browse the repository at this point in the history
  • Loading branch information
pbontrager committed Sep 19, 2024
2 parents 5d11ac0 + c5db813 commit fe1a781
Show file tree
Hide file tree
Showing 22 changed files with 170 additions and 295 deletions.
11 changes: 6 additions & 5 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,10 @@ Please link to any issues this PR addresses.

#### Changelog
What are the changes made in this PR?
*

#### Test plan
Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a [contributing page](https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md) for some guidance on contributing.)
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a [contributing page](https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md) for some guidance on contributing.

- [ ] run pre-commit hooks and linters (make sure you've first installed via `pre-commit install`)
- [ ] add [unit tests](https://github.com/pytorch/torchtune/tree/main/tests/torchtune) for any new functionality
Expand All @@ -23,8 +24,8 @@ Please make sure to do each of the following if applicable to your PR. (If you'r

#### UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Example of docstring: https://github.com/pytorch/torchtune/blob/6a7951f1cdd0b56a9746ef5935106989415f50e3/torchtune/modules/vision_transformer.py#L285
Example in our docs: https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models
Here is a [docstring example](https://github.com/pytorch/torchtune/blob/6a7951f1cdd0b56a9746ef5935106989415f50e3/torchtune/modules/vision_transformer.py#L285)
and a [tutorial example](https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models)

- [ ] I did not change any public API;
- [ ] I have added an example to docs or docstrings;
- [ ] I did not change any public API
- [ ] I have added an example to docs or docstrings
2 changes: 2 additions & 0 deletions docs/source/api_ref_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ Converts data from common JSON formats into a torchtune :class:`Message`.
get_sharegpt_messages
get_openai_messages

.. _message_transforms_ref:

Message transforms
------------------

Expand Down
1 change: 1 addition & 0 deletions docs/source/api_ref_datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Multimodal datasets
multimodal.llava_instruct_dataset
multimodal.the_cauldron_dataset

.. _dataset_builders:

Generic dataset builders
------------------------
Expand Down
69 changes: 34 additions & 35 deletions docs/source/api_ref_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,28 @@ llama3 & llama3.1

All models from the `Llama3 family <https://llama.meta.com/llama3/>`_.

Request Access on `Hugging Face <https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct>`__.
Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct>`__ before downloading it.

To download the Llama3-8B-Instruct model:
To download the Llama3.1-8B-Instruct model:

.. code-block:: bash
tune download meta-llama/Meta-Llama-3-8B-Instruct --hf-token <HF_TOKEN>
tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
To download the Llama3-70B-Instruct model:
To download the Llama3.1-70B-Instruct model:

.. code-block:: bash
tune download meta-llama/Meta-Llama-3-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
To download the Llama3.1 weights of the above models, you can instead download from `Meta-Llama-3.1-8B-Instruct`,
`Meta-Llama-3.1-70B-Instruct`, or `Meta-Llama-3.1-405B-Instruct`.
To download the Llama3.1-405B-Instruct model:

.. code-block:: bash
tune download meta-llama/Meta-Llama-3.1-405B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
To download the Llama3 weights of the above models, you can instead download from `Meta-Llama-3-8B-Instruct` and
`Meta-Llama-3-70B-Instruct`.

.. autosummary::
:toctree: generated/
Expand All @@ -41,7 +47,6 @@ To download the Llama3.1 weights of the above models, you can instead download f
llama3.lora_llama3_70b
llama3.qlora_llama3_70b
llama3.llama3_tokenizer
llama3.Llama3Tokenizer

|
Expand All @@ -67,25 +72,25 @@ llama2

All models from the `Llama2 family <https://llama.meta.com/llama2/>`_.

Request Access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b>`__.
Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b-hf>`__ before downloading it.

To download the Llama2-7B model:

.. code-block:: bash
tune download meta-llama/Llama-2-7b-hf --hf-token <HF_TOKEN>
tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf --hf-token <HF_TOKEN>
To download the Llama2-13B model:

.. code-block:: bash
tune download meta-llama/Llama-2-13b-hf --hf-token <HF_TOKEN>
tune download meta-llama/Llama-2-13b-hf --output-dir /tmp/Llama-2-13b-hf --hf-token <HF_TOKEN>
To download the Llama2-70B model:

.. code-block:: bash
tune download meta-llama/Llama-2-70b-hf --hf-token <HF_TOKEN>
tune download meta-llama/Llama-2-70b-hf --output-dir /tmp/Llama-2-70b-hf --hf-token <HF_TOKEN>
.. autosummary::
:toctree: generated/
Expand All @@ -103,7 +108,6 @@ To download the Llama2-70B model:
llama2.lora_llama2_70b
llama2.qlora_llama2_70b
llama2.llama2_tokenizer
llama2.Llama2Tokenizer
llama2.llama2_reward_7b
llama2.lora_llama2_reward_7b
llama2.qlora_llama2_reward_7b
Expand All @@ -115,13 +119,13 @@ code llama

Models from the `Code Llama family <https://arxiv.org/pdf/2308.12950>`_.

Request Access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b>`__.
Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/CodeLlama-7b-hf>`__ before downloading it.

To download the CodeLlama-7B model:

.. code-block:: bash
tune download codellama/CodeLlama-7b-hf --hf-token <HF_TOKEN>
tune download meta-llama/CodeLlama-7b-hf --output-dir /tmp/CodeLlama-7b-hf --hf-token <HF_TOKEN>
.. autosummary::
:toctree: generated/
Expand Down Expand Up @@ -161,7 +165,6 @@ To download the Qwen2 1.5B model, for example:
qwen2.lora_qwen2_0_5b
qwen2.lora_qwen2_1_5b
qwen2.qwen2_tokenizer
qwen2.Qwen2Tokenizer

phi-3
-----
Expand All @@ -172,7 +175,7 @@ To download the Phi-3 Mini 4k instruct model:

.. code-block:: bash
tune download microsoft/Phi-3-mini-4k-instruct --ignore-patterns None --hf-token <HF_TOKEN>
tune download microsoft/Phi-3-mini-4k-instruct --output-dir /tmp/Phi-3-mini-4k-instruct --ignore-patterns None --hf-token <HF_TOKEN>
.. autosummary::
:toctree: generated/
Expand All @@ -184,21 +187,19 @@ To download the Phi-3 Mini 4k instruct model:
phi3.lora_phi3_mini
phi3.qlora_phi3_mini
phi3.phi3_mini_tokenizer
phi3.Phi3MiniTokenizer


mistral
-------

All models from `Mistral AI family <https://mistral.ai/technology/#models>`_.

Request Access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.3>`__.
Important: You need to request access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.1>`__ to download this model.

To download the Mistral 7B v0.1 model:

.. code-block:: bash
tune download mistralai/Mistral-7B-v0.1 --hf-token <HF_TOKEN>
tune download mistralai/Mistral-7B-v0.1 --output-dir /tmp/Mistral-7B-v0.1 --hf-token <HF_TOKEN>
.. autosummary::
:toctree: generated/
Expand All @@ -215,7 +216,6 @@ To download the Mistral 7B v0.1 model:
mistral.lora_mistral_reward_7b
mistral.qlora_mistral_reward_7b
mistral.mistral_tokenizer
mistral.MistralTokenizer
mistral.MistralChatTemplate


Expand All @@ -224,9 +224,9 @@ gemma

Models of size 2B and 7B from the `Gemma family <https://blog.google/technology/developers/gemma-open-models/>`_.

Request Access on `Hugging Face <https://huggingface.co/google/gemma-2b>`__.
Important: You need to request access on `Hugging Face <https://huggingface.co/google/gemma-2b>`__ to use this model.

To download the Gemma 2B model:
To download the Gemma 2B model (not Gemma2):

.. code-block:: bash
Expand All @@ -251,19 +251,18 @@ To download the Gemma 7B model:
gemma.lora_gemma_7b
gemma.qlora_gemma_7b
gemma.gemma_tokenizer
gemma.GemmaTokenizer


clip
-----
.. clip
.. -----
Vision components to support multimodality using `CLIP encoder <https://arxiv.org/abs/2103.00020>`_.
.. Vision components to support multimodality using `CLIP encoder <https://arxiv.org/abs/2103.00020>`_.
.. autosummary::
:toctree: generated/
:nosignatures:
.. .. autosummary::
.. :toctree: generated/
.. :nosignatures:
clip.clip_vision_encoder
clip.TokenPositionalEmbedding
clip.TiledTokenPositionalEmbedding
clip.TilePositionalEmbedding
.. clip.clip_vision_encoder
.. clip.TokenPositionalEmbedding
.. clip.TiledTokenPositionalEmbedding
.. clip.TilePositionalEmbedding
1 change: 0 additions & 1 deletion docs/source/api_ref_rlhf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,4 @@ Components and losses for RLHF algorithms like PPO and DPO.
loss.PPOLoss
loss.DPOLoss
loss.RSOLoss
loss.IPOLoss
loss.SimPOLoss
31 changes: 10 additions & 21 deletions docs/source/recipes/lora_finetune_single_device.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@ LoRA Single Device Finetuning
=============================

This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT)
such as `LoRA <https://arxiv.org/abs/2106.09685>`_ and `QLoRA <https://arxiv.org/abs/2305.14314>`_. These techniques
such as :ref:`glossary_lora` and :ref:`glossary_qlora`. These techniques
significantly reduce memory consumption during training whilst still maintaining competitive performance.

We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_
in just two steps:
We provide configs which you can get up and running quickly. Here is an example with llama 3.1 8B:

.. note::

Expand All @@ -19,44 +18,34 @@ in just two steps:

.. code-block:: bash
# download the model
tune download meta-llama/Meta-Llama-3.1-8B-Instruct \
--output-dir /tmp/Meta-Llama-3.1-8B-Instruct \
--ignore-patterns "original/consolidated.00.pth"
# run the recipe
tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device
You can quickly customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to,
and the scale of the imapct of LoRA during training:
You can customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to:

.. code-block:: bash
tune run lora_finetune_single_device \
--config llama3_1/8B_lora_single_device \
--model.lora_attn_modules=["q_proj", "k_proj", "v_proj"] \
--model.apply_lora_to_mlp=True \
--model.lora_rank=64 \
--model.lora_alpha=128
model.lora_attn_modules=“[q_proj,k_proj,v_proj]” \
model.apply_lora_to_mlp=True \
model.lora_rank=64 \
model.lora_alpha=128
This configuration in particular results in a aggressive LoRA policy which
will tradeoff higher training accuracy with increased memory usage and slower training.
For a deeper understanding of the different levers you can pull when using this recipe,
see our documentation for the different PEFT training paradigms we support:

* :ref:`glossary_lora`
* :ref:`glossary_qlora`

Many of our other memory optimization features can be used in this recipe, too:

* Adjust :ref:`model precision <glossary_precision>`.
* Use :ref:`activation checkpointing <glossary_act_ckpt>`.
* Enable :ref:`gradient accumulation <glossary_grad_accm>`.
* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`. However, note that since LoRA
significantly reduces memory usage due to gradient state, you will likely not need this
feature.

You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`.
Many of our other memory optimization features can be used in this recipe. You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`.

Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used:

Expand Down
29 changes: 16 additions & 13 deletions docs/source/recipes/recipes_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,24 @@ Each recipe consists of three components:
To learn more about the concept of "recipes", check out our technical deep-dive: :ref:`recipe_deepdive`.


Supervised Finetuning
---------------------
Finetuning
----------

torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
using a variety of :ref:`memory optimization features <memory_optimization_overview_label>`. Our fine-tuning recipes support all of our models and all our dataset types.
This includes continued pre-training, and various supervised funetuning paradigms, which can be customized through our datasets. Check out our
:ref:`dataset tutorial <dataset_tutorial_label>` for more information.
Our recipes include:

Our supervised fine-tuning recipes include:
* :ref:`Single-device LoRA fine-tuning <lora_finetune_recipe_label>`.
* Single-device full fine-tuning
* Distributed full fine-tuning
* Distributed LoRA fine-tuning
* Direct Preference Optimization (DPO)
* Proximal Policy Optimization (PPO)
* :ref:`Distributed Quantization-Aware Training (QAT)<qat_distributed_recipe_label>`.

* :ref:`Single-device <lora_finetune_recipe_label>` LoRA fine-tuning.
* :ref:`Distributed Quantization-Aware Training<qat_distributed_recipe_label>`.
For a full list, please run:

.. code-block:: bash
tune ls
.. Alignment finetuning
.. --------------------
Expand All @@ -46,8 +52,5 @@ Our supervised fine-tuning recipes include:
.. note::

Want to learn more about a certain recipe, but can't find the documentation here?
Not to worry! Our recipe documentation is currently in construction - come back soon
to see documentation of your favourite fine-tuning techniques. We'd love to support
your contributions if you're interested in helping out here. Check out our tracker
Our recipe documentation is currently in construction. Please feel free to follow the progress in our tracker
issue `here <https://github.com/pytorch/torchtune/issues/1408>`_.
Loading

0 comments on commit fe1a781

Please sign in to comment.