Merge branch 'main' into mm_recipe

pytorch · Sep 19, 2024 · fe1a781 · fe1a781
2 parents 5d11ac0 + c5db813
commit fe1a781
Show file tree

Hide file tree

Showing 22 changed files with 170 additions and 295 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -9,9 +9,10 @@ Please link to any issues this PR addresses.
 
 #### Changelog
 What are the changes made in this PR?
+*
 
 #### Test plan
-Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a [contributing page](https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md) for some guidance on contributing.)
+Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a [contributing page](https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md) for some guidance on contributing.
 
 - [ ] run pre-commit hooks and linters (make sure you've first installed via `pre-commit install`)
 - [ ] add [unit tests](https://github.com/pytorch/torchtune/tree/main/tests/torchtune) for any new functionality
@@ -23,8 +24,8 @@ Please make sure to do each of the following if applicable to your PR. (If you'r
 
 #### UX
 If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
-Example of docstring: https://github.com/pytorch/torchtune/blob/6a7951f1cdd0b56a9746ef5935106989415f50e3/torchtune/modules/vision_transformer.py#L285
-Example in our docs: https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models
+Here is a [docstring example](https://github.com/pytorch/torchtune/blob/6a7951f1cdd0b56a9746ef5935106989415f50e3/torchtune/modules/vision_transformer.py#L285)
+and a [tutorial example](https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models)
 
-- [ ] I did not change any public API;
-- [ ] I have added an example to docs or docstrings;
+- [ ] I did not change any public API
+- [ ] I have added an example to docs or docstrings
diff --git a/docs/source/api_ref_data.rst b/docs/source/api_ref_data.rst
@@ -49,6 +49,8 @@ Converts data from common JSON formats into a torchtune :class:`Message`.
     get_sharegpt_messages
     get_openai_messages
 
+.. _message_transforms_ref:
+
 Message transforms
 ------------------
 

diff --git a/docs/source/api_ref_datasets.rst b/docs/source/api_ref_datasets.rst
@@ -37,6 +37,7 @@ Multimodal datasets
     multimodal.llava_instruct_dataset
     multimodal.the_cauldron_dataset
 
+.. _dataset_builders:
 
 Generic dataset builders
 ------------------------

diff --git a/docs/source/api_ref_models.rst b/docs/source/api_ref_models.rst
@@ -11,22 +11,28 @@ llama3 & llama3.1
 
 All models from the `Llama3 family <https://llama.meta.com/llama3/>`_.
 
-Request Access on `Hugging Face <https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct>`__.
+Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct>`__ before downloading it.
 
-To download the Llama3-8B-Instruct model:
+To download the Llama3.1-8B-Instruct model:
 
 .. code-block:: bash
 
-    tune download meta-llama/Meta-Llama-3-8B-Instruct --hf-token <HF_TOKEN>
+    tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
 
-To download the Llama3-70B-Instruct model:
+To download the Llama3.1-70B-Instruct model:
 
 .. code-block:: bash
 
-    tune download meta-llama/Meta-Llama-3-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
+    tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
 
-To download the Llama3.1 weights of the above models, you can instead download from `Meta-Llama-3.1-8B-Instruct`,
-`Meta-Llama-3.1-70B-Instruct`, or `Meta-Llama-3.1-405B-Instruct`.
+To download the Llama3.1-405B-Instruct model:
+
+.. code-block:: bash
+
+    tune download meta-llama/Meta-Llama-3.1-405B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
+
+To download the Llama3 weights of the above models, you can instead download from `Meta-Llama-3-8B-Instruct` and
+`Meta-Llama-3-70B-Instruct`.
 
 .. autosummary::
     :toctree: generated/
@@ -41,7 +47,6 @@ To download the Llama3.1 weights of the above models, you can instead download f
     llama3.lora_llama3_70b
     llama3.qlora_llama3_70b
     llama3.llama3_tokenizer
-    llama3.Llama3Tokenizer
 
     |
 
@@ -67,25 +72,25 @@ llama2
 
 All models from the `Llama2 family <https://llama.meta.com/llama2/>`_.
 
-Request Access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b>`__.
+Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b-hf>`__ before downloading it.
 
 To download the Llama2-7B model:
 
 .. code-block:: bash
 
-    tune download meta-llama/Llama-2-7b-hf --hf-token <HF_TOKEN>
+   tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf --hf-token <HF_TOKEN>
 
 To download the Llama2-13B model:
 
 .. code-block:: bash
 
-    tune download meta-llama/Llama-2-13b-hf --hf-token <HF_TOKEN>
+    tune download meta-llama/Llama-2-13b-hf --output-dir /tmp/Llama-2-13b-hf --hf-token <HF_TOKEN>
 
 To download the Llama2-70B model:
 
 .. code-block:: bash
 
-    tune download meta-llama/Llama-2-70b-hf --hf-token <HF_TOKEN>
+    tune download meta-llama/Llama-2-70b-hf --output-dir /tmp/Llama-2-70b-hf --hf-token <HF_TOKEN>
 
 .. autosummary::
     :toctree: generated/
@@ -103,7 +108,6 @@ To download the Llama2-70B model:
     llama2.lora_llama2_70b
     llama2.qlora_llama2_70b
     llama2.llama2_tokenizer
-    llama2.Llama2Tokenizer
     llama2.llama2_reward_7b
     llama2.lora_llama2_reward_7b
     llama2.qlora_llama2_reward_7b
@@ -115,13 +119,13 @@ code llama
 
 Models from the `Code Llama family <https://arxiv.org/pdf/2308.12950>`_.
 
-Request Access on `Hugging Face <https://huggingface.co/meta-llama/Llama-2-7b>`__.
+Important: You need to request access on `Hugging Face <https://huggingface.co/meta-llama/CodeLlama-7b-hf>`__ before downloading it.
 
 To download the CodeLlama-7B model:
 
 .. code-block:: bash
 
-    tune download codellama/CodeLlama-7b-hf --hf-token <HF_TOKEN>
+    tune download meta-llama/CodeLlama-7b-hf --output-dir /tmp/CodeLlama-7b-hf --hf-token <HF_TOKEN>
 
 .. autosummary::
     :toctree: generated/
@@ -161,7 +165,6 @@ To download the Qwen2 1.5B model, for example:
     qwen2.lora_qwen2_0_5b
     qwen2.lora_qwen2_1_5b
     qwen2.qwen2_tokenizer
-    qwen2.Qwen2Tokenizer
 
 phi-3
 -----
@@ -172,7 +175,7 @@ To download the Phi-3 Mini 4k instruct model:
 
 .. code-block:: bash
 
-    tune download microsoft/Phi-3-mini-4k-instruct --ignore-patterns None --hf-token <HF_TOKEN>
+    tune download microsoft/Phi-3-mini-4k-instruct --output-dir /tmp/Phi-3-mini-4k-instruct --ignore-patterns None --hf-token <HF_TOKEN>
 
 .. autosummary::
     :toctree: generated/
@@ -184,21 +187,19 @@ To download the Phi-3 Mini 4k instruct model:
     phi3.lora_phi3_mini
     phi3.qlora_phi3_mini
     phi3.phi3_mini_tokenizer
-    phi3.Phi3MiniTokenizer
-
 
 mistral
 -------
 
 All models from `Mistral AI family <https://mistral.ai/technology/#models>`_.
 
-Request Access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.3>`__.
+Important: You need to request access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.1>`__ to download this model.
 
 To download the Mistral 7B v0.1 model:
 
 .. code-block:: bash
 
-    tune download mistralai/Mistral-7B-v0.1 --hf-token <HF_TOKEN>
+    tune download mistralai/Mistral-7B-v0.1 --output-dir /tmp/Mistral-7B-v0.1 --hf-token <HF_TOKEN>
 
 .. autosummary::
     :toctree: generated/
@@ -215,7 +216,6 @@ To download the Mistral 7B v0.1 model:
     mistral.lora_mistral_reward_7b
     mistral.qlora_mistral_reward_7b
     mistral.mistral_tokenizer
-    mistral.MistralTokenizer
     mistral.MistralChatTemplate
 
 
@@ -224,9 +224,9 @@ gemma
 
 Models of size 2B and 7B from the `Gemma family <https://blog.google/technology/developers/gemma-open-models/>`_.
 
-Request Access on `Hugging Face <https://huggingface.co/google/gemma-2b>`__.
+Important: You need to request access on `Hugging Face <https://huggingface.co/google/gemma-2b>`__ to use this model.
 
-To download the Gemma 2B model:
+To download the Gemma 2B model (not Gemma2):
 
 .. code-block:: bash
 
@@ -251,19 +251,18 @@ To download the Gemma 7B model:
     gemma.lora_gemma_7b
     gemma.qlora_gemma_7b
     gemma.gemma_tokenizer
-    gemma.GemmaTokenizer
 
 
-clip
------
+.. clip
+.. -----
 
-Vision components to support multimodality using `CLIP encoder <https://arxiv.org/abs/2103.00020>`_.
+.. Vision components to support multimodality using `CLIP encoder <https://arxiv.org/abs/2103.00020>`_.
 
-.. autosummary::
-    :toctree: generated/
-    :nosignatures:
+.. .. autosummary::
+..     :toctree: generated/
+..     :nosignatures:
 
-    clip.clip_vision_encoder
-    clip.TokenPositionalEmbedding
-    clip.TiledTokenPositionalEmbedding
-    clip.TilePositionalEmbedding
+..     clip.clip_vision_encoder
+..     clip.TokenPositionalEmbedding
+..     clip.TiledTokenPositionalEmbedding
+..     clip.TilePositionalEmbedding
diff --git a/docs/source/api_ref_rlhf.rst b/docs/source/api_ref_rlhf.rst
@@ -16,5 +16,4 @@ Components and losses for RLHF algorithms like PPO and DPO.
     loss.PPOLoss
     loss.DPOLoss
     loss.RSOLoss
-    loss.IPOLoss
     loss.SimPOLoss
diff --git a/docs/source/recipes/lora_finetune_single_device.rst b/docs/source/recipes/lora_finetune_single_device.rst
@@ -5,11 +5,10 @@ LoRA Single Device Finetuning
 =============================
 
 This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT)
-such as `LoRA <https://arxiv.org/abs/2106.09685>`_ and `QLoRA <https://arxiv.org/abs/2305.14314>`_.  These techniques
+such as :ref:`glossary_lora` and :ref:`glossary_qlora`. These techniques
 significantly reduce memory consumption during training whilst still maintaining competitive performance.
 
-We provide pre-tested out-of-the-box configs which you can get up and running with the latest `Llama models <https://llama.meta.com/>`_
-in just two steps:
+We provide configs which you can get up and running quickly. Here is an example with llama 3.1 8B:
 
 .. note::
 
@@ -19,44 +18,34 @@ in just two steps:
 
 .. code-block:: bash
 
+    # download the model
     tune download meta-llama/Meta-Llama-3.1-8B-Instruct \
     --output-dir /tmp/Meta-Llama-3.1-8B-Instruct \
     --ignore-patterns "original/consolidated.00.pth"
 
+    # run the recipe
     tune run lora_finetune_single_device \
     --config llama3_1/8B_lora_single_device
 
-You can quickly customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to,
-and the scale of the imapct of LoRA during training:
+You can customize this recipe through the :ref:`cli_label`. For example, when fine-tuning with LoRA, you can adjust the layers which LoRA are applied to:
 
 .. code-block:: bash
 
     tune run lora_finetune_single_device \
     --config llama3_1/8B_lora_single_device \
-    --model.lora_attn_modules=["q_proj", "k_proj", "v_proj"] \
-    --model.apply_lora_to_mlp=True \
-    --model.lora_rank=64 \
-    --model.lora_alpha=128
+    model.lora_attn_modules=“[q_proj,k_proj,v_proj]” \
+    model.apply_lora_to_mlp=True \
+    model.lora_rank=64 \
+    model.lora_alpha=128
 
-This configuration in particular results in a aggressive LoRA policy which
-will tradeoff higher training accuracy with increased memory usage and slower training.
 
 For a deeper understanding of the different levers you can pull when using this recipe,
 see our documentation for the different PEFT training paradigms we support:
 
 * :ref:`glossary_lora`
 * :ref:`glossary_qlora`
 
-Many of our other memory optimization features can be used in this recipe, too:
-
-* Adjust :ref:`model precision <glossary_precision>`.
-* Use :ref:`activation checkpointing <glossary_act_ckpt>`.
-* Enable :ref:`gradient accumulation <glossary_grad_accm>`.
-* Use :ref:`lower precision optimizers <glossary_low_precision_opt>`. However, note that since LoRA
-  significantly reduces memory usage due to gradient state, you will likely not need this
-  feature.
-
-You can learn more about all of our memory optimization features in our  :ref:`memory optimization overview<memory_optimization_overview_label>`.
+Many of our other memory optimization features can be used in this recipe. You can learn more about all of our memory optimization features in our :ref:`memory optimization overview<memory_optimization_overview_label>`.
 
 Interested in seeing this recipe in action? Check out some of our tutorials to show off how it can be used:
 

diff --git a/docs/source/recipes/recipes_overview.rst b/docs/source/recipes/recipes_overview.rst
@@ -19,18 +19,24 @@ Each recipe consists of three components:
   To learn more about the concept of "recipes", check out our technical deep-dive: :ref:`recipe_deepdive`.
 
 
-Supervised Finetuning
----------------------
+Finetuning
+----------
 
-torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
-using a variety of :ref:`memory optimization features <memory_optimization_overview_label>`. Our  fine-tuning recipes support all of our models and all our dataset types.
-This includes continued pre-training, and various supervised funetuning paradigms, which can be customized through our datasets. Check out our
-:ref:`dataset tutorial <dataset_tutorial_label>` for more information.
+Our recipes include:
 
-Our supervised fine-tuning recipes include:
+* :ref:`Single-device LoRA fine-tuning <lora_finetune_recipe_label>`.
+* Single-device full fine-tuning
+* Distributed full fine-tuning
+* Distributed LoRA fine-tuning
+* Direct Preference Optimization (DPO)
+* Proximal Policy Optimization (PPO)
+* :ref:`Distributed Quantization-Aware Training (QAT)<qat_distributed_recipe_label>`.
 
-* :ref:`Single-device <lora_finetune_recipe_label>` LoRA fine-tuning.
-* :ref:`Distributed Quantization-Aware Training<qat_distributed_recipe_label>`.
+For a full list, please run:
+
+.. code-block:: bash
+
+    tune ls
 
 .. Alignment finetuning
 .. --------------------
@@ -46,8 +52,5 @@ Our supervised fine-tuning recipes include:
 
 .. note::
 
-  Want to learn more about a certain recipe, but can't find the documentation here?
-  Not to worry! Our recipe documentation is currently in construction - come back soon
-  to see documentation of your favourite fine-tuning techniques. We'd love to support
-  your contributions if you're interested in helping out here. Check out our tracker
+  Our recipe documentation is currently in construction. Please feel free to follow the progress in our tracker
   issue `here <https://github.com/pytorch/torchtune/issues/1408>`_.