From bfe2d2f41dad53d950112b866c6b4cc41b81821b Mon Sep 17 00:00:00 2001
From: MKhalusova <kafooster@gmail.com>
Date: Tue, 18 Apr 2023 08:17:35 -0400
Subject: [PATCH 1/5] WIP LoRA conceptual guide

---
 docs/source/conceptual_guides/lora.mdx | 50 ++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 docs/source/conceptual_guides/lora.mdx

diff --git a/docs/source/conceptual_guides/lora.mdx b/docs/source/conceptual_guides/lora.mdx
new file mode 100644
index 0000000000..a36ee8dab2
--- /dev/null
+++ b/docs/source/conceptual_guides/lora.mdx
@@ -0,0 +1,50 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LoRA 
+
+This conceptual guide gives a brief overview of [LoRA](https://arxiv.org/abs/2106.09685), a technique that accelerates 
+the fine-tuning of large models while consuming less memory. 
+
+To make the fine-tuning more efficient, the original model's weight matrix is represented with two smaller 
+matrices (called **update matrices**) through low-rank decomposition. These new matrices can be trained to adapt to the 
+new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive 
+any further adjustments. To produce the final results, both the original and the adapted weights are combined.
+
+This approach has a number of advantages: 
+
+* LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters 
+* The original pre-trained weights are kept frozen, and you can have many lightweight and portable LoRA models for various downstream tasks built on top of them
+* LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them  
+
+In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable 
+parameters. However, for simplicity and further parameter efficiency, in Transformer models LoRA is typically applied to 
+attention blocks only. The number of trainable parameters in a LoRA model depends on the size of the low-rank update 
+matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix. 
+
+
+
+## Common LoRA parameters in PEFT
+
+- `r`: the rank of the update matrices, expressed in `int`. Lower rank results in smaller update matrices with fewer trainable parameters.
+- `target_modules`: The modules to use as the base build LoRA update matrices. E.g. attention blocks. 
+- `alpha`: LoRA scaling factor.
+- `bias`: Specifies if the `bias` parameters should be trained. Can be `'none'`, `'all'` or `'lora_only'`.
+- `modules_to_save`: List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.
+
+## LoRA examples
+
+Image classification
+Semantic segmentation
+
+While the original paper focuses on language models, the technique can be applied to any dense layers in deep learning 
+models. As such, you can also apply this technique to diffusion models.   

From 47bf02395eb55d8d0681575ae20028b74e45fcc1 Mon Sep 17 00:00:00 2001
From: MKhalusova <kafooster@gmail.com>
Date: Tue, 18 Apr 2023 13:40:26 -0400
Subject: [PATCH 2/5] conceptual guide for LoRA

---
 docs/source/_toctree.yml               |  5 +++++
 docs/source/conceptual_guides/lora.mdx | 29 +++++++++++++++++---------
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 8090d858cd..ac20d9b1c1 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -20,6 +20,11 @@
   - local: task_guides/ptuning-seq-classification
     title: P-tuning for sequence classification
 
+- title: Conceptual guides
+  sections:
+  - local: conceptual_guides/lora
+    title: LoRA
+
 - title: Reference
   sections:
   - local: package_reference/peft_model
diff --git a/docs/source/conceptual_guides/lora.mdx b/docs/source/conceptual_guides/lora.mdx
index a36ee8dab2..afb33734f1 100644
--- a/docs/source/conceptual_guides/lora.mdx
+++ b/docs/source/conceptual_guides/lora.mdx
@@ -15,25 +15,32 @@ specific language governing permissions and limitations under the License.
 This conceptual guide gives a brief overview of [LoRA](https://arxiv.org/abs/2106.09685), a technique that accelerates 
 the fine-tuning of large models while consuming less memory. 
 
-To make the fine-tuning more efficient, the original model's weight matrix is represented with two smaller 
+To make the fine-tuning more efficient, LoRA's approach is to represent the original model's weight matrix with two smaller 
 matrices (called **update matrices**) through low-rank decomposition. These new matrices can be trained to adapt to the 
 new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive 
 any further adjustments. To produce the final results, both the original and the adapted weights are combined.
 
 This approach has a number of advantages: 
 
-* LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters 
-* The original pre-trained weights are kept frozen, and you can have many lightweight and portable LoRA models for various downstream tasks built on top of them
-* LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them  
+* LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
+* The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
+* LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them.
 
 In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable 
 parameters. However, for simplicity and further parameter efficiency, in Transformer models LoRA is typically applied to 
-attention blocks only. The number of trainable parameters in a LoRA model depends on the size of the low-rank update 
-matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix. 
+attention blocks only. The resulting number of trainable parameters in a LoRA model depends on the size of the low-rank 
+update matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix.
 
+## Common LoRA parameters in PEFT
 
+As with other methods supported by PEFT, to fine-tune a model using LoRA, you need to:
 
-## Common LoRA parameters in PEFT
+1. Instantiate a base model.
+2. Create a configuration (`LoraConfig`) where you define LoRA-specific parameters.
+3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
+4. Train the `PeftModel` as you normally would train the base model.
+
+`LoraConfig` allows you to control how LoRA is applied to the base model through the following parameters: 
 
 - `r`: the rank of the update matrices, expressed in `int`. Lower rank results in smaller update matrices with fewer trainable parameters.
 - `target_modules`: The modules to use as the base build LoRA update matrices. E.g. attention blocks. 
@@ -43,8 +50,10 @@ matrices, which is determined mainly by the rank `r` and the shape of the origin
 
 ## LoRA examples
 
-Image classification
-Semantic segmentation
+For an example of LoRA method application to various downstream tasks, please refer to the following guides:
+
+* [Image classification using LoRA](../task_guides/image_classification_lora)
+* [Semantic segmentation](../task_guides/semantic_segmentation_lora)
 
 While the original paper focuses on language models, the technique can be applied to any dense layers in deep learning 
-models. As such, you can also apply this technique to diffusion models.   
+models. As such, you can leverage this technique with diffusion models. See [Dreambooth fine-tuning with LoRA](../task_guides/task_guides/dreambooth_lora) task guide for an example.

From 04bbff62104369e47dd88a3da8c4a8c1321bb80c Mon Sep 17 00:00:00 2001
From: Maria Khalusova <kafooster@gmail.com>
Date: Wed, 19 Apr 2023 08:35:18 -0400
Subject: [PATCH 3/5] Update docs/source/conceptual_guides/lora.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/conceptual_guides/lora.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/conceptual_guides/lora.mdx b/docs/source/conceptual_guides/lora.mdx
index afb33734f1..7e080190f8 100644
--- a/docs/source/conceptual_guides/lora.mdx
+++ b/docs/source/conceptual_guides/lora.mdx
@@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
 This conceptual guide gives a brief overview of [LoRA](https://arxiv.org/abs/2106.09685), a technique that accelerates 
 the fine-tuning of large models while consuming less memory. 
 
-To make the fine-tuning more efficient, LoRA's approach is to represent the original model's weight matrix with two smaller 
+To make fine-tuning more efficient, LoRA's approach is to represent the original model's weight matrix with two smaller 
 matrices (called **update matrices**) through low-rank decomposition. These new matrices can be trained to adapt to the 
 new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive 
 any further adjustments. To produce the final results, both the original and the adapted weights are combined.

From 1dd201a7207e1e3730a13073bd06e0227c30c2fe Mon Sep 17 00:00:00 2001
From: Maria Khalusova <kafooster@gmail.com>
Date: Wed, 19 Apr 2023 08:35:34 -0400
Subject: [PATCH 4/5] Update docs/source/conceptual_guides/lora.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/conceptual_guides/lora.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/conceptual_guides/lora.mdx b/docs/source/conceptual_guides/lora.mdx
index 7e080190f8..4b191b909b 100644
--- a/docs/source/conceptual_guides/lora.mdx
+++ b/docs/source/conceptual_guides/lora.mdx
@@ -43,7 +43,7 @@ As with other methods supported by PEFT, to fine-tune a model using LoRA, you ne
 `LoraConfig` allows you to control how LoRA is applied to the base model through the following parameters: 
 
 - `r`: the rank of the update matrices, expressed in `int`. Lower rank results in smaller update matrices with fewer trainable parameters.
-- `target_modules`: The modules to use as the base build LoRA update matrices. E.g. attention blocks. 
+- `target_modules`: The modules (for example, attention blocks) to apply the LoRA update matrices.
 - `alpha`: LoRA scaling factor.
 - `bias`: Specifies if the `bias` parameters should be trained. Can be `'none'`, `'all'` or `'lora_only'`.
 - `modules_to_save`: List of modules apart from LoRA layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.

From ab025448e8a36cf9f5e4f07ab1fad5df49bb71ad Mon Sep 17 00:00:00 2001
From: MKhalusova <kafooster@gmail.com>
Date: Wed, 19 Apr 2023 08:40:46 -0400
Subject: [PATCH 5/5] feedback addressed

---
 docs/source/conceptual_guides/lora.mdx | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/source/conceptual_guides/lora.mdx b/docs/source/conceptual_guides/lora.mdx
index 4b191b909b..5b18303b9e 100644
--- a/docs/source/conceptual_guides/lora.mdx
+++ b/docs/source/conceptual_guides/lora.mdx
@@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
 This conceptual guide gives a brief overview of [LoRA](https://arxiv.org/abs/2106.09685), a technique that accelerates 
 the fine-tuning of large models while consuming less memory. 
 
-To make fine-tuning more efficient, LoRA's approach is to represent the original model's weight matrix with two smaller 
+To make fine-tuning more efficient, LoRA's approach is to represent the weight updates with two smaller 
 matrices (called **update matrices**) through low-rank decomposition. These new matrices can be trained to adapt to the 
 new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive 
 any further adjustments. To produce the final results, both the original and the adapted weights are combined.
@@ -25,6 +25,8 @@ This approach has a number of advantages:
 * LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
 * The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
 * LoRA is orthogonal to many other parameter-efficient methods and can be combined with many of them.
+* Performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
+* LoRA does not add any inference latency because adapter weights can be merged with the base model.
 
 In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable 
 parameters. However, for simplicity and further parameter efficiency, in Transformer models LoRA is typically applied to