Docs / LoRA: Add more information on merge_and_unload docs (#1805)

* put back lora merging diagram * push * Update docs/source/developer_guides/lora.md Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
huggingface · May 28, 2024 · 1b26216 · 1b26216
1 parent 39c60ff
commit 1b26216
Showing 1 changed file with 9 additions and 1 deletion.
diff --git a/docs/source/developer_guides/lora.md b/docs/source/developer_guides/lora.md
@@ -140,10 +140,18 @@ Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a
 [Fewshot-Metamath-OrcaVicuna-Mistral-10B](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B) is an example of a model trained using this method on Mistral-7B expanded to 10B. The
 [adapter_config.json](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B/blob/main/adapter_config.json) shows a sample LoRA adapter config applying this method for fine-tuning.
 
-## Merge adapters
+## Merge LoRA weights into the base model
 
 While LoRA is significantly smaller and faster to train, you may encounter latency issues during inference due to separately loading the base model and the LoRA adapter. To eliminate latency, use the [`~LoraModel.merge_and_unload`] function to merge the adapter weights with the base model. This allows you to use the newly merged model as a standalone model. The [`~LoraModel.merge_and_unload`] function doesn't keep the adapter weights in memory.
 
+Below is a diagram that explains the intuition of LoRA adapter merging:
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_diagram.png"/>
+</div>
+
+We show in the snippets below how to run that using PEFT.
+
 ```py
 from transformers import AutoModelForCausalLM
 from peft import PeftModel