From 1b262167f39b5f454624180bf01947a7e2ba1d65 Mon Sep 17 00:00:00 2001 From: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue, 28 May 2024 11:13:44 +0200 Subject: [PATCH] Docs / LoRA: Add more information on `merge_and_unload` docs (#1805) * put back lora merging diagram * push * Update docs/source/developer_guides/lora.md Co-authored-by: Benjamin Bossan --------- Co-authored-by: Benjamin Bossan --- docs/source/developer_guides/lora.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/source/developer_guides/lora.md b/docs/source/developer_guides/lora.md index 680804f0d0..036fc2a4ca 100644 --- a/docs/source/developer_guides/lora.md +++ b/docs/source/developer_guides/lora.md @@ -140,10 +140,18 @@ Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a [Fewshot-Metamath-OrcaVicuna-Mistral-10B](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B) is an example of a model trained using this method on Mistral-7B expanded to 10B. The [adapter_config.json](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B/blob/main/adapter_config.json) shows a sample LoRA adapter config applying this method for fine-tuning. -## Merge adapters +## Merge LoRA weights into the base model While LoRA is significantly smaller and faster to train, you may encounter latency issues during inference due to separately loading the base model and the LoRA adapter. To eliminate latency, use the [`~LoraModel.merge_and_unload`] function to merge the adapter weights with the base model. This allows you to use the newly merged model as a standalone model. The [`~LoraModel.merge_and_unload`] function doesn't keep the adapter weights in memory. +Below is a diagram that explains the intuition of LoRA adapter merging: + +
+ +
+ +We show in the snippets below how to run that using PEFT. + ```py from transformers import AutoModelForCausalLM from peft import PeftModel