Allow Peft models to share their base model #1905

fozziethebeat · 2023-07-09T10:07:26Z

Why are these changes needed?

This adds a special environment variable that activates shared Peft model base weights. Currently when loading two Peft models that have the same base model, those model weights are loaded once. With this flag activated, all Peft models will share the same base model.

To make this work it requires a few work around due to how Huggingface's Peft model has implemented LoRA adapters, the most popular variant. These modify the base model's pytorch modules directly and thus adapters sharing the same base model must live within the same model object and a set_adapter method must be called to switch between them.

Related issue number (if applicable)

Expands #1805

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

BabyChouSr · 2023-07-09T19:14:43Z

This is pretty cool! I tested this using the following script, and it seems to work well!

I served multiple LoRAs using the following script:

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /data/chris/peft-llama-dummy-1 \
    --model-names peft-dummy-1 \
    --model-path /data/chris/peft-llama-dummy-2 \
    --model-names peft-dummy-2 \
    --model-path /data/chris/peft-llama-dummy-3 \
    --model-names peft-dummy-3 \
    --num-gpus 2

Looking at the GPU utilization, we only load the base model once, so in my case, we only load llama-7b once so ~14GB VRAM, which is what we expect.

fozziethebeat · 2023-07-09T23:36:55Z

thank goodness it works for someone else. I've submitted a few too many things that didn't 100% work. I have it running as well on GCP with two modes and they give distinct results and fit within VRAM as expected.

Ying1123

LGTM. Thanks!

docs/model_support.md

Co-authored-by: Ying Sheng <sqy1415@gmail.com>

fozziethebeat marked this pull request as ready for review July 9, 2023 10:15

Ying1123 self-assigned this Jul 9, 2023

fozziethebeat added 3 commits July 10, 2023 11:23

Do some inference and loading magic to share peft model weights

15f8fb5

Some variable name improvements

9f64991

Adding some explanatory docs about shared peft model weights

06001fd

Ying1123 approved these changes Jul 10, 2023

View reviewed changes

docs/model_support.md Outdated Show resolved Hide resolved

Update docs/model_support.md

47da319

Co-authored-by: Ying Sheng <sqy1415@gmail.com>

Ying1123 merged commit 34c585c into lm-sys:main Jul 10, 2023

Ying1123 mentioned this pull request Jul 16, 2023

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

Closed

sam-h-bean mentioned this pull request Jul 31, 2023

Would it be possible to support LoRA fine-tuned models? vllm-project/vllm#182

Closed

bmanturner mentioned this pull request Aug 3, 2023

Tracking: LoRA ggerganov/llama.cpp#964

Closed

7 tasks

glide-the mentioned this pull request Sep 13, 2023

[FEATURE] 简洁阐述功能 / Concise description of the feature chatchat-space/Langchain-Chatchat#1465

Closed

This was referenced Nov 9, 2023

调用lora时能否同时调用多个lora chatchat-space/Langchain-Chatchat#2001

Closed

相同底座的lora模型切换时不需要重新加载底座 chatchat-space/Langchain-Chatchat#2042

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Peft models to share their base model #1905

Allow Peft models to share their base model #1905

fozziethebeat commented Jul 9, 2023

BabyChouSr commented Jul 9, 2023

fozziethebeat commented Jul 9, 2023

Ying1123 left a comment

Allow Peft models to share their base model #1905

Allow Peft models to share their base model #1905

Conversation

fozziethebeat commented Jul 9, 2023

Why are these changes needed?

Related issue number (if applicable)

Checks

BabyChouSr commented Jul 9, 2023

fozziethebeat commented Jul 9, 2023

Ying1123 left a comment

Choose a reason for hiding this comment