-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Model] Add LoRA support for TransformersModel #13770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jeejeelee
merged 16 commits into
vllm-project:main
from
jeejeelee:transformers-model-support-lora
Mar 2, 2025
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
eee6340
Init
jeejeelee 48be219
Modify hf linear lora logic
jeejeelee bb47fa3
Merge branch 'vllm-project:main' into transformers-model-support-lora
jeejeelee 21622e0
Optimize logic
jeejeelee 6e43167
Merge branch 'vllm-project:main' into transformers-model-support-lora
jeejeelee fa92ccb
Merge branch 'vllm-project:main' into transformers-model-support-lora
jeejeelee b8aa842
Add unit test
jeejeelee 2005a3f
Fix
jeejeelee b744f77
fmt
jeejeelee 55a889a
Add doc
jeejeelee f945195
Done
jeejeelee 5b734ce
Done
jeejeelee 4951e8b
Backup
jeejeelee 02ac842
Fix
jeejeelee f0b94cc
Fix lora repo
jeejeelee 18a3056
Fix lora tp
jeejeelee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from typing import List | ||
|
|
||
| import pytest | ||
|
|
||
| import vllm | ||
| from tests.utils import fork_new_process_for_each_test | ||
| from vllm.lora.request import LoRARequest | ||
|
|
||
| from ..utils import multi_gpu_test | ||
|
|
||
| MODEL_PATH = "ArthurZ/ilama-3.2-1B" | ||
hmellor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| PROMPT_TEMPLATE = """I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n"\n##Instruction:\nconcert_singer contains tables such as stadium, singer, concert, singer_in_concert. Table stadium has columns such as Stadium_ID, Location, Name, Capacity, Highest, Lowest, Average. Stadium_ID is the primary key.\nTable singer has columns such as Singer_ID, Name, Country, Song_Name, Song_release_year, Age, Is_male. Singer_ID is the primary key.\nTable concert has columns such as concert_ID, concert_Name, Theme, Stadium_ID, Year. concert_ID is the primary key.\nTable singer_in_concert has columns such as concert_ID, Singer_ID. concert_ID is the primary key.\nThe Stadium_ID of concert is the foreign key of Stadium_ID of stadium.\nThe Singer_ID of singer_in_concert is the foreign key of Singer_ID of singer.\nThe concert_ID of singer_in_concert is the foreign key of concert_ID of concert.\n\n###Input:\n{query}\n\n###Response:""" # noqa: E501 | ||
|
|
||
| EXPECTED_LORA_OUTPUT = [ | ||
| "SELECT count(*) FROM singer", | ||
| "SELECT avg(age) , min(age) , max(age) FROM singer WHERE country = 'France'", # noqa: E501 | ||
| "SELECT DISTINCT Country FROM singer WHERE Age > 20", | ||
| ] | ||
|
|
||
|
|
||
| def do_sample(llm: vllm.LLM, lora_path: str, lora_id: int) -> List[str]: | ||
| prompts = [ | ||
| PROMPT_TEMPLATE.format(query="How many singers do we have?"), | ||
| PROMPT_TEMPLATE.format( | ||
| query= | ||
| "What is the average, minimum, and maximum age of all singers from France?" # noqa: E501 | ||
| ), | ||
| PROMPT_TEMPLATE.format( | ||
| query= | ||
| "What are all distinct countries where singers above age 20 are from?" # noqa: E501 | ||
| ), | ||
| ] | ||
| sampling_params = vllm.SamplingParams(temperature=0, max_tokens=32) | ||
| outputs = llm.generate( | ||
| prompts, | ||
| sampling_params, | ||
| lora_request=LoRARequest(str(lora_id), lora_id, lora_path) | ||
| if lora_id else None) | ||
| # Print the outputs. | ||
| generated_texts: List[str] = [] | ||
| for output in outputs: | ||
| prompt = output.prompt | ||
| generated_text = output.outputs[0].text.strip() | ||
| generated_texts.append(generated_text) | ||
| print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | ||
| return generated_texts | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def v1(run_with_both_engines_lora): | ||
| # Simple autouse wrapper to run both engines for each test | ||
| # This can be promoted up to conftest.py to run for every | ||
| # test in a package | ||
| pass | ||
|
|
||
|
|
||
| @pytest.mark.skip_v1 | ||
| @fork_new_process_for_each_test | ||
| def test_ilama_lora(ilama_lora_files): | ||
| llm = vllm.LLM(MODEL_PATH, | ||
| max_model_len=1024, | ||
| enable_lora=True, | ||
| max_loras=4, | ||
| max_lora_rank=16, | ||
| tensor_parallel_size=1, | ||
| trust_remote_code=True, | ||
| enable_chunked_prefill=True) | ||
|
|
||
| output1 = do_sample(llm, ilama_lora_files, lora_id=1) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output1[i] == EXPECTED_LORA_OUTPUT[i] | ||
| output2 = do_sample(llm, ilama_lora_files, lora_id=2) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output2[i] == EXPECTED_LORA_OUTPUT[i] | ||
|
|
||
|
|
||
| @pytest.mark.skip_v1 | ||
| @multi_gpu_test(num_gpus=4) | ||
| @fork_new_process_for_each_test | ||
| def test_ilama_lora_tp4(ilama_lora_files): | ||
| llm = vllm.LLM(MODEL_PATH, | ||
| max_model_len=1024, | ||
| enable_lora=True, | ||
| max_loras=4, | ||
| max_lora_rank=16, | ||
| tensor_parallel_size=4, | ||
| trust_remote_code=True, | ||
| fully_sharded_loras=False, | ||
| enable_chunked_prefill=True) | ||
|
|
||
| output1 = do_sample(llm, ilama_lora_files, lora_id=1) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output1[i] == EXPECTED_LORA_OUTPUT[i] | ||
| output2 = do_sample(llm, ilama_lora_files, lora_id=2) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output2[i] == EXPECTED_LORA_OUTPUT[i] | ||
|
|
||
|
|
||
| @pytest.mark.skip_v1 | ||
| @multi_gpu_test(num_gpus=4) | ||
| @fork_new_process_for_each_test | ||
| def test_ilama_lora_tp4_fully_sharded_loras(ilama_lora_files): | ||
| llm = vllm.LLM(MODEL_PATH, | ||
| max_model_len=1024, | ||
| enable_lora=True, | ||
| max_loras=4, | ||
| max_lora_rank=16, | ||
| tensor_parallel_size=4, | ||
| trust_remote_code=True, | ||
| fully_sharded_loras=True, | ||
| enable_chunked_prefill=True) | ||
| output1 = do_sample(llm, ilama_lora_files, lora_id=1) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output1[i] == EXPECTED_LORA_OUTPUT[i] | ||
| output2 = do_sample(llm, ilama_lora_files, lora_id=2) | ||
| for i in range(len(EXPECTED_LORA_OUTPUT)): | ||
| assert output2[i] == EXPECTED_LORA_OUTPUT[i] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.