-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support dynamically loading Lora adapter from HuggingFace #6234
Conversation
944a65c
to
bcc22eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine, some nits. A little worried about the performance of this but since this is optional it should be ok.
Let's add a test? There's a lora on HF hub we use for testing, we can just use that (it's in one of the conftest.py files)
vllm/lora/request.py
Outdated
@@ -18,7 +18,7 @@ class LoRARequest: | |||
|
|||
lora_name: str | |||
lora_int_id: int | |||
lora_local_path: str | |||
lora_path: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep lora_local_path
as a deprecated alias for lora_path
? let's warn people when it's used so they update, but ideally we should avoid API breakage here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just notice this is the user facing api if users use vllm server instead of openai server. Sounds good. I will add deprecation notice there.
bcc22eb
to
5987d94
Compare
@Yard1 You mean add unit test? or some other integration or e2e test? I am new to conftest.py and I will check its details. Update 20240709: Seems
@Yard1 Do you have preferred ways? |
b2792d4
to
89edb62
Compare
@Jeffwan let's add a separate test (or parameterize the existing one) so we test both paths (loading from local path and loading from repository) |
89edb62
to
031d7b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests looks good. let's address https://github.com/vllm-project/vllm/pull/6234/files#r1669808107 and we can merge
Sounds good. Let me update the PR to include this change. |
@Yard1 I added the deprecation notice into the default behaviorexplicitly use deprecated fieldrequired field missing |
cb267b8
to
61a6cff
Compare
@Yard1 Should I wait for upstream to fix the document build issue? |
The error has been fixed so you can merge |
e082ebc
to
d8e5a11
Compare
@Yard1 Can you help check the failed test? I am not that sure whether it's related to this change. The test suite pass earlier. I didn't see this test is kicked off in other PRs |
Yeah it's a bit weird. Can you merge master to retry CI? |
d8e5a11
to
240e1e6
Compare
Format the code and pass the linter
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
240e1e6
to
0ea0d0d
Compare
Looks like it still has that error. I am running a 4-gpu node and run |
master branch test passMaster branch
PR branch failsPR branch
Seems the lora is 16k context but the base is 4096. In that case, seems it generates nothing and then it failed the evaluation. After the investigation, I find the problem. I introduce the new field in vllm/tests/lora/test_long_context.py Lines 31 to 33 in e76466d
|
56daa6c
to
26a22d5
Compare
The problem comes from new added field and the test didn't explicitly set the field but using position information.
26a22d5
to
ba2a04d
Compare
@Yard1 Please have another look. The previous failure has been fixed and all tests pass now |
@Yard1 Do you have further comments or suggestion on this PR? I have some other WIP PR need this change. If this can be merged, that definitely simplify the rebase work. Thanks |
Merged, thanks! |
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
@ZXTFINAL it's supposed to be model agnostic. Do you mind cutting an issue with more details? Please cc me and I can help debug it. |
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
We are testing this branch, and things seem to work well for us. It would be great to see this merged with main in the near future. |
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…m-project#6234) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
This PR enhances the flexibility of LoRa adapter artifact locations. It allows users to specify the location using either a relative path or a Hugging Face model id.
part of #6275
expanduser
and resolve to an absolute path. Resolve [Bug]: relative path doesn't work for Lora adapter model #6231 andBEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!