-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix] Fix missing return value in load_weights method of adapters.py #15542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: noc-turne <2270929247@qq.com>
|
This is intentional - if we return the loaded weights, there may be some mismatch compared to the expected weights (leading to an error) because we removed some unused modules in the adapter class . |
|
But now when deploying a multimodal model—in my case, InternVLChatModel—the |
|
Hmm, to work in all cases I think we may need to keep the unnecessary modules then. Can you remove the code where |
|
No, it doesnt' work. If removing the code about lm_head, it will be reported the error as And the original code(not returning the weights) will throw an error: However, when returning its weight and adapting its prefix, Qwen2ForEmbedding can be deployed normally. |
|
Let me unblock some tests and see if the existing embedding models can still be loaded under your PR. |
|
It looks like other tests are failing in this CI run. Can you try to run |
|
It’s too hard for me to download so many models locally to run this test. If others haven’t encountered a similar issue, then just forget it for now and close the PR. |
|
I'll run this in the background then and report back the results |
|
Thanks! |
|
Ok it seems the embedding models can indeed be loaded, perhaps you somehow patched the problem from before as well. In that case, thanks! |
…py (vllm-project#15542) Signed-off-by: noc-turne <2270929247@qq.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
…py (vllm-project#15542) Signed-off-by: noc-turne <2270929247@qq.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
…py (vllm-project#15542) Signed-off-by: noc-turne <2270929247@qq.com>
…py (vllm-project#15542) Signed-off-by: noc-turne <2270929247@qq.com>
…py (vllm-project#15542) Signed-off-by: noc-turne <2270929247@qq.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Previously, the
load_weightsmethod inadapters.pydid not return theweights, which causedloaded_paramsto beNonein some cases — for example, when using theQwen2ForEmbeddingmodel (see vllm/models/utils.py#L171-L177).This PR fixes the issue by properly returning the result of
load_weights, and ensures that the prefix matches the expected format for external usage.