Skip to content

Conversation

@noc-turne
Copy link
Contributor

@noc-turne noc-turne commented Mar 26, 2025

Previously, the load_weights method in adapters.py did not return the weights, which caused loaded_params to be None in some cases — for example, when using the Qwen2ForEmbedding model (see vllm/models/utils.py#L171-L177).

This PR fixes the issue by properly returning the result of load_weights, and ensures that the prefix matches the expected format for external usage.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: noc-turne <2270929247@qq.com>
@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 27, 2025

This is intentional - if we return the loaded weights, there may be some mismatch compared to the expected weights (leading to an error) because we removed some unused modules in the adapter class .

@noc-turne
Copy link
Contributor Author

But now when deploying a multimodal model—in my case, InternVLChatModel—the adaptor's load_weights function not returning a value causes the language model's parameters to fail to load(Qwen2ForEmbedding), which in turn leads to deployment failure.

@DarkLight1337
Copy link
Member

Hmm, to work in all cases I think we may need to keep the unnecessary modules then. Can you remove the code where lm_head etc. is being removed? Then it should be ok then return the weights.

@noc-turne
Copy link
Contributor Author

No, it doesnt' work. If removing the code about lm_head, it will be reported the error as

File "/vllm/vllm/model_executor/models/adapters.py", line 105, in load_weights
ERROR 03-27 07:43:20 [engine.py:448]     self.model.load_weights(weights)
ERROR 03-27 07:43:20 [engine.py:448]   File "vllm/vllm/model_executor/models/qwen2.py", line 400, in load_weights
ERROR 03-27 07:43:20 [engine.py:448]     param = params_dict[name]
ERROR 03-27 07:43:20 [engine.py:448] KeyError: 'lm_head.weight'

And the original code(not returning the weights) will throw an error:

File "vllm/vllm/model_executor/model_loader/loader.py", line 448, in load_model
    raise ValueError(
ValueError: Following weights were not initialized from checkpoint: {
    'language_model.model.layers.16.mlp.down_proj.weight',
    'language_model.model.layers.22.input_layernorm.weight',
    'language_model.model.layers.27.mlp.down_proj.weight',
    'language_model.model.layers.8.input_layernorm.weight',
    'language_model.model.layers.18.mlp.gate_up_proj.weight',
    ...
}

However, when returning its weight and adapting its prefix, Qwen2ForEmbedding can be deployed normally.

@DarkLight1337
Copy link
Member

Let me unblock some tests and see if the existing embedding models can still be loaded under your PR.

@DarkLight1337
Copy link
Member

It looks like other tests are failing in this CI run. Can you try to run tests/models/embedding/language locally and report the results?

@noc-turne
Copy link
Contributor Author

It’s too hard for me to download so many models locally to run this test. If others haven’t encountered a similar issue, then just forget it for now and close the PR.

@DarkLight1337
Copy link
Member

I'll run this in the background then and report back the results

@noc-turne
Copy link
Contributor Author

Thanks!

@DarkLight1337
Copy link
Member

Ok it seems the embedding models can indeed be loaded, perhaps you somehow patched the problem from before as well. In that case, thanks!

@vllm-bot vllm-bot merged commit 037bcd9 into vllm-project:main Mar 31, 2025
15 of 17 checks passed
Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025
…py (vllm-project#15542)

Signed-off-by: noc-turne <2270929247@qq.com>
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…py (vllm-project#15542)

Signed-off-by: noc-turne <2270929247@qq.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…py (vllm-project#15542)

Signed-off-by: noc-turne <2270929247@qq.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants