-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[Model] use AutoWeightsLoader for solar #18113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] use AutoWeightsLoader for solar #18113
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
f1c836e to
d7e6366
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
d7e6366 to
b9c79ad
Compare
vllm/model_executor/models/olmo2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this won't affect too much, I think rotary_emb.inv_freq, rotary_emb.cos_cached and rotary_emb.sin_cached are not prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what good suggestions do you have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can filter out the weights to skip loading in advance before calling AutoWeightsLoader, just like what Phi-4-MM does:
vllm/vllm/model_executor/models/phi4mm.py
Lines 1241 to 1246 in 5418176
| def load_weights(self, weights: Iterable[tuple[str, | |
| torch.Tensor]]) -> None: | |
| weights = ((name, data) for name, data in weights | |
| if "lora" not in name) | |
| loader = AutoWeightsLoader(self) | |
| return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the situation you are talking about occur? Currently, I have tested that it works properly with prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this situation seldom occurred in most cases, because only some models checkpoint fine-tuned by ColossalAI may include these tensors, and the models you tested are likely not having these tensors.
['model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', ... ]
But it's still reasonable to make the weights loading more robust to include this rare case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, since the RoPE buffer tensors in checkpoint is not a bug issue. Let's merge this PR and leave it to be handled with other models' modified loading logic together in a following PR.
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
b9c79ad to
922302f
Compare
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Issue: #15697