-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Feature][EPLB] Add support for Qwen3 EPLB #21290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
[Feature][EPLB] Add support for Qwen3 EPLB
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Expert Parallelism Load Balancing (EPLB) for the Qwen3-MoE model. The changes involve plumbing EPLB configurations through the model layers, updating weight loading logic to handle distributed experts, and implementing the MixtureOfExperts protocol. To fully implement the MixtureOfExperts protocol, the update_physical_experts_metadata method is required.
|
|
||
| class Qwen3MoeForCausalLM(nn.Module, SupportsPP, SupportsLoRA): | ||
| class Qwen3MoeForCausalLM(nn.Module, SupportsPP, | ||
| SupportsLoRA, MixtureOfExperts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Qwen3MoeForCausalLM class needs to implement the update_physical_experts_metadata method as part of the MixtureOfExperts protocol. This method is called by the EPLB scheduler during expert rebalancing, and its absence will lead to a runtime AttributeError.
def set_eplb_state(
self,
expert_load_view: Tensor,
logical_to_physical_map: Tensor,
logical_replica_count: Tensor,
) -> None:
for layer_idx, layer in enumerate(self.moe_layers):
self.expert_weights.append(layer.get_expert_weights())
layer.set_eplb_state(
moe_layer_idx=layer_idx,
expert_load_view=expert_load_view,
logical_to_physical_map=logical_to_physical_map,
logical_replica_count=logical_replica_count,
)
def update_physical_experts_metadata(
self,
num_physical_experts: int,
num_local_physical_experts: int,
) -> None:
self.num_physical_experts = num_physical_experts
self.num_local_physical_experts = num_local_physical_experts
for layer in self.model.layers:
if isinstance(layer, PPMissingLayer):
continue
if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock):
layer.mlp.n_physical_experts = num_physical_experts
layer.mlp.n_local_physical_experts = num_local_physical_expertsSigned-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
Qwen3 eplb
Signed-off-by: ycyaw66 <497410282@qq.com>
qwen3 eplb: fix format
Signed-off-by: ycyaw66 <497410282@qq.com>
qwen3 eplb: fix format
|
@DarkLight1337 @abmfy please review ,thanks |
|
Hi, thank you so much for the contribution! I just returned from traveling and will review this PR soon — thank you for your patience. That said, it appears that support for Qwen3 was already assigned to @aladerran in #20468, and their PR is now open at #20815. I believe it would be great to compare both implementations and look for opportunities to merge them together, so that we can land a unified version into main. Really appreciate your efforts — thank you again for the great work! |
abmfy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes seem similar to #20815. Would it make sense to join forces and merge that PR after testing? Thank you so much!
Signed-off-by: ycyaw66 <497410282@qq.com>
Signed-off-by: ycyaw66 <497410282@qq.com>
|
@abmfy please double check it again? previous two reviews are fixed |
|
we have tested locally, @abmfy do you have some time to double check? |
|
@DarkLight1337 could you help assign another reviewer? maybe WoosukKwon is so busy. thanks |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update tests/distributed/test_expert_parallel.py to test EPLB for each model? I don't have resources to verify the correctness of this locally
ok, I think test DeepSeek and Qwen3 first, @hsliuustc0106 @ycyaw66 . |
abmfy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested accuracy locally and the results look good.
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.9075 | ± | 0.0080 |
| strict-match | 5 | exact_match | ↑ | 0.9007 | ± | 0.0082 |
As mentioned earlier, please coordinate with @aladerran if you plan to add more tests, so we can avoid duplicating efforts.
Thanks again for the contribution!
|
This pull request has merge conflicts that must be resolved before it can be |
|
Any update? |
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
This pull request add support for qwen3 moe model EPLB feature, which helps to improve the overall thoughput during LLM Serving.
#20468
Test Plan
run the following command:
Test Result
(Optional) Documentation Update