[Feature][EPLB] Add support for Qwen3 EPLB #21290

hsliuustc · 2025-07-21T08:31:18Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This pull request add support for qwen3 moe model EPLB feature, which helps to improve the overall thoughput during LLM Serving.

#20468

Test Plan

import json
import os
import argparse
from vllm import LLM, SamplingParams

prompt = "Explain the theory of relativity in simple terms."

RESULT_FILE = "eplb_test_output.json"

sampling_params = SamplingParams(
    temperature=0.0,
    top_p=1.0,
    top_k=1,
    max_tokens=100
)

def run_inference(model_path: str, enable_eplb: bool, num_redundant_experts: int = 0):
    print(f"Running inference with EPLB={enable_eplb}, redundant experts={num_redundant_experts}")
    
    llm = LLM(
        model=model_path,
        tensor_parallel_size=4,
        enable_expert_parallel=True,
        enable_eplb=enable_eplb,
        num_redundant_experts=num_redundant_experts if enable_eplb else 0,
        eplb_window_size=1000,
        eplb_step_interval=100,
        enforce_eager=True,
        trust_remote_code=True
    )
    
    result = llm.generate([prompt], sampling_params)
    output_text = result[0].outputs[0].text.strip()
    
    print("Output:")
    print(output_text)
    print("-" * 50)

    return output_text

def save_result(key: str, value: list):
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            results = json.load(f)
    else:
        results = {}

    results[key] = value

    with open(RESULT_FILE, "w") as f:
        json.dump(results, f, indent=2)

    print(f"Output saved to {RESULT_FILE}")

def load_results():
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            return json.load(f)
    return {}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", type=str, choices=["eplb", "normal", "compare"], required=True)
    args = parser.parse_args()

    MODEL_PATH = "/workspace/models/Qwen3-30B-A3B-FP8"

    if args.mode == "eplb":
        outputs = run_inference(MODEL_PATH, enable_eplb=True, num_redundant_experts=32)
        save_result("eplb", outputs)
    elif args.mode == "normal":
        outputs = run_inference(MODEL_PATH, enable_eplb=False)
        save_result("normal", outputs)

run the following command:

python eplb_test.py --mode eplb

Test Result

(Optional) Documentation Update

Signed-off-by: ycyaw66 <497410282@qq.com>

[Feature][EPLB] Add support for Qwen3 EPLB

github-actions · 2025-07-21T08:31:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds support for Expert Parallelism Load Balancing (EPLB) for the Qwen3-MoE model. The changes involve plumbing EPLB configurations through the model layers, updating weight loading logic to handle distributed experts, and implementing the MixtureOfExperts protocol. To fully implement the MixtureOfExperts protocol, the update_physical_experts_metadata method is required.

gemini-code-assist · 2025-07-21T08:32:53Z

vllm/model_executor/models/qwen3_moe.py


-class Qwen3MoeForCausalLM(nn.Module, SupportsPP, SupportsLoRA):
+class Qwen3MoeForCausalLM(nn.Module, SupportsPP, 
+                          SupportsLoRA, MixtureOfExperts):


The Qwen3MoeForCausalLM class needs to implement the update_physical_experts_metadata method as part of the MixtureOfExperts protocol. This method is called by the EPLB scheduler during expert rebalancing, and its absence will lead to a runtime AttributeError.

def set_eplb_state( self, expert_load_view: Tensor, logical_to_physical_map: Tensor, logical_replica_count: Tensor, ) -> None: for layer_idx, layer in enumerate(self.moe_layers): self.expert_weights.append(layer.get_expert_weights()) layer.set_eplb_state( moe_layer_idx=layer_idx, expert_load_view=expert_load_view, logical_to_physical_map=logical_to_physical_map, logical_replica_count=logical_replica_count, ) def update_physical_experts_metadata( self, num_physical_experts: int, num_local_physical_experts: int, ) -> None: self.num_physical_experts = num_physical_experts self.num_local_physical_experts = num_local_physical_experts for layer in self.model.layers: if isinstance(layer, PPMissingLayer): continue if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock): layer.mlp.n_physical_experts = num_physical_experts layer.mlp.n_local_physical_experts = num_local_physical_experts

Signed-off-by: ycyaw66 <497410282@qq.com>

Qwen3 eplb

Signed-off-by: ycyaw66 <497410282@qq.com>

qwen3 eplb: fix format

Signed-off-by: ycyaw66 <497410282@qq.com>

qwen3 eplb: fix format

david6666666 · 2025-07-23T02:48:19Z

@DarkLight1337 @abmfy please review ,thanks

abmfy · 2025-07-23T15:39:19Z

Hi, thank you so much for the contribution! I just returned from traveling and will review this PR soon — thank you for your patience.

That said, it appears that support for Qwen3 was already assigned to @aladerran in #20468, and their PR is now open at #20815. I believe it would be great to compare both implementations and look for opportunities to merge them together, so that we can land a unified version into main.

Really appreciate your efforts — thank you again for the great work!

abmfy

These changes seem similar to #20815. Would it make sense to join forces and merge that PR after testing? Thank you so much!

vllm/model_executor/models/qwen3_moe.py

Signed-off-by: ycyaw66 <497410282@qq.com>

qwen3 eplb

hsliuustc0106 · 2025-07-25T04:16:52Z

@abmfy please double check it again? previous two reviews are fixed

hsliuustc0106 · 2025-07-28T03:35:01Z

we have tested locally, @abmfy do you have some time to double check?

david6666666 · 2025-07-29T07:20:42Z

@DarkLight1337 could you help assign another reviewer? maybe WoosukKwon is so busy. thanks

DarkLight1337

Can you update tests/distributed/test_expert_parallel.py to test EPLB for each model? I don't have resources to verify the correctness of this locally

david6666666 · 2025-07-29T08:06:10Z

Can you update tests/distributed/test_expert_parallel.py to test EPLB for each model? I don't have resources to verify the correctness of this locally

ok, I think test DeepSeek and Qwen3 first, @hsliuustc0106 @ycyaw66 .

abmfy

LGTM. I tested accuracy locally and the results look good.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9075	±	0.0080
		strict-match	5	exact_match	↑	0.9007	±	0.0082

As mentioned earlier, please coordinate with @aladerran if you plan to add more tests, so we can avoid duplicating efforts.

Thanks again for the contribution!

mergify · 2025-07-30T14:32:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hsliuustc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-08-04T16:43:48Z

Any update?

david6666666 · 2025-08-05T01:07:32Z

Any update?

Sorry, we haven't had time to add the test yet, we are focusing on #22167 and #22179

CarrotShoo and others added 3 commits July 21, 2025 14:58

[Feature][EPLB] Add support for Qwen3 EPLB

aab43ad

Signed-off-by: ycyaw66 <497410282@qq.com>

fix some bugs

cdeb933

Signed-off-by: ycyaw66 <497410282@qq.com>

Merge pull request #5 from ycyaw66/qwen3-eplb

1f19d39

[Feature][EPLB] Add support for Qwen3 EPLB

mergify bot added the qwen Related to Qwen models label Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

CarrotShoo and others added 7 commits July 21, 2025 16:54

add dummy implement of new feature

cad5801

Signed-off-by: ycyaw66 <497410282@qq.com>

fix format

01d4fbb

Signed-off-by: ycyaw66 <497410282@qq.com>

fix precommit

37ebbbc

Qwen3 eplb

fix format

d01ab86

Signed-off-by: ycyaw66 <497410282@qq.com>

fix precommit

3af492f

qwen3 eplb: fix format

fix format

bd10c8d

Signed-off-by: ycyaw66 <497410282@qq.com>

fix format

3e44b68

qwen3 eplb: fix format

DarkLight1337 requested a review from WoosukKwon July 23, 2025 03:37

abmfy suggested changes Jul 24, 2025

View reviewed changes

vllm/model_executor/models/qwen3_moe.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen3_moe.py Outdated Show resolved Hide resolved

CarrotShoo and others added 3 commits July 24, 2025 21:58

implement update_physical_experts_metadata()

2dce1e3

Signed-off-by: ycyaw66 <497410282@qq.com>

remove assertion

ec7e47e

Signed-off-by: ycyaw66 <497410282@qq.com>

implement update_physical_experts_metadata() & change named_mapped

637686b

qwen3 eplb

DarkLight1337 reviewed Jul 29, 2025

View reviewed changes

abmfy approved these changes Jul 29, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Jul 29, 2025

[Feature][EPLB] Add eplb support for Qwen3 #20815

Merged

4 tasks

mergify bot added the needs-rebase label Jul 30, 2025

robertgshaw2-redhat added the eplb label Sep 16, 2025

Uh oh!

[Feature][EPLB] Add support for Qwen3 EPLB #21290

Are you sure you want to change the base?

[Feature][EPLB] Add support for Qwen3 EPLB #21290

Uh oh!

Conversation

hsliuustc commented Jul 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jul 23, 2025

Uh oh!

abmfy commented Jul 23, 2025

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Jul 25, 2025

Uh oh!

hsliuustc0106 commented Jul 28, 2025

Uh oh!

david6666666 commented Jul 29, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jul 30, 2025

Uh oh!

DarkLight1337 commented Aug 4, 2025

Uh oh!

david6666666 commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hsliuustc commented Jul 21, 2025 •

edited by github-actions bot

Loading

david6666666 commented Jul 29, 2025 •

edited

Loading