[Perf][MoE] Improve shared experts multi-stream for w8a8 dynamic. #1561

whx-sjtu · 2025-07-01T09:17:48Z

This PR designs the shared expert multi-stream parallelism of w8a8-dynamic-quantized MoE stage in more detail to achieve better performance.
Current multi-stream parallel for shared experts are shown in following pic:

Performance change：
Before：

After：

Signed-off-by: whx-sjtu <2952154980@qq.com>

ttanzhiqiang · 2025-08-10T11:02:16Z

vllm_ascend/quantization/w8a8_dynamic.py

                layer.w2_weight.data, ACL_FORMAT_FRACTAL_NZ)
        layer.w13_weight_scale.data = layer.w13_weight_scale.data.view(
            layer.w13_weight_scale.data.shape[0], -1)
+        layer.w13_weight_scale_fp32 = layer.w13_weight_scale.data.to(


Why does w13_weight_scale need to be converted to fp32?

github-actions bot added module:ops module:quantization labels Jul 1, 2025

whx-sjtu force-pushed the moe_ms_091 branch from 832b5dd to 251b608 Compare July 2, 2025 08:15

import moe multi-stream

09b70e8

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the moe_ms_091 branch from 251b608 to fff56de Compare July 2, 2025 08:20

fix lint

05bef4b

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the moe_ms_091 branch from fff56de to 05bef4b Compare July 2, 2025 08:35

ganyi1996ppo approved these changes Jul 3, 2025

View reviewed changes

ganyi1996ppo merged commit 65909b2 into vllm-project:v0.9.1-dev Jul 3, 2025
16 checks passed

Yikun added the no-main label Jul 14, 2025

whx-sjtu mentioned this pull request Jul 19, 2025

[Perf][MoE] Improve MoE multistream parallel performace. #1891

Merged

ttanzhiqiang reviewed Aug 10, 2025

View reviewed changes

whx-sjtu deleted the moe_ms_091 branch October 20, 2025 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf][MoE] Improve shared experts multi-stream for w8a8 dynamic. #1561

[Perf][MoE] Improve shared experts multi-stream for w8a8 dynamic. #1561

Uh oh!

whx-sjtu commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

ttanzhiqiang Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Perf][MoE] Improve shared experts multi-stream for w8a8 dynamic. #1561

[Perf][MoE] Improve shared experts multi-stream for w8a8 dynamic. #1561

Uh oh!

Conversation

whx-sjtu commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ttanzhiqiang Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

whx-sjtu commented Jul 1, 2025 •

edited

Loading