[Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init #2348

MengqingCao · 2025-08-13T06:22:52Z

What this PR does / why we need it?

Add the missing apply_router_weight_on_input in FusedMoE init
Quick fix on #2268 (comment)

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@6807af8

github-actions · 2025-08-13T06:22:59Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to fix an issue where the apply_router_weight_on_input parameter was not being passed to the FusedMoE base class constructor. The change correctly adds this parameter to the super().__init__ call.

However, my review found that this change by itself is not sufficient to enable the feature, as the parameter is not subsequently used in the AscendFusedMoE's logic. A follow-up change is required in AscendUnquantizedFusedMoEMethod.apply to pass this parameter to the fused_experts function. I've added a comment with more details.

Also, please note that the pull request title seems to contain a typo; it mentions e_score_correction_bias while the change is for apply_router_weight_on_input.

gemini-code-assist · 2025-08-13T06:24:13Z

vllm_ascend/ops/fused_moe.py

            scoring_func=scoring_func,
            e_score_correction_bias=e_score_correction_bias,
            activation=activation,
+            apply_router_weight_on_input=apply_router_weight_on_input,


Passing apply_router_weight_on_input to the superclass constructor is the correct first step. However, for this parameter to have any effect, it needs to be utilized by the MoE implementation.

Currently, the apply method in AscendUnquantizedFusedMoEMethod does not pass this parameter down to the fused_experts function where it is actually used. This means the feature controlled by apply_router_weight_on_input will remain inactive despite this change.

A follow-up change will be needed in vllm_ascend/ops/fused_moe.py to make this functional. For example, in AscendUnquantizedFusedMoEMethod.apply:

# around line 1170 return fused_experts( ... expert_map=expert_map, apply_router_weight_on_input=layer.apply_router_weight_on_input )

ApsarasX · 2025-08-13T06:27:03Z

title: e_score_correction_bias -> apply_router_weight_on_input

Signed-off-by: MengqingCao <cmq0113@163.com>

codecov · 2025-08-13T07:06:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.74%. Comparing base (992271b) to head (ee6b871).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2348   +/-   ##
=======================================
  Coverage   75.74%   75.74%           
=======================================
  Files         118      118           
  Lines       13525    13525           
=======================================
  Hits        10245    10245           
  Misses       3280     3280

Flag	Coverage Δ
unittests	`75.74% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… MoE layers (#3) * feat(performance): support `GroupedMatmulSwigluQuant` in `W8A8_DYNAMIC` quantized MoE layers Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(bug): fix bug Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(ops): enable grouped_matmul_swiglu_quant by default Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): fix broken test Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): temporally skip broken test due to oom Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): change bias1 to tensor Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(bug): update group_list handling and weight scale in dynamic methods Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(ops): replace all splited gmm and swiglu Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(quantization): split w4a8 and w8a8 apply Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): replace w8a8 function in apply Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(cumsum): add cumsum_group_list function for group list processing Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * [Doc] Add container image save/load FAQ for offline environments (vllm-project#2347) ### What this PR does / why we need it? Add Docker export/import guide for air-gapped environments ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? NA - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@d16aa3d Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> * [Bugfix] fix the oom when chunkprefill with long context like 64k (vllm-project#2319) The attn mask was declared in the mla.py，we don't need the splitfuse mask when mla chunkprefill, and this mask will cause memory problem when long context like 64k or 128k - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@14a5d90 --------- Signed-off-by: haojiangzheng <justineric096@gmail.com> * [Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init (vllm-project#2348) ### What this PR does / why we need it? Add the missing `apply_router_weight_on_input` in FusedMoE init Quick fix on vllm-project#2268 (comment) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@6807af8 Signed-off-by: MengqingCao <cmq0113@163.com> * [2/N][Refactor] Refactor V1 attention for better extensibility (vllm-project#1995) ### What this PR does / why we need it? Refactor V1 Attention for better extensibility (prepared for torchair attention refactor). **Main changes:** - Move different kinds of foward into their method respectively, e.g., `_forward_prefill_no_cache()`, `_forward_prefill_cache_hit()`, `_forward_decode_only()`, `_forward_v1_style()`. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@14a5d90 Signed-off-by: shen-shanshan <467638484@qq.com> * [Misc] Remove redundant imported `envs`, using `envs_ascend` instead (vllm-project#2193) ### What this PR does / why we need it? Remove redundant imported `envs`, using `envs_ascend` instead. ```python import vllm.envs as envs_vllm import vllm_ascend.envs as envs_ascend ``` - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@71683ca --------- Signed-off-by: shen-shanshan <467638484@qq.com> * feat(torchair): consider not using gmmswigluquant when torchair enabled Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(dtype): unify `w1_scale` dtype Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> --------- Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Signed-off-by: haojiangzheng <justineric096@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: jack <QwertyJack@users.noreply.github.com> Co-authored-by: zhenghaojiang <zhjoneson@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Shanshan Shen <467638484@qq.com>

… init (vllm-project#2348) ### What this PR does / why we need it? Add the missing `apply_router_weight_on_input` in FusedMoE init Quick fix on vllm-project#2268 (comment) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@6807af8 Signed-off-by: MengqingCao <cmq0113@163.com>

github-actions bot added the module:ops label Aug 13, 2025

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

MengqingCao changed the title ~~[Quickfix] Add the missing e_score_correction_bias in FusedMoE init~~ [Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init Aug 13, 2025

[Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init

ee6b871

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao force-pushed the cifix branch from e809bdb to ee6b871 Compare August 13, 2025 06:42

ApsarasX approved these changes Aug 13, 2025

View reviewed changes

wangxiyuan approved these changes Aug 13, 2025

View reviewed changes

wangxiyuan merged commit 8914d5a into vllm-project:main Aug 14, 2025
25 checks passed

MengqingCao deleted the cifix branch August 14, 2025 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init #2348

[Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init #2348

Uh oh!

MengqingCao commented Aug 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

ApsarasX commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init #2348

[Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init #2348

Uh oh!

Conversation

MengqingCao commented Aug 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init #2348

[Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init #2348

MengqingCao commented Aug 13, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 13, 2025 •

edited

Loading