Skip to content

Conversation

@MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Aug 13, 2025

What this PR does / why we need it?

Add the missing apply_router_weight_on_input in FusedMoE init
Quick fix on #2268 (comment)

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue where the apply_router_weight_on_input parameter was not being passed to the FusedMoE base class constructor. The change correctly adds this parameter to the super().__init__ call.

However, my review found that this change by itself is not sufficient to enable the feature, as the parameter is not subsequently used in the AscendFusedMoE's logic. A follow-up change is required in AscendUnquantizedFusedMoEMethod.apply to pass this parameter to the fused_experts function. I've added a comment with more details.

Also, please note that the pull request title seems to contain a typo; it mentions e_score_correction_bias while the change is for apply_router_weight_on_input.

scoring_func=scoring_func,
e_score_correction_bias=e_score_correction_bias,
activation=activation,
apply_router_weight_on_input=apply_router_weight_on_input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Passing apply_router_weight_on_input to the superclass constructor is the correct first step. However, for this parameter to have any effect, it needs to be utilized by the MoE implementation.

Currently, the apply method in AscendUnquantizedFusedMoEMethod does not pass this parameter down to the fused_experts function where it is actually used. This means the feature controlled by apply_router_weight_on_input will remain inactive despite this change.

A follow-up change will be needed in vllm_ascend/ops/fused_moe.py to make this functional. For example, in AscendUnquantizedFusedMoEMethod.apply:

# around line 1170
return fused_experts(
    ...
    expert_map=expert_map,
    apply_router_weight_on_input=layer.apply_router_weight_on_input
)

@ApsarasX
Copy link
Collaborator

title: e_score_correction_bias -> apply_router_weight_on_input

@MengqingCao MengqingCao changed the title [Quickfix] Add the missing e_score_correction_bias in FusedMoE init [Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init Aug 13, 2025
@codecov
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.74%. Comparing base (992271b) to head (ee6b871).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2348   +/-   ##
=======================================
  Coverage   75.74%   75.74%           
=======================================
  Files         118      118           
  Lines       13525    13525           
=======================================
  Hits        10245    10245           
  Misses       3280     3280           
Flag Coverage Δ
unittests 75.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wangxiyuan wangxiyuan merged commit 8914d5a into vllm-project:main Aug 14, 2025
25 checks passed
@MengqingCao MengqingCao deleted the cifix branch August 14, 2025 03:40
zhoux77899 added a commit to zhoux77899/vllm-ascend that referenced this pull request Aug 14, 2025
… MoE layers (#3)

* feat(performance): support `GroupedMatmulSwigluQuant` in `W8A8_DYNAMIC` quantized MoE layers

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(bug): fix bug

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* feat(ops): enable grouped_matmul_swiglu_quant by default

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(test): fix broken test

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(test): temporally skip broken test due to oom

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(test): change bias1 to tensor

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(bug): update group_list handling and weight scale in dynamic methods

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* feat(ops): replace all splited gmm and swiglu

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* feat(quantization): split w4a8 and w8a8 apply

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(test): replace w8a8 function in apply

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* feat(cumsum): add cumsum_group_list function for group list processing

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* [Doc] Add container image save/load FAQ for offline environments (vllm-project#2347)

### What this PR does / why we need it?

Add Docker export/import guide for air-gapped environments

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

NA

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@d16aa3d

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

* [Bugfix] fix the oom when chunkprefill with long context like 64k (vllm-project#2319)

The attn mask was declared in the mla.py,we don't need the splitfuse
mask when mla chunkprefill, and this mask will cause memory problem when
long context like 64k or 128k

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@14a5d90

---------

Signed-off-by: haojiangzheng <justineric096@gmail.com>

* [Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init (vllm-project#2348)

### What this PR does / why we need it?
Add the missing `apply_router_weight_on_input` in FusedMoE init
Quick fix on
vllm-project#2268 (comment)

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6807af8

Signed-off-by: MengqingCao <cmq0113@163.com>

* [2/N][Refactor] Refactor V1 attention for better extensibility (vllm-project#1995)

### What this PR does / why we need it?

Refactor V1 Attention for better extensibility (prepared for torchair
attention refactor).

**Main changes:**
- Move different kinds of foward into their method respectively, e.g.,
`_forward_prefill_no_cache()`, `_forward_prefill_cache_hit()`,
`_forward_decode_only()`, `_forward_v1_style()`.

### Does this PR introduce _any_ user-facing change?

No.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@14a5d90

Signed-off-by: shen-shanshan <467638484@qq.com>

* [Misc] Remove redundant imported `envs`, using `envs_ascend` instead (vllm-project#2193)

### What this PR does / why we need it?
Remove redundant imported `envs`, using `envs_ascend` instead.

```python
import vllm.envs as envs_vllm
import vllm_ascend.envs as envs_ascend
```

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@71683ca

---------

Signed-off-by: shen-shanshan <467638484@qq.com>

* feat(torchair): consider not using gmmswigluquant when torchair enabled

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(dtype): unify `w1_scale` dtype

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

* fix(lint): fix lint

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Signed-off-by: haojiangzheng <justineric096@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: jack <QwertyJack@users.noreply.github.com>
Co-authored-by: zhenghaojiang <zhjoneson@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
… init (vllm-project#2348)

### What this PR does / why we need it?
Add the missing `apply_router_weight_on_input` in FusedMoE init
Quick fix on
vllm-project#2268 (comment)

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6807af8

Signed-off-by: MengqingCao <cmq0113@163.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
… init (vllm-project#2348)

### What this PR does / why we need it?
Add the missing `apply_router_weight_on_input` in FusedMoE init
Quick fix on
vllm-project#2268 (comment)

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@6807af8

Signed-off-by: MengqingCao <cmq0113@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants