-
Notifications
You must be signed in to change notification settings - Fork 538
[Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init
#2348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where the apply_router_weight_on_input parameter was not being passed to the FusedMoE base class constructor. The change correctly adds this parameter to the super().__init__ call.
However, my review found that this change by itself is not sufficient to enable the feature, as the parameter is not subsequently used in the AscendFusedMoE's logic. A follow-up change is required in AscendUnquantizedFusedMoEMethod.apply to pass this parameter to the fused_experts function. I've added a comment with more details.
Also, please note that the pull request title seems to contain a typo; it mentions e_score_correction_bias while the change is for apply_router_weight_on_input.
| scoring_func=scoring_func, | ||
| e_score_correction_bias=e_score_correction_bias, | ||
| activation=activation, | ||
| apply_router_weight_on_input=apply_router_weight_on_input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing apply_router_weight_on_input to the superclass constructor is the correct first step. However, for this parameter to have any effect, it needs to be utilized by the MoE implementation.
Currently, the apply method in AscendUnquantizedFusedMoEMethod does not pass this parameter down to the fused_experts function where it is actually used. This means the feature controlled by apply_router_weight_on_input will remain inactive despite this change.
A follow-up change will be needed in vllm_ascend/ops/fused_moe.py to make this functional. For example, in AscendUnquantizedFusedMoEMethod.apply:
# around line 1170
return fused_experts(
...
expert_map=expert_map,
apply_router_weight_on_input=layer.apply_router_weight_on_input
)|
title: |
e_score_correction_bias in FusedMoE initapply_router_weight_on_input in FusedMoE init
Signed-off-by: MengqingCao <cmq0113@163.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2348 +/- ##
=======================================
Coverage 75.74% 75.74%
=======================================
Files 118 118
Lines 13525 13525
=======================================
Hits 10245 10245
Misses 3280 3280
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… MoE layers (#3) * feat(performance): support `GroupedMatmulSwigluQuant` in `W8A8_DYNAMIC` quantized MoE layers Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(bug): fix bug Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(ops): enable grouped_matmul_swiglu_quant by default Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): fix broken test Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): temporally skip broken test due to oom Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): change bias1 to tensor Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(bug): update group_list handling and weight scale in dynamic methods Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(ops): replace all splited gmm and swiglu Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(quantization): split w4a8 and w8a8 apply Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(test): replace w8a8 function in apply Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * feat(cumsum): add cumsum_group_list function for group list processing Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * [Doc] Add container image save/load FAQ for offline environments (vllm-project#2347) ### What this PR does / why we need it? Add Docker export/import guide for air-gapped environments ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? NA - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@d16aa3d Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> * [Bugfix] fix the oom when chunkprefill with long context like 64k (vllm-project#2319) The attn mask was declared in the mla.py,we don't need the splitfuse mask when mla chunkprefill, and this mask will cause memory problem when long context like 64k or 128k - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@14a5d90 --------- Signed-off-by: haojiangzheng <justineric096@gmail.com> * [Quickfix] Add the missing `apply_router_weight_on_input` in FusedMoE init (vllm-project#2348) ### What this PR does / why we need it? Add the missing `apply_router_weight_on_input` in FusedMoE init Quick fix on vllm-project#2268 (comment) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@6807af8 Signed-off-by: MengqingCao <cmq0113@163.com> * [2/N][Refactor] Refactor V1 attention for better extensibility (vllm-project#1995) ### What this PR does / why we need it? Refactor V1 Attention for better extensibility (prepared for torchair attention refactor). **Main changes:** - Move different kinds of foward into their method respectively, e.g., `_forward_prefill_no_cache()`, `_forward_prefill_cache_hit()`, `_forward_decode_only()`, `_forward_v1_style()`. ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@14a5d90 Signed-off-by: shen-shanshan <467638484@qq.com> * [Misc] Remove redundant imported `envs`, using `envs_ascend` instead (vllm-project#2193) ### What this PR does / why we need it? Remove redundant imported `envs`, using `envs_ascend` instead. ```python import vllm.envs as envs_vllm import vllm_ascend.envs as envs_ascend ``` - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@71683ca --------- Signed-off-by: shen-shanshan <467638484@qq.com> * feat(torchair): consider not using gmmswigluquant when torchair enabled Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(dtype): unify `w1_scale` dtype Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> * fix(lint): fix lint Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> --------- Signed-off-by: zhoux77899 <zhouxiang100@huawei.com> Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Signed-off-by: haojiangzheng <justineric096@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: jack <QwertyJack@users.noreply.github.com> Co-authored-by: zhenghaojiang <zhjoneson@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Shanshan Shen <467638484@qq.com>
… init (vllm-project#2348) ### What this PR does / why we need it? Add the missing `apply_router_weight_on_input` in FusedMoE init Quick fix on vllm-project#2268 (comment) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@6807af8 Signed-off-by: MengqingCao <cmq0113@163.com>
… init (vllm-project#2348) ### What this PR does / why we need it? Add the missing `apply_router_weight_on_input` in FusedMoE init Quick fix on vllm-project#2268 (comment) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@6807af8 Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
Add the missing
apply_router_weight_on_inputin FusedMoE initQuick fix on #2268 (comment)
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
CI passed with existing test.