- 
                Notifications
    You must be signed in to change notification settings 
- Fork 528
[Fix] Fix SharedFusedMoE #2817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Fix SharedFusedMoE #2817
Conversation
Introduces `AscendSharedFusedMoE` to handle the specific Mixture-of-Experts architecture of DeepSeek-V2, which includes a shared expert alongside other experts. This new class processes the shared expert and the regular MoE experts separately, ensuring correct tensor parallel communication for the shared expert's output. The DeepSeek-V2 model is updated to use this new implementation. Additionally, a redundant communication reduction method is removed from the base `AscendFusedMoE` class for a cleaner implementation. Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
| 👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge: 
 If CI fails, you can run linting and testing checks locally according Contributing and Testing. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix for SharedFusedMoE by providing an Ascend-specific implementation, AscendSharedFusedMoE, and applying it via monkey-patching to support models like DeepSeek-V2 and Llama4. The implementation of AscendSharedFusedMoE correctly handles tensor parallel communication for shared experts. However, the monkey-patch is applied in two different places, leading to code redundancy. My review includes a suggestion to consolidate the patch in a single, more appropriate location to improve maintainability.
…2 and Llama4 Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
1a23f9e    to
    ad76e33      
    Compare
  
    | from vllm_ascend.ops.common_fused_moe import AscendSharedFusedMoE | ||
|  | ||
| deepseek_v2.SharedFusedMoE = AscendSharedFusedMoE | ||
| llama4.SharedFusedMoE = AscendSharedFusedMoE No newline at end of file | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the latest code, a common class has been extracted, named SharedFusedMoE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be patch in worker module, since it'll be used by worker process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the latest code, a common class has been extracted, named SharedFusedMoE.
Yes, I'm well aware of that, except patching that class directly (I assume you are suggesting shared_fused_moe.SharedFusedMoE = AscendSharedFusedMoE) will have no effect at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be patch in worker module, since it'll be used by worker process?
Tested in both directory, both are fine, so we can stick to this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UT is broken by other change.
### What this PR does / why we need it? Really strange that `register_oot` doesn't work with `SharedFusedMoE`, so we have to add this patch, for now. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? This PR won't have any effect in DeepSeek since we currently still stick with the old `CustomDeepseekV2`. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@0cdd213 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
### What this PR does / why we need it? Really strange that `register_oot` doesn't work with `SharedFusedMoE`, so we have to add this patch, for now. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? This PR won't have any effect in DeepSeek since we currently still stick with the old `CustomDeepseekV2`. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@0cdd213 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
### What this PR does / why we need it? Really strange that `register_oot` doesn't work with `SharedFusedMoE`, so we have to add this patch, for now. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? This PR won't have any effect in DeepSeek since we currently still stick with the old `CustomDeepseekV2`. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@0cdd213 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
### What this PR does / why we need it? Really strange that `register_oot` doesn't work with `SharedFusedMoE`, so we have to add this patch, for now. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? This PR won't have any effect in DeepSeek since we currently still stick with the old `CustomDeepseekV2`. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@0cdd213 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
…nalinaly (#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](#3002 (comment)), [#discussion_r2224572309](#1803 (comment)), [#issuecomment-2982470226](#1261 (comment)), [#issuecomment-2903621197](#836 (comment)), [#issuecomment-2857678691](#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](#2791 (comment)), [discussion_r2332166704](#2817 (comment)), [discussion_r2343289692](#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module #2150 #2706 #2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. #1547 #2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it? Really strange that `register_oot` doesn't work with `SharedFusedMoE`, so we have to add this patch, for now. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? This PR won't have any effect in DeepSeek since we currently still stick with the old `CustomDeepseekV2`. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@0cdd213 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
What this PR does / why we need it?
Really strange that
register_ootdoesn't work withSharedFusedMoE, so we have to add this patch, for now.Does this PR introduce any user-facing change?
None.
How was this patch tested?
This PR won't have any effect in DeepSeek since we currently still stick with the old
CustomDeepseekV2.