-
Couldn't load subscription status.
- Fork 523
[main] [refactor] refactor common_fused_moe.py #2706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the MoE communication methods by introducing a TokenDispatcher abstraction, which is a good move to centralize logic and reduce code duplication. The changes are well-structured and improve the overall design. I've identified a couple of areas for improvement: making a base class method abstract to enforce implementation by subclasses, and removing some redundant code in an __init__ method. Addressing these points will make the implementation more robust and maintainable.
| def get_token_dispatcher(self): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_token_dispatcher method in the base class MoECommMethod is implemented with pass, which means it returns None. The __init__ method calls this and assigns the result to self.token_dispatcher. Later, permute and unpermute methods attempt to use self.token_dispatcher, which will result in an AttributeError if a subclass does not override get_token_dispatcher. To prevent this and enforce the correct implementation in subclasses, this method should be decorated with @abstractmethod.
| def get_token_dispatcher(self): | |
| pass | |
| @abstractmethod | |
| def get_token_dispatcher(self): | |
| pass |
| num_experts = kwargs["num_experts"] | ||
| num_local_experts = kwargs["num_local_experts"] | ||
| from vllm_ascend.ops.moe_dispatcher.token_dispatcher import TokenDispatcherWithAll2AllV | ||
| # from vllm_ascend.ops.moe_dispatcher.token_dispatcher import \ | ||
| # get_token_dispatcher | ||
| # self.token_dispatcher = get_token_dispatcher( | ||
| # "TokenDispatcherWithAll2AllV") | ||
| self.token_dispatcher = TokenDispatcherWithAll2AllV( | ||
| top_k=self.moe_config.experts_per_token, | ||
| num_experts=self.num_experts, | ||
| num_local_experts=self.num_local_experts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization of self.token_dispatcher and the extraction of num_experts and num_local_experts in AlltoAllCommImpl.__init__ are redundant. The call to super().__init__ already handles this logic by calling the get_token_dispatcher method, which is implemented in this class. This redundant code makes the class harder to maintain and can lead to inconsistencies. It should be removed for clarity and correctness.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
36570df to
4f0f1e4
Compare
03e135f to
885d9ce
Compare
727453f to
a72549b
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
4c4f55a to
8ceca57
Compare
01e4390 to
c13b334
Compare
Co-Authored-By: weijinqian0 <12153182+weijinqian0@users.noreply.github.com> Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
c13b334 to
432b64a
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2706 +/- ##
==========================================
+ Coverage 72.61% 73.48% +0.87%
==========================================
Files 154 157 +3
Lines 21319 21519 +200
==========================================
+ Hits 15480 15813 +333
+ Misses 5839 5706 -133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
it's very clear now. |
| TokenDispatcherWithMC2) | ||
|
|
||
|
|
||
| class MoECommMethod(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: I think an abstract class should just contain the unified api, without the detailed implementation, and make more comments like the original MoECommMethod class. Maybe we could add a class MoECommMethodBase to inherit form MoECommMethod, which contains the detailed prepare and other member functions. We could do this in next pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will be fixed in my next pr.
|
|
||
| class FusedMoEPrepareAndFinalize(ABC): | ||
|
|
||
| def __init__(self, moe_config: FusedMoEConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make more comments of FusedMoEPrepareAndFinalize and all the member functions of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, comments will be added in my next pr.
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com> Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
…nalinaly (#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](#3002 (comment)), [#discussion_r2224572309](#1803 (comment)), [#issuecomment-2982470226](#1261 (comment)), [#issuecomment-2903621197](#836 (comment)), [#issuecomment-2857678691](#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](#2791 (comment)), [discussion_r2332166704](#2817 (comment)), [discussion_r2343289692](#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module #2150 #2706 #2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. #1547 #2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it? 1. Move prepare/finalize operation from moe_comm_method to /ops/moe/fused_moe_prepare_and_finalize 2. Adapt to token_dispatcher in moe_comm_method 3. Move moe_comm_method/experts_selector/token_dispatcher/fused_moe_prepare_and_finalize to /ops/moe ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? e2e & ut - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@f4962a6 Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com> Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
no
How was this patch tested?
e2e & ut