Skip to content

Conversation

@Angazenn
Copy link
Collaborator

@Angazenn Angazenn commented Jun 25, 2025

What this PR does / why we need it?

This PR introduces an expert rearrange algorithm for PanguProMoE model. Different from the original grouped topk, it filters out the top experts that are allocated more tokens. Therefore, we can load less experts when calculating gmm.

We have test this algorithm for PanguProMoE-72B on 300I Duo platform and 800I A2 platform. On 300I Duo platform, we find that num_voted_experts set to 5 achieves both good performance and accuracy. While on 800I A2, we still set it to 8 to use original pangu grouped topk.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

@codecov
Copy link

codecov bot commented Jun 26, 2025

Codecov Report

❌ Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.64%. Comparing base (c30ddb8) to head (9321e0c).
⚠️ Report is 581 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/fused_moe.py 20.00% 4 Missing ⚠️
vllm_ascend/ops/common_fused_moe.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1421      +/-   ##
==========================================
+ Coverage   27.39%   31.64%   +4.24%     
==========================================
  Files          56       60       +4     
  Lines        6191     6640     +449     
==========================================
+ Hits         1696     2101     +405     
- Misses       4495     4539      +44     
Flag Coverage Δ
unittests 31.64% <25.00%> (+4.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: angazenn <zengyanjia@huawei.com>
@Angazenn Angazenn changed the title [WIP]support MERRouter [PERF]support MERRouter Jun 28, 2025
@ganyi1996ppo ganyi1996ppo merged commit c59d69d into vllm-project:main Jun 28, 2025
24 checks passed
zhanghw0354 pushed a commit to zhanghw0354/vllm-ascend that referenced this pull request Jun 30, 2025
### What this PR does / why we need it?
This PR introduces an expert rearrange algorithm for PanguProMoE model.
Different from the original grouped topk, it filters out the top experts
that are allocated more tokens. Therefore, we can load less experts when
calculating gmm.

We have test this algorithm for PanguProMoE-72B on 300I Duo platform and
800I A2 platform. On 300I Duo platform, we find that `num_voted_experts`
set to 5 achieves both good performance and accuracy. While on 800I A2,
we still set it to 8 to use original pangu grouped topk.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: zhanghw0354 <zhanghaiwen_yewu@cmss.chinamobile.com>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jun 30, 2025
### What this PR does / why we need it?
This PR introduces an expert rearrange algorithm for PanguProMoE model.
Different from the original grouped topk, it filters out the top experts
that are allocated more tokens. Therefore, we can load less experts when
calculating gmm.

We have test this algorithm for PanguProMoE-72B on 300I Duo platform and
800I A2 platform. On 300I Duo platform, we find that `num_voted_experts`
set to 5 achieves both good performance and accuracy. While on 800I A2,
we still set it to 8 to use original pangu grouped topk.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
@Angazenn Angazenn deleted the me branch September 8, 2025 03:16
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
This PR introduces an expert rearrange algorithm for PanguProMoE model.
Different from the original grouped topk, it filters out the top experts
that are allocated more tokens. Therefore, we can load less experts when
calculating gmm.

We have test this algorithm for PanguProMoE-72B on 300I Duo platform and
800I A2 platform. On 300I Duo platform, we find that `num_voted_experts`
set to 5 achieves both good performance and accuracy. While on 800I A2,
we still set it to 8 to use original pangu grouped topk.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
This PR introduces an expert rearrange algorithm for PanguProMoE model.
Different from the original grouped topk, it filters out the top experts
that are allocated more tokens. Therefore, we can load less experts when
calculating gmm.

We have test this algorithm for PanguProMoE-72B on 300I Duo platform and
800I A2 platform. On 300I Duo platform, we find that `num_voted_experts`
set to 5 achieves both good performance and accuracy. While on 800I A2,
we still set it to 8 to use original pangu grouped topk.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants