Skip to content

Conversation

@momo609
Copy link
Collaborator

@momo609 momo609 commented Sep 4, 2025

What this PR does / why we need it?

add gatherep select.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the token dispatcher selection logic for MoE models on Ascend hardware, making it dependent on the SoC version, token count, and expert parallelism size. My review has identified a critical issue where an unsupported SoC version could lead to a runtime error due to an unhandled case. Additionally, I've noted a high-severity maintainability concern regarding a magic number and code duplication for a key capacity parameter. Addressing these points will enhance the code's robustness and clarity.

Comment on lines 51 to 60
if soc_version in {AscendSocVersion.A2}:
if num_tokens <= mc2_tokens_capacity and ep_size >=16:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAllGather"
elif soc_version in {AscendSocVersion.A3}:
if num_tokens <= mc2_tokens_capacity:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAll2AllV"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function does not handle cases where soc_version is not AscendSocVersion.A2 or AscendSocVersion.A3. In such a scenario, the function will implicitly return None, which violates its str return type annotation and will likely cause a TypeError at runtime. To prevent this, you should add a case to handle unsupported SoC versions, for instance by raising a ValueError.

Suggested change
if soc_version in {AscendSocVersion.A2}:
if num_tokens <= mc2_tokens_capacity and ep_size >=16:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAllGather"
elif soc_version in {AscendSocVersion.A3}:
if num_tokens <= mc2_tokens_capacity:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAll2AllV"
if soc_version in {AscendSocVersion.A2}:
if num_tokens <= mc2_tokens_capacity and ep_size >=16:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAllGather"
elif soc_version in {AscendSocVersion.A3}:
if num_tokens <= mc2_tokens_capacity:
return "TokenDispatcherWithMC2"
else:
return "TokenDispatcherWithAll2AllV"
else:
raise ValueError(f"Unsupported soc_version: {soc_version}")

from vllm_ascend.ops.moe_dispatcher.token_dispatcher import \
get_token_dispatcher
dispatcher_name = get_dispatcher_name(ep_size, with_prefill)
mc2_tokens_capacity = 512 * vllm_config.parallel_config.tensor_parallel_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The value 512 used to calculate mc2_tokens_capacity is a magic number, which makes the code harder to understand and maintain. This same calculation is also present in vllm_ascend/worker/model_runner_v1.py at line 369. To improve clarity and avoid potential inconsistencies, this value should be extracted into a named constant and defined in a central location, such as vllm_ascend/ascend_config.py, so it can be reused across the codebase.

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@momo609 momo609 force-pushed the gatherep3 branch 3 times, most recently from 466f84a to 3160093 Compare September 4, 2025 11:28
@codecov
Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.95%. Comparing base (4c90fa7) to head (195ca4d).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ops/moe_dispatcher/token_dispatcher.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2740      +/-   ##
==========================================
- Coverage   72.99%   72.95%   -0.04%     
==========================================
  Files         153      154       +1     
  Lines       21331    21418      +87     
==========================================
+ Hits        15571    15626      +55     
- Misses       5760     5792      +32     
Flag Coverage Δ
unittests 72.95% <95.83%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@momo609 momo609 force-pushed the gatherep3 branch 2 times, most recently from ee26a93 to dd66e51 Compare September 5, 2025 02:19
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
@wangxiyuan wangxiyuan merged commit 2693196 into vllm-project:main Sep 8, 2025
25 of 28 checks passed
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Sep 10, 2025
### What this PR does / why we need it?
add gatherep select.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
return "TokenDispatcherWithAll2AllV"

if with_prefill:
elif envs_ascend.VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP and ep_size > 1:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: this logic will be consolidated with moe_common_method, without relying on environment variable checks.

offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
### What this PR does / why we need it?
add gatherep select.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
### What this PR does / why we need it?
add gatherep select.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
add gatherep select.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
add gatherep select.

- vLLM version: v0.10.1.1
- vLLM main:
vllm-project/vllm@e599e2c

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants