-
Notifications
You must be signed in to change notification settings - Fork 535
add gatherep select. #2740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gatherep select. #2740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the token dispatcher selection logic for MoE models on Ascend hardware, making it dependent on the SoC version, token count, and expert parallelism size. My review has identified a critical issue where an unsupported SoC version could lead to a runtime error due to an unhandled case. Additionally, I've noted a high-severity maintainability concern regarding a magic number and code duplication for a key capacity parameter. Addressing these points will enhance the code's robustness and clarity.
| if soc_version in {AscendSocVersion.A2}: | ||
| if num_tokens <= mc2_tokens_capacity and ep_size >=16: | ||
| return "TokenDispatcherWithMC2" | ||
| else: | ||
| return "TokenDispatcherWithAllGather" | ||
| elif soc_version in {AscendSocVersion.A3}: | ||
| if num_tokens <= mc2_tokens_capacity: | ||
| return "TokenDispatcherWithMC2" | ||
| else: | ||
| return "TokenDispatcherWithAll2AllV" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function does not handle cases where soc_version is not AscendSocVersion.A2 or AscendSocVersion.A3. In such a scenario, the function will implicitly return None, which violates its str return type annotation and will likely cause a TypeError at runtime. To prevent this, you should add a case to handle unsupported SoC versions, for instance by raising a ValueError.
| if soc_version in {AscendSocVersion.A2}: | |
| if num_tokens <= mc2_tokens_capacity and ep_size >=16: | |
| return "TokenDispatcherWithMC2" | |
| else: | |
| return "TokenDispatcherWithAllGather" | |
| elif soc_version in {AscendSocVersion.A3}: | |
| if num_tokens <= mc2_tokens_capacity: | |
| return "TokenDispatcherWithMC2" | |
| else: | |
| return "TokenDispatcherWithAll2AllV" | |
| if soc_version in {AscendSocVersion.A2}: | |
| if num_tokens <= mc2_tokens_capacity and ep_size >=16: | |
| return "TokenDispatcherWithMC2" | |
| else: | |
| return "TokenDispatcherWithAllGather" | |
| elif soc_version in {AscendSocVersion.A3}: | |
| if num_tokens <= mc2_tokens_capacity: | |
| return "TokenDispatcherWithMC2" | |
| else: | |
| return "TokenDispatcherWithAll2AllV" | |
| else: | |
| raise ValueError(f"Unsupported soc_version: {soc_version}") |
| from vllm_ascend.ops.moe_dispatcher.token_dispatcher import \ | ||
| get_token_dispatcher | ||
| dispatcher_name = get_dispatcher_name(ep_size, with_prefill) | ||
| mc2_tokens_capacity = 512 * vllm_config.parallel_config.tensor_parallel_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value 512 used to calculate mc2_tokens_capacity is a magic number, which makes the code harder to understand and maintain. This same calculation is also present in vllm_ascend/worker/model_runner_v1.py at line 369. To improve clarity and avoid potential inconsistencies, this value should be extracted into a named constant and defined in a central location, such as vllm_ascend/ascend_config.py, so it can be reused across the codebase.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
466f84a to
3160093
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2740 +/- ##
==========================================
- Coverage 72.99% 72.95% -0.04%
==========================================
Files 153 154 +1
Lines 21331 21418 +87
==========================================
+ Hits 15571 15626 +55
- Misses 5760 5792 +32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ee26a93 to
dd66e51
Compare
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
### What this PR does / why we need it? add gatherep select. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
| return "TokenDispatcherWithAll2AllV" | ||
|
|
||
| if with_prefill: | ||
| elif envs_ascend.VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP and ep_size > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this logic will be consolidated with moe_common_method, without relying on environment variable checks.
### What this PR does / why we need it? add gatherep select. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
### What this PR does / why we need it? add gatherep select. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
### What this PR does / why we need it? add gatherep select. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
### What this PR does / why we need it? add gatherep select. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@e599e2c Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com> Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
What this PR does / why we need it?
add gatherep select.
Does this PR introduce any user-facing change?
How was this patch tested?