-
Notifications
You must be signed in to change notification settings - Fork 530
[refactor] Refactoring AscendFusedMoE #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a79efd7 to
20e3bb6
Compare
|
I really like this change. Let's merge this first. @Yikun @ganyi1996ppo @jianzs please take this as the high priory. Thanks. |
vllm_ascend/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a sub module names fused_moe in ops. Then move FusedMoEState and get_fused_moe_state to that module's utils file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make 8 constant
Any change in this PR related to this commit message? |
|
|
|
It's better to describe which communication kernel is chosen for different configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current usage of MC2 kernel does not support non-uniform inputs, so padding is still required.
|
please rebase to main to make sure the torchair CI passed |
What this PR does / why we need it?
This PR is used for resolved issue 1147
fused_moe.py.get_fused_moe_state.Does this PR introduce any user-facing change?
VLLM_ENABLE_MC2, because I think this env is useless, we can make judgments based on the current scenario without this env, it will only increase complexity.USING_LCCL_COM, because this env has already expired.additional_config.expert_tensor_parallel_sizehas already expired, and now we also use parameterenable_expert_parallel, consistent with the vLLM.How was this patch tested?