Skip to content

Conversation

@zzzzwwjj
Copy link
Collaborator

@zzzzwwjj zzzzwwjj commented Jun 11, 2025

What this PR does / why we need it?

This PR is used for resolved issue 1147

  1. Move fused_moe code into one file fused_moe.py.
  2. Integrate branch conditions into function get_fused_moe_state.

Does this PR introduce any user-facing change?

  1. This PR has removed the env VLLM_ENABLE_MC2, because I think this env is useless, we can make judgments based on the current scenario without this env, it will only increase complexity.
  2. This PR has removed the env USING_LCCL_COM, because this env has already expired.
  3. additional_config.expert_tensor_parallel_size has already expired, and now we also use parameter enable_expert_parallel, consistent with the vLLM.

How was this patch tested?

@zzzzwwjj zzzzwwjj force-pushed the main branch 6 times, most recently from a79efd7 to 20e3bb6 Compare June 11, 2025 10:18
@Yikun Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 11, 2025
@wangxiyuan
Copy link
Collaborator

I really like this change. Let's merge this first. @Yikun @ganyi1996ppo @jianzs please take this as the high priory. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a sub module names fused_moe in ops. Then move FusedMoEState and get_fused_moe_state to that module's utils file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make 8 constant

@wangxiyuan
Copy link
Collaborator

wangxiyuan commented Jun 11, 2025

  1. additional_config.expert_tensor_parallel_size has already expired, and now we also use parameter enable_expert_parallel, consistent with the vLLM.

Any change in this PR related to this commit message?

@wangxiyuan
Copy link
Collaborator

VLLM_ENABLE_MC2 in example files can be removed as well.

@jianzs
Copy link
Collaborator

jianzs commented Jun 11, 2025

It's better to describe which communication kernel is chosen for different configurations.

@wangxiyuan
Copy link
Collaborator

@jianzs it's described clear in RFC. @zzzzwwjj maybe you can add more note in the code logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current usage of MC2 kernel does not support non-uniform inputs, so padding is still required.

@wangxiyuan
Copy link
Collaborator

please rebase to main to make sure the torchair CI passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

long-term-test enable long term test for PR module:core module:ops module:quantization ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Refactoring AscendFusedMoE

5 participants