- 
                Notifications
    You must be signed in to change notification settings 
- Fork 528
Unified fused moe #2923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified fused moe #2923
Conversation
| 👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge: 
 If CI fails, you can run linting and testing checks locally according Contributing and Testing. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and valuable refactoring of the Mixture of Experts (MoE) implementation, unifying various approaches under a common interface. This improves modularity and maintainability by encapsulating communication logic within dedicated MoECommMethod classes. However, this refactoring appears to have left behind a duplicated file (vllm_ascend/ops/fused_moe.py), which is very similar to vllm_ascend/ops/common_fused_moe.py and could cause confusion. Additionally, I've identified a critical bug in the MoE communication method selection logic and the removal of some important safety assertions. Please address these points to ensure the stability and correctness of the new implementation.
| moe_comm_method = "naivemulticast" | ||
|  | ||
| if model_type == "PanguProMoE": | ||
| moe_comm_method == "allgather" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here. moe_comm_method == "allgather" is a comparison and has no effect. It should be an assignment: moe_comm_method = "allgather". This bug will cause PanguProMoE models to use an incorrect MoE communication method, potentially leading to runtime errors or incorrect behavior.
| moe_comm_method == "allgather" | |
| moe_comm_method = "allgather" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bug here.
| global_redundant_expert_num: int = 0, | ||
| need_trans: bool = False) -> torch.Tensor: | ||
| # Check constraints | ||
| assert hidden_states.shape[1] == w1.shape[1], ( | ||
| f"Hidden size mismatch {hidden_states.shape[1]} != {w1.shape[1]}") | ||
| assert topk_weights.shape == topk_ids.shape, "topk shape mismatch" | ||
| assert hidden_states.is_contiguous( | ||
| ), "Hidden_states must be contiguous" | ||
| assert w1.stride(-1) == 1, "Stride of last dimension must be 1" | ||
| assert w2.stride(-1) == 1, "Stride of last dimension must be 1" | ||
| assert hidden_states.dtype in [ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several important assertions checking for tensor shape matching and contiguity have been removed. While the first assertion for hidden size mismatch was likely incorrect, the others (e.g., topk_weights.shape == topk_ids.shape, hidden_states.is_contiguous()) are valuable for ensuring correctness and preventing hard-to-debug errors from underlying NPU operations, which may have strict input requirements. Please consider restoring the valid assertions to maintain input validation and prevent potential silent failures.
| This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
d20bd83    to
    aded6ea      
    Compare
  
    | This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?