-
Notifications
You must be signed in to change notification settings - Fork 617
[Bugfix]Fix eplb enable when using mtp float weights. #4576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.11.0-dev
Are you sure you want to change the base?
Conversation
Signed-off-by: offline0806 <3337230449@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue where expert placement and load balancing (EPLB) was incorrectly disabled when using float weights. The change correctly scopes the quantization check to only apply when static EPLB (using expert_map_path) is enabled, allowing dynamic EPLB to function with float weights. The logic of the fix is sound. My review includes suggestions to improve the maintainability and readability of the implementation by refactoring the newly introduced state-carrying variable into an instance attribute with a more descriptive name and clearer comments.
| self.moe_instance_id, self.ep_rank)) | ||
| self.log2phy = self.expert_load_balancer.get_rank_log2phy_map( | ||
| self.moe_instance_id, self.ep_rank).npu() | ||
| init_eplb_enable = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if eplb_enable and (not hasattr(self.quant_method, "quant_method") or | ||
| not isinstance(self.quant_method.quant_method, | ||
| AscendW8A8DynamicFusedMoEMethod)): | ||
| if init_eplb_enable and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: offline893 <158537145+offline893@users.noreply.github.com>
|
This is a backport of #4571 |
What this PR does / why we need it?
Fix eplb enable when using mtp float weights. It will be remove when eplb supporting mtp and float weights.
Does this PR introduce any user-facing change?
How was this patch tested?
Deepseek-V3 + MTP + EPLB in A3.
![Uploading image.png…]()