Skip to content

Conversation

@offline893
Copy link
Contributor

@offline893 offline893 commented Nov 29, 2025

What this PR does / why we need it?

Fix eplb enable when using mtp float weights. It will be remove when eplb supporting mtp and float weights.

Does this PR introduce any user-facing change?

How was this patch tested?

Deepseek-V3 + MTP + EPLB in A3.
Uploading image.png…

Signed-off-by: offline0806 <3337230449@qq.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue where expert placement and load balancing (EPLB) was incorrectly disabled when using float weights. The change correctly scopes the quantization check to only apply when static EPLB (using expert_map_path) is enabled, allowing dynamic EPLB to function with float weights. The logic of the fix is sound. My review includes suggestions to improve the maintainability and readability of the implementation by refactoring the newly introduced state-carrying variable into an instance attribute with a more descriptive name and clearer comments.

self.moe_instance_id, self.ep_rank))
self.log2phy = self.expert_load_balancer.get_rank_log2phy_map(
self.moe_instance_id, self.ep_rank).npu()
init_eplb_enable = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To be consistent with the suggested change of using an instance attribute, this should be updated to set self.static_eplb_enabled.

Suggested change
init_eplb_enable = True
self.static_eplb_enabled = True

if eplb_enable and (not hasattr(self.quant_method, "quant_method") or
not isinstance(self.quant_method.quant_method,
AscendW8A8DynamicFusedMoEMethod)):
if init_eplb_enable and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To be consistent with the suggested change, this condition should use the self.static_eplb_enabled instance attribute.

Suggested change
if init_eplb_enable and (
if self.static_eplb_enabled and (

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: offline893 <158537145+offline893@users.noreply.github.com>
@MengqingCao
Copy link
Collaborator

This is a backport of #4571

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants