Skip to content

Conversation

@CSWYF3634076
Copy link
Contributor

@CSWYF3634076 CSWYF3634076 commented Sep 30, 2025

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly changes the data type of the MoE gate and bias parameters to float32 for the ernie45_moe model, aligning it with the original implementation. However, a similar change in ernie45_vl_moe.py is incomplete. While the params_dtype is set to float32, the model's general quantization configuration is still passed to the gate layers. This could lead to the gates being quantized, which would negate the intended fix and potentially cause correctness issues. I've added comments to explicitly set quant_config=None for the gate layers in ernie45_vl_moe.py to ensure they remain unquantized.

config.moe_num_experts[0],
params_dtype=torch.float32,
bias=False,
quant_config=quant_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

To ensure the gate layer is not quantized, quant_config should be explicitly set to None. Currently, it's passing the general quant_config from the model, which could lead to the gate being quantized if any quantization method is enabled for the model. This would contradict the purpose of setting params_dtype=torch.float32.

Suggested change
quant_config=quant_config,
quant_config=None,

config.moe_num_experts[1],
bias=False,
params_dtype=torch.float32,
quant_config=quant_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the text_experts_gate, the quant_config for vision_experts_gate should also be set to None to prevent it from being quantized. This ensures the gate operates in float32 as intended.

Suggested change
quant_config=quant_config,
quant_config=None,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fix

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bias=False,
quant_config=quant_config,
params_dtype=torch.float32,
quant_config=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why is it set to None? Is it because of GPTQ quantization?

Copy link
Contributor Author

@CSWYF3634076 CSWYF3634076 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why is it set to None? Is it because of GPTQ quantization?

Based on the advice of the ai-assistant and referencing the gate of other models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Okay, it's already done

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025
@jeejeelee jeejeelee merged commit ef6e0e7 into vllm-project:main Sep 30, 2025
50 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
…ct#25936)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…ct#25936)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ct#25936)

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants