[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936

CSWYF3634076 · 2025-09-30T08:26:26Z

fix issue #25833

refer transformers:https://github.com/huggingface/transformers/blob/main/src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py#L342

moe_gate -> float32
e_score_correction_bias -> float32

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

gemini-code-assist

Code Review

This pull request correctly changes the data type of the MoE gate and bias parameters to float32 for the ernie45_moe model, aligning it with the original implementation. However, a similar change in ernie45_vl_moe.py is incomplete. While the params_dtype is set to float32, the model's general quantization configuration is still passed to the gate layers. This could lead to the gates being quantized, which would negate the intended fix and potentially cause correctness issues. I've added comments to explicitly set quant_config=None for the gate layers in ernie45_vl_moe.py to ensure they remain unquantized.

gemini-code-assist · 2025-09-30T08:28:20Z

vllm/model_executor/models/ernie45_vl_moe.py

                config.moe_num_experts[0],
+                params_dtype=torch.float32,
                bias=False,
                quant_config=quant_config,


To ensure the gate layer is not quantized, quant_config should be explicitly set to None. Currently, it's passing the general quant_config from the model, which could lead to the gate being quantized if any quantization method is enabled for the model. This would contradict the purpose of setting params_dtype=torch.float32.

Suggested change

quant_config=quant_config,

quant_config=None,

gemini-code-assist · 2025-09-30T08:28:20Z

vllm/model_executor/models/ernie45_vl_moe.py

                config.moe_num_experts[1],
                bias=False,
+                params_dtype=torch.float32,
                quant_config=quant_config,


Similar to the text_experts_gate, the quant_config for vision_experts_gate should also be set to None to prevent it from being quantized. This ensures the gate operates in float32 as intended.

Suggested change

quant_config=quant_config,

quant_config=None,

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

jeejeelee

LGTM

jeejeelee · 2025-09-30T08:43:37Z

vllm/model_executor/models/ernie45_vl_moe.py

                bias=False,
-                quant_config=quant_config,
+                params_dtype=torch.float32,
+                quant_config=None,


QQ: Why is it set to None? Is it because of GPTQ quantization?

QQ: Why is it set to None? Is it because of GPTQ quantization?

Based on the advice of the ai-assistant and referencing the gate of other models.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Okay, it's already done

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>

Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Bugfix][Model]fix ernie45 moe gate dtype to float32

d52a6b5

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

[Bugfix][Model]fix ernie45 moe gate dtype to float32 v2

8232b4c

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

jeejeelee approved these changes Sep 30, 2025

View reviewed changes

jeejeelee reviewed Sep 30, 2025

View reviewed changes

[Bugfix][Model]fix ernie45 moe gate dtype to float32 v3

5618f8f

Signed-off-by: wangyafeng <wangyafeng@baidu.com>

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

jeejeelee merged commit ef6e0e7 into vllm-project:main Sep 30, 2025
50 checks passed

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (vllm-proje…

3a89e8c

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (#25936)

9dce93e

Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (vllm-proje…

c549c9d

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (vllm-proje…

d0dedd1

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (vllm-proje…

43cdd09

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (vllm-proje…

be641bc

…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936

Uh oh!

CSWYF3634076 commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

CSWYF3634076 Sep 30, 2025

Uh oh!

jeejeelee left a comment

Uh oh!

jeejeelee Sep 30, 2025

Uh oh!

CSWYF3634076 Sep 30, 2025 •

edited

Loading

Uh oh!

jeejeelee Sep 30, 2025

Uh oh!

CSWYF3634076 Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936

[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936

Uh oh!

Conversation

CSWYF3634076 commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeejeelee Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CSWYF3634076 commented Sep 30, 2025 •

edited by github-actions bot

Loading

CSWYF3634076 Sep 30, 2025 •

edited

Loading