Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 29, 2025

Purpose

Fix Weight Loading for Cutlass SM90.

Test Plan

vllm serve deepseek-ai/DeepSeek-V3.2-Exp -tp 8 --max-num-seqs 128 --load-format dummy --enforce_eager

Original

16:37:34 [multiproc_executor.py:671]     output = w8a8_blockscale_func(q_input, weight, x_scale, weight_scale,
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]   File "/opt/vllm-source/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 46, in cutlass_scaled_mm
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]     return ops.cutlass_scaled_mm(
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]   File "/opt/vllm-source/vllm/_custom_ops.py", line 667, in cutlass_scaled_mm
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]   File "/opt/vllm/lib64/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]     return self._op(*args, **kwargs)
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671] RuntimeError: b_scale_group_shape must be [128, 128].
(APIServer pid=1) (EngineCore_DP0 pid=293) (Worker_TP3 pid=305) ERROR 09-29 16:37:34 [multiproc_executor.py:671] 

Now

(APIServer pid=3511336) INFO 09-29 22:03:11 [launcher.py:42] Route: /invocations, Methods: POST
(APIServer pid=3511336) INFO 09-29 22:03:11 [launcher.py:42] Route: /metrics, Methods: GET
(APIServer pid=3511336) INFO:     Started server process [3511336]
(APIServer pid=3511336) INFO:     Waiting for application startup.
(APIServer pid=3511336) INFO:     Application startup complete.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the weight loading logic for FP8 block-quantized models on SM90 GPUs, which could cause a runtime error. The refactoring also improves code clarity. However, I've identified a related issue where torch.bfloat16 is hardcoded when checking if DeepGEMM should be used. This could lead to incorrect behavior for models using other data types like float16. I've suggested a fix to use the layer's original data type instead.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
@heheda12345 heheda12345 added this to the v0.11.0 Cherry Picks milestone Sep 29, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, makes sense why it failed

@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed deepseek Related to DeepSeek models labels Sep 29, 2025
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing it!

@mgoin mgoin changed the title [Bug] Fix Weight Loading for Cutlass SM90 [Bug] Fix Weight Loading for Block FP8 Cutlass SM90 Sep 29, 2025
@youkaichao youkaichao merged commit 89e4050 into main Sep 30, 2025
54 checks passed
@youkaichao youkaichao deleted the wentao-fix-weight-loading branch September 30, 2025 01:15
simon-mo pushed a commit that referenced this pull request Oct 1, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
yewentao256 added a commit that referenced this pull request Oct 3, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants