[Feat] support fp8 quantization in update weights #24488

weixiao-huang · 2025-09-09T07:22:52Z

This MR makes that process_weights_after_loading could be reused in fp8 quantization

gemini-code-assist

Code Review

This pull request refactors weight processing for FP8 quantization to support weight updates, primarily by introducing a _wrap_parameter_or_copy helper function. This is a good change for compatibility with CUDA graphs. The change in kv_cache.py also improves robustness by ensuring quantization scales are always present. However, I've found a critical issue in Fp8MoEMethod.process_weights_after_loading where a parameter is not correctly unwrapped, leading to a no-op update and incorrect behavior in certain code paths. I've also suggested an improvement in kv_cache.py to make the code more robust by removing some overly strict assertions.

gemini-code-assist · 2025-09-09T07:27:02Z

vllm/model_executor/layers/quantization/fp8.py

In the else branch of the conditional starting at line 733, the variables w2_weight and w2_weight_scale_inv are assigned torch.nn.Parameter objects on lines 755-756, instead of their underlying tensor data. Consequently, these calls to _wrap_parameter_or_copy become no-ops due to self-copying, which is likely not the intended behavior and can lead to incorrect weight updates.

This is inconsistent with how w13_weight is handled in the same block, which correctly uses .data. To fix this, you should modify lines 755-756 to extract the tensor data, like so:

# In vllm/model_executor/layers/quantization/fp8.py, lines 755-756 w2_weight = layer.w2_weight.data w2_weight_scale_inv = layer.w2_weight_scale_inv.data

Since the fix is outside the diff, I'm placing this comment here to highlight this critical issue.

gemini-code-assist · 2025-09-09T07:27:02Z

vllm/model_executor/layers/quantization/kv_cache.py

These assertions could make the code brittle. If another part of the codebase modifies these attributes partially (e.g., removes q_scale but not k_scale), these assertions will fail. The main goal here is to ensure all weights are present if any are missing. Simply checking for q_scale and then creating all weights is sufficient and more robust against unforeseen state changes.

…dd missing scale attributes Signed-off-by: huangweixiao <huangweixiao@msh.team>

faresobeid · 2025-11-06T03:19:15Z

Update?

weixiao-huang requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 9, 2025 07:22

weixiao-huang force-pushed the fix/fp8-update-weights branch from 5dee0fe to 2f90b31 Compare September 9, 2025 07:24

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

youkaichao marked this pull request as draft September 9, 2025 07:40

[BugFix] use _wrap_parameter_or_copy instead of using Parameter and a…

cb70707

…dd missing scale attributes Signed-off-by: huangweixiao <huangweixiao@msh.team>

weixiao-huang force-pushed the fix/fp8-update-weights branch from 2f90b31 to cb70707 Compare September 9, 2025 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat] support fp8 quantization in update weights #24488

[Feat] support fp8 quantization in update weights #24488

Uh oh!

weixiao-huang commented Sep 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 9, 2025

Uh oh!

gemini-code-assist bot Sep 9, 2025

Uh oh!

faresobeid commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Feat] support fp8 quantization in update weights #24488

Are you sure you want to change the base?

[Feat] support fp8 quantization in update weights #24488

Uh oh!

Conversation

weixiao-huang commented Sep 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

faresobeid commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants