Skip to content

Commit 8b8c209

Browse files
authored
static_scaled_fp8_quant should not run when scale.numel is not 1 (#20076)
1 parent 23a04e0 commit 8b8c209

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/_custom_ops.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1276,7 +1276,7 @@ def scaled_fp8_quant(
12761276
torch.ops._C.dynamic_scaled_fp8_quant(output, input, scale)
12771277
else:
12781278
# num_token_padding not implemented for this case
1279-
assert (scale.numel() == 1 or num_token_padding is None)
1279+
assert (scale.numel() == 1 and num_token_padding is None)
12801280
torch.ops._C.static_scaled_fp8_quant(output, input, scale)
12811281

12821282
return output, scale

0 commit comments

Comments
 (0)