You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix Float8Tensor quantize op kernrel preference dispatch
Summary:
Previously we didn't handle kernel_preference == "fbgemm" properly for the quantize op,
this PR makes sure we dispatch to fbgemm kernels when kernel_preference is fbgemm
This doesn't have much impact on BC, the serialized checkpoints will use AUTO which is going to be dispatched
to triton op for quantize, only thing is fixing the kernel choice for fbgemm kernel preference, which
is supposed to be a developer facing API (we expect most users to just use AUTO without worrying about details)
Test Plan:
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_kernel_preference_numerical_equivalence
Reviewers:
Subscribers:
Tasks:
Tags:
stack-info: PR: #2883, branch: jerryzh168/stack/59
Copy file name to clipboardExpand all lines: torchao/quantization/quantize_/common/kernel_preference.py
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -30,5 +30,9 @@ class KernelPreference(str, Enum):
30
30
"""
31
31
FBGEMM="fbgemm"
32
32
33
+
"""Use triton quantize and quantized mm kernels (if available), requires fbgemm_gpu_genai library, if no triton kernel for the quantize op or mm kernel is available, we'll fallback to torch ops
0 commit comments