[Experimental] Float8 support in AQT #671

jainapurva · 2024-08-14T00:54:07Z

Add float8 inference support to current Affine Quantized Tensor.

Test Plan : test/dtypes/test_affine_quantized.py

pytorch-bot · 2024-08-14T00:54:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/671

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 482f537 with merge base 9a56e80 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/dtypes/affine_quantized_tensor.py

vkuzo · 2024-08-14T15:03:57Z

I think a good next step would be to add numerical tests, and ensure that this new object matches the numerical behavior of Float8Tensor.

HDCharles · 2024-08-20T18:39:40Z

can i ask whether we're sure we should include float8 tensors in AQT instead of another paradigm?

jainapurva · 2024-08-20T22:04:30Z

can i ask whether we're sure we should include float8 tensors in AQT instead of another paradigm?

AQT conceptually aligns a lot with fp8/fpx. Instead of writing a separate tensor subclass, it's more efficient to add float support to AQT. The concept of AQT is shared, the major difference is for dtype. This is an experimental PR, to test the feasibility. The design will be modified.

HDCharles · 2024-08-20T22:25:55Z

(sorry didn't mean to close this, missclick)

torchao/quantization/quant_primitives.py

torchao/dtypes/affine_quantized_tensor.py

vkuzo · 2024-08-21T20:54:21Z

torchao/dtypes/affine_quantized_tensor.py

@@ -269,6 +270,42 @@ def from_float_static(
 dtype=input_float.dtype,
 )

+ @classmethod
+ def from_float_float8(


@jerryzh168 can you help with a design on how to have from_float, from_float_static, etc extend to this use case? Ideally we shouldn't special case a set of dtypes (float8) to have their own function.

A combined function is better, will be refactoring it after testing float8

yeah sure, I think we could have the two following final state:

have separate from_float_fpx and from_from_intx since they have a bit different arg list

if we manage to generalize the arg list enough so it is reasonable merge the two then we can merge as well, I will discuss with Apurva and Driss about the args but at the first glance maybe preserve_zero is always going to be true and zero_point_domain may not apply here

makes sense.

one thought, the from_float name will become more confusing if both the source and the target can also be various floating point bitwidths. To clarify this in torchao.float8, I went with the high_precision|hp and low_precision|lp naming scheme

This is also why I like the idea of extending to and having our own factory functions that we can pass dtype enums. For example

ao/torchao/dtypes/__init__.py

Line 16 in 5c0e060

"to_nf4",

@vkuzo yeah makes sense, we can rename from_float to from_high_precision as well. as @cpuhrsch mentioned, this is not user facing API, we'll have to factory functions for various dtypes as the user facing API:

ao/torchao/dtypes/affine_quantized_tensor.py

Lines 990 to 991 in 5c0e060

to_affine_quantized = AffineQuantizedTensor.from_float

to_affine_quantized_static = AffineQuantizedTensor.from_float_static

@jainapurva as discussed from the meeting, let's merge this into from_float and add some guards on arguments

test/dtypes/test_affine_quantized_float.py

torchao/quantization/quant_primitives.py

test/dtypes/test_affine_quantized.py

jainapurva · 2024-08-24T00:36:59Z

@jainapurva can you add this to https://github.com/pytorch/ao/blob/main/torchao/_models/llama/generate.py and https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py as well and get some e2e accuracy and perf numbers for them

Testing perf in this PR: #732

torchao/dtypes/affine_quantized_tensor.py

jerryzh168 · 2024-08-24T04:21:20Z

torchao/dtypes/affine_quantized_tensor.py

+def validate_float8_params(
+ input_float, mapping_type, target_dtype, quant_min, quant_max, eps, scale_dtype, zero_point_dtype, preserve_zero, zero_point_domain, layout_type, use_hqq
+):
+ assert input_float.is_floating_point(), "input_float must be a floating point tensor"


actually based on this it seems most of the args are irrelevant for float8, maybe just splitting the op makes more sense?

Yes, created a new op from_float_to_floatx. It'll be calling the existing from_float but with some pre-defined param values. Also for future we'll need to rename and refactor these methods.

jerryzh168 · 2024-08-27T19:31:22Z

torchao/quantization/quant_api.py

+ Applies float8 weight-only symmetric per-channel quantization to linear layers.
+ """
+ def apply_float8wo_quant(weight):
+ # avoid circular dep


maybe you want to import to_affine_quantized_floatx here, I'm also refactoring this file to change the import to the file to avoid circular dep as well

updates for float

12d0ac2

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2024

jainapurva added 3 commits August 13, 2024 18:06

AAdded optional param

9173ee1

updates for compatible type

32bf45f

updates to support torch=2.2

ddc1bc0

jainapurva requested review from jerryzh168, vkuzo and drisspg August 14, 2024 14:57

vkuzo reviewed Aug 14, 2024

View reviewed changes

torchao/dtypes/affine_quantized_tensor.py Outdated Show resolved Hide resolved

jainapurva added 2 commits August 19, 2024 12:57

updates to quant api

b3e4e79

updates

e778bca

HDCharles closed this Aug 20, 2024

Float8 updates

d4b057f

jainapurva reopened this Aug 20, 2024

jainapurva marked this pull request as ready for review August 20, 2024 22:04

jainapurva added 2 commits August 20, 2024 15:06

Merge branch 'main' into experimental_float8_aqt

9d86df3

todos

04c471e

version check

d86d798