Add float8 FakeQuantizeConfig and FakeQuantizer #2735

andrewor14 · 2025-08-11T17:28:47Z

Summary: This commit adds a QAT path for float8, using the same primitives as torchao.quantization.Float8Tensor targeting the following PTQ configs:

Float8DynamicActivationFloat8WeightConfig
Float8ActivationInt4WeightConfig

Usage:

from torchao.quantization.granularity import PerRow
from torchao.quantization.qat import quantize_, QATConfig

base_config = Float8DynamicActivationFloat8WeightConfig(
    torch.float8_e4m3fn, PerRow(),
)
quantize_(model, QATConfig(base_config, step="prepare"))
quantize_(model, QATConfig(base_config, step="convert"))

OR

from torchao.quantization.granularity import PerRow
from torchao.quantization.qat import (
    Float8FakeQuantizeConfig,
    QATConfig,
    quantize_,
)

dtype = torch.float8_e4m3fn
granularity = PerRow()
quantize_(model, QATConfig(
    activation_config=Float8FakeQuantizeConfig(dtype, granularity),
    weight_config=Float8FakeQuantizeConfig(dtype, granularity),
    step="prepare",
)

# convert (same as above, not shown)

Test Plan:

python test/quantization/test_qat.py -k test_float8_fake_quantize_config
python test/quantization/test_qat.py -k test_float8_fake_quantize
python test/quantization/test_qat.py -k test_quantize_api_fp8_fp8
python test/quantization/test_qat.py -k test_quantize_api_fp8_int4

Identical outputs between normal bf16 and QAT fine-tuning for both fp8-fp8 and fp8-int4, reproduced on Llama3.1 using this unsloth notebook. Loss curves also overlap almost exactly (not shown):

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 632459

pytorch-bot · 2025-08-11T17:28:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2735

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 6b85230 with merge base a1a9632 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14 · 2025-08-11T23:42:17Z

torchao/quantization/qat/fake_quantize_config.py

+            dtype=base_config.weight_dtype,
+            granularity=weight_granularity,
+        )
+    elif isinstance(base_config, Float8ActivationInt4WeightConfig):


@jerryzh168 can you confirm these config settings?

andrewor14 · 2025-08-11T23:46:02Z

torchao/quantization/quant_primitives.py


+# TODO: don't register as custom op?
 @_register_custom_op(quant_lib, False)
 def _dequantize_affine_float8(


@jerryzh168 I'm seeing this warning. Maybe should also skip registering this custom op?

/home/andrewor/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/autograd/graph.py:824: UserWarning: torchao::dequantize_affine_float8: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:62.)

oh, we didn't know this would be a problem, we can do

ao/torchao/quantization/quant_primitives.py

Line 361 in 1dca638

@register_custom_op

for now I think, this is added from intel for #2565 I believe and it's broken now anyways

ok, will relax in a separate PR

vkuzo · 2025-08-12T12:07:47Z

torchao/quantization/qat/fake_quantize_config.py

+            )
+        else:
+            # targeting tinygemm kernel
+            assert base_config.VERSION == 1


just support version 2, to minimize complexity?

vkuzo · 2025-08-12T12:09:30Z

can the test plan include training a real model and verifying loss converges

andrewor14 · 2025-08-12T14:11:49Z

can the test plan include training a real model and verifying loss converges

Yes this is in progress

jerryzh168 · 2025-08-12T21:19:39Z

test/quantization/test_qat.py

    _get_qmin_qmax,
 )
 from torchao.quantization.quant_api import (
+    Float8ActivationInt4WeightConfig,


nit: we just renamed this one

jerryzh168 · 2025-08-12T21:21:12Z

test/quantization/test_qat.py

+        sqnr = compute_error(out, out_expected)
+        self.assertGreater(sqnr, 16)
+
+    @parameterized.expand([(PerRow(),), (PerTensor(),)])


nit: why not use

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Line 66 in 1dca638

@common_utils.parametrize("granularity", [PerTensor(), PerRow()])

jerryzh168 · 2025-08-12T21:23:08Z

test/quantization/test_qat.py

+            if "fbgemm-gpu-genai" in str(e):
+                self.skipTest("fbgemm-gpu-genai not available")


nit: we can skip the test when fbgemm-gpu-genai is not installed:

ao/test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py

Line 45 in 1dca638

@unittest.skipIf(

jerryzh168 · 2025-08-12T21:23:31Z

test/quantization/test_qat.py

+        try:
+            quantize_(m, QATConfig(base_config, step="prepare"))
+            quantize_(m, QATConfig(base_config, step="convert"))
+            m(*example_inputs)


should this happen between prepare and convert as well

jerryzh168 · 2025-08-12T21:27:41Z

torchao/quantization/qat/fake_quantize_config.py

+            granularity=weight_granularity,
+        )
+    elif isinstance(base_config, Float8DynamicActivationInt4WeightConfig):
+        act_config = Float8FakeQuantizeConfig(


one thing we should pay extra attention here is whether the simulation works for int4 preshuffled tensor as well I think, we need some numerics testing to make sure

added an fp8-int4 numerics test

jerryzh168 · 2025-08-12T21:29:33Z

torchao/quantization/quant_primitives.py

            max_abs = torch.clamp(max_abs, min=hp_value_lb, max=hp_value_ub)
        scale = max_abs / quant_max
    else:
+        # rowwise


this is not necessarily rowwise I think, I believe this includes all granularities, and the len(block_size) == 0 is more of special case for tensorwise quant, I'm not sure where it comes from and whether it's needed, we could try to trace it and see if it can be removed as well to reduce complexity

jerryzh168 · 2025-08-13T18:13:53Z

test/quantization/test_qat.py

+            Float8FakeQuantizeConfig(granularity=PerToken())
+
+    @parametrize("granularity", [PerTensor(), PerRow()])
+    def test_float8_fake_quantize(self, granularity: Granularity):


can you add a same test for fp8_int4?

added some sqnr comparison against PTQ fp8_int4

**Summary:** This commit adds a QAT path for float8, using the same primitives as `torchao.quantization.Float8Tensor` targeting the following PTQ configs: - `Float8DynamicActivationFloat8WeightConfig` - `Float8DynamicActivationInt4WeightConfig` Usage: ``` from torchao.quantization.granularity import PerRow from torchao.quantization.qat import quantize_, QATConfig base_config = Float8DynamicActivationFloat8WeightConfig( torch.float8_e4m3fn, PerRow(), ) quantize_(model, QATConfig(base_config, step="prepare")) quantize_(model, QATConfig(base_config, step="convert")) ``` OR ``` from torchao.quantization.granularity import PerRow from torchao.quantization.qat import ( Float8FakeQuantizeConfig, QATConfig, quantize_, ) dtype = torch.float8_e4m3fn granularity = PerRow() quantize_(model, QATConfig( activation_config=Float8FakeQuantizeConfig(dtype, granularity), weight_config=Float8FakeQuantizeConfig(dtype, granularity), step="prepare", ) # convert (same as above, not shown) ``` **Test Plan:** ``` python test/quantization/test_qat.py -k test_float8_fake_quantize_config python test/quantization/test_qat.py -k test_float8_fake_quantize python test/quantization/test_qat.py -k test_quantize_api_fp8_fp8 python test/quantization/test_qat.py -k test_quantize_api_fp8_int4 ```

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2025

andrewor14 marked this pull request as draft August 11, 2025 17:28

andrewor14 added the topic: new feature Use this tag if this PR adds a new feature label Aug 11, 2025

andrewor14 force-pushed the fp8-fake-quantizer branch from df874d5 to 481ac90 Compare August 11, 2025 23:31

andrewor14 requested a review from jerryzh168 August 11, 2025 23:31

andrewor14 changed the title ~~[draft] Add float8 FakeQuantizeConfig and FakeQuantizer~~ Add float8 FakeQuantizeConfig and FakeQuantizer Aug 11, 2025

andrewor14 requested review from drisspg and vkuzo August 11, 2025 23:32

andrewor14 marked this pull request as ready for review August 11, 2025 23:32

andrewor14 force-pushed the fp8-fake-quantizer branch 3 times, most recently from 73bca60 to 7460a2d Compare August 11, 2025 23:39

andrewor14 commented Aug 11, 2025

View reviewed changes

vkuzo reviewed Aug 12, 2025

View reviewed changes

andrewor14 force-pushed the fp8-fake-quantizer branch 2 times, most recently from 7f19d27 to 1162ac3 Compare August 12, 2025 14:11

andrewor14 force-pushed the fp8-fake-quantizer branch 2 times, most recently from 3129101 to 204e99b Compare August 12, 2025 20:53

jerryzh168 reviewed Aug 12, 2025

View reviewed changes

andrewor14 force-pushed the fp8-fake-quantizer branch from 204e99b to d669f71 Compare August 12, 2025 22:23

andrewor14 requested a review from jerryzh168 August 12, 2025 22:24

andrewor14 force-pushed the fp8-fake-quantizer branch 2 times, most recently from f002586 to e1823c6 Compare August 13, 2025 14:23

jerryzh168 reviewed Aug 13, 2025

View reviewed changes

jerryzh168 approved these changes Aug 13, 2025

View reviewed changes

andrewor14 force-pushed the fp8-fake-quantizer branch from e1823c6 to bd0bd9d Compare August 13, 2025 19:40

andrewor14 force-pushed the fp8-fake-quantizer branch from bd0bd9d to 6b85230 Compare August 13, 2025 19:45

andrewor14 merged commit 715ea9f into main Aug 13, 2025
21 checks passed

andrewor14 mentioned this pull request Sep 2, 2025

FP8 QAT / FP8 block-wise quantization #1632

Closed

		if "fbgemm-gpu-genai" in str(e):
		self.skipTest("fbgemm-gpu-genai not available")

Add float8 FakeQuantizeConfig and FakeQuantizer #2735

Add float8 FakeQuantizeConfig and FakeQuantizer #2735

Uh oh!

Conversation

andrewor14 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2735

⏳ No Failures, 2 Pending

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Aug 12, 2025

Uh oh!

andrewor14 commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewor14 commented Aug 11, 2025 •

edited

Loading

pytorch-bot bot commented Aug 11, 2025 •

edited

Loading

jerryzh168 Aug 13, 2025 •

edited

Loading