Static quant support for SmoothQuant #3089

Xia-Weiwen · 2025-09-28T06:02:19Z

Summary
This PR adds static quant support for SmoothQuant by adding a new Int8StaticActivationInt8WeightConfig configuration. Static quantization will generally have better latency & throughput than dynamic quant as it saves the overhead of runtime qparam selection.
In the implementation:

Activation is only per-tensor quantized to support dynamic shape of activation.
The SmoothQuantObserver returns act scale along with the smoothing factor.
The act scale of each layer is set to Int8StaticActivationInt8WeightConfig for transformation of each linear layer.
- Note: The Int8StaticActivationInt8WeightConfig is not suitable for general static quantization (although it works), users should use PT2E in that case. It's because the act scale for the config are global instead of per-linear-layer, which is the same as Float8StaticActivationFloat8WeightConfig

Test plan
This PR also updates the test cases for SmoothQuant:

Support CPU-only environment
Bug fix: the linear module in the UT is not transformed to SmoothQuantLinear since the linear module itself is the parent module.
Add outliers to example inputs to simulate the case which SmoothQuant is intended to handle.

pytest -sv test/prototype/test_smoothquant.py

pytorch-bot · 2025-09-28T06:02:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3089

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit abfce41 with merge base 4013764 ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/prototype/test_smoothquant.py::TestSmoothQuant::test_smoothquant_static_act_accuracy_alpha_0_25_device_cuda_bfloat16
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
test/prototype/test_smoothquant.py::TestSmoothQuant::test_smoothquant_static_act_accuracy_alpha_0_25_device_cuda_bfloat16
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
test/prototype/test_smoothquant.py::TestSmoothQuant::test_smoothquant_static_act_accuracy_alpha_0_25_device_cuda_bfloat16
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/prototype/test_smoothquant.py::TestSmoothQuant::test_smoothquant_static_act_accuracy_alpha_0_25_device_cuda_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-09-30T02:30:28Z

@Xia-Weiwen I think it's better to wait until the Int8Tensor migration is done

Xia-Weiwen · 2025-09-30T02:31:41Z

@Xia-Weiwen I think it's better to wait until the Int8Tensor migration is done

Thanks for the info

namgyu-youn · 2025-10-14T12:19:10Z

torchao/prototype/smoothquant/example.py

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = (
-        AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16)
+        AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)


nit: torch_dtype is deprecated; please check #2982 for more info

namgyu-youn · 2025-10-14T12:19:28Z

torchao/prototype/smoothquant/example.py

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = (
-        AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16)
+        AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)


namgyu-youn · 2025-10-14T12:19:38Z

torchao/prototype/smoothquant/example.py

    torch.manual_seed(34)
    w8a8_model = (
-        AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16)
+        AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)


and here :)

namgyu-youn · 2025-10-14T12:33:38Z

torchao/prototype/smoothquant/example.py

    model_save_path: str,
    model_save_hf_hub_path: str,
+    static_quant_act: bool,
+    compile: bool,


Could you share result for torch.compile with static-quant? Not sure for the reason, but it decreased Token/Sec within dynamic-quant, and discussed to remove at #2728 (comment) .

Xia-Weiwen · 2025-11-04T01:27:30Z

This PR is out of date. We need to use the new int8 tensor API.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 28, 2025

Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 28, 2025

Xia-Weiwen added 2 commits September 28, 2025 11:31

Static quant support for SmoothQuant

7fe8aec

Update UT

df91cb2

Update example

abfce41

namgyu-youn reviewed Oct 14, 2025

View reviewed changes

namgyu-youn mentioned this pull request Oct 14, 2025

Add Int8Tensor for clearer interface #3038

Closed

Xia-Weiwen closed this Nov 4, 2025

Xia-Weiwen deleted the static_act_smoothquant branch November 4, 2025 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Static quant support for SmoothQuant #3089

Static quant support for SmoothQuant #3089

Uh oh!

Xia-Weiwen commented Sep 28, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 28, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Sep 30, 2025

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

namgyu-youn Oct 14, 2025

Uh oh!

namgyu-youn Oct 14, 2025

Uh oh!

namgyu-youn Oct 14, 2025

Uh oh!

namgyu-youn Oct 14, 2025

Uh oh!

Xia-Weiwen commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Static quant support for SmoothQuant #3089

Static quant support for SmoothQuant #3089

Uh oh!

Conversation

Xia-Weiwen commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3089

❌ 4 New Failures

Uh oh!

jerryzh168 commented Sep 30, 2025

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

namgyu-youn Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xia-Weiwen commented Sep 28, 2025 •

edited

Loading

pytorch-bot bot commented Sep 28, 2025 •

edited

Loading