Avoid normalization layers in HF's quantization_config #3030

lisjin · 2025-09-18T18:15:24Z

This is a follow-up to #3015. I found that setting model.config.quantization_config with a "_default" key will quantize not just linear layer weights, but all weights, including normalization layers.

The culprit is still this line from TorchAoHfQuantizer.create_quantized_param:

quantize_(module, c, filter_fn=lambda x, fqn: True)

In order to avoid quantizing non-linear weights, I had to manually add all module names for normalization layers to modules_to_not_convert, which is an argument to TorchAoConfig.

@andrewor14 Do you know if this is the intended behavior? I don't see why they aren't using the default filter_fn for quantize_.

pytorch-bot · 2025-09-18T18:15:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1572b34 with merge base 8525185 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14 · 2025-09-18T20:42:48Z

I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm.

There are two ways to skip quantizing certain layers today:

ModuleFqnToConfig, more flexible but also a bit more verbose
modules_to_not_convert, just specify a list of module names to not convert

However, neither of these accept regex today so you'll have to specify all the modules you want to skip manually, which may be a bit brittle

torchao/prototype/parq/quant/config_torchao.py

test/prototype/test_parq.py

jerryzh168 · 2025-09-22T17:31:28Z

test/prototype/test_parq.py

+            self.embedding = embedding
+            self.tied_weights = tied_weights
+
+    class PreTrainedM(M, PreTrainedModel):


oh I meant just using some specific model defined in transformers, and use the public APIs, just making sure, would the tests work for existing models in transformers?

Yes, existing transformers models also inherit from PreTrainedModel. AutoModelForCausalLM.from_pretrained(..., quantization_config=quantization_config) can be tested in the same way

jerryzh168

lg, see comments inline, just want to make sure that the APIs we are using are applicable to all huggingface models

metascroy · 2025-09-26T18:15:25Z

I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm.

This is not at all what I would expect for how default_ to behave since the "default" in quantize_ is to only apply to linear.

@jerryzh168 this behavior means we're quantizing more than we expect in our hummingbird models?

jerryzh168 · 2025-09-26T18:25:18Z

This is not at all what I would expect for how default_ to behave since the "default" in quantize_ is to only apply to linear.

yeah this is the intended behavior since now user can do the filtering manually when they generate ModuleFqnToConfig

it will still fail if the module does not have .weight though, due to the implementation details of quantize_

for the models we quantize, I think it's still just linears that are quantized, otherwise these are likely to fail:

ao/torchao/quantization/quant_api.py

Lines 965 to 977 in 3d48174

    
           new_weight, new_bias = _int8_dynamic_activation_intx_weight_quantize_tensor( 
        
               module.weight, 
        
               module.bias, 
        
               config, 
        
               custom_scale=custom_scale, 
        
               custom_zero_point=custom_zero_point, 
        
           ) 
        
           module.weight = torch.nn.Parameter(new_weight, requires_grad=False) 
        
           if new_bias is None: 
        
               module.bias = None 
        
           if isinstance(module, nn.Linear): 
        
               module.extra_repr = types.MethodType(_linear_extra_repr, module) 
        
           return module

although it does make sense to have better error messages when it's not what we expect

lisjin requested review from andrewor14 and jerryzh168 September 18, 2025 18:15

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2025

lisjin added the topic: bug fix Use this tag for PRs that fix bugs label Sep 18, 2025

andrewor14 reviewed Sep 18, 2025

View reviewed changes

torchao/prototype/parq/quant/config_torchao.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Sep 20, 2025

View reviewed changes

test/prototype/test_parq.py Outdated Show resolved Hide resolved

lisjin force-pushed the lvj/hf-quant-config branch 3 times, most recently from 80e9c51 to 9b781c6 Compare September 21, 2025 17:01

lisjin added 2 commits September 21, 2025 10:56

Avoid normalization layers in HF's quantization_config

0fc89a4

Add TestTorchAoConfigIntegration

8634b2e

lisjin force-pushed the lvj/hf-quant-config branch 4 times, most recently from e6c994c to 1697fee Compare September 21, 2025 18:41

Use PreTrainedModel.from_pretrained

1572b34

lisjin force-pushed the lvj/hf-quant-config branch from 1697fee to 1572b34 Compare September 21, 2025 19:11

lisjin requested review from andrewor14, jerryzh168 and metascroy September 22, 2025 13:38

jerryzh168 reviewed Sep 22, 2025

View reviewed changes

jerryzh168 approved these changes Sep 22, 2025

View reviewed changes

lisjin merged commit fb7c837 into main Sep 22, 2025
18 checks passed

lisjin deleted the lvj/hf-quant-config branch September 22, 2025 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid normalization layers in HF's quantization_config #3030

Avoid normalization layers in HF's quantization_config #3030

Uh oh!

lisjin commented Sep 18, 2025

Uh oh!

pytorch-bot bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

andrewor14 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Sep 22, 2025 •

edited

Loading

Uh oh!

lisjin Sep 22, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

Uh oh!

metascroy commented Sep 26, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Sep 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Avoid normalization layers in HF's quantization_config #3030

Avoid normalization layers in HF's quantization_config #3030

Uh oh!

Conversation

lisjin commented Sep 18, 2025

Uh oh!

pytorch-bot bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030

✅ No Failures

Uh oh!

andrewor14 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lisjin Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metascroy commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Sep 18, 2025 •

edited

Loading

jerryzh168 Sep 22, 2025 •

edited

Loading

metascroy commented Sep 26, 2025 •

edited

Loading

jerryzh168 commented Sep 26, 2025 •

edited

Loading