Skip to content

Conversation

@lisjin
Copy link
Contributor

@lisjin lisjin commented Sep 18, 2025

This is a follow-up to #3015. I found that setting model.config.quantization_config with a "_default" key will quantize not just linear layer weights, but all weights, including normalization layers.

The culprit is still this line from TorchAoHfQuantizer.create_quantized_param:

quantize_(module, c, filter_fn=lambda x, fqn: True)

In order to avoid quantizing non-linear weights, I had to manually add all module names for normalization layers to modules_to_not_convert, which is an argument to TorchAoConfig.

@andrewor14 Do you know if this is the intended behavior? I don't see why they aren't using the default filter_fn for quantize_.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1572b34 with merge base 8525185 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2025
@lisjin lisjin added the topic: bug fix Use this tag for PRs that fix bugs label Sep 18, 2025
@andrewor14
Copy link
Contributor

I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm.

There are two ways to skip quantizing certain layers today:

  1. ModuleFqnToConfig, more flexible but also a bit more verbose
  2. modules_to_not_convert, just specify a list of module names to not convert

However, neither of these accept regex today so you'll have to specify all the modules you want to skip manually, which may be a bit brittle

@lisjin lisjin force-pushed the lvj/hf-quant-config branch 3 times, most recently from 80e9c51 to 9b781c6 Compare September 21, 2025 17:01
@lisjin lisjin force-pushed the lvj/hf-quant-config branch 4 times, most recently from e6c994c to 1697fee Compare September 21, 2025 18:41
self.embedding = embedding
self.tied_weights = tied_weights

class PreTrainedM(M, PreTrainedModel):
Copy link
Contributor

@jerryzh168 jerryzh168 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I meant just using some specific model defined in transformers, and use the public APIs, just making sure, would the tests work for existing models in transformers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, existing transformers models also inherit from PreTrainedModel. AutoModelForCausalLM.from_pretrained(..., quantization_config=quantization_config) can be tested in the same way

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, see comments inline, just want to make sure that the APIs we are using are applicable to all huggingface models

@lisjin lisjin merged commit fb7c837 into main Sep 22, 2025
18 checks passed
@lisjin lisjin deleted the lvj/hf-quant-config branch September 22, 2025 18:41
@metascroy
Copy link
Contributor

metascroy commented Sep 26, 2025

I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm.

This is not at all what I would expect for how default_ to behave since the "default" in quantize_ is to only apply to linear.

@jerryzh168 this behavior means we're quantizing more than we expect in our hummingbird models?

@jerryzh168
Copy link
Contributor

jerryzh168 commented Sep 26, 2025

This is not at all what I would expect for how default_ to behave since the "default" in quantize_ is to only apply to linear.

yeah this is the intended behavior since now user can do the filtering manually when they generate ModuleFqnToConfig

it will still fail if the module does not have .weight though, due to the implementation details of quantize_

for the models we quantize, I think it's still just linears that are quantized, otherwise these are likely to fail:

new_weight, new_bias = _int8_dynamic_activation_intx_weight_quantize_tensor(
module.weight,
module.bias,
config,
custom_scale=custom_scale,
custom_zero_point=custom_zero_point,
)
module.weight = torch.nn.Parameter(new_weight, requires_grad=False)
if new_bias is None:
module.bias = None
if isinstance(module, nn.Linear):
module.extra_repr = types.MethodType(_linear_extra_repr, module)
return module

although it does make sense to have better error messages when it's not what we expect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bug fix Use this tag for PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants