-
Notifications
You must be signed in to change notification settings - Fork 360
Avoid normalization layers in HF's quantization_config #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3030
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1572b34 with merge base 8525185 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
I think that's the intended behavior but just wanted to cc @jerryzh168 to confirm. There are two ways to skip quantizing certain layers today:
However, neither of these accept regex today so you'll have to specify all the modules you want to skip manually, which may be a bit brittle |
80e9c51 to
9b781c6
Compare
e6c994c to
1697fee
Compare
1697fee to
1572b34
Compare
| self.embedding = embedding | ||
| self.tied_weights = tied_weights | ||
|
|
||
| class PreTrainedM(M, PreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I meant just using some specific model defined in transformers, and use the public APIs, just making sure, would the tests work for existing models in transformers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, existing transformers models also inherit from PreTrainedModel. AutoModelForCausalLM.from_pretrained(..., quantization_config=quantization_config) can be tested in the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg, see comments inline, just want to make sure that the APIs we are using are applicable to all huggingface models
This is not at all what I would expect for how default_ to behave since the "default" in quantize_ is to only apply to linear. @jerryzh168 this behavior means we're quantizing more than we expect in our hummingbird models? |
yeah this is the intended behavior since now user can do the filtering manually when they generate ModuleFqnToConfig it will still fail if the module does not have for the models we quantize, I think it's still just linears that are quantized, otherwise these are likely to fail: ao/torchao/quantization/quant_api.py Lines 965 to 977 in 3d48174
although it does make sense to have better error messages when it's not what we expect |
This is a follow-up to #3015. I found that setting
model.config.quantization_configwith a "_default" key will quantize not just linear layer weights, but all weights, including normalization layers.The culprit is still this line from TorchAoHfQuantizer.create_quantized_param:
In order to avoid quantizing non-linear weights, I had to manually add all module names for normalization layers to
modules_to_not_convert, which is an argument toTorchAoConfig.@andrewor14 Do you know if this is the intended behavior? I don't see why they aren't using the default
filter_fnforquantize_.