Add `quantize` #256

jerryzh168 · 2024-05-18T00:45:48Z

Summary:
This exposes the AffineQuantizedTensor and LinearActQuantizedTensor subclass as a model level API that will replace the weights of linear layers This is in preparation to replace existing tensor subclass APIs such as change_linear_weights_to_int4_woqtensors

Test Plan:
regression tests:

python test/quantization/test_quant_api.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-18T00:45:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/256

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 82ec155 with merge base 163cb93 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/quant_api.py

torchao/quantization/subclass.py

cpuhrsch

Please seen inline comment

test/quantization/test_quant_api.py

torchao/quantization/quant_api.py

Summary: This exposes the AffineQuantizedTensor and LinearActQuantizedTensor subclass as a model level API that will replace the weights of linear layers This is in preparation to replace existing tensor subclass APIs such as `change_linear_weights_to_int4_woqtensors` but currently we can't combine the two quantizers due to some problem with parametrization/nn.Parameter the error is: raise KeyError(f"attribute '{name}' already exists") KeyError: "attribute 'weight' already exists" happens in ``` lin.weight = torch.nn.Parameter(constructor(lin.weight, **copied_kwargs), requires_grad=False) ``` Test Plan: regression tests: ``` python test/quantization/test_quant_api.py ``` Reviewers: Subscribers: Tasks: Tags:

…antize` Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags:

Summary: Previously we added `quantize` as a general API (#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags:

…antize` Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags:

* Replace implementation for int8 dynamic quantization with call to `quantize` Summary: Previously we added `quantize` as a general API (#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags: * Refactor int8 weight only quant to use `quantize` Summary: Similar to #294 we replaced the implementation of int8 weight only quant to used the newly added `quantize` function, as a part of the unification effort for affine quantization Test Plan: 1. unit perf test: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_wo_quant_perf elapsed time: 0.23909856796264647, ref elapsed time: 0.25150911331176756 elapsed time: 0.24894208908081056, ref elapsed time: 0.2570047950744629 elapsed time: 0.21607391357421876, ref elapsed time: 0.22809568405151368 2. integration test: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py Reference: elapsed_time: 1.355208740234375 milliseconds After refactor: elapsed_time: 1.32778857421875 milliseconds code diff (gist): https://gist.github.com/jerryzh168/921a722cf20d476c8fc5888482e722dc code diff (meta-only paste): https://www.internalfb.com/phabricator/paste/view/P1387333845 Reviewers: Subscribers: Tasks: Tags: * Replace implementation for int8 dynamic quantization with call to `quantize` Summary: Previously we added `quantize` as a general API (#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags: * Refactor int4 weight only quantization with call to `quantize` Summary: This is similar to #294 but applied for int4 weight only quantization Test Plan: unit perf test: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4_wo_quant_perf elapsed time: 0.2166275215148926, ref elapsed time: 0.2191881561279297 elapsed time: 0.2376406478881836, ref elapsed time: 0.22721023559570314 elapsed time: 0.21919679641723633, ref elapsed time: 0.2154969596862793 integration perf test: reference: elapsed_time: 2.5900126953125 milliseconds after refactor: elapsed_time: 2.56680078125 milliseconds diff: no diff TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py Before: After: generated code diff: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

Summary: This exposes the AffineQuantizedTensor and LinearActQuantizedTensor subclass as a model level API that will replace the weights of linear layers This is in preparation to replace existing tensor subclass APIs such as `change_linear_weights_to_int4_woqtensors` but currently we can't combine the two quantizers due to some problem with parametrization/nn.Parameter the error is: raise KeyError(f"attribute '{name}' already exists") KeyError: "attribute 'weight' already exists" happens in ``` lin.weight = torch.nn.Parameter(constructor(lin.weight, **copied_kwargs), requires_grad=False) ``` Test Plan: regression tests: ``` python test/quantization/test_quant_api.py ``` Reviewers: Subscribers: Tasks: Tags:

Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags:

…torch#301) * Replace implementation for int8 dynamic quantization with call to `quantize` Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags: * Refactor int8 weight only quant to use `quantize` Summary: Similar to pytorch#294 we replaced the implementation of int8 weight only quant to used the newly added `quantize` function, as a part of the unification effort for affine quantization Test Plan: 1. unit perf test: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_wo_quant_perf elapsed time: 0.23909856796264647, ref elapsed time: 0.25150911331176756 elapsed time: 0.24894208908081056, ref elapsed time: 0.2570047950744629 elapsed time: 0.21607391357421876, ref elapsed time: 0.22809568405151368 2. integration test: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py Reference: elapsed_time: 1.355208740234375 milliseconds After refactor: elapsed_time: 1.32778857421875 milliseconds code diff (gist): https://gist.github.com/jerryzh168/921a722cf20d476c8fc5888482e722dc code diff (meta-only paste): https://www.internalfb.com/phabricator/paste/view/P1387333845 Reviewers: Subscribers: Tasks: Tags: * Replace implementation for int8 dynamic quantization with call to `quantize` Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags: * Refactor int4 weight only quantization with call to `quantize` Summary: This is similar to pytorch#294 but applied for int4 weight only quantization Test Plan: unit perf test: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4_wo_quant_perf elapsed time: 0.2166275215148926, ref elapsed time: 0.2191881561279297 elapsed time: 0.2376406478881836, ref elapsed time: 0.22721023559570314 elapsed time: 0.21919679641723633, ref elapsed time: 0.2154969596862793 integration perf test: reference: elapsed_time: 2.5900126953125 milliseconds after refactor: elapsed_time: 2.56680078125 milliseconds diff: no diff TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py Before: After: generated code diff: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

* fix int4pack for AOTI on x86 * fixes * fixes * handle cuda in4pack_mm * typo

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2024

jerryzh168 force-pushed the add-api branch from 1084cf2 to ec109ef Compare May 18, 2024 00:47

jerryzh168 requested review from cpuhrsch and HDCharles May 18, 2024 00:52

jerryzh168 force-pushed the add-api branch 2 times, most recently from 5541f43 to 19d8d68 Compare May 20, 2024 22:09

cpuhrsch reviewed May 21, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 21, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

cpuhrsch approved these changes May 21, 2024

View reviewed changes

jerryzh168 force-pushed the add-api branch from 21d94fb to dff6a45 Compare May 22, 2024 23:44

jerryzh168 changed the title ~~Add WeightQuantizer and DynamicActQuantizer~~ Add TensorSubclassQuantizer May 22, 2024

jerryzh168 requested a review from albanD May 22, 2024 23:46

jerryzh168 force-pushed the add-api branch 3 times, most recently from bd75fb2 to e325d4c Compare May 23, 2024 00:13

cpuhrsch reviewed May 23, 2024

View reviewed changes

torchao/quantization/subclass.py Show resolved Hide resolved

cpuhrsch requested changes May 23, 2024

View reviewed changes

cpuhrsch reviewed May 23, 2024

View reviewed changes

test/quantization/test_quant_api.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 23, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the add-api branch 4 times, most recently from 1bc8d18 to 049cf34 Compare May 24, 2024 02:15

jerryzh168 requested a review from cpuhrsch May 24, 2024 02:15

jerryzh168 force-pushed the add-api branch 2 times, most recently from e0db22b to 8b4e88e Compare May 24, 2024 02:24

jerryzh168 changed the title ~~Add TensorSubclassQuantizer~~ Add quantize May 24, 2024

jerryzh168 force-pushed the add-api branch from 8b4e88e to ac57a0c Compare May 24, 2024 02:53

jerryzh168 mentioned this pull request May 30, 2024

Refactor int8 dynamic quantization with call to quantize #294

Merged

jerryzh168 mentioned this pull request May 31, 2024

Refactor int8 weight only quant to use quantize #299

Closed

jerryzh168 mentioned this pull request Jun 1, 2024

Refactor int4 and int8 weight only quantization to use quantize #301

Merged

jerryzh168 deleted the add-api branch July 2, 2024 20:31

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

fix int4pack for AOTI on x86 (pytorch#256)

2d78ce5

* fix int4pack for AOTI on x86 * fixes * fixes * handle cuda in4pack_mm * typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `quantize` #256

Add `quantize` #256

Uh oh!

jerryzh168 commented May 18, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpuhrsch left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add quantize #256

Add quantize #256

Uh oh!

Conversation

jerryzh168 commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/256

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpuhrsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add `quantize` #256

Add `quantize` #256

jerryzh168 commented May 18, 2024 •

edited

Loading

pytorch-bot bot commented May 18, 2024 •

edited

Loading