Refactor int8 dynamic quantization with call to `quantize` #294

…antize` Summary: Previously we added `quantize` as a general API (pytorch#256) for Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general. The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant and 8da4w (for executorch). This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor subclass. We'll make sure the performance does not regress for vit model. Test Plan: TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py reference: elapsed_time: 1.4821058654785155 milliseconds after refactor: elapsed_time: 1.4804757690429688 milliseconds generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor int8 dynamic quantization with call to `quantize` #294

Refactor int8 dynamic quantization with call to `quantize` #294

Commits on May 31, 2024

Refactor int8 dynamic quantization with call to quantize #294

Refactor int8 dynamic quantization with call to quantize #294

Commits on May 31, 2024

Refactor int8 dynamic quantization with call to `quantize` #294

Refactor int8 dynamic quantization with call to `quantize` #294