Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor int8 dynamic quantization with call to quantize #294

Merged
merged 1 commit into from
May 31, 2024

Commits on May 31, 2024

  1. Replace implementation for int8 dynamic quantization with call to `qu…

    …antize`
    
    Summary:
    Previously we added `quantize` as a general API (pytorch#256) for
    Affine Quantized tensor subclass, and also tensor subclass based dtype conversion in general.
    
    The plan is to use this to replace existing quant APIs including int4 weight only, int8 weight only, int8 dynamic quant
    and 8da4w (for executorch).
    
    This PR we started replacing the implementation of int8 dynamic quant API with `quantize` API with affine quantized tensor
    subclass. We'll make sure the performance does not regress for vit model.
    
    Test Plan:
    TORCH_LOGS='output_code' python tutorials/quantize_vit/run_vit_b_quant.py
    
    reference: elapsed_time:  1.4821058654785155  milliseconds
    after refactor: elapsed_time:  1.4804757690429688  milliseconds
    
    generated code diff: https://gist.github.com/jerryzh168/90c71107a5aaaa5d8dd2170c573e076d
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    jerryzh168 committed May 31, 2024
    Configuration menu
    Copy the full SHA
    4ec820d View commit details
    Browse the repository at this point in the history