Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoquant v2 initial version #1240

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Nov 8, 2024

  1. Autoquant v2 initial version

    Summary:
    We refactored the v1 to do benchmark for subgraphs of (prev_op -> linear -> post_op) in order to get more accurate estimation
    of timing. One issue here is now we need to care about batch size of the subgraph, so we'd need the batch size dimension to use symbolic
    shape, seems that it does not have good support on torch.compile right now
    
    More improvements:
    * current batch size adjustment code is hardcoded to work for llama model, need to think of a way to generalize it
    * using canonicalized subgraph as key for the cache to reduce the number of times we need to do benchmarking
    * add accuracy sanity checks
    
    Test Plan:
    Testing with torchao/_models/llama/generate.py
    
    ```
    python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --compile_prefill --quantization autoquant_v2-int4
    ```
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    jerryzh168 committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    7b106d7 View commit details
    Browse the repository at this point in the history