Closed
Description
Focus - benchmarking, documentation, tutorials, prototype to beta
Due date: June 13 2024
Spillover from 0.2.0
- Consolidating workflows to use tensor subclass @jerryzh168
- Fast sparse training @jcaip
- Explore adding HQQ 4/3/2-bit quant to torchao @HDCharles @mobicham
- Don't cause a "import error" when someone is using a feature unsupported by e.g. torch 2.1.2 (e.g. https://github.com/pytorch-labs/ao/blob/046dc985de6d5eac05c6575cc71505687e3aadf1/torchao/quantization/quant_primitives.py#L42 will cause an import error if someone tries to use torchao.quantization.quant_primitives.per_token_dynamic_quant on 2.2.2
Benchmarking
- Setup model level benchmark for accuracy and performance in torchbench for single quantization API, so that we can start deprecating quant primitives and quant apis after making sure no regressions (@HDCharles)
- Benchmarks for auto quant on pytorch benchmark's inference quant pane (@HDCharles)
Documentation
- Docs Revamp #181
- Make our website sparkle https://pytorch.org/ao (everyone)
- Unify on README or website, but don't duplicate content
- Get started on a design to slot in the prototype features - HQQ (@msaroufim)
Tutorials
- Tutorial for affine quantization dtype and unified quant primitives - Found lots of subtle differences, especially w.r.t. preserving zeros and tinygemm (@jerryzh168)
Core
- QAT workflow (@andrewor14)
- dedup the implementations of quant primitives (@jerryzh168)
- dedup the implementations of quant APIs (@jerryzh168)
- Deduplicate int4 workflows
- Factory function ahd
implements
decorator for affine quantization dtype - Bit packing interfaces @msaroufim
- float6 kernels @gau-nernst
- int 3/5 kernel @msaroufim