Skip to content

TorchAO 0.1.0: First Release

Compare
Choose a tag to compare
@jerryzh168 jerryzh168 released this 04 Apr 23:18

Highlights

We’re excited to announce the release of TorchAO v0.1.0! TorchAO is a repository to host architecture optimization techniques such as quantization and sparsity and performance kernels on different backends such as CUDA and CPU. In this release, we added support for a few quantization techniques like int4 weight only GPTQ quantization, added nf4 dtype support for QLoRA and sparsity features like WandaSparsifier, we also added autotuner that can tune triton integer matrix multiplication kernels on cuda.

Note: TorchAO is currently in a pre-release state and under extensive development. The public APIs should not be considered stable. But we welcome you to try out our APIs and offerings and provide any feedback on your experience.

torchao 0.1.0 will be compatible with PyTorch 2.2.2 and 2.3.0, ExecuTorch 0.2.0 and TorchTune 0.1.0.

New Features

Quantization

  • Added tensor subclass based quantization APIs: change_linear_weights_to_int8_dqtensors, change_linear_weights_to_int8_woqtensors and change_linear_weights_to_int4_woqtensors (#1)
  • Added module based quantization APIs for int8 dynamic and weight only quantization apply_weight_only_int8_quant and apply_dynamic_quant (#1)
  • Added module swap version of int4 weight only quantization Int4WeightOnlyQuantizer and Int4WeightOnlyGPTQQuantizer used in TorchTune (#119, #116)
  • Added int8 dynamic activation and int4 weight quantization Int8DynActInt4WeightQuantizer and Int8DynActInt4WeightGPTQQuantizer, used in ExecuTorch (#74) (available after torch 2.3.0 and later)

Sparsity

  • Added WandaSparsifier that prunes both weights and activations (#22)

Kernels

  • Added autotuner for int mm Triton kernels (#41)

dtypes

  • nf4 tensor subclass and nf4 linear (#37, #40, #62)
  • Added uint4 dtype tensor subclass (#13)

Improvements

  • Setup github workflow for regression testing (#50)
  • Setup github workflow for torchao-nightly release (#54)

Documentation

  • Added tutorials for quantizing vision transformer model (#60)
  • Added tutorials for how to add an op for nf4 tensor (#54)

Notes

  • we are still debugging the accuracy problem for Int8DynActInt4WeightGPTQQuantizer
  • Save and load does not work well for tensor subclass based APIs yet
  • We will consolidate tensor subclass and module swap based quantization APIs later
  • uint4 tensor subclass is going to be merged into pytorch core in the future
  • Quantization ops in quant_primitives.py will be deduplicated with similar quantize/dequantize ops in PyTorch later