Releases: huggingface/optimum-quanto
Releases Β· huggingface/optimum-quanto
release: 0.2.6
What's Changed
- Add hip support by @Disty0 in #330
- Switched linters, black -> ruff by @ishandeva in #334
- Add marlin int4 kernel by @dacorvo and @shcho1118 in #333
- fix: use reshape instead of view by @dacorvo in #338
- Support QLayerNorm without weights by @dacorvo in #341
New Contributors
- @ishandeva made their first contribution in #334
- @Disty0 made their first contribution in #330
- @shcho1118 made their first contribution in #333
Full Changelog: v0.2.5...v0.2.6
release: 0.2.5
New features
- Load and save models from the Hugging Face hub #263 by @sayakpaul
- Add support for float8 e4f3mnuz #310 (from #281) by @maktukmak
- Faster and less memory-intensive requantization #290 by @latentCall145
- Support torch.equal for QTensor #294 by @dacorvo
- Add Marlin Float8 kernel #296 (from #241) by @fxmarty
- Add Whisper for speech recognition example #298 (from #242) by @mattiadg
- Add ViT classification example #308 by @shovan777
Bug fixes
- Fix include patterns in quantize #271 by @kaibioinfo
- Enable non-strict loading of state dicts #295 by @BenjaminBossan
- Fix transformers forward error #303 by @dacorvo
- Fix missing call in transformers models #325 by @dacorvo
- Fix 8-bit mm calls for 4D inputs #326 by @dacorvo
Full Changelog: v0.2.4...v0.2.5
release: 0.2.4
Bug Fixes:
- fix import error in
optimum-cli
when diffusers is not installed by @dacorvo
Full Changelog: v0.2.3...v0.2.4
release: 0.2.3
What's Changed
- Use new int8 torch kernels by @dacorvo in #222
- Rebuild extension when pytorch is updated by @dacorvo in #223
- Use tinygemm bfloat16 / int4 kernel whenever possible by @dacorvo in #234
- Add HQQ optimizer by @dacorvo in #235
- Add QuantizedModelForCausalLM by @dacorvo in #243
- Integrate quanto commands to optimum-cli by @dacorvo in #244
- Add pixart-sigma test to image example by @dacorvo in #247
- Support diffusion models. by @sayakpaul in #255
Bug fixes
- Fix: align extension on max arch by @dacorvo in #227
- Fix TinyGemmQBitsTensor move by @dacorvo in #246
- Fix stream-lining bug by @dacorvo in #249
- Fix float/int8 matrix multiplication latency regression by @dacorvo in #250
- Fix serialization issues by @dacorvo in #258
New Contributors
- @sayakpaul made their first contribution in #255
Full Changelog: v0.2.2...v0.2.3
release: 0.2.2
release: 0.2.1
This release does not contain any new feature, but it is the first one with the new package name.
release: 0.2.0
New features
- requantize helper by @calmitchell617,
- StableDiffusion example by @thliang01,
- improved linear backward path by @dacorvo ,
- AWQ int4 kernels by @dacorvo .
release: v0.1.0
New features
- group-wise quantization,
- safe serialization.
release(0.0.13):
New features
- new
QConv2d
quantized module, - official support for
float8
weights.
Bug fixes
- fix
QbitsTensor.to()
that was not moving the inner tensors, - prevent shallow
QTensor
copies when loading weights that do not move inner tensors.
release: 0.0.12
New features
- quanto kernels library (not used for now in quantize).
Breaking changes
- quantization types are now all quanto.dtype