Compressed Tensors v0.6.0
What's Changed
- Add simple GHA workflow to run tests by @dbogunowicz in #2
- Define BaseModels for Quantization by @Satrat in #3
- Quantization refactor by @horheynm in #5
- Apply quantization config implementation by @bfineran in #4
- decorate fake quant with torch.no_grad by @bfineran in #8
- fix observer bugs by @bfineran in #9
- [lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
- Fix Device Mismatch by @Satrat in #12
- Serialize Config from Model by @Satrat in #7
- [Observers] pull shared logic into a helper function by @bfineran in #13
- Rename the repo to
compressed-tensors
by @dbogunowicz in #14 - fix style post rename PR by @bfineran in #25
- Quantization Examples and Correctness Fixes by @Satrat in #26
- Fix failing GHA by @dbogunowicz in #29
- Pretrained Model Reload + SparseGPT Support by @Satrat in #31
- [Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
- Quantization Fixes by @Satrat in #35
- Final details for package by @mgoin in #36
- bump version to 0.3.1 license an packaging updates by @bfineran in #37
- Dyanmic Quantization by @bfineran in #15
- [Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
- Update target match conditions; make public by @dsikka in #44
- [Lifecycle][Tests] Feature Branch by @horheynm in #38
- [Observers] group size + channel wise + per token by @horheynm in #32
- [BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
- [Fix] Fix the messed-up test structure by @dbogunowicz in #49
- Bump the version before the release by @dbogunowicz in #50
- Compressed lifecycle implementation (INT8 only) by @bfineran in #33
- group size speedups + fixes by @bfineran in #51
- Group and Channelwise Compression Support by @Satrat in #52
- Int4 Packed Compressor by @Satrat in #47
- Fix for auto device map quantization by @Satrat in #54
- Enable generating
compressed-tensors-nightly
by @dbogunowicz in #53 - [BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
- Fix per_token slowdown by @Satrat in #57
- [GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
- fix group size min max tracking by adding tensor ids by @bfineran in #60
- Support for aliased scheme settings in quant config by @bfineran in #40
- Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
- Misc Fixes by @Satrat in #55
- Fix for Symmetric Zero Point Reloading by @Satrat in #64
- Additional Symmetric ZP Fix by @Satrat in #65
- Make ZP int8 instead of int64 by @Satrat in #67
- Add a function to check if a string is a preset scheme by @rahul-tuli in #66
- Rename Packed Weights by @Satrat in #63
- Fixed Grouped Quantization Reload by @Satrat in #68
- Fix incorrect loading of dtype by @eldarkurtic in #70
- Fix Python 3.8 Compatability by @Satrat in #71
- Update nightly build to run at 6pm by @dsikka in #72
- Update time for the runner by @dsikka in #74
- Fixes to enable FSDP one-shot by @dbogunowicz in #58
- Update Compression Config for HfQuantizer Compatability by @Satrat in #73
- Remove version restriction on transformers by @mgoin in #76
- remove pydantic version cap by @bfineran in #80
- reduce appropriate dim by @horheynm in #75
- Marlin24 Compressor by @Satrat in #77
- Fix GPTQ Aliases by @Satrat in #81
- initial fixes for compatibility with HFQuantizer by @bfineran in #79
- bump version to 0.4.0 by @bfineran in #83
- import is_release from version.py by @horheynm in #85
- Add release build workflow by @dhuangnm in #89
- Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
- update default symmetry to True on presets by @bfineran in #92
- Fp8 Quantization Support by @Satrat in #62
- default W4A16 alias to use group_size=128 by @bfineran in #94
- [compressor] Add packed int8 support by @dsikka in #91
- Fix Decompress kwargs by @Satrat in #100
- [Quant KV Cache] Implementation by @dbogunowicz in #86
- Fix Transient Tests by @Satrat in #101
- Speed Up Packed Compression by @Satrat in #103
- [Fix] remove
tests/quantization
by @dbogunowicz in #99 - Allow creating compressor when
trust_remote_code=True
by @dbogunowicz in #104 - Update Quantization Presets by @Satrat in #105
- [MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
- [BugFix]Fix Name Mangling Issue in
compressed_tensors.utils
by @rahul-tuli in #102 - Update Quantization Scheme Standards for better readability by @markurtz in #106
- quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
- Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
- Model Offloading Support by @Satrat in #113
- Fix Test to Account for Model Change by @Satrat in #116
- Make publish workflow manually triggerable by @rahul-tuli in #117
- bump version to 0.5.0 by @bfineran in #119
- Fix Execution Device Helper Fn by @Satrat in #120
- Do not mutate config by
apply_quantization_config
by @dbogunowicz in #107 - Rename Quant Method by @Satrat in #122
- Revert Config Change by @Satrat in #124
- Bug Fix for Calibration Setup by @Satrat in #123
- Fix Issue #112 by @horheynm in #126
- follow up, better tests by @horheynm in #128
- Update README.md by @mgoin in #121
- Adding MSE Clipping Support by @abhinavnmagic in #115
- Move kv cache scales from k/v_proj.output_scale to self_attn.k/v_scale by @mgoin in #133
- Make Accelerate Dependency Optional by @Satrat in #131
- Group Index Quantization Support by @kylesayrs in #134
- Move safe_permute, update layer helper arguments by @kylesayrs in #137
- Remove g_idx arg on observer by @kylesayrs in #139
- Activation Ordering by @horheynm in #97
- Warning Raise Bugfix by @kylesayrs in #141
- LLM compressor GHA fix by @horheynm in #140
- fix nightly by @dsikka in #144
- Naive Run Compressed Support by @Satrat in #109
- Run Compressed Fixes by @Satrat in #147
- Support quantizing only the kv cache by @mgoin in #135
- [Forward Call] fake quant fix by @horheynm in #145
- Activation Ordering Strategies by @kylesayrs in #146
- Support bool arguments for actorder by @kylesayrs in #150
- Skip writing empty g_idx to disk, fix compress_quantized_weights by @kylesayrs in #143
- Raise error if group_size is passed but wrong strategy by @kylesayrs in #149
- bump up main to 0.6.0 by @dhuangnm in #154
- Add workflows to build and test nightly and release by @dhuangnm in #155
- Update
QuantizationScheme
defaults by @dsikka in #157 - Add: targets and ignore to sparsity compression config by @rahul-tuli in #159
New Contributors
- @dbogunowicz made their first contribution in #2
- @mgoin made their first contribution in #36
- @dsikka made their first contribution in #44
- @rahul-tuli made their first contribution in #48
- @eldarkurtic made their first contribution in #70
- @dhuangnm made their first contribution in #89
- @markurtz made their first contribution in #106
- @abhinavnmagic made their first contribution in #115
- @kylesayrs made their first contribution in #134
Full Changelog: https://github.com/neuralmagic/compressed-tensors/commits/0.6.0