Release Compressed Tensors v0.6.0 · neuralmagic/compressed-tensors

What's Changed

Add simple GHA workflow to run tests by @dbogunowicz in #2
Define BaseModels for Quantization by @Satrat in #3
Quantization refactor by @horheynm in #5
Apply quantization config implementation by @bfineran in #4
decorate fake quant with torch.no_grad by @bfineran in #8
fix observer bugs by @bfineran in #9
[lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
Fix Device Mismatch by @Satrat in #12
Serialize Config from Model by @Satrat in #7
[Observers] pull shared logic into a helper function by @bfineran in #13
Rename the repo to compressed-tensors by @dbogunowicz in #14
fix style post rename PR by @bfineran in #25
Quantization Examples and Correctness Fixes by @Satrat in #26
Fix failing GHA by @dbogunowicz in #29
Pretrained Model Reload + SparseGPT Support by @Satrat in #31
[Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
Quantization Fixes by @Satrat in #35
Final details for package by @mgoin in #36
bump version to 0.3.1 license an packaging updates by @bfineran in #37
Dyanmic Quantization by @bfineran in #15
[Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
Update target match conditions; make public by @dsikka in #44
[Lifecycle][Tests] Feature Branch by @horheynm in #38
[Observers] group size + channel wise + per token by @horheynm in #32
[BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
[Fix] Fix the messed-up test structure by @dbogunowicz in #49
Bump the version before the release by @dbogunowicz in #50
Compressed lifecycle implementation (INT8 only) by @bfineran in #33
group size speedups + fixes by @bfineran in #51
Group and Channelwise Compression Support by @Satrat in #52
Int4 Packed Compressor by @Satrat in #47
Fix for auto device map quantization by @Satrat in #54
Enable generating compressed-tensors-nightly by @dbogunowicz in #53
[BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
Fix per_token slowdown by @Satrat in #57
[GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
fix group size min max tracking by adding tensor ids by @bfineran in #60
Support for aliased scheme settings in quant config by @bfineran in #40
Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
Misc Fixes by @Satrat in #55
Fix for Symmetric Zero Point Reloading by @Satrat in #64
Additional Symmetric ZP Fix by @Satrat in #65
Make ZP int8 instead of int64 by @Satrat in #67
Add a function to check if a string is a preset scheme by @rahul-tuli in #66
Rename Packed Weights by @Satrat in #63
Fixed Grouped Quantization Reload by @Satrat in #68
Fix incorrect loading of dtype by @eldarkurtic in #70
Fix Python 3.8 Compatability by @Satrat in #71
Update nightly build to run at 6pm by @dsikka in #72
Update time for the runner by @dsikka in #74
Fixes to enable FSDP one-shot by @dbogunowicz in #58
Update Compression Config for HfQuantizer Compatability by @Satrat in #73
Remove version restriction on transformers by @mgoin in #76
remove pydantic version cap by @bfineran in #80
reduce appropriate dim by @horheynm in #75
Marlin24 Compressor by @Satrat in #77
Fix GPTQ Aliases by @Satrat in #81
initial fixes for compatibility with HFQuantizer by @bfineran in #79
bump version to 0.4.0 by @bfineran in #83
import is_release from version.py by @horheynm in #85
Add release build workflow by @dhuangnm in #89
Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
update default symmetry to True on presets by @bfineran in #92
Fp8 Quantization Support by @Satrat in #62
default W4A16 alias to use group_size=128 by @bfineran in #94
[compressor] Add packed int8 support by @dsikka in #91
Fix Decompress kwargs by @Satrat in #100
[Quant KV Cache] Implementation by @dbogunowicz in #86
Fix Transient Tests by @Satrat in #101
Speed Up Packed Compression by @Satrat in #103
[Fix] remove tests/quantization by @dbogunowicz in #99
Allow creating compressor when trust_remote_code=True by @dbogunowicz in #104
Update Quantization Presets by @Satrat in #105
[MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
[BugFix]Fix Name Mangling Issue in compressed_tensors.utils by @rahul-tuli in #102
Update Quantization Scheme Standards for better readability by @markurtz in #106
quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
Model Offloading Support by @Satrat in #113
Fix Test to Account for Model Change by @Satrat in #116
Make publish workflow manually triggerable by @rahul-tuli in #117
bump version to 0.5.0 by @bfineran in #119
Fix Execution Device Helper Fn by @Satrat in #120
Do not mutate config by apply_quantization_config by @dbogunowicz in #107
Rename Quant Method by @Satrat in #122
Revert Config Change by @Satrat in #124
Bug Fix for Calibration Setup by @Satrat in #123
Fix Issue #112 by @horheynm in #126
follow up, better tests by @horheynm in #128
Update README.md by @mgoin in #121
Adding MSE Clipping Support by @abhinavnmagic in #115
Move kv cache scales from k/v_proj.output_scale to self_attn.k/v_scale by @mgoin in #133
Make Accelerate Dependency Optional by @Satrat in #131
Group Index Quantization Support by @kylesayrs in #134
Move safe_permute, update layer helper arguments by @kylesayrs in #137
Remove g_idx arg on observer by @kylesayrs in #139
Activation Ordering by @horheynm in #97
Warning Raise Bugfix by @kylesayrs in #141
LLM compressor GHA fix by @horheynm in #140
fix nightly by @dsikka in #144
Naive Run Compressed Support by @Satrat in #109
Run Compressed Fixes by @Satrat in #147
Support quantizing only the kv cache by @mgoin in #135
[Forward Call] fake quant fix by @horheynm in #145
Activation Ordering Strategies by @kylesayrs in #146
Support bool arguments for actorder by @kylesayrs in #150
Skip writing empty g_idx to disk, fix compress_quantized_weights by @kylesayrs in #143
Raise error if group_size is passed but wrong strategy by @kylesayrs in #149
bump up main to 0.6.0 by @dhuangnm in #154
Add workflows to build and test nightly and release by @dhuangnm in #155
Update QuantizationScheme defaults by @dsikka in #157
Add: targets and ignore to sparsity compression config by @rahul-tuli in #159

New Contributors

@dbogunowicz made their first contribution in #2
@mgoin made their first contribution in #36
@dsikka made their first contribution in #44
@rahul-tuli made their first contribution in #48
@eldarkurtic made their first contribution in #70
@dhuangnm made their first contribution in #89
@markurtz made their first contribution in #106
@abhinavnmagic made their first contribution in #115
@kylesayrs made their first contribution in #134

Full Changelog: https://github.com/neuralmagic/compressed-tensors/commits/0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed Tensors v0.6.0

What's Changed

New Contributors

Contributors