Releases · neuralmagic/compressed-tensors · GitHub

11 Dec 19:29

dhuangnm

Compressed Tensors v0.8.1 Latest

Latest

What's Changed

Skip accelerate tests by @kylesayrs in #208
Remove QuantizationScheme.default_scheme by @kylesayrs in #202
Allow ModelCompressor.from_pretrained to load from quantization_config, not compression config by @horheynm in #207
Quantization Scheme Validation by @kylesayrs in #209
Fix uninitialized variable in quantized compressors by @markmc in #205
Implement aliasable mixin and alias activation ordering by @kylesayrs in #213
Revert "Implement aliasable mixin and alias activation ordering (#213)" by @dsikka in #217
Implement aliasable mixin and alias activation ordering (python3.9 fix) by @kylesayrs in #218
bump by @dsikka in #226

New Contributors

@markmc made their first contribution in #205

Full Changelog: 0.8.0...0.8.1

Contributors

markmc, kylesayrs, and 2 other contributors

Assets 4

12 Nov 14:53

dhuangnm

Compressed Tensors v0.8.0

What's Changed

[Observer Restructure]: Separate out scale/zp and observer init; separate out calibration from forward pass by @dsikka in #188
Fix device allocation for MSE observer by @anmarques in #190
drop 3.8 and add 3.12 to testing by @dhuangnm in #196
Fix test which required accelerate, apply style by @kylesayrs in #194
[Bugfix] Move observer and g_idx until after module in onloaded by @kylesayrs in #195
Add sparsity structure enum by @rahul-tuli in #197
Observer Restructure: Remove Observers, calibration, and applying frozen steps from lifecycle by @dsikka in #189
Clean up observer defaulting logic, better error message by @kylesayrs in #200
apply style and quality by @kylesayrs in #201
[BugFix] Fix Marlin24 Bug by @dsikka in #203
Bump version to v0.8.0 by @dsikka in #204

New Contributors

@anmarques made their first contribution in #190

Full Changelog: 0.7.1...0.8.0

Contributors

kylesayrs, anmarques, and 3 other contributors

Assets 4

17 Oct 18:13

dhuangnm

Compressed Tensors v0.7.1

What's Changed

[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization by @dsikka in #187
bump up to 0.7.1 for patch release by @dhuangnm in #192

Full Changelog: 0.7.0...0.7.1

Contributors

dsikka and dhuangnm

Assets 4

09 Oct 11:29

dhuangnm

Compressed Tensors v0.7.0

What's Changed

Make INT8 activation PRESET_SCHEMES explicit by @mgoin in #158
Write the current version into model configs by @mgoin in #160
[KV-Cache] Make k_scale, v_scale as attributes of self_attn using HFCache by @horheynm in #148
[Bugfix] Fix quant config parsing by @kylesayrs in #162
Ignore Dense sparsity config by @rahul-tuli in #169
fix bug by @horheynm in #170
Replace compression_config to be quantization_config for HFQuantizer support by @dsikka in #164
ignore list by @horheynm in #171
switch default to release and disable pushing to pypi for now by @dhuangnm in #175
Fix missing quant_method value by @kylesayrs in #174
Fix ModelCompressor parsing in HF Quantizer case by @kylesayrs in #176
Calibration Code Clarity by @kylesayrs in #168
Add: base sparsity/quantization compressors by @rahul-tuli in #165
Update compressors folder structure by @rahul-tuli in #166
Update number of groups by @dsikka in #178
Bring nightly build/test back by @dhuangnm in #179
Remove unused function by @kylesayrs in #156
Revert "Ignore Dense sparsity config (#169)" by @rahul-tuli in #181
Workaround HF Quantizer apply_quantization_config misuse by @kylesayrs in #180
bump up version to 0.7.0 by @dhuangnm in #186

Full Changelog: 0.6.0...0.7.0

Contributors

mgoin, kylesayrs, and 4 other contributors

Assets 4

23 Sep 18:44

dhuangnm

Compressed Tensors v0.6.0

What's Changed

Add simple GHA workflow to run tests by @dbogunowicz in #2
Define BaseModels for Quantization by @Satrat in #3
Quantization refactor by @horheynm in #5
Apply quantization config implementation by @bfineran in #4
decorate fake quant with torch.no_grad by @bfineran in #8
fix observer bugs by @bfineran in #9
[lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
Fix Device Mismatch by @Satrat in #12
Serialize Config from Model by @Satrat in #7
[Observers] pull shared logic into a helper function by @bfineran in #13
Rename the repo to compressed-tensors by @dbogunowicz in #14
fix style post rename PR by @bfineran in #25
Quantization Examples and Correctness Fixes by @Satrat in #26
Fix failing GHA by @dbogunowicz in #29
Pretrained Model Reload + SparseGPT Support by @Satrat in #31
[Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
Quantization Fixes by @Satrat in #35
Final details for package by @mgoin in #36
bump version to 0.3.1 license an packaging updates by @bfineran in #37
Dyanmic Quantization by @bfineran in #15
[Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
Update target match conditions; make public by @dsikka in #44
[Lifecycle][Tests] Feature Branch by @horheynm in #38
[Observers] group size + channel wise + per token by @horheynm in #32
[BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
[Fix] Fix the messed-up test structure by @dbogunowicz in #49
Bump the version before the release by @dbogunowicz in #50
Compressed lifecycle implementation (INT8 only) by @bfineran in #33
group size speedups + fixes by @bfineran in #51
Group and Channelwise Compression Support by @Satrat in #52
Int4 Packed Compressor by @Satrat in #47
Fix for auto device map quantization by @Satrat in #54
Enable generating compressed-tensors-nightly by @dbogunowicz in #53
[BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
Fix per_token slowdown by @Satrat in #57
[GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
fix group size min max tracking by adding tensor ids by @bfineran in #60
Support for aliased scheme settings in quant config by @bfineran in #40
Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
Misc Fixes by @Satrat in #55
Fix for Symmetric Zero Point Reloading by @Satrat in #64
Additional Symmetric ZP Fix by @Satrat in #65
Make ZP int8 instead of int64 by @Satrat in #67
Add a function to check if a string is a preset scheme by @rahul-tuli in #66
Rename Packed Weights by @Satrat in #63
Fixed Grouped Quantization Reload by @Satrat in #68
Fix incorrect loading of dtype by @eldarkurtic in #70
Fix Python 3.8 Compatability by @Satrat in #71
Update nightly build to run at 6pm by @dsikka in #72
Update time for the runner by @dsikka in #74
Fixes to enable FSDP one-shot by @dbogunowicz in #58
Update Compression Config for HfQuantizer Compatability by @Satrat in #73
Remove version restriction on transformers by @mgoin in #76
remove pydantic version cap by @bfineran in #80
reduce appropriate dim by @horheynm in #75
Marlin24 Compressor by @Satrat in #77
Fix GPTQ Aliases by @Satrat in #81
initial fixes for compatibility with HFQuantizer by @bfineran in #79
bump version to 0.4.0 by @bfineran in #83
import is_release from version.py by @horheynm in #85
Add release build workflow by @dhuangnm in #89
Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
update default symmetry to True on presets by @bfineran in #92
Fp8 Quantization Support by @Satrat in #62
default W4A16 alias to use group_size=128 by @bfineran in #94
[compressor] Add packed int8 support by @dsikka in #91
Fix Decompress kwargs by @Satrat in #100
[Quant KV Cache] Implementation by @dbogunowicz in #86
Fix Transient Tests by @Satrat in #101
Speed Up Packed Compression by @Satrat in #103
[Fix] remove tests/quantization by @dbogunowicz in #99
Allow creating compressor when trust_remote_code=True by @dbogunowicz in #104
Update Quantization Presets by @Satrat in #105
[MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
[BugFix]Fix Name Mangling Issue in compressed_tensors.utils by @rahul-tuli in #102
Update Quantization Scheme Standards for better readability by @markurtz in #106
quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
Model Offloading Support by @Satrat in #113
Fix Test to Account for Model Change by @Satrat in #116
Make publish workflow manually triggerable by @rahul-tuli in #117
bump version to 0.5.0 by @bfineran in #119
Fix Execution Device Helper Fn by @Satrat in #120
Do not mutate config by apply_quantization_config by @dbogunowicz in #107
Rename Quant Method by @Satrat in #122
Revert Config Change by @Satrat in #124
Bug Fix for Calibration Setup by @Satrat in https://github.com/neuralmagic/compressed-tenso...

Read more

Contributors

mgoin, Satrat, and 10 other contributors

Assets 4

14 Aug 20:36

dhuangnm

Compressed Tensors v0.5.0

What's Changed

Add simple GHA workflow to run tests by @dbogunowicz in #2
Define BaseModels for Quantization by @Satrat in #3
Quantization refactor by @horheynm in #5
Apply quantization config implementation by @bfineran in #4
decorate fake quant with torch.no_grad by @bfineran in #8
fix observer bugs by @bfineran in #9
[lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
Fix Device Mismatch by @Satrat in #12
Serialize Config from Model by @Satrat in #7
[Observers] pull shared logic into a helper function by @bfineran in #13
Rename the repo to compressed-tensors by @dbogunowicz in #14
fix style post rename PR by @bfineran in #25
Quantization Examples and Correctness Fixes by @Satrat in #26
Fix failing GHA by @dbogunowicz in #29
Pretrained Model Reload + SparseGPT Support by @Satrat in #31
[Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
Quantization Fixes by @Satrat in #35
Final details for package by @mgoin in #36
bump version to 0.3.1 license an packaging updates by @bfineran in #37
Dyanmic Quantization by @bfineran in #15
[Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
Update target match conditions; make public by @dsikka in #44
[Lifecycle][Tests] Feature Branch by @horheynm in #38
[Observers] group size + channel wise + per token by @horheynm in #32
[BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
[Fix] Fix the messed-up test structure by @dbogunowicz in #49
Bump the version before the release by @dbogunowicz in #50
Compressed lifecycle implementation (INT8 only) by @bfineran in #33
group size speedups + fixes by @bfineran in #51
Group and Channelwise Compression Support by @Satrat in #52
Int4 Packed Compressor by @Satrat in #47
Fix for auto device map quantization by @Satrat in #54
Enable generating compressed-tensors-nightly by @dbogunowicz in #53
[BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
Fix per_token slowdown by @Satrat in #57
[GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
fix group size min max tracking by adding tensor ids by @bfineran in #60
Support for aliased scheme settings in quant config by @bfineran in #40
Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
Misc Fixes by @Satrat in #55
Fix for Symmetric Zero Point Reloading by @Satrat in #64
Additional Symmetric ZP Fix by @Satrat in #65
Make ZP int8 instead of int64 by @Satrat in #67
Add a function to check if a string is a preset scheme by @rahul-tuli in #66
Rename Packed Weights by @Satrat in #63
Fixed Grouped Quantization Reload by @Satrat in #68
Fix incorrect loading of dtype by @eldarkurtic in #70
Fix Python 3.8 Compatability by @Satrat in #71
Update nightly build to run at 6pm by @dsikka in #72
Update time for the runner by @dsikka in #74
Fixes to enable FSDP one-shot by @dbogunowicz in #58
Update Compression Config for HfQuantizer Compatability by @Satrat in #73
Remove version restriction on transformers by @mgoin in #76
remove pydantic version cap by @bfineran in #80
reduce appropriate dim by @horheynm in #75
Marlin24 Compressor by @Satrat in #77
Fix GPTQ Aliases by @Satrat in #81
initial fixes for compatibility with HFQuantizer by @bfineran in #79
bump version to 0.4.0 by @bfineran in #83
import is_release from version.py by @horheynm in #85
Add release build workflow by @dhuangnm in #89
Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
update default symmetry to True on presets by @bfineran in #92
Fp8 Quantization Support by @Satrat in #62
default W4A16 alias to use group_size=128 by @bfineran in #94
[compressor] Add packed int8 support by @dsikka in #91
Fix Decompress kwargs by @Satrat in #100
[Quant KV Cache] Implementation by @dbogunowicz in #86
Fix Transient Tests by @Satrat in #101
Speed Up Packed Compression by @Satrat in #103
[Fix] remove tests/quantization by @dbogunowicz in #99
Allow creating compressor when trust_remote_code=True by @dbogunowicz in #104
Update Quantization Presets by @Satrat in #105
[MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
[BugFix]Fix Name Mangling Issue in compressed_tensors.utils by @rahul-tuli in #102
Update Quantization Scheme Standards for better readability by @markurtz in #106
quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
Model Offloading Support by @Satrat in #113
Fix Test to Account for Model Change by @Satrat in #116
Make publish workflow manually triggerable by @rahul-tuli in #117
bump version to 0.5.0 by @bfineran in #119
[Cherry Pick] dont set quantization data on reload (#123) by @Satrat in #125

New Contributors

@mgoin made their first contribution in #36
@dsikka made their first contribution in #44
@rahul-tuli made their first contribution in #48
@eldarkurtic made their first contribution in https://gith...

Read more

Contributors

mgoin, Satrat, and 8 other contributors

Assets 4

03 Jul 20:25

jeanniefinks

Compressed Tensors v0.4.0

New Features:

Scheme alias support in quant config (#40)
New compressors: packed int4 (#47), Marlin 2:4 (#77)

Changes:

None

Resolved Issues:

Group-size quantization implementation addressed to ensure correctness. (#60)

Known Issues:

None

Assets 4