Quantization Compressor Support #45

Satrat · 2024-04-30T17:58:12Z

Renamed CompressionConfig to SparsityCompressionConfig, QuantizationConfig is quantized version
Renamed ModelCompressor -> Compressor. This class still represents a single compression algorithm, either for sparsity or quantization. The "dense" compressor also applies to a model that has been quantized but not compressed
ModelCompressor is now a new class that handles the overall compression and decompression of both sparsity and quantization. Its responsible for model loading/saving as well as config loading/saving
Added new IntCompressor to represent a quantized model that has been compressed to its quantized type (ie float32 -> int8)

Works with this SparseML PR: neuralmagic/sparseml#2260

tests/test_utils/test_helpers.py

dbogunowicz · 2024-05-02T14:10:34Z

src/compressed_tensors/compressors/base.py

-from compressed_tensors.base import SPARSITY_CONFIG_NAME
-from compressed_tensors.config import CompressionConfig
+from compressed_tensors.config import SparsityCompressionConfig
+from compressed_tensors.quantization import QuantizationConfig


nit: shouldn't all configs live side-by-side? the principle of least surprise?

src/compressed_tensors/compressors/model_compressor.py

* Compressed lifecycle implementation (INT8 only) * Apply suggestions from code review * small fixes for runtime * Quantization Compressor Support (#45) * add classes * WIP * moving around classes * code complete * tests passing * unit test bugs * fill out int decompression * docstrings * allow repeat frozens * int compressor unit tests * PR comments * fix device issue * fixing leaf checker * initial commit * Revert "Merge branch 'main' into compressed-lifecycle" This reverts commit 8dcdde5, reversing changes made to bb36936. * update version * fix test --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> Co-authored-by: dbogunowicz <damian@neuralmagic.com>

* group size * add logic in base observer * Compressed lifecycle implementation (INT8 only) * group size full lifecycle run * Apply suggestions from code review * before vectorize the for loop * comments, todo add channelwise * chan wise impl * comments * fix channel wise * comments, validators * fix typo * small fixes for runtime * add classes * tensor return error fix * WIP * moving around classes * fix sparseml-side of code and add per channel * pyndatic defaults * token wise quant * Update src/compressed_tensors/quantization/quant_args.py Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> * comments' * code complete * tests passing * unit test bugs * fill out int decompression * docstrings * allow repeat frozens * update dim * int compressor unit tests * move helper * shape consistency * initial commit * first unit test passing * Update src/compressed_tensors/quantization/lifecycle/forward.py Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> * comments * tests passing * one more test * cleanup * pass test_quant_args * Quantization Compressor Support (#45) * add classes * WIP * moving around classes * code complete * tests passing * unit test bugs * fill out int decompression * docstrings * allow repeat frozens * int compressor unit tests * PR comments * fix device issue * fixing leaf checker * updating tests * docstrings * updating examples * update examples * fix channelwise * new tests, some fail * WIP * new helper fn * actually just a warning * group size speedups + fixes * group compression * fix output type on decompress * fix channelwise * revert * more tests * move tests * example notebook * add example notebook * update README * cleanup --------- Co-authored-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

Sara Adkins added 11 commits April 29, 2024 14:19

add classes

f68f8f6

WIP

51181d1

Merge branch 'main' into quant_compressor

b380181

moving around classes

1c3cd7f

code complete

687bb71

tests passing

037d0f8

unit test bugs

6a40eb6

fill out int decompression

b99ea01

docstrings

0d02901

Merge branch 'compressed-lifecycle' into quant_compressor

4901aa6

allow repeat frozens

d6040ac

Satrat requested review from bfineran, dbogunowicz, rahul-tuli and horheynm April 30, 2024 17:58

Satrat mentioned this pull request Apr 30, 2024

Quantization Compressor Support neuralmagic/sparseml#2260

Merged

int compressor unit tests

50de30a

Satrat marked this pull request as ready for review May 1, 2024 14:14

dbogunowicz approved these changes May 2, 2024

View reviewed changes

Sara Adkins added 2 commits May 2, 2024 20:59

Merge branch 'compressed-lifecycle' into quant_compressor

c41dd48

PR comments

5b8065b

Satrat merged commit df94b5e into compressed-lifecycle May 3, 2024

Satrat deleted the quant_compressor branch May 3, 2024 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Compressor Support #45

Quantization Compressor Support #45

Satrat commented Apr 30, 2024 •

edited

Loading

dbogunowicz May 2, 2024

Quantization Compressor Support #45

Quantization Compressor Support #45

Conversation

Satrat commented Apr 30, 2024 • edited Loading

dbogunowicz May 2, 2024

Choose a reason for hiding this comment

Satrat commented Apr 30, 2024 •

edited

Loading