Quantization Compressor Support #2260

Satrat · 2024-04-30T18:02:16Z

Requires this compressed-tensors branch: neuralmagic/compressed-tensors#45

Adds support for saving compressed quantized models within SparseAutoModel saving. Compression type can be passed in via quantization_format or inferred from the model itself
Simplified a lot of the save/load logic by moving it to helper classes in compressed-tensors

Examples

Very little UX change, similar to sparsity we just pass save_compressed=True to enable compression. By default, we save weights in the fake_quant format is save_compressed isn't set.

from sparseml.transformers import SparseAutoModelForCausalLM, oneshot

recipe="tests/sparseml/transformers/compression/recipes/new_quant_full.yaml"
model_stub = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
dataset = "open_platypus"
max_seq_length = 512
num_calibration_samples = 512
output_dir = "./test_updated_llama1.1b_quant_compressed"

model = SparseAutoModelForCausalLM.from_pretrained(model_stub, device_map="cuda:0")
oneshot(
    model=model,
    dataset=dataset,
    overwrite_output_dir=True,
    output_dir=output_dir,
    max_seq_length=max_seq_length,
    num_calibration_samples=num_calibration_samples,
    recipe=recipe,
    pad_to_max_length=False,
    save_compressed=True
)

Reloading a fake_quant model and then compressing it:

from sparseml.transformers import SparseAutoModelForCausalLM

output_dir_fake = "./test_updated_llama1.1b_quant"
output_dir_compressed = "./test_updated_llama1.1b_quant_compressed"

model_reloaded = SparseAutoModelForCausalLM.from_pretrained(output_dir_fake)
model_reloaded.save_pretrained(output_dir_compressed, save_compressed=True)

You can also specify a quantization compression format by name. Right now we only have support for unpacked int quantization, but as we add additional compression formats for quantization this becomes more relevant

from sparseml.transformers import SparseAutoModelForCausalLM

output_dir_fake = "./test_updated_llama1.1b_quant"
output_dir_compressed = "./test_updated_llama1.1b_quant_compressed"

model_reloaded = SparseAutoModelForCausalLM.from_pretrained(output_dir_fake)
model_reloaded.save_pretrained(output_dir_compressed, quantization_format="int_quantized")

* working reload * sparsegpt

tests/sparseml/transformers/compression/test_compress_tensor_utils.py

dbogunowicz

🥇

The base branch was changed.

rahul-tuli

LGTM after proposed fix

src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py

Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

dbogunowicz and others added 30 commits April 8, 2024 11:38

initial commit

097bd79

update setup.py

76970e3

Update setup.py

bbf4b39

fix setup.py

a272a30

move all config to sparsetensors

c0d3ead

Merge branch 'main' into feature/damian/sparsetensors

b3f7ff3

cleanup class name and comments

a75f8da

Merge branch 'main' into feature/damian/sparsetensors

c5b897e

initial implementation untested

2c72ab1

fixing issues

9174c1d

add test script

aa17e77

update perplexity test

f1f114c

refactor to compressed-tensors

bbbdcb9

Merge branch 'main' into feature/damian/sparsetensors

5d9c7dd

rename sparsetensors

7a9f9e5

update setup

fa43088

Sa/model reload (#2250)

63266d8

* working reload * sparsegpt

Merge branch 'main' into sa/quant_mod_refactor

b0f0fc9

Merge branch 'main' into feature/damian/sparsetensors

dfa41fb

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

4af4852

cleanup

55976c5

refactor tests

38f4f77

only run oneshot once

6574874

all tests passing

7f5babf

remove unused config

c0d6cb9

reset models on each parameterize

a59e2af

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

cba7c27

style

2a6b0f2

Merge branch 'main' into feature/damian/sparsetensors

1e7ee94

bring back SparsityConfigMetadata

a4e0575

dbogunowicz reviewed May 2, 2024

View reviewed changes

tests/sparseml/transformers/compression/test_compress_tensor_utils.py Outdated Show resolved Hide resolved

dbogunowicz previously approved these changes May 2, 2024

View reviewed changes

Sara Adkins added 4 commits May 2, 2024 18:50

address PR comments

2432cf4

PR comments

24437c7

Merge branch 'sa/quant_mod_refactor' into sa/compressors

399087f

fixing some things

e8bc021

bfineran previously approved these changes May 6, 2024

View reviewed changes

Base automatically changed from sa/quant_mod_refactor to main May 6, 2024 20:02

Merge branch 'main' into sa/compressors

3ca0298

Satrat requested review from bfineran and dbogunowicz May 7, 2024 14:31

Sara Adkins added 2 commits May 7, 2024 14:35

style

061de67

Merge branch 'main' into sa/compressors

633d5a5

Satrat requested a review from rahul-tuli May 8, 2024 19:47

Sara Adkins added 4 commits May 8, 2024 20:23

pull from cp main

6e0f1bc

postmerge too

3ff4dc8

Merge branch 'main' into sa/compressors

6f4379c

export needs it too

29a2186

dbogunowicz previously approved these changes May 9, 2024

View reviewed changes

rahul-tuli reviewed May 9, 2024

View reviewed changes

src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py Outdated Show resolved Hide resolved

Satrat dismissed dbogunowicz’s stale review via e93257f May 9, 2024 14:14

Update src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py

e93257f

Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

Satrat requested review from rahul-tuli and dbogunowicz May 9, 2024 14:14

dbogunowicz approved these changes May 9, 2024

View reviewed changes

rahul-tuli approved these changes May 9, 2024

View reviewed changes

Satrat merged commit 8a7fc99 into main May 9, 2024
16 of 17 checks passed

Satrat deleted the sa/compressors branch May 9, 2024 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Compressor Support #2260

Quantization Compressor Support #2260

Satrat commented Apr 30, 2024 •

edited

Loading

dbogunowicz left a comment

rahul-tuli left a comment

Quantization Compressor Support #2260

Quantization Compressor Support #2260

Conversation

Satrat commented Apr 30, 2024 • edited Loading

Examples

dbogunowicz left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

Satrat commented Apr 30, 2024 •

edited

Loading