Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization Compressor Support #2260

Merged
merged 70 commits into from
May 9, 2024
Merged

Quantization Compressor Support #2260

merged 70 commits into from
May 9, 2024

Conversation

Satrat
Copy link

@Satrat Satrat commented Apr 30, 2024

Requires this compressed-tensors branch: neuralmagic/compressed-tensors#45

  • Adds support for saving compressed quantized models within SparseAutoModel saving. Compression type can be passed in via quantization_format or inferred from the model itself
  • Simplified a lot of the save/load logic by moving it to helper classes in compressed-tensors

Examples

Very little UX change, similar to sparsity we just pass save_compressed=True to enable compression. By default, we save weights in the fake_quant format is save_compressed isn't set.

from sparseml.transformers import SparseAutoModelForCausalLM, oneshot

recipe="tests/sparseml/transformers/compression/recipes/new_quant_full.yaml"
model_stub = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
dataset = "open_platypus"
max_seq_length = 512
num_calibration_samples = 512
output_dir = "./test_updated_llama1.1b_quant_compressed"

model = SparseAutoModelForCausalLM.from_pretrained(model_stub, device_map="cuda:0")
oneshot(
    model=model,
    dataset=dataset,
    overwrite_output_dir=True,
    output_dir=output_dir,
    max_seq_length=max_seq_length,
    num_calibration_samples=num_calibration_samples,
    recipe=recipe,
    pad_to_max_length=False,
    save_compressed=True
)

Reloading a fake_quant model and then compressing it:

from sparseml.transformers import SparseAutoModelForCausalLM

output_dir_fake = "./test_updated_llama1.1b_quant"
output_dir_compressed = "./test_updated_llama1.1b_quant_compressed"

model_reloaded = SparseAutoModelForCausalLM.from_pretrained(output_dir_fake)
model_reloaded.save_pretrained(output_dir_compressed, save_compressed=True)

You can also specify a quantization compression format by name. Right now we only have support for unpacked int quantization, but as we add additional compression formats for quantization this becomes more relevant

from sparseml.transformers import SparseAutoModelForCausalLM

output_dir_fake = "./test_updated_llama1.1b_quant"
output_dir_compressed = "./test_updated_llama1.1b_quant_compressed"

model_reloaded = SparseAutoModelForCausalLM.from_pretrained(output_dir_fake)
model_reloaded.save_pretrained(output_dir_compressed, quantization_format="int_quantized")

dbogunowicz
dbogunowicz previously approved these changes May 2, 2024
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

bfineran
bfineran previously approved these changes May 6, 2024
Base automatically changed from sa/quant_mod_refactor to main May 6, 2024 20:02
@bfineran bfineran dismissed stale reviews from dbogunowicz and themself May 6, 2024 20:02

The base branch was changed.

@Satrat Satrat requested review from bfineran and dbogunowicz May 7, 2024 14:31
@Satrat Satrat requested a review from rahul-tuli May 8, 2024 19:47
dbogunowicz
dbogunowicz previously approved these changes May 9, 2024
Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after proposed fix

src/sparseml/modifiers/obcq/utils/sgpt_wrapper.py Outdated Show resolved Hide resolved
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
@Satrat Satrat requested review from rahul-tuli and dbogunowicz May 9, 2024 14:14
@Satrat Satrat merged commit 8a7fc99 into main May 9, 2024
16 of 17 checks passed
@Satrat Satrat deleted the sa/compressors branch May 9, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants