Marlin24 Compressor #77

Satrat · 2024-06-03T20:32:20Z

Implements Marlin24 compression in SparseML. Note that this compressor does not have a decompress method, it is meant only for feeding directly into vllm.

Example Usage

Since this is a one-way conversion for vLLM, we won't automatically convert to this format when save_compressed is set in SparseML. Instead the save will have to be done explicitly like this:

from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer
import torch

model_dir = "/network/sadkins/llama7b_sparse_24_w4a16_group128"
output_dir = "llama7b_marlin24"
model = SparseAutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.float16, device_map="cuda")
tokenizer = SparseAutoTokenizer.from_pretrained(model_dir)
model.save_pretrained(output_dir, quantization_format="marlin-24")
tokenizer.save_pretrained(output_dir)

Testing

Tested with a 7b Llama 2 model in vLLM, confirmed outputs for the marlin-24 format match the ouputs for the equivalent pack-quantized model. Also added some unit tests to compressed-tensors

Prompt: 'Hello, my name is', Generated text: '… Business: And I'm here to tell you about my new book!'
Prompt: 'The capital of France is', Generated text: ' Paris, but did you know that Parisians call themselves Parisins? True enough'
Prompt: 'The president of the United States is', Generated text: ', by his or her position, obliged to represent the United States internation,'
Prompt: 'The Boston Bruins are', Generated text: ' a hockey team based in Boston, Massachusetts that compete in the National Hockey League'

src/compressed_tensors/compressors/marlin_24.py

dbogunowicz · 2024-06-10T11:18:26Z

src/compressed_tensors/compressors/marlin_24.py

+
+        for name, value in tqdm(model_state.items(), desc="Compressing model"):
+            if name.endswith(weight_suffix):
+                prefix = name[: -(len(weight_suffix))]


Suggested change

prefix = name[: -(len(weight_suffix))]

prefix = name.replace(weight_suffix)

More readable :)

Yes but the replace code runs the risk of replacing ".weight" if it exists elsewhere in the string, not likely but possible

dbogunowicz · 2024-06-10T11:22:48Z

src/compressed_tensors/compressors/utils/semi_structured_conversions.py

+
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");


@bfineran can we have a license attached to this code if it's largely copied over from an open-source project (vllm), even though "our" Alex is the author?

Good call, I think this got added in automatically when I ran make style, is there a way to turn that off for a certain file??

you can skip it with `neuralmagic: no copyright. it's an apache license though so I think usage should be pretty permissable

src/compressed_tensors/compressors/marlin_24.py

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

tests/test_compressors/test_marlin_24.py

horheynm · 2024-06-10T14:24:13Z

I like the design and how simple the implementation is. I didnt check the math and bit operations but other than that green from me

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

bfineran · 2024-06-11T13:08:41Z

src/compressed_tensors/compressors/utils/semi_structured_conversions.py

+
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");


you can skip it with `neuralmagic: no copyright. it's an apache license though so I think usage should be pretty permissable

Sara Adkins added 18 commits June 3, 2024 14:40

add marlin compressor

2f99095

full implementation running

36703eb

Merge branch 'main' into sa/marlin_24

0aac5ed

save meta matrix

0f77137

Merge branch 'main' into sa/marlin_24

a312a86

fix type

68b4282

change axis we pack on

a45c7ca

debug

6e085b8

fix offset

049ee26

fix scale shape

9c816ee

cast dtype

74923f7

cleanup

25016f5

fix

792d5c0

avoid extra transpose

11f9312

add uncompressed weights

b72abed

Merge branch 'main' into sa/marlin_24

712235d

unit tests

6e98f42

cleanup

679e138

Satrat changed the title ~~[WIP] Marlin24 Compressor~~ Marlin24 Compressor Jun 7, 2024

Satrat requested review from bfineran, dsikka, rahul-tuli, horheynm, dbogunowicz and robertgshaw2-neuralmagic June 7, 2024 21:32

Satrat marked this pull request as ready for review June 7, 2024 21:38

dbogunowicz previously approved these changes Jun 10, 2024

View reviewed changes

horheynm reviewed Jun 10, 2024

View reviewed changes

src/compressed_tensors/compressors/marlin_24.py Show resolved Hide resolved

horheynm reviewed Jun 10, 2024

View reviewed changes

src/compressed_tensors/compressors/marlin_24.py Show resolved Hide resolved

Update src/compressed_tensors/compressors/marlin_24.py

beff90b

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

Satrat dismissed dbogunowicz’s stale review via beff90b June 10, 2024 14:19

horheynm reviewed Jun 10, 2024

View reviewed changes

tests/test_compressors/test_marlin_24.py Show resolved Hide resolved

horheynm reviewed Jun 10, 2024

View reviewed changes

tests/test_compressors/test_marlin_24.py Show resolved Hide resolved

horheynm previously approved these changes Jun 10, 2024

View reviewed changes

Satrat dismissed horheynm’s stale review via a843f92 June 10, 2024 14:58

Sara Adkins and others added 3 commits June 10, 2024 10:58

Update src/compressed_tensors/compressors/marlin_24.py

a843f92

Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com>

style

0dc7a1b

move helper fn

b3f6fb7

Satrat requested review from dbogunowicz and horheynm June 10, 2024 15:32

horheynm approved these changes Jun 10, 2024

View reviewed changes

bfineran approved these changes Jun 11, 2024

View reviewed changes

Satrat merged commit 14b1db1 into main Jun 11, 2024
1 check passed

Satrat deleted the sa/marlin_24 branch June 11, 2024 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marlin24 Compressor #77

Marlin24 Compressor #77

Satrat commented Jun 3, 2024 •

edited

Loading

dbogunowicz Jun 10, 2024

Satrat Jun 10, 2024

dbogunowicz Jun 10, 2024

Satrat Jun 10, 2024

bfineran Jun 11, 2024

horheynm commented Jun 10, 2024

bfineran Jun 11, 2024

	prefix = name[: -(len(weight_suffix))]
	prefix = name.replace(weight_suffix)

Marlin24 Compressor #77

Marlin24 Compressor #77

Conversation

Satrat commented Jun 3, 2024 • edited Loading

Example Usage

Testing

dbogunowicz Jun 10, 2024

Choose a reason for hiding this comment

Satrat Jun 10, 2024

Choose a reason for hiding this comment

dbogunowicz Jun 10, 2024

Choose a reason for hiding this comment

Satrat Jun 10, 2024

Choose a reason for hiding this comment

bfineran Jun 11, 2024

Choose a reason for hiding this comment

horheynm commented Jun 10, 2024

bfineran Jun 11, 2024

Choose a reason for hiding this comment

Satrat commented Jun 3, 2024 •

edited

Loading