Implement sparsity as a AQT Layout #498

jcaip · 2024-07-11T22:10:27Z

Summary:

This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly.

I also added renamed sparsify to sparsify_ and added in a semi_sparse_weight() function to be in line with our other APIs.

The main code changes are in torchao/dtypes/affine_quantized_tensor.py, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like int_data.

Test Plan:

python test/sparsity/test_sparse_api

pytorch-bot · 2024-07-11T22:10:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/498

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce567c3 with merge base e5df48e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-07-20T00:04:26Z

torchao/quantization/quant_api.py

@@ -460,3 +460,47 @@ def get_per_token_block_size(x):
        return weight

    return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant)
+
+
+def int8_dynamic_activation_int8_semi_sparse_weight():


does this have similar config as int8_dynamic_activation_int8_weight? if so we can add a layout_type arg to that function directly

jerryzh168 · 2024-07-25T19:55:04Z

README.md


-m = sparsify(m, to_sparse_semi_structured)
+m = sparsify_(m, semi_sparse_weight())


does semi_sparse_weight have to talk about dtype as well?

it will work for bf16, fp16, and fp32, so i don't think specifying the dtype makes sense. Maybe dense_activation_semi_sparse_weight to keep it consistent?

I see, then it's fine. we have int4_weight_only() as well so I feel it's fine that we don't mention activation

(we could remove only as well)

jerryzh168 · 2024-07-25T19:57:55Z

torchao/quantization/quant_api.py

-        eps = torch.finfo(torch.float32).eps
-        zero_point_dtype = torch.int64
+    from torchao.dtypes import PlainLayoutType
+    _apply_int8_dynamic_activation_int8_weight_quant_layout = partial(_apply_int8_dynamic_activation_int8_weight_quant, layout_type=PlainLayoutType())


why not just expose layout_type as an argument for int8_dynamic_activation_int8_weight?

Sure i can refactor

jerryzh168 · 2024-07-27T02:08:13Z

torchao/quantization/quant_api.py

        weight = to_linear_act_quantized(weight, input_quant_func)
        return weight

    return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant)
+
+
+def int8_dynamic_activation_int8_semi_sparse_weight():


sorry I meant that we could remove this, and just use int8_dynamic_activation_int8_weight for sparsity as well

Oh I see, yeah that sounds good to me too.

Summary: This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly. I also added renamed `sparsify` to `sparsify_` and added in a `semi_sparse_weight()` function to be in line with our other APIs. The main code changes are in `torchao/dtypes/affine_quantized_tensor.py`, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like `int_data`. Test Plan: ``` python test/sparsity/test_sparse_api ```

jcaip added 5 commits July 11, 2024 09:48

wip

e8f1fb1

more

4e95ebd

working

f88873c

move from prototype -> quant api

44dadfc

update results

fd26210

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2024

jcaip added 4 commits July 11, 2024 15:14

rename sparsify_

ae8b206

undo benchmark script

b13c6f4

update eval_combo.py

514b74c

fix eval

f8fb6aa

vayuda mentioned this pull request Jul 16, 2024

Refactor layout implementation #491

Merged

jcaip added 8 commits July 18, 2024 10:14

update

11e2534

update

2086394

update README

5f97b88

update READMEs

0971d92

update

e7608cf

update eval_combo

1d3b2cd

update

0d5907c

update init

c1797cb

jerryzh168 reviewed Jul 20, 2024

View reviewed changes

jcaip added 4 commits July 24, 2024 06:59

update

17f0ea1

update

dc0ab16

updated for packed shape

c544449

merge main

83ac9fc

jcaip marked this pull request as ready for review July 24, 2024 15:22

Merge branch 'main' into jcaip/affine-quanti-sparse

a2af83d

jerryzh168 reviewed Jul 25, 2024

View reviewed changes

refactor quant api

ce567c3

jcaip merged commit c9f79be into main Jul 26, 2024
13 checks passed

jerryzh168 reviewed Jul 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sparsity as a AQT Layout #498

Implement sparsity as a AQT Layout #498

jcaip commented Jul 11, 2024 •

edited

Loading

pytorch-bot bot commented Jul 11, 2024 •

edited

Loading

jerryzh168 Jul 20, 2024

jerryzh168 Jul 25, 2024

jcaip Jul 25, 2024

jerryzh168 Jul 25, 2024

jerryzh168 Jul 25, 2024

jcaip Jul 25, 2024

jerryzh168 Jul 27, 2024

jcaip Jul 29, 2024


		m = sparsify(m, to_sparse_semi_structured)
		m = sparsify_(m, semi_sparse_weight())

Implement sparsity as a AQT Layout #498

Implement sparsity as a AQT Layout #498

Conversation

jcaip commented Jul 11, 2024 • edited Loading

pytorch-bot bot commented Jul 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/498

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcaip commented Jul 11, 2024 •

edited

Loading

pytorch-bot bot commented Jul 11, 2024 •

edited

Loading