Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sparsity as a AQT Layout #498

Merged
merged 23 commits into from
Jul 26, 2024
Merged

Implement sparsity as a AQT Layout #498

merged 23 commits into from
Jul 26, 2024

Conversation

jcaip
Copy link
Contributor

@jcaip jcaip commented Jul 11, 2024

Summary:

This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly.

I also added renamed sparsify to sparsify_ and added in a semi_sparse_weight() function to be in line with our other APIs.

The main code changes are in torchao/dtypes/affine_quantized_tensor.py, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like int_data.

Test Plan:

python test/sparsity/test_sparse_api

Copy link

pytorch-bot bot commented Jul 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/498

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce567c3 with merge base e5df48e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2024
@@ -460,3 +460,47 @@ def get_per_token_block_size(x):
return weight

return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant)


def int8_dynamic_activation_int8_semi_sparse_weight():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have similar config as int8_dynamic_activation_int8_weight? if so we can add a layout_type arg to that function directly

@jcaip jcaip marked this pull request as ready for review July 24, 2024 15:22

m = sparsify(m, to_sparse_semi_structured)
m = sparsify_(m, semi_sparse_weight())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does semi_sparse_weight have to talk about dtype as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will work for bf16, fp16, and fp32, so i don't think specifying the dtype makes sense. Maybe dense_activation_semi_sparse_weight to keep it consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then it's fine. we have int4_weight_only() as well so I feel it's fine that we don't mention activation

(we could remove only as well)

eps = torch.finfo(torch.float32).eps
zero_point_dtype = torch.int64
from torchao.dtypes import PlainLayoutType
_apply_int8_dynamic_activation_int8_weight_quant_layout = partial(_apply_int8_dynamic_activation_int8_weight_quant, layout_type=PlainLayoutType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just expose layout_type as an argument for int8_dynamic_activation_int8_weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure i can refactor

@jcaip jcaip merged commit c9f79be into main Jul 26, 2024
13 checks passed
weight = to_linear_act_quantized(weight, input_quant_func)
return weight

return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant)


def int8_dynamic_activation_int8_semi_sparse_weight():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant that we could remove this, and just use int8_dynamic_activation_int8_weight for sparsity as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, yeah that sounds good to me too.

Hanxian97 pushed a commit that referenced this pull request Jul 29, 2024
Summary:

This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly. 

I also added renamed `sparsify` to `sparsify_` and added in a `semi_sparse_weight()` function to be in line with our other APIs. 

The main code changes are in `torchao/dtypes/affine_quantized_tensor.py`, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like `int_data`. 

Test Plan:
```
python test/sparsity/test_sparse_api
```
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary:

This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly. 

I also added renamed `sparsify` to `sparsify_` and added in a `semi_sparse_weight()` function to be in line with our other APIs. 

The main code changes are in `torchao/dtypes/affine_quantized_tensor.py`, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like `int_data`. 

Test Plan:
```
python test/sparsity/test_sparse_api
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants