-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sparsity as a AQT Layout #498
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/498
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ce567c3 with merge base e5df48e (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -460,3 +460,47 @@ def get_per_token_block_size(x): | |||
return weight | |||
|
|||
return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant) | |||
|
|||
|
|||
def int8_dynamic_activation_int8_semi_sparse_weight(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have similar config as int8_dynamic_activation_int8_weight
? if so we can add a layout_type arg to that function directly
|
||
m = sparsify(m, to_sparse_semi_structured) | ||
m = sparsify_(m, semi_sparse_weight()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does semi_sparse_weight
have to talk about dtype as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will work for bf16, fp16, and fp32, so i don't think specifying the dtype makes sense. Maybe dense_activation_semi_sparse_weight
to keep it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then it's fine. we have int4_weight_only()
as well so I feel it's fine that we don't mention activation
(we could remove only
as well)
torchao/quantization/quant_api.py
Outdated
eps = torch.finfo(torch.float32).eps | ||
zero_point_dtype = torch.int64 | ||
from torchao.dtypes import PlainLayoutType | ||
_apply_int8_dynamic_activation_int8_weight_quant_layout = partial(_apply_int8_dynamic_activation_int8_weight_quant, layout_type=PlainLayoutType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just expose layout_type
as an argument for int8_dynamic_activation_int8_weight
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure i can refactor
weight = to_linear_act_quantized(weight, input_quant_func) | ||
return weight | ||
|
||
return _get_linear_subclass_inserter(apply_int8_dynamic_activation_int8_weight_quant) | ||
|
||
|
||
def int8_dynamic_activation_int8_semi_sparse_weight(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant that we could remove this, and just use int8_dynamic_activation_int8_weight
for sparsity as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, yeah that sounds good to me too.
Summary: This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly. I also added renamed `sparsify` to `sparsify_` and added in a `semi_sparse_weight()` function to be in line with our other APIs. The main code changes are in `torchao/dtypes/affine_quantized_tensor.py`, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like `int_data`. Test Plan: ``` python test/sparsity/test_sparse_api ```
Summary: This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly. I also added renamed `sparsify` to `sparsify_` and added in a `semi_sparse_weight()` function to be in line with our other APIs. The main code changes are in `torchao/dtypes/affine_quantized_tensor.py`, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor like `int_data`. Test Plan: ``` python test/sparsity/test_sparse_api ```
Summary:
This PR adds in sparsity as an AQTLayout, previously it was implemented using the QuantizedLinearBase subclass that will be deprecated shortly.
I also added renamed
sparsify
tosparsify_
and added in asemi_sparse_weight()
function to be in line with our other APIs.The main code changes are in
torchao/dtypes/affine_quantized_tensor.py
, for the semi-structured cusparselt representation, we can reuse a lot of the existing PlainLayout implementation, since the compressed representation is stored in a single tensor likeint_data
.Test Plan: