Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix int4pack_mm error #517

Merged
merged 6 commits into from
Jul 29, 2024
Merged

Conversation

yanbing-j
Copy link
Contributor

Need update meta shape in PyTorch first pytorch/pytorch#130915.

Copy link

pytorch-bot bot commented Jul 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/517

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8aadb7d with merge base afde175 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2024
@svekars svekars requested review from msaroufim, jerryzh168 and andrewor14 and removed request for andrewor14, msaroufim and jerryzh168 July 17, 2024 15:26
andrewor14 added a commit that referenced this pull request Jul 17, 2024
int4 tinygemm quantization is currently broken in master and
being fixed in #517. Let's
skip these tests for now until that is fixed.
andrewor14 added a commit that referenced this pull request Jul 17, 2024
int4 tinygemm quantization is currently broken in master and
being fixed in #517. Let's
skip these tests for now until that is fixed.
andrewor14 added a commit that referenced this pull request Jul 17, 2024
int4 tinygemm quantization is currently broken in master and
being fixed in #517. Let's
skip these tests for now until that is fixed.
@yanbing-j yanbing-j force-pushed the yanbing/fix_int4_woq branch 2 times, most recently from 49b47a2 to a11e455 Compare July 19, 2024 03:48
@@ -349,6 +350,8 @@ def groupwise_affine_quantize_tensor_from_qparams(
quant_max = 2 ** n_bit - 1

int_data = quantize_affine(w, block_size, scales, zeros, output_dtype, quant_min, quant_max, zero_point_domain = ZeroPointDomain.FLOAT)
if TORCH_VERSION_AFTER_2_5:
int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)
Copy link
Contributor

@manuelcandales manuelcandales Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should break on MPS backend, since __lshift__.Scalar is not currently implemented for MPS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is int_data in MPS device in this function? If so, we can make int_data in cpu device, then convert back to MPS device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet landed pytorch/pytorch#131813, so this won't be a problem anymore

Copy link
Contributor

@manuelcandales manuelcandales Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I learned from @malfet today (see his suggestion on line 203) that if instead of using << in here, we use torch.bitwise_left_shift(x, 4), it would be falling back to cpu. So, things would work even prior to his PR having landed, if torch.bitwise_left_shift is used instead of <<

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. With pytorch/pytorch#131813, __lshift__.Scalar has MPS dispatch now.

@msaroufim
Copy link
Member

msaroufim commented Jul 25, 2024

@yanbing-j what's the status on this PR? If a breaking change requires more than 1 week of work to figure out on our end the right solution is to revert the offending PR

@yanbing-j
Copy link
Contributor Author

@msaroufim This PR is pending on pytorch/pytorch#130915, which is blocked by the RuntimeError: CUDA error: invalid device function when using OpInfo.
After pytorch/pytorch#130915 is merged into PyTorch, current PR can fix int4 error in torchao.

@yanbing-j
Copy link
Contributor Author

@msaroufim I update pytorch/pytorch#130915 not to use OpInfo.

Copy link

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[EDIT] Please ignore, both CUDA and MPS change will land at the same time

@@ -349,6 +350,8 @@ def groupwise_affine_quantize_tensor_from_qparams(
quant_max = 2 ** n_bit - 1

int_data = quantize_affine(w, block_size, scales, zeros, output_dtype, quant_min, quant_max, zero_point_domain = ZeroPointDomain.FLOAT)
if TORCH_VERSION_AFTER_2_5:
int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)
int_data = (torch.bitwise_left_shift(int_data[::, ::2], 4) | int_data[::, 1::2]).to(torch.uint8)

@@ -198,6 +199,8 @@ def hqq_quants_to_torch_quants(
.reshape(shape)
.contiguous()
)
if TORCH_VERSION_AFTER_2_5:
W_q = (W_q[::, ::2] << 4 | W_q[::, 1::2]).to(torch.uint8)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
W_q = (W_q[::, ::2] << 4 | W_q[::, 1::2]).to(torch.uint8)
W_q = (torch.bitwise_left_shift(W_q[::, ::2], 4) | W_q[::, 1::2]).to(torch.uint8)

@msaroufim
Copy link
Member

Hi @yanbing-j just a heads up since I haven't seen CI be green, we're planning a release on Friday Aug 8 and doing a codefreeze on Friday Aug 2 so if this PR can't be landed in by this Wednesday I will have no choice but to revert your changes in core since this is a feature we have customers depend on such as https://github.com/mobiusml/hqq

@yanbing-j
Copy link
Contributor Author

@msaroufim Thanks for the information. Could you please start this CI again? Thanks!

@yanbing-j
Copy link
Contributor Author

@msaroufim @jerryzh168 I find pytorch/pytorch@6de65d5 will break test_int8_weight_only_quant_subclass and test_int4_weight_only_quant_subclass_api. Today's nightly can work, but tomorrow's will not.
Test plan:
python test/integration/test_integration.py -k test_int8_weight_only_quant_subclass_api
python test/integration/test_integration.py -k test_int4_weight_only_quant_subclass_api

@msaroufim
Copy link
Member

Thanks @yanbing-j!

pytorch/pytorch@6de65d5 was reverted so indeed should only see breakages for 1 day

@msaroufim msaroufim merged commit 8fa11a6 into pytorch:main Jul 29, 2024
13 checks passed
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
int4 tinygemm quantization is currently broken in master and
being fixed in pytorch#517. Let's
skip these tests for now until that is fixed.
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
* Fix int4pack_mm error

* fix CI

* Fix CI

* Fix CI

* Fix CI

* Fix CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants