Fix int4pack_mm error #517

yanbing-j · 2024-07-17T08:30:35Z

Need update meta shape in PyTorch first pytorch/pytorch#130915.

pytorch-bot · 2024-07-17T08:30:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/517

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8aadb7d with merge base afde175 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

int4 tinygemm quantization is currently broken in master and being fixed in #517. Let's skip these tests for now until that is fixed.

manuelcandales · 2024-07-25T19:56:39Z

torchao/quantization/utils.py

@@ -349,6 +350,8 @@ def groupwise_affine_quantize_tensor_from_qparams(
    quant_max = 2 ** n_bit - 1

    int_data = quantize_affine(w, block_size, scales, zeros, output_dtype, quant_min, quant_max, zero_point_domain = ZeroPointDomain.FLOAT)
+    if TORCH_VERSION_AFTER_2_5:
+        int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)


This should break on MPS backend, since __lshift__.Scalar is not currently implemented for MPS

Is int_data in MPS device in this function? If so, we can make int_data in cpu device, then convert back to MPS device.

@malfet landed pytorch/pytorch#131813, so this won't be a problem anymore

In any case, I learned from @malfet today (see his suggestion on line 203) that if instead of using << in here, we use torch.bitwise_left_shift(x, 4), it would be falling back to cpu. So, things would work even prior to his PR having landed, if torch.bitwise_left_shift is used instead of <<

Thanks for the clarification. With pytorch/pytorch#131813, __lshift__.Scalar has MPS dispatch now.

msaroufim · 2024-07-25T20:28:24Z

@yanbing-j what's the status on this PR? If a breaking change requires more than 1 week of work to figure out on our end the right solution is to revert the offending PR

yanbing-j · 2024-07-26T01:28:13Z

@msaroufim This PR is pending on pytorch/pytorch#130915, which is blocked by the RuntimeError: CUDA error: invalid device function when using OpInfo.
After pytorch/pytorch#130915 is merged into PyTorch, current PR can fix int4 error in torchao.

yanbing-j · 2024-07-26T02:02:13Z

@msaroufim I update pytorch/pytorch#130915 not to use OpInfo.

malfet

[EDIT] Please ignore, both CUDA and MPS change will land at the same time

malfet · 2024-07-26T15:35:01Z

torchao/quantization/utils.py

@@ -349,6 +350,8 @@ def groupwise_affine_quantize_tensor_from_qparams(
    quant_max = 2 ** n_bit - 1

    int_data = quantize_affine(w, block_size, scales, zeros, output_dtype, quant_min, quant_max, zero_point_domain = ZeroPointDomain.FLOAT)
+    if TORCH_VERSION_AFTER_2_5:
+        int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)


Suggested change

int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8)

int_data = (torch.bitwise_left_shift(int_data[::, ::2], 4) | int_data[::, 1::2]).to(torch.uint8)

malfet · 2024-07-26T15:35:17Z

torchao/prototype/hqq/hqq_tinygemm_linear.py

@@ -198,6 +199,8 @@ def hqq_quants_to_torch_quants(
            .reshape(shape)
            .contiguous()
        )
+        if TORCH_VERSION_AFTER_2_5:
+            W_q = (W_q[::, ::2] << 4 | W_q[::, 1::2]).to(torch.uint8)


Suggested change

W_q = (W_q[::, ::2] << 4 | W_q[::, 1::2]).to(torch.uint8)

W_q = (torch.bitwise_left_shift(W_q[::, ::2], 4) | W_q[::, 1::2]).to(torch.uint8)

msaroufim · 2024-07-28T18:42:56Z

Hi @yanbing-j just a heads up since I haven't seen CI be green, we're planning a release on Friday Aug 8 and doing a codefreeze on Friday Aug 2 so if this PR can't be landed in by this Wednesday I will have no choice but to revert your changes in core since this is a feature we have customers depend on such as https://github.com/mobiusml/hqq

yanbing-j · 2024-07-29T02:18:08Z

@msaroufim Thanks for the information. Could you please start this CI again? Thanks!

yanbing-j · 2024-07-29T02:47:06Z

@msaroufim @jerryzh168 I find pytorch/pytorch@6de65d5 will break test_int8_weight_only_quant_subclass and test_int4_weight_only_quant_subclass_api. Today's nightly can work, but tomorrow's will not.
Test plan:
python test/integration/test_integration.py -k test_int8_weight_only_quant_subclass_api
python test/integration/test_integration.py -k test_int4_weight_only_quant_subclass_api

msaroufim · 2024-07-29T16:40:07Z

Thanks @yanbing-j!

pytorch/pytorch@6de65d5 was reverted so indeed should only see breakages for 1 day

int4 tinygemm quantization is currently broken in master and being fixed in pytorch#517. Let's skip these tests for now until that is fixed.

* Fix int4pack_mm error * fix CI * Fix CI * Fix CI * Fix CI * Fix CI

* Update iOS.md * Update iOS.md

* make --device fast the default * Update iOS.md (pytorch#517) * Update iOS.md * Update iOS.md * Pip to pip3 (pytorch#504) * remove macos-12 test * pip to pip3 * break aoti CI jobs separately (pytorch#500) * init * fixes * more fixes * fixes * fix * fix * bug fix * add objcopy update * suppress int8 * undefined variable --------- Co-authored-by: Michael Gschwind <mikekg@meta.com> * Support llama3 in chat in run.cpp (pytorch#486) * refactor chat runner in preparation for llama3 * add sketch for llama3 prompt template and move to returning tokens * fix tiktoken * fixes to chat * add default llama_ver * Add tests for quantize json, add cuda device specification and precision to cuda.json (pytorch#519) * remove code for no KV Cache path (pytorch#527) * Update ADVANCED-USERS.md (pytorch#529) Update Advanced Users description to reflect changes in the repo since the description was initially created. * runner-aoti on cuda (pytorch#531) * runner-aoti on cuda * transfer results back to CPU * transfer results back to CPU * runner-aoti on cuda * Update runner_build.md (pytorch#530) Update description of runner and build process in runner_build.md * clean up runner code a little (pytorch#532) * clean up runner code a little * update * update * pull out generate loop in chat * updates * edit docs * typo * move int8 linear class and function into qops.py (pytorch#534) * add dtype tests for runner-aoti + runner-et (pytorch#539) * add dtype tests for runner-aoti + runner-et * typo * Quantized embedding (pytorch#536) * move int8 linear class and function into qops.py * move Quantized Embedding to qops.py * Move Linear int4 to qops (pytorch#537) * move int8 linear class and function into qops.py * move Quantized Embedding to qops.py * move int4 linear to qops * Revert "add dtype tests for runner-aoti + runner-et (pytorch#539)" (pytorch#548) This reverts commit a7a24577a65be67ac9ae4dc05452f35d9c49e5d1. * fix generate for llama3 (pytorch#538) * fix generate for llama3 * switch more things to C * remove C++ header * add delegation visualization instructions (pytorch#551) * Add dtype runner aoti (pytorch#552) * add dtype tests for runner-aoti + runner-et * typo * add dtype test runner-aoti * test sdpa with fp16 (pytorch#553) * test sdpa with fp16 * kv cache fp32 * typo * update (pytorch#560) * Only support newest versions of lm-eval (pytorch#556) Summary: remove support for lm-eval 0.3 to reduce the options we have Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * split cpu eval CI by dtype (pytorch#554) * split cpu eval CI by dtype * fix * differentiate names with checks * keep one name the same as old * fix * Removing duplicate HF issue message from README (pytorch#559) Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> * doc updates (pytorch#567) * Add VM-safe MPS check --------- Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in> Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: lucylq <lfq@meta.com> Co-authored-by: Jerry Zhang <jerryzh168@gmail.com> Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>

* code beautification * code beautification, move functions together * make --device fast the default (pytorch#515) * make --device fast the default * Update iOS.md (pytorch#517) * Update iOS.md * Update iOS.md * Pip to pip3 (pytorch#504) * remove macos-12 test * pip to pip3 * break aoti CI jobs separately (pytorch#500) * init * fixes * more fixes * fixes * fix * fix * bug fix * add objcopy update * suppress int8 * undefined variable --------- Co-authored-by: Michael Gschwind <mikekg@meta.com> * Support llama3 in chat in run.cpp (pytorch#486) * refactor chat runner in preparation for llama3 * add sketch for llama3 prompt template and move to returning tokens * fix tiktoken * fixes to chat * add default llama_ver * Add tests for quantize json, add cuda device specification and precision to cuda.json (pytorch#519) * remove code for no KV Cache path (pytorch#527) * Update ADVANCED-USERS.md (pytorch#529) Update Advanced Users description to reflect changes in the repo since the description was initially created. * runner-aoti on cuda (pytorch#531) * runner-aoti on cuda * transfer results back to CPU * transfer results back to CPU * runner-aoti on cuda * Update runner_build.md (pytorch#530) Update description of runner and build process in runner_build.md * clean up runner code a little (pytorch#532) * clean up runner code a little * update * update * pull out generate loop in chat * updates * edit docs * typo * move int8 linear class and function into qops.py (pytorch#534) * add dtype tests for runner-aoti + runner-et (pytorch#539) * add dtype tests for runner-aoti + runner-et * typo * Quantized embedding (pytorch#536) * move int8 linear class and function into qops.py * move Quantized Embedding to qops.py * Move Linear int4 to qops (pytorch#537) * move int8 linear class and function into qops.py * move Quantized Embedding to qops.py * move int4 linear to qops * Revert "add dtype tests for runner-aoti + runner-et (pytorch#539)" (pytorch#548) This reverts commit a7a24577a65be67ac9ae4dc05452f35d9c49e5d1. * fix generate for llama3 (pytorch#538) * fix generate for llama3 * switch more things to C * remove C++ header * add delegation visualization instructions (pytorch#551) * Add dtype runner aoti (pytorch#552) * add dtype tests for runner-aoti + runner-et * typo * add dtype test runner-aoti * test sdpa with fp16 (pytorch#553) * test sdpa with fp16 * kv cache fp32 * typo * update (pytorch#560) * Only support newest versions of lm-eval (pytorch#556) Summary: remove support for lm-eval 0.3 to reduce the options we have Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * split cpu eval CI by dtype (pytorch#554) * split cpu eval CI by dtype * fix * differentiate names with checks * keep one name the same as old * fix * Removing duplicate HF issue message from README (pytorch#559) Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> * doc updates (pytorch#567) * Add VM-safe MPS check --------- Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in> Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: lucylq <lfq@meta.com> Co-authored-by: Jerry Zhang <jerryzh168@gmail.com> Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com> * add unpacking support (pytorch#525) * add unpacking support * fix typos and linter * perform parallel prefill when possible (pytorch#568) * perform parallel prefill when possible * typo * disable hack * remove print * remove debug messages which prevent export * fixes * stream results in generate.py (#571) * remove logging interfering with export --------- Co-authored-by: Anthony Shoumikhin <anthony@shoumikh.in> Co-authored-by: metascroy <161522778+metascroy@users.noreply.github.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: lucylq <lfq@meta.com> Co-authored-by: Jerry Zhang <jerryzh168@gmail.com> Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2024

svekars requested review from msaroufim, jerryzh168 and andrewor14 and removed request for andrewor14, msaroufim and jerryzh168 July 17, 2024 15:26

andrewor14 added a commit that referenced this pull request Jul 17, 2024

Skip int4 QAT tests for nightly for now

fa233ec

int4 tinygemm quantization is currently broken in master and being fixed in #517. Let's skip these tests for now until that is fixed.

andrewor14 mentioned this pull request Jul 17, 2024

Skip int4 QAT tests for nightly for now #521

Merged

andrewor14 added a commit that referenced this pull request Jul 17, 2024

Skip int4 QAT tests for nightly for now

3a75936

int4 tinygemm quantization is currently broken in master and being fixed in #517. Let's skip these tests for now until that is fixed.

andrewor14 added a commit that referenced this pull request Jul 17, 2024

Skip int4 QAT tests for nightly for now (#521)

ec95afd

int4 tinygemm quantization is currently broken in master and being fixed in #517. Let's skip these tests for now until that is fixed.

andrewor14 requested review from jerryzh168 and msaroufim July 18, 2024 15:11

yanbing-j force-pushed the yanbing/fix_int4_woq branch 2 times, most recently from 49b47a2 to a11e455 Compare July 19, 2024 03:48

andrewor14 requested a review from HDCharles July 19, 2024 14:25

This was referenced Jul 25, 2024

Adapt to _convert_weight_to_int4pack new behavior #541

Closed

update the input weight of _convert_weight_to_int4pack to [n][k / 2] uint8 pytorch/pytorch#129940

Closed

manuelcandales reviewed Jul 25, 2024

View reviewed changes

yanbing-j mentioned this pull request Jul 26, 2024

Fix meta error in _convert_weight_to_int4pack pytorch/pytorch#130915

Closed

malfet reviewed Jul 26, 2024

View reviewed changes

yanbing-j added 4 commits July 27, 2024 19:56

Fix int4pack_mm error

283fa73

fix CI

cb4a5e4

Fix CI

b30667e

Fix CI

5f41c1e

yanbing-j force-pushed the yanbing/fix_int4_woq branch from ecd2a86 to 5f41c1e Compare July 28, 2024 02:57

Fix CI

bd3b79a

Fix CI

8aadb7d

yanbing-j force-pushed the yanbing/fix_int4_woq branch from f03a014 to 8aadb7d Compare July 29, 2024 06:37

msaroufim approved these changes Jul 29, 2024

View reviewed changes

msaroufim merged commit 8fa11a6 into pytorch:main Jul 29, 2024
13 checks passed

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Skip int4 QAT tests for nightly for now (pytorch#521)

53562fe

int4 tinygemm quantization is currently broken in master and being fixed in pytorch#517. Let's skip these tests for now until that is fixed.

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Fix int4pack_mm error (pytorch#517)

4563492

* Fix int4pack_mm error * fix CI * Fix CI * Fix CI * Fix CI * Fix CI

yanbing-j mentioned this pull request Nov 14, 2024

Add Int4CPULayout and update int4 woq #1278

Merged

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Update iOS.md (pytorch#517)

da07eea

* Update iOS.md * Update iOS.md

	int_data = (int_data[::, ::2] << 4 \| int_data[::, 1::2]).to(torch.uint8)
	int_data = (torch.bitwise_left_shift(int_data[::, ::2], 4) \| int_data[::, 1::2]).to(torch.uint8)

	W_q = (W_q[::, ::2] << 4 \| W_q[::, 1::2]).to(torch.uint8)
	W_q = (torch.bitwise_left_shift(W_q[::, ::2], 4) \| W_q[::, 1::2]).to(torch.uint8)

Fix int4pack_mm error #517

Fix int4pack_mm error #517

Uh oh!

Conversation

yanbing-j commented Jul 17, 2024

Uh oh!

pytorch-bot bot commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/517

✅ No Failures

Uh oh!

manuelcandales Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanbing-j Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

manuelcandales Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

manuelcandales Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanbing-j Jul 28, 2024

Choose a reason for hiding this comment

Uh oh!

msaroufim commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanbing-j commented Jul 26, 2024

Uh oh!

yanbing-j commented Jul 26, 2024

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

malfet Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

msaroufim commented Jul 28, 2024

Uh oh!

yanbing-j commented Jul 29, 2024

Uh oh!

yanbing-j commented Jul 29, 2024

Uh oh!

msaroufim commented Jul 29, 2024

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2024 •

edited

Loading

manuelcandales Jul 25, 2024 •

edited

Loading

manuelcandales Jul 26, 2024 •

edited

Loading

msaroufim commented Jul 25, 2024 •

edited

Loading

malfet left a comment •

edited

Loading