-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Int4CPULayout and update int4 woq #1278
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1278
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
1b26f26
to
104d1f3
Compare
we are doing a refactor for file structure btw: #1234 might be good to rebase after that is landed |
|
||
__torch_function__ = torch._C._disabled_torch_function_impl | ||
|
||
def get_plain(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an unpack op for tensor core tiled layout now, so this can actually be replaced with a call to the op:
ao/torchao/csrc/cuda/tensor_core_tiled_layout/tensor_core_tiled_layout.cu
Lines 311 to 312 in 39f16f4
m.impl("torchao::unpack_tensor_core_tiled_layout", &_unpack_tensor_core_tiled_layout); | |
m.impl("torchao::dequantize_tensor_core_tiled_layout", &_dequantize_tensor_core_tiled_layout); |
do you plan to write similar ops for cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noticed this, but I have no bandwidth to do so these days. If you are not urgent for this feature, I can take this task.
cc @mingfeima
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be great, thanks @yanbing-j , this is not urgent
torchao/quantization/utils.py
Outdated
# if int_data_device_type == "mps": | ||
# int_data = int_data.cpu() | ||
if int_data_device_type != "cpu": | ||
int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8) | ||
# if int_data_device_type == "mps": | ||
# int_data = int_data.to(device="mps") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the code that's commented out
is this equivalent to previous code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #517 (comment), <<
can be used in MPS backend, don't need to convert to CPU
and use CPU
backend. Since I don't have mps machine, I want to use CI to check if this can work. Otherwise, I can update to int_data = (torch.bitwise_left_shift(int_data[::, ::2], 4) | int_data[::, 1::2]).to(torch.uint8)
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be a separate PR, but can you also help add support for conversion between int4 tensor core tiled layout and int4 cpu layout, we may need a separate util for this, like we discussed in the issue: #1117 (comment)
right now we error out when converting between different devices
ao/torchao/dtypes/affine_quantized_tensor.py
Lines 1486 to 1489 in 39f16f4
if not is_device(torch.device(self.device).type, device): | |
raise ValueError( | |
f"TensorCoreTiledAQTTensorImpl does not support conversion from {self.device} to {device}" | |
) |
Test can be added in
ao/test/dtypes/test_affine_quantized.py
Line 44 in 39f16f4
class TestAffineQuantized(TestCase): |
512eb75
to
98b8f8c
Compare
I think you should also unpin pytorch version to get the latest op changes: #1283 |
98b8f8c
to
a870ed0
Compare
Hi @jerryzh168 , I have updated to fix CI and involve PyTorch nightly in. Could you please take a look? I tested 2.3.0, 2.4.1, 2.5.1 and 2.6 in CPU in my local. |
@yanbing-j we just landed a large refactor PR, can you rebase? |
d3ecc01
to
b1c40ad
Compare
@jerryzh168 I have rebased, could you please take a look? |
Thanks for looking into this @yanbing-j Eagerly awaiting to pick it up in pytorch/torchchat#1367 |
@yanbing-j there is some failures: https://github.com/pytorch/ao/actions/runs/11886684924/job/33154937493?pr=1278 can you fix them? |
@jerryzh168 Please review again. |
5204a85
to
1f6d59d
Compare
@jerryzh168 Please review again. |
Hi @jerryzh168 , 2 failures in CUDA nightly cannot be reproduced in A100 with |
@@ -70,6 +70,12 @@ jobs: | |||
torch-spec: 'torch==2.5.1 --index-url https://download.pytorch.org/whl/cu121' | |||
gpu-arch-type: "cuda" | |||
gpu-arch-version: "12.1" | |||
- name: CUDA Nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these tests added? can you rebase on main? I think we have some recent changes to the CI jobs:
ao/.github/workflows/regression_test.yml
Line 21 in 72fb597
test-nightly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed https://github.com/pytorch/ao/pull/1283/files#diff-87efadde54d371bf3d7330fff24599a85d32c4982530dd4c7d7d7855e76489bbL28-L33 to update nightly. I have removed CUDA nightly and CPU nightly already.
also do you know the issue with xpu job errors in current main: https://github.com/pytorch/ao/actions/runs/11942397686/job/33289365532 |
40cabd1
to
25b9460
Compare
@jerryzh168 I saw these XPU failures are related to Windows. Please involve @EikanWang inside. |
the error still seems valid: https://github.com/pytorch/ao/actions/runs/11945348130/job/33299816407?pr=1278 |
It cannot be reproduced in A100 with |
@jerryzh168 Sorry, I still cannot reproduce in A100. Could you please help make a try? Thanks!
torch 2.6.0.dev20241120+cu124 |
there is some issue with pytorch nightly version I think, I saw: Downloading https://download.pytorch.org/whl/nightly/cu121/torch-2.6.0.dev20241112%2Bcu121-cp39-cp39-linux_x86_64.whl (767.9 MB) when I'm installing locally, I also installed: looks like the latest cu121 is: 1112+cu121 in https://download.pytorch.org/whl/nightly/torch/ right now |
Oh, you are right. For cu121, the latest is 1112 nightly, which does not include pytorch/pytorch#139611 (20241112 merged into PyTorch). And for cu124, the latest is 1121, that's why I cannot reproduce. So, can this PR be merged since this is a platform related issue, and can be regarded as a known issue before CI upgrades to cu124? @jerryzh168 |
let's upgrade CI to use 12.4 first, I heard 12.1 is deprecated in newer pytorch versions: pytorch/pytorch#138609 |
* Update nightly job to use 12.4 since 12.1 is deprecated #1278 (comment) * skip failed tests
* Update nightly job to use 12.4 since 12.1 is deprecated pytorch#1278 (comment) * skip failed tests
25b9460
to
67dcc43
Compare
…ed values (pytorch#1359) * Update cli.py to make --device/--dtype pre-empt quantize dict-specified values Users may expect that cli parameters override the JSON, as per pytorch#1278. Invert logic - case split: 1 - if none (no value) is specified, use value specified in quantize dict, if present; else 2 - if value is specified, override the respective handler if present. * Fix typo in cli.py fix typo --------- Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>
@@ -383,3 +393,251 @@ def get_plain(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: | |||
|
|||
def get_layout(self) -> Layout: | |||
return self._layout | |||
|
|||
|
|||
@dataclass(frozen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry missed this one, it should have a separate file since it's a different layout, cc @yanbing-j can you help move this to a separate file under the same directly? (int4_cpu_layout.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 Okay, here is the PR #1419.
pytorch/pytorch#139611 is merged into PyTorch main branch.