Fix WOQ int8 failures #884

leslie-fang-intel · 2024-09-13T03:21:56Z

Fix WOQ int8 failures as discussed in #843 (comment)

pytorch-bot · 2024-09-13T03:22:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/884

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 26e9c44 with merge base 8236a87 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int8woqtensors_1_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

leslie-fang-intel · 2024-09-13T03:22:23Z

cc @jerryzh168 @bdhirsh

jerryzh168 · 2024-09-13T03:59:11Z

test/integration/test_integration.py

- if not TORCH_VERSION_AT_LEAST_2_5:
+ if (
+ not TORCH_VERSION_AT_LEAST_2_5
+ ) or torch._inductor.config.freezing:


when is freezing set? how do user know that they need to call unwrap_tensor_subclass in that case?

Hi Jerry, freezing is setting in

ao/test/integration/test_integration.py

Line 829 in 8236a87

@torch._inductor.config.patch({"freezing": True})

when running the UT. User can also set it as torch._inductor.config.freezing=True in script or using the environment variable as TORCHINDUCTOR_FREEZING=1

how do user know that they need to call unwrap_tensor_subclass in that case?

That's a good question. Did you mean we should write document somewhere? And the root-cause for why freezing needs unwrap_tensor_subclass is in pytorch/pytorch#123522.

I see, thanks for the context! yeah I think it would be good to add this in the README: https://github.com/pytorch/ao/tree/main/torchao/quantization#workaround-with-unwrap_tensor_subclass-for-export-aoti-and-torchcompile-pytorch-24-and-before-only and link the issue as well

Thanks and adding the readme with issue link in this PR. please help to take a look again.

jerryzh168

LGTM

jerryzh168 · 2024-09-16T20:18:18Z

please fix CI before landing

leslie-fang-intel · 2024-09-16T23:20:39Z

please fix CI before landing

Thanks @jerryzh168, we should land the PR in PyTorch at first.
Could you help to take a review pytorch/pytorch#135928?

leslie-fang-intel · 2024-09-20T00:38:28Z

Hi @jerryzh168, since pytorch/pytorch#135928 has been merged into PyTorch, could you approve the CI run for this PR to check if it passes all tests?

leslie-fang-intel · 2024-09-20T01:08:58Z

It seems test_int8_weight_only_quant_subclass_api with float16 inputs still failed which is probably due to numerical issue. Will take a further look.

leslie-fang-intel · 2024-09-20T07:55:20Z

Creating another PR: pytorch/pytorch#136353 to fix the float16 input failure.

Fix the correctness issue of pytorch/ao#884. The current implementation for converting between `Half/BFloat16` and `int8/uint8` incorrectly assumes that 1/4 of the int8/uint8 vector lane maps to 1/2 of the Half/BFloat16 vector lane. This assumption leads to accuracy issues after the full bit-width vectorization of the Half data type was introduced. When converting between int8 weights and the half data type, the generated code is as the following: ``` #include "/tmp/torchinductor_leslie/xw/cxww3s7wxrujoyxna7mlcjktid2uu6nntixqwm542xfkd756gl3x.h" extern "C" void kernel(const int8_t* in_ptr0, half* out_ptr0) { { for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(2048L); x0+=static_cast<int64_t>(32L)) { auto tmp0 = at::vec::Vectorized<int8_t>::loadu(in_ptr0 + static_cast<int64_t>(x0), static_cast<int64_t>(32)); auto tmp1 = at::vec::convert<half>(tmp0); tmp1.store(out_ptr0 + static_cast<int64_t>(x0), static_cast<int64_t>(32)); } } } ``` In this PR, we address the issue by changing the implementation to convert 1/2 of the int8/uint8 vector lane into a full vector lane of Half/BFloat16. **TestPlan** * AO: `python test/integration/test_integration.py -k test_int8_weight_only_quant_subclass_api` * `python -u -m pytest -s -v test/inductor/test_cpu_repro.py -k test_convert_int8_to_half_vec` * Due to the CPP backend legalization pass, we are unable to create a unit test to simulate the conversion from `Half` to `int8`. Instead, we rely on a C++ test case. * `./build/bin/vec_test_all_types_AVX512 --gtest_filter="VecConvertTestsReducedFloat/*.ConvertReduced"` * `./build/bin/vec_test_all_types_AVX2 --gtest_filter="VecConvertTestsReducedFloat/*.ConvertReduced"` Pull Request resolved: #136353 Approved by: https://github.com/jgong5, https://github.com/jerryzh168

Fix the correctness issue of pytorch/ao#884. The current implementation for converting between `Half/BFloat16` and `int8/uint8` incorrectly assumes that 1/4 of the int8/uint8 vector lane maps to 1/2 of the Half/BFloat16 vector lane. This assumption leads to accuracy issues after the full bit-width vectorization of the Half data type was introduced. When converting between int8 weights and the half data type, the generated code is as the following: ``` #include "/tmp/torchinductor_leslie/xw/cxww3s7wxrujoyxna7mlcjktid2uu6nntixqwm542xfkd756gl3x.h" extern "C" void kernel(const int8_t* in_ptr0, half* out_ptr0) { { for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(2048L); x0+=static_cast<int64_t>(32L)) { auto tmp0 = at::vec::Vectorized<int8_t>::loadu(in_ptr0 + static_cast<int64_t>(x0), static_cast<int64_t>(32)); auto tmp1 = at::vec::convert<half>(tmp0); tmp1.store(out_ptr0 + static_cast<int64_t>(x0), static_cast<int64_t>(32)); } } } ``` In this PR, we address the issue by changing the implementation to convert 1/2 of the int8/uint8 vector lane into a full vector lane of Half/BFloat16. **TestPlan** * AO: `python test/integration/test_integration.py -k test_int8_weight_only_quant_subclass_api` * `python -u -m pytest -s -v test/inductor/test_cpu_repro.py -k test_convert_int8_to_half_vec` * Due to the CPP backend legalization pass, we are unable to create a unit test to simulate the conversion from `Half` to `int8`. Instead, we rely on a C++ test case. * `./build/bin/vec_test_all_types_AVX512 --gtest_filter="VecConvertTestsReducedFloat/*.ConvertReduced"` * `./build/bin/vec_test_all_types_AVX2 --gtest_filter="VecConvertTestsReducedFloat/*.ConvertReduced"` Pull Request resolved: pytorch#136353 Approved by: https://github.com/jgong5, https://github.com/jerryzh168

Fix WOQ int8 failures

18585d2

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 13, 2024

leslie-fang-intel mentioned this pull request Sep 13, 2024

Revert "Pin PT version: Fix FPX Inductor error" #843

Merged

jerryzh168 reviewed Sep 13, 2024

View reviewed changes

This was referenced Sep 13, 2024

[AO][Inductor] Enable WOQ fusion pattern with permute pytorch/pytorch#135928

Closed

[torchao]NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.permute.default #890

Closed

jerryzh168 approved these changes Sep 14, 2024

View reviewed changes

leslie-fang-intel force-pushed the leslie/fix_woq_int8_failure branch 2 times, most recently from 5fa2288 to 828571c Compare September 20, 2024 00:33

leslie-fang-intel mentioned this pull request Sep 20, 2024

[Inductor][CPP] Fix int8 cvt half pytorch/pytorch#136353

Closed

yanbing-j mentioned this pull request Sep 23, 2024

Add test_int8_weight_only_quant_with_freeze back due to PyTorch fix #917

Closed

leslie-fang-intel force-pushed the leslie/fix_woq_int8_failure branch from 828571c to 7d7441a Compare September 23, 2024 03:04

update readme

053a97d

leslie-fang-intel force-pushed the leslie/fix_woq_int8_failure branch from 7d7441a to 053a97d Compare September 23, 2024 06:01

leslie-fang-intel added 3 commits September 25, 2024 18:55

add dynamo reset for freezing case

9fb9d71

re-trigger UT

22b6fc9

re-trigger CI

26e9c44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WOQ int8 failures #884

Fix WOQ int8 failures #884

leslie-fang-intel commented Sep 13, 2024

pytorch-bot bot commented Sep 13, 2024 •

edited

Loading

leslie-fang-intel commented Sep 13, 2024

jerryzh168 Sep 13, 2024

leslie-fang-intel Sep 13, 2024

leslie-fang-intel Sep 13, 2024 •

edited

Loading

jerryzh168 Sep 13, 2024

leslie-fang-intel Sep 14, 2024

jerryzh168 left a comment

jerryzh168 commented Sep 16, 2024

leslie-fang-intel commented Sep 16, 2024 •

edited

Loading

leslie-fang-intel commented Sep 20, 2024

leslie-fang-intel commented Sep 20, 2024

leslie-fang-intel commented Sep 20, 2024 •

edited

Loading

Fix WOQ int8 failures #884

Are you sure you want to change the base?

Fix WOQ int8 failures #884

Conversation

leslie-fang-intel commented Sep 13, 2024

pytorch-bot bot commented Sep 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/884

❌ 1 New Failure

leslie-fang-intel commented Sep 13, 2024

jerryzh168 Sep 13, 2024

Choose a reason for hiding this comment

leslie-fang-intel Sep 13, 2024

Choose a reason for hiding this comment

leslie-fang-intel Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 Sep 13, 2024

Choose a reason for hiding this comment

leslie-fang-intel Sep 14, 2024

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

jerryzh168 commented Sep 16, 2024

leslie-fang-intel commented Sep 16, 2024 • edited Loading

leslie-fang-intel commented Sep 20, 2024

leslie-fang-intel commented Sep 20, 2024

leslie-fang-intel commented Sep 20, 2024 • edited Loading

pytorch-bot bot commented Sep 13, 2024 •

edited

Loading

leslie-fang-intel Sep 13, 2024 •

edited

Loading

leslie-fang-intel commented Sep 16, 2024 •

edited

Loading

leslie-fang-intel commented Sep 20, 2024 •

edited

Loading