[TIR] Fix int32 vs int64 mismatch in For construct. #10595

lazycal · 2022-03-14T05:38:50Z

In the VectorizeLoop pass, when scalarizing a vectorized loop into serialized loop (for reasons like it's not vectorizable), the for-loop is not properly constructed to respect the loop variable's dtype. For example, the second loop in the following code

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner: int64, 0i64, 4i64) "vectorized" {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner)] = @tir.if_then_else((3i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

is scalarized into

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner.s: int64, 0, 4) {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner.s)] = @tir.if_then_else((3i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

where ax2.inner.s's start and extent are changed silently to int32. Later it caused SegFault. After this PR, the above code will handle the dtype correctly and output the following

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner.s: int64, 0i64, 4i64) {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner.s)] = @tir.if_then_else((3i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

An onnx model to trigger this issue: model.onnx.zip.

lazycal · 2022-03-14T05:44:39Z

cc @junrushao1994 @masahi

tkonolige · 2022-03-14T23:49:42Z

This seems related to #10571. I wonder if it introduced the error?

lazycal · 2022-03-15T02:03:27Z

~~@tkonolige In an old TVM version several weeks ago, this model still cannot be compiled, though the error message is different (some llvm generated error message).~~

lazycal · 2022-03-15T05:51:42Z

@tkonolige You were right. After reverting your PR, this error disappeared (a green check on unittest: GPU on the CI page now :-)). Indeed after reading the specific test and your PR more carefully, I realize that the test (

tvm/tests/python/unittest/test_target_codegen_vulkan.py

Line 112 in 0b61256

def test_vulkan_stress(target, dev):

) does not even use vectorize, and your PR does affect bind thread/block axis. So it seems more related to your change. Do you have any thought as how to fix this?

masahi · 2022-03-15T05:57:10Z

oh @lazycal can you send a revert PR? We can ask @tkonolige for a follow-up later.

tkonolige · 2022-03-15T17:51:01Z

@lazycal I don't think reverting #10571 is the correct way to go. I think that just uncovered more issues that we have. Specifically we sometime mismatch the dtypes between the var and the extent of a For loop. This PR fixes one such case so we should merge it.

To prevent issues like this in the future, we should add a dtype check to the For constructor.

lazycal · 2022-03-15T19:05:15Z

@masahi @tkonolige Just saw your comments. I agree reverting is not the right way to go, and there might be other issues uncovered by our combined change. I did that just for testing (I don't have VK on my end). I'll revert the revert commit and let you decide whether to merge this or not.

Good point about adding the dtype check to For constructor. I'll try to do that.

lazycal · 2022-04-03T23:55:21Z

The CI seems broken. I got error like urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)> everywhere, and it doesn't seem to be related to this PR. @masahi

masahi · 2022-04-04T00:37:23Z

ooh, I also keep getting the same error, but only from PyTorch tests involving large model download like https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10876/6/pipeline. So I thought it is PyTorch problem (we also updated our PT install to v1.11 last week).

I'll talk to people about this tomorrow. In the mean time, since it is likely a flaky problem, you may try pushing new jobs.

lazycal · 2022-04-05T02:20:40Z

It's finally finished. This PR now aims to fix the dtype mismatch issue brought by constructing TIR's For statement. The fix is by limiting For constructor to only accept loop_var, min and extent of the same type (as pointed out by @tkonolige). This change may help uncover dtype inconsistency issues in other places, but at a cost of stronger "type-checking" (so might slightly make it harder to construct For) and disallowing any special use case requiring different dtypes among loop_var, min and extent, such as float32 as the extent.

Not sure about the impact of this restriction. If not desired, there might be other solutions to this problem, e.g.:

We may adapt the codegen to accept such for loop of different dtypes (at least currently LLVM codegen doesn't seem to be compatible).
We may also add one more pass before codegen that "legalizes" such For loop.

I am not familiar with various codegen pass, and not sure exactly where to insert this additional legalization pass, so I didn't go down those paths, but happy to discusss more.

P.S. I added one exception to the For constructor for int32 in min or extent when it is IntImm but int64 in the loop_var, in which it will automatically promotes the IntImm to int64. With this it would be more compatible to the old code base (and more friendly to TIR programmer) I think, and it also sounds safe to me.

cc: @masahi @tkonolige

masahi · 2022-04-05T06:08:14Z

Not sure about the impact of this restriction

I think this is not a big restriction, it's always better to be precise about data types. I'm ok with additional casting I may need to do.

* Respect dtype in Scalarize. * Add unittest. * Fix lint. * Promote dtype of IntImm to match loop_var in For. * Fix dtype mismatches. * Lint * Lint. * jostle ci * Match dtype in hybrid parser.

This PR fixes a bug uncovered in end-to-end tests, where the dtype of loop variable could has fewer bits than the loop min/extent, which leads to a fatal error introduced by the recent #10595 which enforces more restrictive checks.

* Respect dtype in Scalarize. * Add unittest. * Fix lint. * Promote dtype of IntImm to match loop_var in For. * Fix dtype mismatches. * Lint * Lint. * jostle ci * Match dtype in hybrid parser.

…he#11030) This PR fixes a bug uncovered in end-to-end tests, where the dtype of loop variable could has fewer bits than the loop min/extent, which leads to a fatal error introduced by the recent apache#10595 which enforces more restrictive checks.

lazycal marked this pull request as draft March 15, 2022 03:11

masahi mentioned this pull request Mar 15, 2022

Revert "[TE] Promote substituted variable to iter_var's dtype (#10571)" to fix CI breakage #10610

Closed

masahi mentioned this pull request Mar 25, 2022

[Flaky Test] tests/python/unittest/test_target_codegen_vulkan.py::test_vulkan_stress #10779

Closed

lazycal added 6 commits April 3, 2022 01:14

Respect dtype in Scalarize.

d0ffb66

Add unittest.

a7d9687

Fix lint.

7ac346d

Promote dtype of IntImm to match loop_var in For.

c90026e

Fix dtype mismatches.

081f27a

Lint

1a31d96

lazycal force-pushed the fix-vectorizeloop branch from 7b8c703 to 1a31d96 Compare April 3, 2022 19:51

lazycal changed the title ~~[TIR] Fix Segfault caused by int32~64 mismatch in Scalarize of VectorizeLoop~~ [TIR] Fix int32 vs int64 mismatch in For construct. Apr 3, 2022

Lint.

b769511

lazycal added 2 commits April 4, 2022 01:48

jostle ci

eca3966

Match dtype in hybrid parser.

1cfd9c9

lazycal marked this pull request as ready for review April 5, 2022 01:49

masahi approved these changes Apr 5, 2022

View reviewed changes

masahi merged commit 41cfd3d into apache:main Apr 5, 2022

junrushao mentioned this pull request Apr 17, 2022

[BugFix][TIR] Fix narrower dtype of loop vars in CreatePrimFunc #11030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Fix int32 vs int64 mismatch in For construct. #10595

[TIR] Fix int32 vs int64 mismatch in For construct. #10595

lazycal commented Mar 14, 2022

lazycal commented Mar 14, 2022

tkonolige commented Mar 14, 2022

lazycal commented Mar 15, 2022 •

edited

Loading

lazycal commented Mar 15, 2022 •

edited

Loading

masahi commented Mar 15, 2022

tkonolige commented Mar 15, 2022

lazycal commented Mar 15, 2022

lazycal commented Apr 3, 2022

masahi commented Apr 4, 2022

lazycal commented Apr 5, 2022

masahi commented Apr 5, 2022

[TIR] Fix int32 vs int64 mismatch in For construct. #10595

[TIR] Fix int32 vs int64 mismatch in For construct. #10595

Conversation

lazycal commented Mar 14, 2022

lazycal commented Mar 14, 2022

tkonolige commented Mar 14, 2022

lazycal commented Mar 15, 2022 • edited Loading

lazycal commented Mar 15, 2022 • edited Loading

masahi commented Mar 15, 2022

tkonolige commented Mar 15, 2022

lazycal commented Mar 15, 2022

lazycal commented Apr 3, 2022

masahi commented Apr 4, 2022

lazycal commented Apr 5, 2022

masahi commented Apr 5, 2022

lazycal commented Mar 15, 2022 •

edited

Loading

lazycal commented Mar 15, 2022 •

edited

Loading