Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Fix int32 vs int64 mismatch in For construct. #10595

Merged
merged 9 commits into from
Apr 5, 2022

Conversation

lazycal
Copy link
Contributor

@lazycal lazycal commented Mar 14, 2022

In the VectorizeLoop pass, when scalarizing a vectorized loop into serialized loop (for reasons like it's not vectorizable), the for-loop is not properly constructed to respect the loop variable's dtype. For example, the second loop in the following code

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner: int64, 0i64, 4i64) "vectorized" {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner)] = @tir.if_then_else((3i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner), placeholder_1[(((ax2.inner*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

is scalarized into

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner.s: int64, 0, 4) {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner.s)] = @tir.if_then_else((3i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

where ax2.inner.s's start and extent are changed silently to int32. Later it caused SegFault. After this PR, the above code will handle the dtype correctly and output the following

  for (ax0.ax1.fused: int32, 0, 169) "parallel" {
    for (ax2.inner.s: int64, 0i64, 4i64) {
      T_transpose[((cast(int64, ax0.ax1.fused)*4i64) + ax2.inner.s)] = @tir.if_then_else((3i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 39i64)], @tir.if_then_else((2i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 26i64)], @tir.if_then_else((1i64 <= ax2.inner.s), placeholder_1[(((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused)) - 13i64)], placeholder[((ax2.inner.s*13i64) + cast(int64, ax0.ax1.fused))], dtype=int64), dtype=int64), dtype=int64)
    }
  }

An onnx model to trigger this issue: model.onnx.zip.

@lazycal
Copy link
Contributor Author

lazycal commented Mar 14, 2022

cc @junrushao1994 @masahi

@tkonolige
Copy link
Contributor

This seems related to #10571. I wonder if it introduced the error?

@lazycal
Copy link
Contributor Author

lazycal commented Mar 15, 2022

@tkonolige In an old TVM version several weeks ago, this model still cannot be compiled, though the error message is different (some llvm generated error message).

@lazycal lazycal marked this pull request as draft March 15, 2022 03:11
@lazycal
Copy link
Contributor Author

lazycal commented Mar 15, 2022

@tkonolige You were right. After reverting your PR, this error disappeared (a green check on unittest: GPU on the CI page now :-)). Indeed after reading the specific test and your PR more carefully, I realize that the test (

def test_vulkan_stress(target, dev):
) does not even use vectorize, and your PR does affect bind thread/block axis. So it seems more related to your change. Do you have any thought as how to fix this?

@masahi
Copy link
Member

masahi commented Mar 15, 2022

oh @lazycal can you send a revert PR? We can ask @tkonolige for a follow-up later.

@tkonolige
Copy link
Contributor

@lazycal I don't think reverting #10571 is the correct way to go. I think that just uncovered more issues that we have. Specifically we sometime mismatch the dtypes between the var and the extent of a For loop. This PR fixes one such case so we should merge it.

To prevent issues like this in the future, we should add a dtype check to the For constructor.

@lazycal
Copy link
Contributor Author

lazycal commented Mar 15, 2022

@masahi @tkonolige Just saw your comments. I agree reverting is not the right way to go, and there might be other issues uncovered by our combined change. I did that just for testing (I don't have VK on my end). I'll revert the revert commit and let you decide whether to merge this or not.

Good point about adding the dtype check to For constructor. I'll try to do that.

@lazycal lazycal changed the title [TIR] Fix Segfault caused by int32~64 mismatch in Scalarize of VectorizeLoop [TIR] Fix int32 vs int64 mismatch in For construct. Apr 3, 2022
@lazycal
Copy link
Contributor Author

lazycal commented Apr 3, 2022

The CI seems broken. I got error like urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)> everywhere, and it doesn't seem to be related to this PR. @masahi

@masahi
Copy link
Member

masahi commented Apr 4, 2022

ooh, I also keep getting the same error, but only from PyTorch tests involving large model download like https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10876/6/pipeline. So I thought it is PyTorch problem (we also updated our PT install to v1.11 last week).

I'll talk to people about this tomorrow. In the mean time, since it is likely a flaky problem, you may try pushing new jobs.

@lazycal lazycal marked this pull request as ready for review April 5, 2022 01:49
@lazycal
Copy link
Contributor Author

lazycal commented Apr 5, 2022

It's finally finished. This PR now aims to fix the dtype mismatch issue brought by constructing TIR's For statement. The fix is by limiting For constructor to only accept loop_var, min and extent of the same type (as pointed out by @tkonolige). This change may help uncover dtype inconsistency issues in other places, but at a cost of stronger "type-checking" (so might slightly make it harder to construct For) and disallowing any special use case requiring different dtypes among loop_var, min and extent, such as float32 as the extent.

Not sure about the impact of this restriction. If not desired, there might be other solutions to this problem, e.g.:

  • We may adapt the codegen to accept such for loop of different dtypes (at least currently LLVM codegen doesn't seem to be compatible).
  • We may also add one more pass before codegen that "legalizes" such For loop.

I am not familiar with various codegen pass, and not sure exactly where to insert this additional legalization pass, so I didn't go down those paths, but happy to discusss more.

P.S. I added one exception to the For constructor for int32 in min or extent when it is IntImm but int64 in the loop_var, in which it will automatically promotes the IntImm to int64. With this it would be more compatible to the old code base (and more friendly to TIR programmer) I think, and it also sounds safe to me.

cc: @masahi @tkonolige

@masahi
Copy link
Member

masahi commented Apr 5, 2022

Not sure about the impact of this restriction

I think this is not a big restriction, it's always better to be precise about data types. I'm ok with additional casting I may need to do.

@masahi masahi merged commit 41cfd3d into apache:main Apr 5, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* Respect dtype in Scalarize.

* Add unittest.

* Fix lint.

* Promote dtype of IntImm to match loop_var in For.

* Fix dtype mismatches.

* Lint

* Lint.

* jostle ci

* Match dtype in hybrid parser.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 11, 2022
* Respect dtype in Scalarize.

* Add unittest.

* Fix lint.

* Promote dtype of IntImm to match loop_var in For.

* Fix dtype mismatches.

* Lint

* Lint.

* jostle ci

* Match dtype in hybrid parser.
junrushao added a commit that referenced this pull request Apr 17, 2022
This PR fixes a bug uncovered in end-to-end tests, where the dtype of loop variable could has fewer bits than the loop min/extent, which leads to a fatal error introduced by the recent #10595 which enforces more restrictive checks.
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
* Respect dtype in Scalarize.

* Add unittest.

* Fix lint.

* Promote dtype of IntImm to match loop_var in For.

* Fix dtype mismatches.

* Lint

* Lint.

* jostle ci

* Match dtype in hybrid parser.
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
…he#11030)

This PR fixes a bug uncovered in end-to-end tests, where the dtype of loop variable could has fewer bits than the loop min/extent, which leads to a fatal error introduced by the recent apache#10595 which enforces more restrictive checks.
altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022
…he#11030)

This PR fixes a bug uncovered in end-to-end tests, where the dtype of loop variable could has fewer bits than the loop min/extent, which leads to a fatal error introduced by the recent apache#10595 which enforces more restrictive checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants