Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken loop partitioning due to recent changes. #4243

Merged
merged 1 commit into from
Nov 15, 2019

Conversation

kimishpatel
Copy link
Contributor

factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.

Specifically this is what happens:
Recent changes in PR, #3734 , broke some of the loop partitioning logic. Particularly take for example:

  def test_multilevel_splitting_with_indivisble_factors():
      import topi
      A = tvm.placeholder((130,), dtype="float32")
      B = topi.nn.relu(A)
      s = tvm.create_schedule(B.op)
      (y,) = s[B].op.axis
      (yo, yi) = s[B].split(y, factor=8)
      (yoo, yoi) = s[B].split(yo, factor=16)
      s[B].reorder(yoo, yoi, yi)
      #s[B].reorder(yo, yi)
      s[B].unroll(yi)
      with tvm.build_config(partition_const_loop=True):
          lowered_body = tvm.lower(s, [A, B]).body
          print("-----------lowerd body----------")
          print(lowered_body)
          print("-----------lowerd body----------")
          def visit_stmt(op):
              return(isinstance(op, tvm.expr.Max))
          num_max = collect_visit(lowered_body, visit_stmt)
          assert num_max.count(True) == 10

This fails mainly due to removal of the conditions such as:
https://github.com/dmlc/tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6L516
The issue is that, now we can potentially create zero extent post and pre loop bodies. If we have nested loops as a result of loop partitioning then, the zero extent loops can have nested loop which are further partitioned. In this particular example the nested loop is generated with outer loop's extent of 0.
This results in inner loop: unrolled (i0.inner, 0, min(((0 - (i0.outer.inner*8)) - 6), 8)). With this unroll pass breaks with the following error:
TVMError: Check failed: value >= 0 (-1 vs. 0) : Cannot unroll non-constant loop

However if we factor the loop only once things work fine. So the following example works.

  def test_multilevel_splitting_with_indivisble_factors():
      import topi
      A = tvm.placeholder((130,), dtype="float32")
      B = topi.nn.relu(A)
      s = tvm.create_schedule(B.op)
      (y,) = s[B].op.axis
      (yo, yi) = s[B].split(y, factor=8)
      #(yoo, yoi) = s[B].split(yo, factor=16)
      #s[B].reorder(yoo, yoi, yi)
      s[B].reorder(yo, yi)
      s[B].unroll(yi)
      with tvm.build_config(partition_const_loop=True):
          lowered_body = tvm.lower(s, [A, B]).body
          print("-----------lowerd body----------")
          print(lowered_body)
          print("-----------lowerd body----------")
          def visit_stmt(op):
              return(isinstance(op, tvm.expr.Max))
          num_max = collect_visit(lowered_body, visit_stmt)
          assert num_max.count(True) == 10

I suggest we revert changes such that we no longer generate post and pre loops of zero extent.
This PR does that.

@kimishpatel
Copy link
Contributor Author

CC: @umangyadav @ZihengJiang @tqchen.

@umangyadav
Copy link
Contributor

umangyadav commented Nov 1, 2019

@kimishpatel I understand the point and it is completely valid.

I suggest an alternate way, that is how about adding a call to RemoveNoOp pass in the VisitAndMutate here? That is before recursing, we clean up the zero loops?
Or other way maybe we can do both, keeping the guard conditions and having RemoveNoOp?

https://github.com/dmlc/tvm/blob/a3ca1a4da071f3aadffeafa6d22b43d5addde191/src/pass/loop_partition.cc#L315

If we use CanProve(body_begin == min) and if it fails, then we failed to "prove" the equality and we have not eliminated the possibility of zero extent loops. So, there is still the possibility of recursing on zero extent loops.

@@ -513,17 +513,19 @@ Stmt LoopPartitioner::TryPartition(const Node* node,
bool pre_stmt_recurse = true;
if (middle_interval_i->HasLowerBound()) {
body_begin = ir::Simplify(middle_interval.min());
Expr cond = (body_begin - min >= 0);
Copy link
Contributor

@umangyadav umangyadav Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel
we can also think about changing this condition to body_begin - min > 0 instead of using >= that can also prevent recursing on zero extent loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangyadav, we can do this one, however we still generate zero extent loop. Just that we dont recursively partition anything generated for zero extent loop.
I still prefer to generate cleaner output that has no such zero extent loops that need fixing afterwards, unless somehow we fix it in the loop partition itself. This can be done as you suggested via RemoveNoOp however it seems little convoluted way to go about this. I would rather have loop partition generate clean IR with valid loop iterations.
Is there a reason why you would prefer to generate zero extent loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel There is no specific reason or preference for generating the zero extent loops.

My point is that there is always possibility of having the zero extent loops.

We need to make sure not to recurse on those loops by having some mechanism e.g either guard conditions, RemoveNoOp etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangyadav, I understand. Do you have specific example where we can still generate zero extent loops despite this PR? Then it will be easier to see what would be a better fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@umangyadav I agree we might need a cleanup pass. But if we can avoid generating zero extent loops, it's desirable.

@umangyadav
Copy link
Contributor

umangyadav commented Nov 1, 2019

Another thing we can add is a call to Simplify with range info in the VisitAndMutate to before RemoveNoOp to make the job of later one easier.
e.g

Stmt VisitAndMutate(Stmt& stmt) {
 stmt = Simplify(stmt);
 stmt = RemoveNoOp(stmt);
 selector.visit(stmt);
return Mutate(stmt);
}

@tqchen tqchen assigned ZihengJiang and unassigned ZihengJiang Nov 1, 2019
@kimishpatel
Copy link
Contributor Author

@ZihengJiang ping.

@ZihengJiang
Copy link
Contributor

Hi @kimishpatel , the #3734 is for avoiding generating the tail loop with extent one which is mentioned in #3733. Do you think whether we have way to solve the zero-extent loop problem while keeping avoiding the extent one tail loop issue? Also, it would be great if you can post the printing result of your example.

@kimishpatel
Copy link
Contributor Author

@ZihengJiang, this PR keeps the fixes for tails loop from #3734. The changes here https://github.com/apache/incubator-tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6R544, for post_doubt_begin is still preserved. The only thing changes is about generating zero extent loop or not.
I will post the output of the script I mentioned. Without the fix of this PR it fails in the unroll_loop pass where the unroll extent is determined to be -1. I will post the output with the fix.

@kimishpatel
Copy link
Contributor Author

@ZihengJiang
Output without fix:

F1107 22:50:11.396204 3352620 unroll_loop.cc:85] Check failed: value >= 0 (-1 vs. 0) Cannot unroll non-constant loop
*** Check failure stack trace: ***

Output with this PR:

produce compute {
  for (i0.outer.inner, 0, 16) {
    compute[(i0.outer.inner*8)] = max(placeholder[(i0.outer.inner*8)], 0f)
    compute[((i0.outer.inner*8) + 1)] = max(placeholder[((i0.outer.inner*8) + 1)], 0f)
    compute[((i0.outer.inner*8) + 2)] = max(placeholder[((i0.outer.inner*8) + 2)], 0f)
    compute[((i0.outer.inner*8) + 3)] = max(placeholder[((i0.outer.inner*8) + 3)], 0f)
    compute[((i0.outer.inner*8) + 4)] = max(placeholder[((i0.outer.inner*8) + 4)], 0f)
    compute[((i0.outer.inner*8) + 5)] = max(placeholder[((i0.outer.inner*8) + 5)], 0f)
    compute[((i0.outer.inner*8) + 6)] = max(placeholder[((i0.outer.inner*8) + 6)], 0f)
    compute[((i0.outer.inner*8) + 7)] = max(placeholder[((i0.outer.inner*8) + 7)], 0f)
  }
  compute[128] = max(placeholder[128], 0f)
  compute[129] = max(placeholder[129], 0f)
}

@ZihengJiang
Copy link
Contributor

@kimishpatel Could you update the PR to re-trigger the CI?

factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.
@ZihengJiang
Copy link
Contributor

Approved. Thanks for contributing! @kimishpatel

@ZihengJiang ZihengJiang merged commit d58b733 into apache:master Nov 15, 2019
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Nov 26, 2019
…pache#4243)

factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.
yongwww pushed a commit to neo-ai/tvm that referenced this pull request Nov 26, 2019
…pache#4243)

factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.
@apivovarov
Copy link
Contributor

apivovarov commented Dec 4, 2019

@kimishpatel Conv2d_transpose kernel 2x2, strides (2,2) fails for CUDA - Cannot prove
https://discuss.tvm.ai/t/conv2d-transpose-kernel-2x2-strides-2-2-fails-for-cuda-cannot-prove/5020/1

@kimishpatel
Copy link
Contributor Author

Thanks @apivovarov for bringing this to attention. Will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants