-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken loop partitioning due to recent changes. #4243
Conversation
@kimishpatel I understand the point and it is completely valid. I suggest an alternate way, that is how about adding a call to If we use |
@@ -513,17 +513,19 @@ Stmt LoopPartitioner::TryPartition(const Node* node, | |||
bool pre_stmt_recurse = true; | |||
if (middle_interval_i->HasLowerBound()) { | |||
body_begin = ir::Simplify(middle_interval.min()); | |||
Expr cond = (body_begin - min >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimishpatel
we can also think about changing this condition to body_begin - min > 0
instead of using >=
that can also prevent recursing on zero extent loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umangyadav, we can do this one, however we still generate zero extent loop. Just that we dont recursively partition anything generated for zero extent loop.
I still prefer to generate cleaner output that has no such zero extent loops that need fixing afterwards, unless somehow we fix it in the loop partition itself. This can be done as you suggested via RemoveNoOp
however it seems little convoluted way to go about this. I would rather have loop partition generate clean IR with valid loop iterations.
Is there a reason why you would prefer to generate zero extent loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimishpatel There is no specific reason or preference for generating the zero extent loops.
My point is that there is always possibility of having the zero extent loops.
We need to make sure not to recurse on those loops by having some mechanism e.g either guard conditions, RemoveNoOp
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umangyadav, I understand. Do you have specific example where we can still generate zero extent loops despite this PR? Then it will be easier to see what would be a better fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umangyadav I agree we might need a cleanup pass. But if we can avoid generating zero extent loops, it's desirable.
Another thing we can add is a call to
|
@ZihengJiang ping. |
Hi @kimishpatel , the #3734 is for avoiding generating the tail loop with extent one which is mentioned in #3733. Do you think whether we have way to solve the zero-extent loop problem while keeping avoiding the extent one tail loop issue? Also, it would be great if you can post the printing result of your example. |
@ZihengJiang, this PR keeps the fixes for tails loop from #3734. The changes here https://github.com/apache/incubator-tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6R544, for |
@ZihengJiang
Output with this PR:
|
@kimishpatel Could you update the PR to re-trigger the CI? |
factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.
8918b5f
to
8627ad8
Compare
Approved. Thanks for contributing! @kimishpatel |
…pache#4243) factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.
…pache#4243) factors and resulting nested loop is broken. This is due to the fact that we are creating zero extent loops which are fixed afterwards. However unroll pass breaks due to the zero extent loop.
@kimishpatel Conv2d_transpose kernel 2x2, strides (2,2) fails for CUDA - Cannot prove |
Thanks @apivovarov for bringing this to attention. Will take a look. |
factors and resulting nested loop is broken.
This is due to the fact that we are creating zero extent loops which
are fixed afterwards. However unroll pass breaks due to the zero extent
loop.
Specifically this is what happens:
Recent changes in PR, #3734 , broke some of the loop partitioning logic. Particularly take for example:
This fails mainly due to removal of the conditions such as:
https://github.com/dmlc/tvm/pull/3734/files#diff-4d2032b94673aac6728b304262c5b4a6L516
The issue is that, now we can potentially create zero extent post and pre loop bodies. If we have nested loops as a result of loop partitioning then, the zero extent loops can have nested loop which are further partitioned. In this particular example the nested loop is generated with outer loop's extent of 0.
This results in inner loop:
unrolled (i0.inner, 0, min(((0 - (i0.outer.inner*8)) - 6), 8))
. With this unroll pass breaks with the following error:TVMError: Check failed: value >= 0 (-1 vs. 0) : Cannot unroll non-constant loop
However if we factor the loop only once things work fine. So the following example works.
I suggest we revert changes such that we no longer generate post and pre loops of zero extent.
This PR does that.