Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arith] DeduceBound produces relaxed bounds because of rounding up #3974

Closed
umangyadav opened this issue Sep 19, 2019 · 15 comments
Closed

[Arith] DeduceBound produces relaxed bounds because of rounding up #3974

umangyadav opened this issue Sep 19, 2019 · 15 comments
Assignees

Comments

@umangyadav
Copy link
Contributor

umangyadav commented Sep 19, 2019

Consider the following example for the DeduceBound

where we want to find bounds for expression b < 1 - 2*a given variables' range, a : [0, 1] and b: [0, 1].

import tvm
def test_deduce_bug():
    a = tvm.var('a')
    b = tvm.var('b')
    a_s = tvm.arith.IntervalSet(0, 1)
    b_s = tvm.arith.IntervalSet(0, 1)
    res1 = tvm.arith.DeduceBound(a, b < 1 - 2*a, {a: a_s, b: b_s}, {b: b_s})
    print(res1)

if __name__ == "__main__":
    test_deduce_bug()

Result from running above exmpale is IntervalSet: [neg_inf, 0]

These bounds from DeduceBound are not correct. E.g. consider a = 0 then expression becomes b < 1 and that is not always true given b: [0, 1].


The error comes from following rounding up stated in the bound_deducer.cc:

https://github.com/dmlc/tvm/blob/b0ddcff6748bddc69881a2ff4216a830407421c9/src/arithmetic/bound_deducer.cc#L175

Is there a good reason behind keeping this rounding up as it is ?


It is mentioned that, user should respect the relaxed bounds.

As far as I know, only place deduce bound is used is in the LoopPartition and relaxed bound gives wrong results.

Take the following example for the LoopPartition:

import tvm

def test_loop_bug():
    ib = tvm.ir_builder.create()
    m = tvm.var("m")
    n = tvm.var("n")
    data = ib.pointer("float32", name="data")
    out = ib.pointer("float32", name="out")
    with ib.for_range(0, 2, "i") as i:
        with ib.for_range(0, 2, "j") as j:
            with ib.if_scope(ib.likely(j < 1 - 2 * i)):
                out[i] = data[i]
            with ib.else_scope():
                out[i] = tvm.const(0, "float32")

    stmt = ib.get()
    print("===================")
    print(stmt)
    stmt = tvm.ir_pass.LoopPartition(stmt, True)
    print("LoopPartition Output: ==============")
    print(stmt)
    stmt = tvm.ir_pass.Simplify(stmt)
    stmt = tvm.ir_pass.RemoveNoOp(stmt)
    print("====================")
    print(stmt)

if __name__ == "__main__":
    test_loop_bug()

###### INPUT ##########
for (i, 0, 2) {
  for (j, 0, 2) {
    if (likely((j < (1 - (2*i))))) {
      out[i] = data[i]
    } else {
      out[i] = 0f
    }
  }
}

Output from LoopPartition after the Simplify and RemoveNoOp is as follows:

for (j, 0, 2) {
  out[0] = data[0]
}
for (j, 0, 2) {
  out[1] = 0f
}

Above output is not correct. With the correct deduce bound the result should be following:

out[0] = data[0]
out[0] = 0.000000f
for (j, 0, 2) {
  out[1] = 0.000000f
}
@umangyadav
Copy link
Contributor Author

umangyadav commented Sep 19, 2019

@tqchen Please let me know your thoughts, this can be fixed with a check for the sign.

@xqdan @derisavi @kaitingwang @jdavies-huawei

@tqchen
Copy link
Member

tqchen commented Sep 19, 2019

it is hard to force deduce bound to produce exact bound for every case, so ideally we should fix loop partition to respect this assumption.

@tqchen
Copy link
Member

tqchen commented Sep 19, 2019

In the previous logic of loop partition, we always keep the same body of the loop, but transforms the loop to two segments.

for (i, 0, 4) {
    if (i < 3) body
} 

In the loop above, imagine if deduce bound returns [0, 2], which is a relaxed set, then the loop will be transformed to

for (i, 0, 2) {
    if (i < 3) body
} 
for (i, 2, 4) {
    if (i < 3) body
} 

Regardless of what value deduce bound returns, such transformation was hold to be always valid. A tight bound will help us to eliminate the conditions.

It could be possible that recent changes to LoopPartition no longer tries to hold that assumption, it would be great if we can look into it and fix the problem. Independently, we could work to make sure DeduceBound produce tighter bound, but we don't want to make the assumption that it is a strict one. Perhaps we should also change the API name to remind users of that

@umangyadav
Copy link
Contributor Author

@tqchen
The current loop partition aims to remove conditions based on the bounds returned from the DeduceBound.

https://github.com/dmlc/tvm/blob/1d00c083cbd77b8523d4952d624f1fab5de13056/src/pass/loop_partition.cc#L568

Hence, it is implicitly making the assumption that bounds are strict.

E.g in the example you posted above:

for (i, 0, 4) {
    if (i < 3) body
} 

Based on the bounds for the i < 3 it makes three loops pre_stmt, mid_stmt and post_stmt.

mid_stmt is the loop where it is assuming (or has proven) that conditions are definitely either true or false.

so, it directly replaces that condition with either bool (1) or (0).

output from first iteration would look like:

for (i, 0, 0) { // pre_stmt i.e. dead loop removed later by the `RemoveNoOp` pass
    if (i < 3) body
} 
for (i, 0, 2) { // mid_stmt here it requires/assumes  that bounds are tights and then replaces the condition
    if ((uint1(1)) body
} 
for (i, 2, 4) { //post_stmt 
    if (i < 3) body 
} 

pre_stmt and post_stmt are further recursed in order to remove the conditions.

I think it is better to tighten the bounds from the deduce_bound itself. deduce_bound is already taking into account rounding up error for the GE op here:

https://github.com/dmlc/tvm/blob/1d00c083cbd77b8523d4952d624f1fab5de13056/src/arithmetic/bound_deducer.cc#L160

However, it doesn't do so for the LE op.

So far from my experience and From the LoopPartition perspective (the one and only user of the deduce_bound I guess), I haven't seen an example where deduce bound succeeded and produced relaxed bounds except for the particular case of the rounding up for the LE.

I think it is more or less safe to assume that bounds are always tight except the case mentioned in this issue.

@xqdan also share your thoughts.

@tqchen
Copy link
Member

tqchen commented Sep 19, 2019

I see, I think we should avoid directly eliminating the conditions but rather keep them. Then the follow-up simplify pass would decide if it is safe to eliminate the conditions.

In the meanwhile, we could improve deduce bound to produce tighter bound, but it is good to not making the assumption when using it. So we won't put pressure on deduce bound. Especially in the cases where we might relax the some of the bounds.

@tqchen
Copy link
Member

tqchen commented Sep 24, 2019

ping @umangyadav would be great if you can followup on this and help to update the loop partition logic.

@ZihengJiang ZihengJiang self-assigned this Sep 24, 2019
@umangyadav
Copy link
Contributor Author

umangyadav commented Sep 25, 2019

@ZihengJiang @tqchen

To summerize,

Problem: Currently for the LE op, deduce bound can produce relaxed bounds because of truncdiv, which then could lead to incorrect results for the loop partition.

We discussed two possible solutions through the discussion.

  1. Fixing the implicit tight bound assumption in the loop partition.
  2. Fixing the deduce_bound to produce tighter bound fo the `LE op.

Instead of updating the loop partition, I am more leaning towards tightening deduce_bound for the LE op as in this issue.

One main reason, I can think of is that, in some cases, Simplifier may not able to remove the conditions. Hence, assuming (and a safe assumption) tight bounds, when loop partition knows if the conditions are either definitely true or false, it is better to go and replace those conditions instead of relying later on the simplifier.

Also, the current loop partition logic is recursive. Hence, if the conditions are still floating around then, it is possible that loop partition keeps partitioning based on those conditions and go into an infinite loop.

With the introduction of floordiv and floormod, I find it better to modify deduce_bound.

So far, I don't see any case where deduce_bound would produce relaxed bounds (given the LE case is fixed).

I can prepare a PR for the deduce_bound for the LE op. But it can take some time.

@tqchen
Copy link
Member

tqchen commented Sep 25, 2019

In this particular case, I still recommend to keep the original bounds, and relies on the simplifier(which provide an additional verification).

This will allows us to safe-guard the bound deduce, especially for cases that it may produce in-exact bound, not due to the deduction, but due to some iter variables being relaxed that already covers more than what it expects. Note that the integer set analysis system we have right now can produce relaxed bounds, which enlarges its capability for covering set of operations, but also means that additional utils need to work together with the relaxed result.

If there is a case when the simplifier cannot simplify the additional condition, we should look into it and fix the simplifier.

@xqdan
Copy link
Contributor

xqdan commented Sep 26, 2019

@umangyadav @tqchen @ZihengJiang
I agree "it is hard to force deduce bound to produce exact bound for every case", but if we've caught some specific cases, we should reslove them at the very beginning, unless we may bring some danger factors.

For this case, fixing the deduce_bound is better IMO, but we can take all actions, since they don't conflict with each other in TVM complier system. deduce_bound should do its best as much as possible, other passes should tolerate bad cases at the same time.

@ZihengJiang
Copy link
Contributor

ZihengJiang commented Sep 26, 2019

We discussed two possible solutions through the discussion.

  • Fixing the implicit tight bound assumption in the loop partition.
  • Fixing the deduce_bound to produce tighter bound fo the LE op.

Hey @umangyadav and @xqdan , actually @tqchen and me believe that these two things are both necessary to do instead of just choosing one of them to fix the problem.
Could you raise a PR to fix the LE condition? If you are interested you can also try to relax the assumption for deduce_bound in loop_partition, or I will do it. @umangyadav

@umangyadav
Copy link
Contributor Author

@ZihengJiang @tqchen I agree with not removing conditions directly in the loop partition now.

I've sent a PR for the deduce_bound.

I'll try to come up with something for the loop partition.

@tqchen
Copy link
Member

tqchen commented Oct 7, 2019

@umangyadav please also comment on whether we have addressed the issue of adding the guard conditions back(besides improving the bound deducer).

@ZihengJiang
Copy link
Contributor

Hi @umangyadav , are you interested in relaxing the loop partition assumption task ?

@umangyadav
Copy link
Contributor Author

umangyadav commented Oct 15, 2019

@tqchen We haven't addressed the issue of adding the guard conditions back yet.

@ZihengJiang

There is one more thing with the loop partition that can be improved. That is right now If the HalideIR has more than one conditions and if the intersection of their deduced bounds are empty then, loop partition will skip the partition altogether.

e.g. something like the following:

For (i, 0, 16)
  if(likely(i == 14))
    if(likely (i == 15))

The issue is stemming from the following line:

https://github.com/dmlc/tvm/blob/11a3a777ec78ad81ce12a940d07521870f546ad1/src/pass/loop_partition.cc#L405

I was thinking about the solution to the task of the removing assumption and it reminded me of this issue as well.

The solution to task1 (removing assumption) could simply be using simplifier instead of directly eliminating the conditions.

The solution to the empty intersection issue (task2) could be that we can consider only one condition at a time for the partition instead of a collection of the conditions. Now Let's say we follow this solution to task2. In that case, we need to somehow tag condition that has already been considered for the partition to avoid considering the same condition repeatedly when simplifier is not able to simplify that.

I wanted to address both the issues at the same time, since solving one without another would potentially require design change each time.

I am busy with some other work at the time moment. So, @ZihengJiang if you want to take up the task, please go ahead.

@tqchen
Copy link
Member

tqchen commented Nov 27, 2019

Seems that using floordiv already solve one of the problem, so i will close this thread, let us follow up with new threads for new problems.
As far as I know, we still need to fix loopPartition to add the guard conditions

@tqchen tqchen closed this as completed Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants