[Relay][Pass] Fix bug in re-processing call node in MergeComposite pass #4879

soiferj · 2020-02-14T00:39:47Z

This fixes a bug where call nodes are recursively processed more than once, potentially resulting in a composite function containing duplicate nodes. This change introduces a call_map, similar to the var_map, to keep track of call nodes that we've processed.

I found this bug while writing a pattern for a single-layer transformer.

@mbarrett97 @comaniac would you be able to take a look?

comaniac

The fix looks good to me. Could you add a simple unit test to cover this change?

soiferj · 2020-02-14T00:50:00Z

Sure, I'll work on adding a unit test.

mbaret · 2020-02-14T16:32:44Z

Good catch :) Fix looks to be correct, looking forward to the test case.

soiferj · 2020-02-14T18:27:56Z

@mbarrett97 @comaniac I just pushed a test where "result" creates an incorrect graph, and "expected" is correct. Even though these two graphs are different, and "result" generated an incorrect function, the two functions generated are computationally equivalent. This means that the test actually passes alpha_equal both with and without the bug.

It is still worth fixing this bug, since this problem blows up when matching large patterns, but I am not sure how to actually fail the test with output like this. Do you have any suggestions?

Here are the Relay outputs. The pattern I am trying to match is add -> add -> add.

Result (incorrect output that generated function of add -> add -> add -> add):

v0.0.4
fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %0 = subtract(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %4 = fn (%x: Tensor[(10, 10), float32], %y: Tensor[(10, 10), float32], Primitive=1, Composite="add_add_add") -> Tensor[(10, 10), float32] {
    %1 = add(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    %2 = add(%x, %1) /* ty=Tensor[(10, 10), float32] */;
    %3 = add(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    add(%2, %3) /* ty=Tensor[(10, 10), float32] */
  };
  %4(%0, %b) /* ty=Tensor[(10, 10), float32] */
}

Expected (correct output):

v0.0.4
fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %0 = subtract(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %3 = fn (%in_1: Tensor[(10, 10), float32], %in_2: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
    %1 = add(%in_1, %in_2) /* ty=Tensor[(10, 10), float32] */;
    %2 = add(%in_1, %1) /* ty=Tensor[(10, 10), float32] */;
    add(%2, %1) /* ty=Tensor[(10, 10), float32] */
  };
  %3(%0, %b) /* ty=Tensor[(10, 10), float32] */
}

zhiics · 2020-02-14T19:07:02Z

@soiferj Sorry. I don't quite understand the problem. Do you mean that these two expressions can pass alpha_equal check?

soiferj · 2020-02-14T19:10:23Z

Yes, they pass the alpha_equal check.

zhiics · 2020-02-14T19:12:22Z

@soiferj hmm, this looks a bit weird to me, I will take a look at it. Thanks.

mbaret · 2020-02-14T19:14:29Z

You could try graph_equal which claims to check for data flow equivalency, although I am a bit surprised alpha_equal doesn't catch this. Failing that, maybe a static traversal is another option.

soiferj · 2020-02-14T20:56:35Z

graph_equal also succeeds on the buggy graph :( @zhiics, let me know when you have any findings!

mbaret · 2020-02-14T21:04:49Z

Unless I'm missing something, those graphs don't appear to me to be 'data flow equivalent', so this may be a bug with graph_equal.

soiferj · 2020-02-14T21:05:59Z

I'll do some investigation :)

zhiics · 2020-02-14T21:12:43Z

I might not have time today. I can spend some time on it over the weekend.

zhiics · 2020-02-14T22:36:11Z

hmm, I just tried this:

import numpy as np
import tvm
from tvm import relay
from tvm.relay import analysis
from tvm.relay.testing import run_opt_pass

def test():
    tt = relay.TensorType([10, 10], "float32")
    a = relay.Var("a", tt)
    b = relay.Var("b", tt)
    sub = relay.subtract(a, b)

    x = relay.Var("x", tt)
    y = relay.Var("y", tt)

    add1 = x + y
    add2 = x + add1
    add3 = x + y
    add4 = add2 + add3

    fn = relay.Function([x, y], add4)
    fn = fn.set_attribute("Primitive", tvm.tir.IntImm("int32", 1))
    fn = fn.set_attribute("Composite", tvm.tir.StringImm("add_add_add"))
    fn_call = relay.Call(fn, [sub, b])

    func = relay.Function([a, b], fn_call)
    func = run_opt_pass(func, relay.transform.InferType())
    print(func)

    tt0 = relay.TensorType([10, 10], "float32")
    a0 = relay.Var("a0", tt0)
    b0 = relay.Var("b0", tt0)
    sub0 = relay.subtract(a0, b0)

    x0 = relay.Var("x0", tt0)
    y0 = relay.Var("y0", tt0)

    add01 = x0 + y0
    add02 = x0 + add01
    add03 = add02 + add01

    fn0 = relay.Function([x0, y0], add03)
    fn_call0 = relay.Call(fn0, [sub0, b0])
    func0 = relay.Function([a0, b0], fn_call0)
    func0 = run_opt_pass(func0, relay.transform.InferType())

    print(func0)
    assert analysis.alpha_equal(func, func0)

It could not pass alpha_equal. Are we missing something here? Can you double check if the program I provided are identical yours here?

soiferj · 2020-02-14T22:50:25Z

That's really strange - it looks right. Are you able to pull my branch and give the test a try?

zhiics · 2020-02-14T22:53:23Z

I can give it a try over the weekend, but why do you feed the expected with an unexpected expression?

soiferj · 2020-02-14T23:05:42Z

Sorry, what exactly do you mean?

zhiics · 2020-02-14T23:14:20Z

I though you provided

v0.0.4
fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %0 = subtract(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %3 = fn (%in_1: Tensor[(10, 10), float32], %in_2: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
    %1 = add(%in_1, %in_2) /* ty=Tensor[(10, 10), float32] */;
    %2 = add(%in_1, %1) /* ty=Tensor[(10, 10), float32] */;
    add(%2, %1) /* ty=Tensor[(10, 10), float32] */
  };
  %3(%0, %b) /* ty=Tensor[(10, 10), float32] */
}

as expected, instead of,

v0.0.4
fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %0 = subtract(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %4 = fn (%x: Tensor[(10, 10), float32], %y: Tensor[(10, 10), float32], Primitive=1, Composite="add_add_add") -> Tensor[(10, 10), float32] {
    %1 = add(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    %2 = add(%x, %1) /* ty=Tensor[(10, 10), float32] */;
    %3 = add(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    add(%2, %3) /* ty=Tensor[(10, 10), float32] */
  };
  %4(%0, %b) /* ty=Tensor[(10, 10), float32] */
}

If so, why did you provide that one? The mentioned bug (if it is) is actually a separate issue that doesn't block this PR. So we are good for this PR, right?

soiferj · 2020-02-14T23:15:30Z

I think so. If everyone else is okay, I think it's best to check this fix in.

mbaret · 2020-02-14T23:21:02Z

So is the issue that the test case currently passes both pre and post fix? If so, I'd say it probably does block the PR. We need to understand whether alpha_equal is an acceptable way to test equality here and if it's not, use an alternative method.

soiferj · 2020-02-14T23:45:06Z

Actually, are you sure the second argument is expected? It looks like AlphaEqual loops through the LHS args. This has some weird implications, when I run the other tests and flip the order of result and expected in alpha_equal, they fail.

Maybe this is because the "expected" doesn't have the composite and primitive attributes?

Update: that's the part that's returning false. I think "expected" should be arg one, and we need to set the attributes on the function.

soiferj · 2020-02-15T00:02:41Z

If I flip the arguments to alpha_equals and properly add the attributes to the "expected" function, the tests work as expected. With everyone's okay, can I go ahead with this change?

zhiics · 2020-02-15T00:06:27Z

I am okay with it as the failure should be separate issue from alpha_equal. But I would suggest we create minimal example to reproduce the bug and open an issue for it so that ppl can conveniently look into it.

mbaret · 2020-02-15T00:09:59Z

So long as the test now correctly fails for the current behaviour, I'm happy. The ordering mattering is a concern for all the other tests though which have 'expected' as the second argument.

soiferj · 2020-02-15T00:10:39Z

This is actually a good exercise, as some other tests are actually failing now due to the graphs not being the same. For example, test_branch_merge. @mbarrett97, I'll push the changes, let me know what you think.

Expected:

fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32], %c: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %2 = fn (%in_1: Tensor[(10, 10), float32], %in_2: Tensor[(10, 10), float32], Composite="add_sub_mul", Primitive=1) -> Tensor[(10, 10), float32] {
    %0 = add(%in_1, %in_2) /* ty=Tensor[(10, 10), float32] */;
    %1 = subtract(%in_1, %in_2) /* ty=Tensor[(10, 10), float32] */;
    multiply(%0, %1) /* ty=Tensor[(10, 10), float32] */
  };
  %3 = %2(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %4 = %2(%c, %3) /* ty=Tensor[(10, 10), float32] */;
  nn.relu(%4) /* ty=Tensor[(10, 10), float32] */
}

Result:

fn (%a: Tensor[(10, 10), float32], %b: Tensor[(10, 10), float32], %c: Tensor[(10, 10), float32]) -> Tensor[(10, 10), float32] {
  %2 = fn (%x: Tensor[(10, 10), float32], %y: Tensor[(10, 10), float32], Primitive=1, Composite="add_sub_mul") -> Tensor[(10, 10), float32] {
    %0 = add(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    %1 = subtract(%x, %y) /* ty=Tensor[(10, 10), float32] */;
    multiply(%0, %1) /* ty=Tensor[(10, 10), float32] */
  };
  %3 = %2(%a, %b) /* ty=Tensor[(10, 10), float32] */;
  %6 = fn (%x1: Tensor[(10, 10), float32], %y1: Tensor[(10, 10), float32], Primitive=1, Composite="add_sub_mul") -> Tensor[(10, 10), float32] {
    %4 = add(%x1, %y1) /* ty=Tensor[(10, 10), float32] */;
    %5 = subtract(%x1, %y1) /* ty=Tensor[(10, 10), float32] */;
    multiply(%4, %5) /* ty=Tensor[(10, 10), float32] */
  };
  %7 = %6(%c, %3) /* ty=Tensor[(10, 10), float32] */;
  nn.relu(%7) /* ty=Tensor[(10, 10), float32] */
}

+1 that we should confirm which order is expected for alpha_equals.

soiferj · 2020-02-15T00:11:46Z

Thanks everyone for working through this with me!

zhiics · 2020-02-15T00:12:45Z

@soiferj Can you also try to create a minimal counterexample? Thanks.

soiferj · 2020-02-15T00:13:21Z

Definitely, I'll do that after fixing this.

soiferj · 2020-02-15T00:19:26Z

This test (test_branch_merge) is now failing at the de-duplicate pass. It's interesting since the "correct" (result) graph has the exact same function duplicated twice. Whereas the "expected" graph is just re-using the same function. What should be the correct behavior?

Update: I am still having some trouble with this test. I will look next week. I am wondering if we should check this fix in and update the tests wholesale in another PR?

I was able to fix this test. Will push temporary changes so I can work from another machine this weekend.

soiferj · 2020-02-15T01:16:43Z

Alright, sorry for all of the spam. All tests are now fixed. cc @zhiics @mbarrett97

zhiics · 2020-02-15T01:52:09Z

Thanks for the effort. I will take a look later. So, do we still need to flip the args? If so, we still need to create a repo and open an issue, right?

soiferj · 2020-02-15T01:54:32Z

It seems like we need to flip the args. I’ll open the issue on Monday.

soiferj · 2020-02-17T19:07:28Z

@zhiics I merged your changes and updated the branch. Would you mind taking another look?

zhiics

LGTM

zhiics · 2020-02-17T19:11:01Z

@mbaret PTAL. Let's land this if it looks good to you as well.

mbaret · 2020-02-17T20:14:10Z

Looks good.

zhiics · 2020-02-17T20:18:37Z

Thanks @soiferj @mbaret @cbalint13

…ss (apache#4879) * Fix bug in re-processing call node * Add test * Add to main * temp changes to work from another machine * fix rest of tests * fix test_reuse_call_merge * fix merge Co-authored-by: Jon Soifer <jonso@microsoft.com>

Fix bug in re-processing call node

5aaa30a

comaniac requested changes Feb 14, 2020

View reviewed changes

tqchen added status: need test case need test cases to cover the change status: need update need update based on feedbacks labels Feb 14, 2020

tqchen assigned zhiics Feb 14, 2020

Add test

8cd0300

Add to main

42535b5

Merge branch 'master' into call_map

de2c9ed

jonso4 added 2 commits February 14, 2020 17:00

temp changes to work from another machine

eb5e508

fix rest of tests

313f6a6

soiferj added 4 commits February 15, 2020 21:51

Merge branch 'master' into call_map

0246b00

Merge branch 'master' into call_map

14eb6ac

fix test_reuse_call_merge

10163f1

fix merge

14c8fd8

zhiics approved these changes Feb 17, 2020

View reviewed changes

mbaret approved these changes Feb 17, 2020

View reviewed changes

zhiics merged commit 27a0284 into apache:master Feb 17, 2020

zhiics added status: accepted and removed status: need test case need test cases to cover the change status: need update need update based on feedbacks labels Feb 17, 2020

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

[Relay][Pass] Fix bug in re-processing call node in MergeComposite pass #4879

[Relay][Pass] Fix bug in re-processing call node in MergeComposite pass #4879

Conversation

soiferj commented Feb 14, 2020 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

soiferj commented Feb 14, 2020

mbaret commented Feb 14, 2020

soiferj commented Feb 14, 2020 • edited Loading

zhiics commented Feb 14, 2020

soiferj commented Feb 14, 2020 • edited Loading

zhiics commented Feb 14, 2020 • edited Loading

mbaret commented Feb 14, 2020

soiferj commented Feb 14, 2020

mbaret commented Feb 14, 2020

soiferj commented Feb 14, 2020 • edited Loading

zhiics commented Feb 14, 2020 • edited Loading

zhiics commented Feb 14, 2020 • edited Loading

soiferj commented Feb 14, 2020

zhiics commented Feb 14, 2020

soiferj commented Feb 14, 2020

zhiics commented Feb 14, 2020 • edited Loading

soiferj commented Feb 14, 2020

mbaret commented Feb 14, 2020

soiferj commented Feb 14, 2020 • edited Loading

soiferj commented Feb 15, 2020

zhiics commented Feb 15, 2020

mbaret commented Feb 15, 2020

soiferj commented Feb 15, 2020 • edited Loading

soiferj commented Feb 15, 2020

zhiics commented Feb 15, 2020 • edited Loading

soiferj commented Feb 15, 2020

soiferj commented Feb 15, 2020 • edited Loading

soiferj commented Feb 15, 2020

zhiics commented Feb 15, 2020 • edited Loading

soiferj commented Feb 15, 2020

soiferj commented Feb 17, 2020

zhiics left a comment

Choose a reason for hiding this comment

zhiics commented Feb 17, 2020

mbaret commented Feb 17, 2020

zhiics commented Feb 17, 2020

soiferj commented Feb 14, 2020 •

edited

Loading

soiferj commented Feb 14, 2020 •

edited

Loading

soiferj commented Feb 14, 2020 •

edited

Loading

zhiics commented Feb 14, 2020 •

edited

Loading

soiferj commented Feb 14, 2020 •

edited

Loading

zhiics commented Feb 14, 2020 •

edited

Loading

zhiics commented Feb 14, 2020 •

edited

Loading

zhiics commented Feb 14, 2020 •

edited

Loading

soiferj commented Feb 14, 2020 •

edited

Loading

soiferj commented Feb 15, 2020 •

edited

Loading

zhiics commented Feb 15, 2020 •

edited

Loading

soiferj commented Feb 15, 2020 •

edited

Loading

zhiics commented Feb 15, 2020 •

edited

Loading