Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Fix the order of applying AutoInline in ScheduleUsingAnchorTrace #13329

Merged
merged 6 commits into from
Nov 9, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Nov 9, 2022

Note: the diff is bloated due to the test case.

In anchor-block tuning, we need to manually apply AutoInline to some blocks (those that are not part of the anchor subgraph). Currently the order of blocks to apply AutoInline is undefined, but I've hit a case where this is problematic.

For example, given these four blocks,

        for i0_7, i1_7, i2_7, i3_7 in T.grid(16, 56, 56, 256):
            with T.block("compute_2"):
                i0_8, i1_8, i2_8, i3_8 = T.axis.remap("SSSS", [i0_7, i1_7, i2_7, i3_7])
                T.reads(T_subtract_1[i0_8, i1_8, i2_8, i3_8])
                T.writes(compute_3[i0_8, i1_8, i2_8, i3_8])
                compute_3[i0_8, i1_8, i2_8, i3_8] = T.q_multiply_shift(T_subtract_1[i0_8, i1_8, i2_8, i3_8], 1457846997, 31, 0, dtype="int32")
        for i0_9, i1_9, i2_9, i3_9 in T.grid(16, 56, 56, 256):
            with T.block("compute_3"):
                i0_10, i1_10, i2_10, i3_10 = T.axis.remap("SSSS", [i0_9, i1_9, i2_9, i3_9])
                T.reads(p9[i0_10, i1_10, i2_10, i3_10])
                T.writes(compute_4[i0_10, i1_10, i2_10, i3_10])
                compute_4[i0_10, i1_10, i2_10, i3_10] = T.q_multiply_shift(p9[i0_10, i1_10, i2_10, i3_10], 2101000910, 31, 0, dtype="int32")
        for i0_11, i1_11, i2_11, i3_11 in T.grid(16, 56, 56, 256):
            with T.block("T_add_2"):
                ax0, ax1, ax2, ax3 = T.axis.remap("SSSS", [i0_11, i1_11, i2_11, i3_11])
                T.reads(compute_3[ax0, ax1, ax2, ax3], compute_4[ax0, ax1, ax2, ax3])
                T.writes(T_add_2[ax0, ax1, ax2, ax3])
                T_add_2[ax0, ax1, ax2, ax3] = compute_3[ax0, ax1, ax2, ax3] + compute_4[ax0, ax1, ax2, ax3]
        for i0_12, i1_12, i2_12, i3_12 in T.grid(16, 56, 56, 256):
            with T.block("compute_4"):
                i0_13, i1_13, i2_13, i3_13 = T.axis.remap("SSSS", [i0_12, i1_12, i2_12, i3_12])
                T.reads(T_add_2[i0_13, i1_13, i2_13, i3_13])
                T.writes(compute[i0_13, i1_13, i2_13, i3_13])
                compute[i0_13, i1_13, i2_13, i3_13] = T.max(T.min(T_add_2[i0_13, i1_13, i2_13, i3_13], 255), 0)

, we want to AutoInline "compute_3", "T_add_2" and "compute_4". If the order is "T_add_2" -> "compute_3" -> "compute_4", all three blocks can be inlined / reverse inlined to "compute_2". However, if the order is "T_add_2" -> "compute_4" -> "compute_3" , "compute_4" can neither be inlined or reverse inlined. This in turn can result in a buggy schedule to be generated (see the description in the test case).

We can avoid this problem by always AutoInlining the last block after all other blocks have been processed. This ensures that the last block can be reverse inlined.

@vinx13 @junrushao @zxybazh

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 9, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

<< "If a spatial block cannot be inlined, it should be the output block";
if (CanReverseComputeInline(sch->state(), block_sref)) {
sch->ReverseComputeInline(block);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another fix, just relaxing the wrong assumption at L144.

# "conv2d_nhwc_reindex_shared" has the predicate
# T.where(((ax1_0 * 4 + ax1_1) * 32 + ax1_2) * 2 + ax1_3 < 64) due to anchor-block scheduling
# (see Conv2dInt8_with_predicate_scheduled). Currently, if we try to reverse-inline a block to
# its producer that has a predicate, the predicate disappears after reverse inlining.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @vinx13 to confirm if applying reverse_compute_inline when the producer has a predicate should be disallowed. Currently it is allowed, and the predicate disappears. A minimum repro in https://gist.github.com/masahi/01a80b86062122ad57b9b1fd785fb960

@junrushao
Copy link
Member

Will leave this PR to @vinx13 :-)

@vinx13 vinx13 merged commit 8453c9c into apache:main Nov 9, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 10, 2022
…ngAnchorTrace` (apache#13329)

* index on concat-fusion-fix: 3ffe5b1 fix te extern create_prim_func test

* Apply AutoInline to the last block after all other blocks are processed

* Do not require CanReverseComputeInline to be true when
CanComputeInline is false

* add comment

* add test

* cpplint
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…ngAnchorTrace` (apache#13329)

* index on concat-fusion-fix: 3ffe5b1 fix te extern create_prim_func test

* Apply AutoInline to the last block after all other blocks are processed

* Do not require CanReverseComputeInline to be true when
CanComputeInline is false

* add comment

* add test

* cpplint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants