Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload #13334

Merged
merged 4 commits into from
Nov 11, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Nov 9, 2022

  1. A workload from quantized models often have a trivial block which only produces a constant scalar:
with T.block("compile_engine_const"):
    vi = T.axis.spatial(1, 0)
    T.reads()
    T.writes(compile_engine_const[()])
    compile_engine_const[()] = 59

This can be inlined by existing AutoInline rule, but depending on the order where spatial blocks are processed by AutoInline, these "compile_engine_const" blocks can get in the way of ReverseComputeInline on other blocks, since the constant blocks are also counted as a producer block. PostOrderApply currently processes the constant blocks at the very end, so ReverseComputeInline on blocks that consumes such constants always fails to inline. So in practice, we are not generating a fused kernel for quantized conv2d today.

I added a simple inlining rule that inlines only such constant blocks. This rule is supposed to run before AutoInline, to unblock ReverseComputeInline. This lets us generate a fused kernel. On the int8 resnet50 model from PyTorch, the e2e perf improved from 6.8 to 5.2 msec, using batch size 16, and the same number of trials.

  1. Currently, VerifyGPUCode only checks vector width used in BufferLoad and BufferStore. But quantized models uses specialized intrinsics like q_multiply_shift_per_axis below, which uses 64 bit arithmetic internally.
T_cast[v0 // 49, v0 % 49 // 7, v0 % 7, v1] = T.cast(T.max(T.min(T.q_multiply_shift_per_axis(conv2d_nhwc_reindex_shared[v0, v1] - p2[0, 0, 0, v1] + p3[0, 0, 0, v1], p4[v1], p5[v1], p6[v1], 31, False, True, dtype="int32"), 255), 0), "uint8")

To accurately account for data types used in a block, we need to lower those intrinsics before invoking TIR VerifyGPUCode and check the dtype of CastNode.

@vinx13 @junrushao @zxybazh

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 9, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@junrushao
Copy link
Member

Hey thanks for the contribution!

I was a bit uncertain if we really want to do name checking to determine constants from the compile engine, because it relies on the assumption that relay exists and relay always use compile_engine_const as the constant it introduces, which could be fragile in some certain cases.

There is an alternative I could come up with, and please let me know if it makes sense:

Add a schedule_rule attribute here (

"compile_engine_const", topi::kBroadcast);
), which will guide TIR to generate the annotation below:

T.block_attr({"schedule_rule": "compute_inline"})

Then register a PackedFunc meta_schedule.generic.compute_inline to apply compute–inline as part of the custom schedule rule.

Let me know if it makes sense!

@masahi
Copy link
Member Author

masahi commented Nov 9, 2022

@junrushao I like your idea, I'll rework this.

@masahi masahi closed this Nov 9, 2022
@masahi
Copy link
Member Author

masahi commented Nov 9, 2022

@junrushao I realized that an easier way would be to check the content of the block to determine if it is a constant block, rather than relying on the block name.

@masahi masahi reopened this Nov 9, 2022
@masahi
Copy link
Member Author

masahi commented Nov 10, 2022

Removed the identification of constant blocks by name, and replaced it with more robust method based on the block structure.

@masahi
Copy link
Member Author

masahi commented Nov 10, 2022

cc @vinx13 @junrushao please take a look.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@masahi masahi merged commit 93fdf83 into apache:main Nov 11, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…el workload (apache#13334)

* [MetaSchedule] Add a new schedule rule to inline all scalar constants

* add doc

* reorg

* identify constant block by its structure, not by name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants