[MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload #13334

masahi · 2022-11-09T20:32:23Z

A workload from quantized models often have a trivial block which only produces a constant scalar:

with T.block("compile_engine_const"):
    vi = T.axis.spatial(1, 0)
    T.reads()
    T.writes(compile_engine_const[()])
    compile_engine_const[()] = 59

This can be inlined by existing AutoInline rule, but depending on the order where spatial blocks are processed by AutoInline, these "compile_engine_const" blocks can get in the way of ReverseComputeInline on other blocks, since the constant blocks are also counted as a producer block. PostOrderApply currently processes the constant blocks at the very end, so ReverseComputeInline on blocks that consumes such constants always fails to inline. So in practice, we are not generating a fused kernel for quantized conv2d today.

I added a simple inlining rule that inlines only such constant blocks. This rule is supposed to run before AutoInline, to unblock ReverseComputeInline. This lets us generate a fused kernel. On the int8 resnet50 model from PyTorch, the e2e perf improved from 6.8 to 5.2 msec, using batch size 16, and the same number of trials.

Currently, VerifyGPUCode only checks vector width used in BufferLoad and BufferStore. But quantized models uses specialized intrinsics like q_multiply_shift_per_axis below, which uses 64 bit arithmetic internally.

T_cast[v0 // 49, v0 % 49 // 7, v0 % 7, v1] = T.cast(T.max(T.min(T.q_multiply_shift_per_axis(conv2d_nhwc_reindex_shared[v0, v1] - p2[0, 0, 0, v1] + p3[0, 0, 0, v1], p4[v1], p5[v1], p6[v1], 31, False, True, dtype="int32"), 255), 0), "uint8")

To accurately account for data types used in a block, we need to lower those intrinsics before invoking TIR VerifyGPUCode and check the dtype of CastNode.

@vinx13 @junrushao @zxybazh

tvm-bot · 2022-11-09T20:32:26Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @elvin-n, @junrushao _{See #10317 for details}
Built docs for commit f398453 can be found here.

_{Generated by tvm-bot}

junrushao · 2022-11-09T21:05:05Z

Hey thanks for the contribution!

I was a bit uncertain if we really want to do name checking to determine constants from the compile engine, because it relies on the assumption that relay exists and relay always use compile_engine_const as the constant it introduces, which could be fragile in some certain cases.

There is an alternative I could come up with, and please let me know if it makes sense:

Add a schedule_rule attribute here (

tvm/src/relay/backend/te_compiler_cache.cc

Line 275 in fbe174b

"compile_engine_const", topi::kBroadcast);

), which will guide TIR to generate the annotation below:

T.block_attr({"schedule_rule": "compute_inline"})

Then register a PackedFunc meta_schedule.generic.compute_inline to apply compute–inline as part of the custom schedule rule.

Let me know if it makes sense!

masahi · 2022-11-09T21:13:46Z

@junrushao I like your idea, I'll rework this.

masahi · 2022-11-09T23:36:19Z

@junrushao I realized that an easier way would be to check the content of the block to determine if it is a constant block, rather than relying on the block name.

masahi · 2022-11-10T06:49:38Z

Removed the identification of constant blocks by name, and replaced it with more robust method based on the block structure.

masahi · 2022-11-10T21:24:11Z

cc @vinx13 @junrushao please take a look.

junrushao

LGTM!

…el workload (apache#13334) * [MetaSchedule] Add a new schedule rule to inline all scalar constants * add doc * reorg * identify constant block by its structure, not by name

masahi force-pushed the ms-inline-constant-scalar branch from 6e62f2a to 337c1c1 Compare November 9, 2022 20:43

masahi closed this Nov 9, 2022

masahi reopened this Nov 9, 2022

masahi added 4 commits November 10, 2022 15:04

[MetaSchedule] Add a new schedule rule to inline all scalar constants

0e3cb15

add doc

11f1f96

reorg

4b83416

identify constant block by its structure, not by name

f398453

masahi force-pushed the ms-inline-constant-scalar branch from 337c1c1 to f398453 Compare November 10, 2022 06:47

junrushao approved these changes Nov 10, 2022

View reviewed changes

zxybazh approved these changes Nov 10, 2022

View reviewed changes

masahi merged commit 93fdf83 into apache:main Nov 11, 2022

leandron mentioned this pull request Feb 1, 2023

TVM v0.11.0 Release Candidate Notes #13899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload #13334

[MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload #13334

masahi commented Nov 9, 2022 •

edited

Loading

tvm-bot commented Nov 9, 2022 •

edited

Loading

junrushao commented Nov 9, 2022

masahi commented Nov 9, 2022

masahi commented Nov 9, 2022

masahi commented Nov 10, 2022

masahi commented Nov 10, 2022

junrushao left a comment

[MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload #13334

[MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload #13334

Conversation

masahi commented Nov 9, 2022 • edited Loading

tvm-bot commented Nov 9, 2022 • edited Loading

junrushao commented Nov 9, 2022

masahi commented Nov 9, 2022

masahi commented Nov 9, 2022

masahi commented Nov 10, 2022

masahi commented Nov 10, 2022

junrushao left a comment

Choose a reason for hiding this comment

[MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload #13334

[MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload #13334

masahi commented Nov 9, 2022 •

edited

Loading

tvm-bot commented Nov 9, 2022 •

edited

Loading