[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

jwfromm · 2020-10-23T23:10:08Z

This collection of new utility functions allows enables a starting floating point model to be converted to a datatype and format that can be run using the efficient HWNC tensorcore schedules introduced in #6121. Although these schedules are the fastest available in TVM, they have a few very specific requirements that make it difficult to apply generally to models. Specifically, compatible operators must have inputs set to int4 or int8, all compatible layers must be in the HWNC layout, and incompatible layers should be left in their original layout and datatype. There are currently not tools to make such changes to an existing model. To address this, I've written the following utilities:

count_layers: A pass that determines the number of layers of the specified operator in a graph. Although generally useful, for tensorcores we use this to enable the skip_layers feature.

recast: A pass that changes the input and output datatype of all specified operators in a graph, with the option to skip a set of layers. Although this pass is only useful for benchmarking as it does not apply any intelligent quantization, this type of utility is a common topic on the Discuss forums and can serve as a good example for users interested in similar functionality.

LayoutConfig: An optional scope that can be applied around the ConvertLayout pass. In this PR I use it to enable skipping the conversion of specified conv2d layers, but it could be extended for other customization down the line.

HWNC support for ConvertLayout.

The combination of these utilities allows us to target HWNC tensorcores using a workflow such as this:

mod, params = relay.testing.resnet.get_workload()
layout_config = relay.transform.LayoutConfig(skip_layers=[0])
desired_layouts = {'nn.conv2d: ['HWNC', 'default']}
with layout_config:
    seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
    with tvm.transform.PassContext(opt_level=3):
        mod = seq(mod)
mod = recast(mod, 'int4', 'int32', skip_layers=[0])

When autotuned, the resulting mod will qualify for using the HWNC tensorcore strategy.

jwfromm · 2020-10-23T23:10:43Z

@Laurawly @masahi @csullivan @jroesch Can you guys take a look at this PR?

jroesch

LGTM

csullivan

Thanks for the layercount and recastmutator passes @jwfromm! They are quite useful additions to have.

csullivan · 2020-10-26T17:26:33Z

python/tvm/relay/transform/transform.py

+
+    def __init__(self, skip_layers=None):
+        self.skip_counter = 0
+        self.skip_layers = skip_layers if skip_layers is not None else []


When have you found it useful to skip a specific layer of a given operator type / how do you envision it being used? Mainly for debugging and performance tests?

In this case, the first layer of most networks does not have a sufficient number of channels for our tensorcore schedules to be applied. Although this would in theory not be a problem, there aren't HWNC schedules for GPU. So if you blindly apply ConvertLayout to all layers, you end up with a first layer that cant be executed. Skipping it during conversion is an elegant way to avoid this issue. I imagine a similar pathology could apply to other situations.

…on tensorcores. (apache#6748)

jwfromm added 6 commits October 23, 2020 22:42

HWNC layout conversion support and better tensorcore strategy checking.

ad0ac5f

Add first draft at recast pass.

7e71d4b

Layer count pass now working and tested.

b68f515

Recast pass now working as expected.

92da625

Recast tests added.

f8d2493

Formatting applied.

079d8f4

tqchen added the status: need review label Oct 23, 2020

jwfromm added 2 commits October 23, 2020 23:55

Style fixes.

b8710b7

Another style fix.

4f8e477

jroesch approved these changes Oct 24, 2020

View reviewed changes

masahi approved these changes Oct 25, 2020

View reviewed changes

jwfromm added 2 commits October 25, 2020 11:33

Merge branch 'main' into hwnc_tensorcore

bb523f0

Remove extra newline.

0055631

csullivan approved these changes Oct 26, 2020

View reviewed changes

tqchen merged commit 8d56164 into apache:main Oct 26, 2020

tqchen added status: accepted and removed status: need review labels Oct 26, 2020

zhiics pushed a commit to zhiics/tvm that referenced this pull request Oct 28, 2020

[Relay] A set of utilities that allows a model to be run efficiently …

6b090c4

…on tensorcores. (apache#6748)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020

[Relay] A set of utilities that allows a model to be run efficiently …

4220307

…on tensorcores. (apache#6748)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020

[Relay] A set of utilities that allows a model to be run efficiently …

75a9f1c

…on tensorcores. (apache#6748)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020

[Relay] A set of utilities that allows a model to be run efficiently …

b845dd3

…on tensorcores. (apache#6748)

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020

[Relay] A set of utilities that allows a model to be run efficiently …

4ca89e7

…on tensorcores. (apache#6748)

hypercubestart mentioned this pull request Sep 8, 2021

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

Merged

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

jwfromm deleted the hwnc_tensorcore branch April 12, 2023 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

jwfromm commented Oct 23, 2020

jwfromm commented Oct 23, 2020

jroesch left a comment

csullivan left a comment

csullivan Oct 26, 2020

jwfromm Oct 26, 2020

[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

[Relay] A set of utilities that allows a model to be run efficiently on tensorcores. #6748

Conversation

jwfromm commented Oct 23, 2020

jwfromm commented Oct 23, 2020

jroesch left a comment

Choose a reason for hiding this comment

csullivan left a comment

Choose a reason for hiding this comment

csullivan Oct 26, 2020

Choose a reason for hiding this comment

jwfromm Oct 26, 2020

Choose a reason for hiding this comment