Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

Merged
merged 14 commits into from
May 1, 2021

Conversation

hypercubestart
Copy link
Contributor

@hypercubestart hypercubestart commented Apr 12, 2021

adds support for int4 in AutoTVM and fixes bugs, done with @ZihengJiang

The current schedule for conv2d hwnc tensorcore is unsearchable by AutoTVM because of the error ValueError: could not broadcast input array from shape (1850) into shape (1748) due to feature length mismatch between different instantiated templates

Narrowing the search space fixes the problem, and we ran a few experiments over different schedule fixes on T4:

Workload (batch_size, in_channels, in_size, out_channels, kernel_size, stride, padding) HWNC int4 time (#6121) AS-ko, WS-kw AS-ki, WS-kw AS-kh, WS-ko AS-ko, WS-kh
(8, 64, 56, 64, 3, 1, 1) 0.1723 0.17988 0.19138 0.18075 0.18399
(8, 64, 56, 128, 3, 2, 1) 0.10278 0.10783 0.11104 0.13839 0.10446
(8, 64, 56, 64, 1, 2, 0) 0.0333 0.0187 0.01997 0.01933 0.0183
(8, 128, 28, 128, 3, 1, 1) 0.15088 0.1784 0.2296 0.21108 0.20623
(8, 128, 28, 256, 3, 2, 1) 0.11548 0.11616 0.1305 0.12947 0.15934
(8, 128, 28, 256, 1, 2, 0) 0.04219 0.02374 0.02575 0.02332 0.0223
(8, 256, 14, 256, 3, 1, 1) 0.05695 0.21981 0.24931 0.24194 0.27055
(8, 256, 14, 512, 3, 2, 1) 0.14456 0.14939 0.15589 0.14812 0.20209
(8, 256, 14, 512, 1, 2, 0) 0.0475 0.02659 0.02778 0.02531 0.02641
(8, 512, 7, 512, 3, 1, 1) 0.147156 0.245 0.27005 0.25863 0.25255

the results for int4 HWNC in #6121 are not reproducible in AutoTVM because of the feature length mismatch

cc: @Laurawly @Hzfengsy @anijain2305 @tqchen @masahi

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LTGM. But would you like to show some performance results for int4?

@hypercubestart
Copy link
Contributor Author

hypercubestart commented Apr 14, 2021

The code LTGM. But would you like to show some performance results for int4?

yes, I'm testing some combinations of the removed knobs and will show perf results once the parity reaches the results from #6121

@ZihengJiang
Copy link
Contributor

LGTM. Thanks @hypercubestart !

@ZihengJiang ZihengJiang merged commit dc1f189 into apache:main May 1, 2021
@hypercubestart hypercubestart deleted the tc-fix branch May 2, 2021 00:21
umangyadav pushed a commit to umangyadav/tvm that referenced this pull request May 5, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* initial

* int4 asnumpy

* remove

* random test

* format

* random

* remove unused import

* change dist range

* add fuse_pack in

* random engine

* reformat

* remove import

* add cuda context

* refactor code
@jlimmm
Copy link

jlimmm commented Sep 8, 2021

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

@hypercubestart
Copy link
Contributor Author

hypercubestart commented Sep 8, 2021

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

hi @jlimmm! Unfortunately I don't have the code anymore but the PR has an example of creating a network consisting of a single int4 conv2d

def get_mod():
x = relay.var("x", relay.TensorType(input_shape, "float32"))
y = relay.var("y", relay.TensorType(kernel_shape, "float32"))
f = relay.Function(
[x, y], relay.nn.conv2d(x, y, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3])
)
mod = tvm.IRModule()
mod["main"] = f
mod = relay.transform.InferType()(mod)
return mod, {}
mod, params = get_mod()
layout_config = relay.transform.LayoutConfig()
desired_layouts = {"nn.conv2d": ["HWNC", "default"]}
with layout_config:
seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
with tvm.transform.PassContext(opt_level=3):
mod = seq(mod)
mod = relay.transform.recast(mod, "int4", "int32")
using the utilities from #6748, so you could reuse most of the AutoTVM tutorial code, but simply replace the network with the network shown above

AutoTVM will then be able to automatically infer the int4+tensorcore template

@liubowen520
Copy link

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

  1. Asnumpy don't support the conversion of negative numbers.
  2. Asnumpy loss a 4 bit data when shape is odd.
    Am I right? I have modified these parts locally. Could you review it?

@hypercubestart
Copy link
Contributor Author

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

  1. Asnumpy don't support the conversion of negative numbers.
  2. Asnumpy loss a 4 bit data when shape is odd.
    Am I right? I have modified these parts locally. Could you review it?

@liubowen520 good points! makes sense to me, feel free to create a PR and cc me and some other people to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants