[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

hypercubestart · 2021-04-12T17:17:25Z

adds support for int4 in AutoTVM and fixes bugs, done with @ZihengJiang

The current schedule for conv2d hwnc tensorcore is unsearchable by AutoTVM because of the error ValueError: could not broadcast input array from shape (1850) into shape (1748) due to feature length mismatch between different instantiated templates

Narrowing the search space fixes the problem, and we ran a few experiments over different schedule fixes on T4:

Workload (batch_size, in_channels, in_size, out_channels, kernel_size, stride, padding)	HWNC int4 time (#6121)	AS-ko, WS-kw	AS-ki, WS-kw	AS-kh, WS-ko	AS-ko, WS-kh
(8, 64, 56, 64, 3, 1, 1)	0.1723	0.17988	0.19138	0.18075	0.18399
(8, 64, 56, 128, 3, 2, 1)	0.10278	0.10783	0.11104	0.13839	0.10446
(8, 64, 56, 64, 1, 2, 0)	0.0333	0.0187	0.01997	0.01933	0.0183
(8, 128, 28, 128, 3, 1, 1)	0.15088	0.1784	0.2296	0.21108	0.20623
(8, 128, 28, 256, 3, 2, 1)	0.11548	0.11616	0.1305	0.12947	0.15934
(8, 128, 28, 256, 1, 2, 0)	0.04219	0.02374	0.02575	0.02332	0.0223
(8, 256, 14, 256, 3, 1, 1)	0.05695	0.21981	0.24931	0.24194	0.27055
(8, 256, 14, 512, 3, 2, 1)	0.14456	0.14939	0.15589	0.14812	0.20209
(8, 256, 14, 512, 1, 2, 0)	0.0475	0.02659	0.02778	0.02531	0.02641
(8, 512, 7, 512, 3, 1, 1)	0.147156	0.245	0.27005	0.25863	0.25255

the results for int4 HWNC in #6121 are not reproducible in AutoTVM because of the feature length mismatch

cc: @Laurawly @Hzfengsy @anijain2305 @tqchen @masahi

python/tvm/topi/cuda/conv2d_hwnc_tensorcore.py

src/runtime/contrib/random/mt_random_engine.cc

Hzfengsy

The code LTGM. But would you like to show some performance results for int4?

hypercubestart · 2021-04-14T05:55:00Z

The code LTGM. But would you like to show some performance results for int4?

yes, I'm testing some combinations of the removed knobs and will show perf results once the parity reaches the results from #6121

ZihengJiang · 2021-04-30T01:58:32Z

LGTM. Thanks @hypercubestart !

* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code

jlimmm · 2021-09-08T10:24:06Z

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

hypercubestart · 2021-09-08T22:57:30Z

@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you.

hi @jlimmm! Unfortunately I don't have the code anymore but the PR has an example of creating a network consisting of a single int4 conv2d

tvm/tests/python/topi/python/test_topi_conv2d_hwnc_tensorcore.py

Lines 149 to 167 in f8b1df4

    
           def get_mod(): 
        
               x = relay.var("x", relay.TensorType(input_shape, "float32")) 
        
               y = relay.var("y", relay.TensorType(kernel_shape, "float32")) 
        
               f = relay.Function( 
        
                   [x, y], relay.nn.conv2d(x, y, padding=[1, 1, 1, 1], channels=512, kernel_size=[3, 3]) 
        
               ) 
        
               mod = tvm.IRModule() 
        
               mod["main"] = f 
        
               mod = relay.transform.InferType()(mod) 
        
               return mod, {} 
        
           mod, params = get_mod() 
        
           layout_config = relay.transform.LayoutConfig() 
        
           desired_layouts = {"nn.conv2d": ["HWNC", "default"]} 
        
           with layout_config: 
        
               seq = tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)]) 
        
               with tvm.transform.PassContext(opt_level=3): 
        
                   mod = seq(mod) 
        
           mod = relay.transform.recast(mod, "int4", "int32")

using the utilities from #6748, so you could reuse most of the AutoTVM tutorial code, but simply replace the network with the network shown above

AutoTVM will then be able to automatically infer the int4+tensorcore template

liubowen520 · 2021-09-24T12:08:31Z

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

Asnumpy don't support the conversion of negative numbers.
Asnumpy loss a 4 bit data when shape is odd.
Am I right? I have modified these parts locally. Could you review it?

hypercubestart · 2021-09-28T22:48:29Z

Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.

Asnumpy don't support the conversion of negative numbers.

Asnumpy loss a 4 bit data when shape is odd.
Am I right? I have modified these parts locally. Could you review it?

@liubowen520 good points! makes sense to me, feel free to create a PR and cc me and some other people to review

Hzfengsy reviewed Apr 13, 2021

View reviewed changes

python/tvm/topi/cuda/conv2d_hwnc_tensorcore.py Show resolved Hide resolved

src/runtime/contrib/random/mt_random_engine.cc Outdated Show resolved Hide resolved

Hzfengsy reviewed Apr 14, 2021

View reviewed changes

ZihengJiang self-assigned this Apr 14, 2021

hypercubestart force-pushed the tc-fix branch from eec79b8 to 1ec296a Compare April 23, 2021 22:11

hypercubestart added 13 commits April 25, 2021 01:35

initial

a2bfc2f

int4 asnumpy

73f5411

remove

57fd1a0

random test

0cb0bec

format

5227ef7

random

7088ed8

remove unused import

4e1624b

change dist range

f5b73e0

add fuse_pack in

d4d9c97

random engine

ea3b87f

reformat

7bcc4d4

remove import

81b68d6

add cuda context

50ac65b

hypercubestart force-pushed the tc-fix branch from 7d03524 to 50ac65b Compare April 25, 2021 01:41

refactor code

0bef3d8

ZihengJiang approved these changes Apr 30, 2021

View reviewed changes

ZihengJiang merged commit dc1f189 into apache:main May 1, 2021

hypercubestart deleted the tc-fix branch May 2, 2021 00:21

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

hypercubestart commented Apr 12, 2021 •

edited

Loading

Hzfengsy left a comment

hypercubestart commented Apr 14, 2021 •

edited

Loading

ZihengJiang commented Apr 30, 2021

jlimmm commented Sep 8, 2021 •

edited

Loading

hypercubestart commented Sep 8, 2021 •

edited

Loading

liubowen520 commented Sep 24, 2021

hypercubestart commented Sep 28, 2021

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831

Conversation

hypercubestart commented Apr 12, 2021 • edited Loading

Hzfengsy left a comment

Choose a reason for hiding this comment

hypercubestart commented Apr 14, 2021 • edited Loading

ZihengJiang commented Apr 30, 2021

jlimmm commented Sep 8, 2021 • edited Loading

hypercubestart commented Sep 8, 2021 • edited Loading

liubowen520 commented Sep 24, 2021

hypercubestart commented Sep 28, 2021

hypercubestart commented Apr 12, 2021 •

edited

Loading

hypercubestart commented Apr 14, 2021 •

edited

Loading

jlimmm commented Sep 8, 2021 •

edited

Loading

hypercubestart commented Sep 8, 2021 •

edited

Loading