-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoTVM] [TOPI] Support AutoTVM for int4 tensorcore #7831
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LTGM. But would you like to show some performance results for int4?
yes, I'm testing some combinations of the removed knobs and will show perf results once the parity reaches the results from #6121 |
LGTM. Thanks @hypercubestart ! |
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
* initial * int4 asnumpy * remove * random test * format * random * remove unused import * change dist range * add fuse_pack in * random engine * reformat * remove import * add cuda context * refactor code
@hypercubestart Hello, I'd like to reproduce your table on T4+int4, but it seems that the current AutoTVM tutorial code (link) cannot utilize the int4+tensorcore template. It would be appreciated if you could share a sample code. Thank you. |
hi @jlimmm! Unfortunately I don't have the code anymore but the PR has an example of creating a network consisting of a single int4 conv2d tvm/tests/python/topi/python/test_topi_conv2d_hwnc_tensorcore.py Lines 149 to 167 in f8b1df4
AutoTVM will then be able to automatically infer the int4+tensorcore template |
Hi @hypercubestart , Great work! I'm doing with 4bit in TVM now. But I found there are two points can be improved in asnumpy.
|
@liubowen520 good points! makes sense to me, feel free to create a PR and cc me and some other people to review |
adds support for int4 in AutoTVM and fixes bugs, done with @ZihengJiang
The current schedule for conv2d hwnc tensorcore is unsearchable by AutoTVM because of the error
ValueError: could not broadcast input array from shape (1850) into shape (1748)
due to feature length mismatch between different instantiated templatesNarrowing the search space fixes the problem, and we ran a few experiments over different schedule fixes on T4:
the results for int4 HWNC in #6121 are not reproducible in AutoTVM because of the feature length mismatch
cc: @Laurawly @Hzfengsy @anijain2305 @tqchen @masahi