-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] VNNI support for int8 dense #10230
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
masahi
requested review from
Laurawly,
Huyuwei,
kevinthesun,
jwfromm,
vinx13,
yzhliu,
mbrookhart and
ZihengJiang
as code owners
February 14, 2022 00:19
masahi
requested review from
jcf94,
jroesch,
slyubomirsky,
icemelon,
MarisaKirisame,
zhiics,
anijain2305,
wweic,
junrushao,
comaniac,
tqchen,
areusch and
merrymercy
as code owners
February 14, 2022 00:19
elvin-n
approved these changes
Feb 14, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Another perf result, this time on a desktop CPU TVM showing excellent performance!
|
junrushao
approved these changes
Feb 14, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is just amazing!!
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Feb 16, 2022
* wip * revert for now * simplify blocking * add bench script * update type rel * refactor tests * end to end compilation working * paralleize outer loop * add shape check * fused schedule first cut * restore original test * black * add vnni check * add relay test * skip on ci * check dtype * lint * make it tunable * minor cleanup
pfk-beta
pushed a commit
to pfk-beta/tvm
that referenced
this pull request
Apr 11, 2022
* wip * revert for now * simplify blocking * add bench script * update type rel * refactor tests * end to end compilation working * paralleize outer loop * add shape check * fused schedule first cut * restore original test * black * add vnni check * add relay test * skip on ci * check dtype * lint * make it tunable * minor cleanup
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I started off with the test code in
tvm/tests/python/contrib/test_gemm_acc32_vnni.py
Line 30 in 720e7b1
Moreover, since I rely on alter op layout to enable this op (see
tvm/python/tvm/topi/x86/dense_alter_op.py
Lines 40 to 49 in 35f6bb1
tvm/python/tvm/autotvm/task/relay_integration.py
Lines 52 to 54 in 187aeb5
AlterOpLayout
before extracting tasks. I refuse to add an ugly code path to workaround this strange issue like existing code cc @tkonolige.cc @vinx13 @junrushao1994 @mbrookhart @tkonolige @elvin-n
Current perf results (also see more results in #10230 (comment))
Compare against FBGEMM using their bench exe https://github.com/pytorch/FBGEMM/blob/main/bench/GEMMsBenchmark.cc
The CPU is
tigerlake i7-1195G7 @ 2.90GHz
, all numbers are giga ops per sec (GOPS).I didn't spend much on perf tuning, but the results look promising. Perf on bigger workloads don't look great, might need further investigation.
Also, I found that autotvm tuning (only one knob) helped on some single threaded perf, but it didn't on multi-threaded perf at all.
Single thread
4 threads