-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tensorize][TOPI] Add AMX Tensorizing for int8 batch matmul #13745
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
@tvm-bot rerun |
python/tvm/topi/x86/batch_matmul.py
Outdated
layout_trans = op.input_tensors[1] | ||
batch_matmul_vnni_schedule(cfg, s, op.output(0), outs[0], layout_trans) | ||
print(layout_trans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib = relay.build(mod, target=target) | ||
|
||
asm = lib.lib.get_source("asm") | ||
assert "vpdpbusd" in asm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a VNNI intrin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi just for local test and forget to change it back, now already fixed
…3745) * amx int8 tensorized x86 bmm * remove the unused amx schedule * fix lint * fix lint * remove unused import * fix Instr. assert in testcase.
This PR is to tensorize X86 int8 batch matmul based on last AMX PR: #13642
unified batch matmul VNNI and AMX to
batch_matmul_int8
, then chose different SIMD(VNNI or AMX) of the target to finish tensorizing.(basicly the same way to x86 dense_int8)