-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is AMX supported for LLM inference? #517
Comments
if you use data type bfloat16 or int8, pytorch and ipex will use AMX. |
Thanks for your response. If it's possible, could you please point out which C++ kernel code implements GEMM on AMX? |
Please note that you will need the 4-th generation xeon or beyond to take advantage of AMX. The tpp kernel you refer to would invoke the micro-kernels in the libxsmm which would leverage AMX on the CPU platforms having the AMX HW support. See
|
@jgong5 IPEX calls the oneDNN kernels, doesn't it? |
Not always. We have multiple choices of kernels for GEMMs, some implemented with oneDNN and others implemented with TPP/intrinsics kernels. |
@Hyungyo1 |
@NeoZhangJianyu |
Describe the issue
Hi,
I have a quick question regarding the LLM inference on CPUs using this extension.
I've been digging into the LLM inference case, and it seems like the kernels written in C++ do not run on AMX (AVX512 is the only one I see). For example, _IPEXlinearReluCPU calls the torch.ops.torch_ipex.tpp_linear_relu C++ code which doesn't seem to be running on AMX.
Is there any LLM layer that runs on AMX, and if so, which C++ code implements it?
Thank you.
The text was updated successfully, but these errors were encountered: