Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is AMX supported for LLM inference? #517

Closed
Hyungyo1 opened this issue Jan 26, 2024 · 7 comments
Closed

Is AMX supported for LLM inference? #517

Hyungyo1 opened this issue Jan 26, 2024 · 7 comments
Assignees
Labels
CPU CPU specific issues Query

Comments

@Hyungyo1
Copy link

Describe the issue

Hi,
I have a quick question regarding the LLM inference on CPUs using this extension.
I've been digging into the LLM inference case, and it seems like the kernels written in C++ do not run on AMX (AVX512 is the only one I see). For example, _IPEXlinearReluCPU calls the torch.ops.torch_ipex.tpp_linear_relu C++ code which doesn't seem to be running on AMX.
Is there any LLM layer that runs on AMX, and if so, which C++ code implements it?

Thank you.

@jingxu10 jingxu10 added CPU CPU specific issues Query labels Jan 26, 2024
@jingxu10
Copy link
Contributor

if you use data type bfloat16 or int8, pytorch and ipex will use AMX.

@Hyungyo1
Copy link
Author

Thanks for your response. If it's possible, could you please point out which C++ kernel code implements GEMM on AMX?

@jgong5
Copy link
Contributor

jgong5 commented Jan 27, 2024

Please note that you will need the 4-th generation xeon or beyond to take advantage of AMX. The tpp kernel you refer to would invoke the micro-kernels in the libxsmm which would leverage AMX on the CPU platforms having the AMX HW support. See BrgemmTPP at

(BrgemmTPP<T, T>(BSb, Hk, Hc, Hc, Hk * Hc, C, Hk, K, 1.0, 0, Ncb)));
and its implementation here: which calls into libxsmm.

@hezhiqian01
Copy link

@jgong5 IPEX calls the oneDNN kernels, doesn't it?

@jgong5
Copy link
Contributor

jgong5 commented Sep 3, 2024

@jgong5 IPEX calls the oneDNN kernels, doesn't it?

Not always. We have multiple choices of kernels for GEMMs, some implemented with oneDNN and others implemented with TPP/intrinsics kernels.

@NeoZhangJianyu
Copy link
Contributor

@Hyungyo1
Could you feedback?
If no more questions, we will close this issue.

@Hyungyo1
Copy link
Author

Hyungyo1 commented Sep 6, 2024

@NeoZhangJianyu
Yes, my question is answered. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU CPU specific issues Query
Projects
None yet
Development

No branches or pull requests

5 participants