Closed
Description
Describe the issue
Hi,
I have a quick question regarding the LLM inference on CPUs using this extension.
I've been digging into the LLM inference case, and it seems like the kernels written in C++ do not run on AMX (AVX512 is the only one I see). For example, _IPEXlinearReluCPU calls the torch.ops.torch_ipex.tpp_linear_relu C++ code which doesn't seem to be running on AMX.
Is there any LLM layer that runs on AMX, and if so, which C++ code implements it?
Thank you.