-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functionalTensor is not supported by ipex custom kernel when using Torch.compile after ipex.llm.optimize #760
Comments
@ZhaoqiongZ , any update? |
Hi @EikanWang , I pass the issue to Su, Tong, since it is too detail with torch.compile feature. |
Hi @lostkingdom4 , thanks for the detailed reproducer! @torch.library.register_fake("torch_ipex::xetla_sdp_dropout")
def xetla_sdp_dropout(query, key, value, attn_mask, dropout_p, is_causal, scale):
print("run into fake")
assert False You will find that this still does not throw the After further investigation, I found that when registering the custom op, it has the dispatch key of
This makes the schema can't be correctly found. A quick and temporary solution is to change This solution is not perfect and just a temporary solution, but normally it won't affect much performance/accuracy. We will try to fix that later. Thanks again for your patience! |
Hi @Stonepia Thanks for the feedback. It works for me! |
Hi @Stonepia, I was trying to do the same thing for torch.ops.torch_ipex.mm_qkv_out(input, self.weight, self.bias, q, k, v). @torch.library.register_fake("torch_ipex::mm_qkv_out.xpu")
def _(query, key, value, attn_mask, dropout_p, is_causal, scale):
print("run into fake")
assert False I got RuntimeError: register_fake(...): the operator torch_ipex::mm_qkv_out.xpu already has an implementation for this device type via a pre-existing registration to DispatchKey::CompositeImplicitAutograd.CompositeImplicitAutograd operators do not need an fake impl; instead, the operator will decompose into its constituents and those can have fake impls defined on them. I have already modified the dispatch key but I think this might not be the problem. Is there a quick fix for this type of operator? I went through the C++ source code but I'm still not entirely sure where it is registered as CompositeImplicitAutograd. |
I try to solve the problem by modifying code in csrc IPEX_OP_REGISTER("mm_qkv_out.xpu", at::AtenIpexTypeXPU::mm_qkv_out);
IPEX_OP_REGISTER_DISPATCH(
"mm_qkv_out.xpu",
at::AtenIpexTypeXPU::mm_qkv_out_autocast,
c10::DispatchKey::AutocastXPU); to // IPEX_OP_REGISTER("mm_qkv_out", at::AtenIpexTypeXPU::mm_qkv_out);
IPEX_OP_REGISTER_DISPATCH(
"mm_qkv_out",
at::AtenIpexTypeXPU::mm_qkv_out,
c10::DispatchKey::XPU); The problem with the fake tensor seems to be solved. However, when running with torch.compile, the dynamo will cause a graph break on this operator. Could you double-check that this will work for operators with a pre-existing registration to DispatchKey::CompositeImplicitAutograd? Meanwhile, what is the difference between IPEX_OP_REGISTER and IPEX_OP_REGISTER_DISPATCH? Why do we want to put .xpu at the end of the mm_qkv_out? I know it is for overloading to XPU. But is this necessary? Thanks |
@Stonepia Thanks |
Hi @lostkingdom4 , Apologize for that I didn't take the chance to get some bandwidth on this. I will update you once I have some new findings. |
@Stonepia I also tried to understand the problem by going through the source codes of IPEX_OP_REGISTER, IPEX_OP_REGISTER_DISPATCH, and TORCH_LIBRARY_IMPL. I think the problem is that once the operator registered using IPEX_OP_REGISTER. The dispatch key will be registered as CompositeImplicitAutograd. I've tried to use IPEX_OP_REGISTER_DISPATCH to register it to a certain dispatch key and rebuild it. However the build is always unsuccessful. |
@Stonepia |
Hi, @lostkingdom4 |
@Stonepia I also tried to understand the problem by going through the source codes of IPEX_OP_REGISTER, IPEX_OP_REGISTER_DISPATCH, and TORCH_LIBRARY_IMPL. I think the problem is that once the operator registered using IPEX_OP_REGISTER. The dispatch key will be registered as CompositeImplicitAutograd. I've tried to use IPEX_OP_REGISTER_DISPATCH to register it to a certain dispatch key and rebuild it. However, the build is always unsuccessful. As I basically need to rebuild the entire pytorch and ipex every time I tried with new registration, it would be really helpful if you could give me some tips on just building the operator without rebuilding the entire system. So I can try to solve this problem more conveniently. I might have a way to solve the problem, but building the ipex from scratch is killing me. |
I don't think you need to rebuild everytime, you could try first with the Python Op registration, it should be the same with C++ side. You don't need to build PyTorch as well. I suggest starting from a simpler custom op with Python registration, to see if everything goes well. Then move to the harder one (that fused everything on IPEX). |
I see what you mean. I've already tried the custom operator registration. It works. The only thing I'm not so sure about is these headers. For example: intel-extension-for-pytorch/csrc/gpu/aten/operators/XeGemm.cpp Lines 1 to 14 in 5b268a5
But I think I will try to build it within the repository so I don't need to worry about the path. I will get back to you after I have some findings. Again, thanks for the information. |
Describe the issue
I was attempting to use the torch.compile after doing the ipex.llm.optimize on language model on a Max 1100 GPU. My goal is to improve the torch.compile by recognizing the fx graph pattern and directly using the custom kernel such as torch_ipex.xetla_sdp_dropout. However, as I was testing torch.compile on the NewIPEXBertSelfAttention as shown in the following code,
I got error as the following:
I followed Adding torch.compile support for an operator to added FakeTensor kernels for torch.ops.torch_ipex.xetla_sdp_dropout and use torch.library.opcheck for detail failed reason as shown below.
The code will fail on test_aot_dispatch_static and test_aot_dispatch_dynamic.
I had a close look at the source code and found the problem is at OpOverload.
It will return None and cause the failure. It seems like the problem is because the functionalTensor is not supported even I have already declared the fakeTensor kernel.
The error is as follows:
Any tips on how to solve this problem will be really helpful!
My system configuration is as follows:
The text was updated successfully, but these errors were encountered: