-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139
Comments
Hi @maleadt, thanks for the bug report
At the moment we have no guarantees that passing optimized IR through the translator will always work. From OpenCL Guide:
If I understand correctly, that is the case because historically the translator was developed in assumption that it will operate on non-optimized LLVM IR, so no one though about all those intrinsics. However, we have been using it on such LLVM IR for SYCL in intel/llvm GitHub repo for at least half a year already and did some improvements in that area. In general, this direction is being tracked/discussed in #203 and there are known issues like #481 or #645. For you, there are several ways how to proceed forward:
|
Thanks for the details. I'm targeting oneAPI GPUs (i.e. using Intel's IGC/compute-runtime); any concrete suggestions on how to represent these vector intrinsics? If not, I think I'll try to emit unoptimized IR and use |
Let's ask @PawelJurek and @aratajew here for their inputs. if |
|
To handle (work-around) masked memory intrinsics (llvm.masked.gather*, llvm.masked.scatter*, llvm.masked.load*, llvm.masked.store*) we could re-use ScalarizeMaskedMemIntrin pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp . But when I tried it last time, I had a problem with unsupported by the translator types, but I think it is still a bug in the translator rather than the problem with the pass (see #481 for example). |
There is a new extension intel/llvm#6613 which was recently implemented in the translator #1580 . I believe it should help, though it's not yet supported by Intel GPU. |
My front-end is emitting the following IR, which works fine:
On LLVM 12 the SLP Vectorizer introduces calls to gather intrinsics that the translator does not handle:
Is there any guarantee / expectation that code produced by LLVM's optimization passes is supported by the translator?
The text was updated successfully, but these errors were encountered: