Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

Open
maleadt opened this issue Aug 3, 2021 · 6 comments
Open

Comments

@maleadt
Copy link

maleadt commented Aug 3, 2021

My front-end is emitting the following IR, which works fine:

target triple = "spir64-unknown-unknown"

declare void @a([2 x i64]*)

declare {}* @b()

define void @c([3 x i64]* %.7) {
  %.1 = alloca [2 x i64], align 8
  call void @a([2 x i64]* %.1)
  %.4 = getelementptr [2 x i64], [2 x i64]* %.1, i32 0, i32 0
  br label %d.f

g.f:                                              ; preds = %d.f
  %.5 = load i64, i64* %.4, align 4
  %.6 = sub i64 %.5, 0
  br label %e.f

i.f:                                              ; preds = %e.f
  %.8 = getelementptr [3 x i64], [3 x i64]* %.7, i32 0, i32 0
  %.9 = load i64, i64* %.8, align 4
  %.10 = icmp slt i64 %.9, 0
  %.11 = xor i1 %.10, true
  %.12 = load i64, i64* %.8, align 4
  %.13 = select i1 %.11, i64 %.12, i64 0
  %.14 = getelementptr [3 x i64], [3 x i64]* %.7, i32 0, i32 2
  %.15 = load i64, i64* %.14, align 4
  %.16 = icmp slt i64 %.15, 0
  %.17 = xor i1 %.16, true
  %.18 = load i64, i64* %.14, align 4
  %.19 = select i1 %.17, i64 %.18, i64 0
  %.20 = icmp sle i64 1, %.53
  %.21 = icmp sle i64 %.53, %.13
  %.22 = zext i1 %.20 to i8
  %.23 = zext i1 %.21 to i8
  %.24 = and i8 %.22, %.23
  %.25 = trunc i8 %.24 to i1
  %.26 = icmp sle i64 0, %.54
  %.28 = zext i1 %.26 to i8
  %.30 = and i8 %.28, 1
  %.31 = trunc i8 %.30 to i1
  %.32 = load i64, i64* %.4, align 4
  %.33 = icmp sle i64 %.32, %.19
  %.35 = zext i1 %.33 to i8
  %.37 = trunc i8 %.35 to i1
  %.41 = zext i1 %.31 to i8
  %.42 = zext i1 %.37 to i8
  %.43 = and i8 %.41, %.42
  %.44 = trunc i8 %.43 to i1
  %.45 = zext i1 %.25 to i8
  %.46 = zext i1 %.44 to i8
  %.47 = and i8 %.45, %.46
  %.48 = trunc i8 %.47 to i1
  br i1 %.48, label %h.f, label %.

.:                                                ; preds = %i.f
  ret void

h.f:                                              ; preds = %i.f
  %1 = call {}* @b()
  unreachable

d.f:                                              ; preds = %0
  br label %g.f

e.f:                                              ; preds = %g.f
  %.51 = sdiv i64 %.6, 1
  %.52 = sub i64 %.6, 0
  %.53 = add i64 %.52, 0
  %.54 = add i64 %.51, 1
  br label %i.f
}
$ llvm-as unopt.ll -o unopt.bc
$ llvm-spirv --spirv-debug-info-version=ocl-100 unopt.bc

On LLVM 12 the SLP Vectorizer introduces calls to gather intrinsics that the translator does not handle:

$ opt -O3 unopt.bc
$ llvm-spirv --spirv-debug-info-version=ocl-100 opt.bc
InvalidFunctionCall: Unexpected llvm intrinsic:
 llvm.masked.gather.v2i64.v2p0i64 [Src: ../lib/SPIRV/SPIRVWriter.cpp:2755  ]
declare void @a([2 x i64]*) local_unnamed_addr

declare {}* @b() local_unnamed_addr

define void @c([3 x i64]* nocapture readonly %.7) local_unnamed_addr {
d.f:
  %.1 = alloca [2 x i64], align 8
  call void @a([2 x i64]* nonnull %.1)
  %.4 = getelementptr inbounds [2 x i64], [2 x i64]* %.1, i64 0, i64 0
  %.5 = load i64, i64* %.4, align 8
  %.54 = add i64 %.5, 1
  %.8 = getelementptr [3 x i64], [3 x i64]* %.7, i64 0, i64 0
  %.14 = getelementptr [3 x i64], [3 x i64]* %.7, i64 0, i64 2
  %0 = insertelement <2 x i64*> poison, i64* %.8, i32 0
  %1 = insertelement <2 x i64*> %0, i64* %.14, i32 1
  %2 = call <2 x i64> @llvm.masked.gather.v2i64.v2p0i64(<2 x i64*> %1, i32 4, <2 x i1> <i1 true, i1 true>, <2 x i64> undef)
  %3 = icmp sgt <2 x i64> %2, zeroinitializer
  %4 = select <2 x i1> %3, <2 x i64> %2, <2 x i64> zeroinitializer
  %5 = insertelement <2 x i64> poison, i64 %.5, i32 0
  %6 = shufflevector <2 x i64> %5, <2 x i64> undef, <2 x i32> zeroinitializer
  %7 = icmp sle <2 x i64> %6, %4
  %8 = insertelement <2 x i64> %5, i64 %.54, i32 1
  %9 = icmp sgt <2 x i64> %8, <i64 0, i64 -1>
  %10 = and <2 x i1> %9, %7
  %shift = shufflevector <2 x i1> %10, <2 x i1> poison, <2 x i32> <i32 1, i32 undef>
  %11 = and <2 x i1> %10, %shift
  %.473 = extractelement <2 x i1> %11, i32 0
  br i1 %.473, label %h.f, label %.

.:                                                ; preds = %d.f
  ret void

h.f:                                              ; preds = %d.f
  %12 = call {}* @b()
  unreachable
}

; Function Attrs: nofree nosync nounwind readonly willreturn
declare <2 x i64> @llvm.masked.gather.v2i64.v2p0i64(<2 x i64*>, i32 immarg, <2 x i1>, <2 x i64>) #0

attributes #0 = { nofree nosync nounwind readonly willreturn }

Is there any guarantee / expectation that code produced by LLVM's optimization passes is supported by the translator?

@AlexeySachkov
Copy link
Contributor

Hi @maleadt, thanks for the bug report

Is there any guarantee / expectation that code produced by LLVM's optimization passes is supported by the translator?

At the moment we have no guarantees that passing optimized IR through the translator will always work. From OpenCL Guide:

Note: Converting IR produced with optimization levels other than -O0 is only available as an experimental feature and it is not guaranteed to work. In the majority of cases, the conversion is expected to succeed when optimizations are enabled. Developers are encouraged to file a bug report when issues are encountered. As a workaround when encountering an issue in translating modules obtained with optimizations, generate LLVM IR with optimizations disabled, and then use the stand-alone spirv-opt tool to optimize at the SPIR-V level.

If I understand correctly, that is the case because historically the translator was developed in assumption that it will operate on non-optimized LLVM IR, so no one though about all those intrinsics. However, we have been using it on such LLVM IR for SYCL in intel/llvm GitHub repo for at least half a year already and did some improvements in that area.

In general, this direction is being tracked/discussed in #203 and there are known issues like #481 or #645.

For you, there are several ways how to proceed forward:

  1. There is a flag, which allows you to translate unknown LLVM intrinsics as if they were regular function calls:
    static cl::list<std::string> SPIRVAllowUnknownIntrinsics(
    "spirv-allow-unknown-intrinsics", cl::CommaSeparated,
    cl::desc("Unknown intrinsics that begin with any prefix from the "
    "comma-separated input list will be translated as external "
    "function calls in SPIR-V.\nLeaving any prefix unspecified "
    "(default) would naturally allow all unknown intrinsics"),
    cl::value_desc("intrinsic_prefix_1,intrinsic_prefix_2"), cl::ValueOptional);

    It has some downsides like your SPIR-V consumer might not properly translate them back to LLVM intrinsics if you are not using the translator for consuming SPIR-V. Another (more ideological) issue is that by turning instrinsic into a function call you loose its semantic within the SPIR-V, there is no attached capability for it and there is no way to check if your client supports that intrinsic using mechanism described in the SPIR-V spec.
  2. If that is possible, you can prepare a pass (or something like that) to lower the intrinsic into another construct, which can be in turn translated into SPIR-V. An example of that would be Add pass to lower Bitcast to nonstandard type instructions #1117, Implement support for dynamic memmove #1060, Translate the llvm.fshr intrinsic function #985, Support llvm.is.constant #672 and many other PRs
  3. If that is possible, you can extend the translator to lower the intrinsic into some existing SPIR-V construct which bears the same semantic. An example would be Add possibility to lower llvm.fmuladd into mad from OpenCL extinst #824, Translate llvm.maxnum intrinsic function #674, Add llvm.ctpop* intrinsic translation #775 and many others
  4. If you get down there, it means that the only way left for you is to define your own SPIR-V extension, which allows to represent the semantics you want in SPIR-V, publish it and implement here

@maleadt
Copy link
Author

maleadt commented Aug 3, 2021

Thanks for the details. I'm targeting oneAPI GPUs (i.e. using Intel's IGC/compute-runtime); any concrete suggestions on how to represent these vector intrinsics?

If not, I think I'll try to emit unoptimized IR and use spirv-opt.

@AlexeySachkov
Copy link
Contributor

I'm targeting oneAPI GPUs (i.e. using Intel's IGC/compute-runtime); any concrete suggestions on how to represent these vector intrinsics?

Let's ask @PawelJurek and @aratajew here for their inputs. if @llvm.masked.gather.* is supported on their side, then the easiest way for you would be to use --spirv-allow-unknown-intrinsics - at least for the short-medium term to discover other issues and have some proof-of-concept. Longer-term, re-using existing SPIR-V capabilities or making a new proper SPIR-V extension are preferable ways.

@PawelJurek
Copy link
Contributor

@maleadt, @AlexeySachkov:

@llvm.masked.* are currently not supported in IGC and no currently supported frontend produces them, so I guess the best way to go here in the short term is to lower/expand these intrinsics on your side.

@Fznamznon
Copy link
Contributor

To handle (work-around) masked memory intrinsics (llvm.masked.gather*, llvm.masked.scatter*, llvm.masked.load*, llvm.masked.store*) we could re-use ScalarizeMaskedMemIntrin pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp . But when I tried it last time, I had a problem with unsupported by the translator types, but I think it is still a bug in the translator rather than the problem with the pass (see #481 for example).
Unfortunately I only tried to lower llvm.masked.gather, so there might me more problems.

@MrSidims
Copy link
Contributor

There is a new extension intel/llvm#6613 which was recently implemented in the translator #1580 . I believe it should help, though it's not yet supported by Intel GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants