LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

maleadt · 2021-08-03T08:11:45Z

My front-end is emitting the following IR, which works fine:

target triple = "spir64-unknown-unknown"

declare void @a([2 x i64]*)

declare {}* @b()

define void @c([3 x i64]* %.7) {
  %.1 = alloca [2 x i64], align 8
  call void @a([2 x i64]* %.1)
  %.4 = getelementptr [2 x i64], [2 x i64]* %.1, i32 0, i32 0
  br label %d.f

g.f:                                              ; preds = %d.f
  %.5 = load i64, i64* %.4, align 4
  %.6 = sub i64 %.5, 0
  br label %e.f

i.f:                                              ; preds = %e.f
  %.8 = getelementptr [3 x i64], [3 x i64]* %.7, i32 0, i32 0
  %.9 = load i64, i64* %.8, align 4
  %.10 = icmp slt i64 %.9, 0
  %.11 = xor i1 %.10, true
  %.12 = load i64, i64* %.8, align 4
  %.13 = select i1 %.11, i64 %.12, i64 0
  %.14 = getelementptr [3 x i64], [3 x i64]* %.7, i32 0, i32 2
  %.15 = load i64, i64* %.14, align 4
  %.16 = icmp slt i64 %.15, 0
  %.17 = xor i1 %.16, true
  %.18 = load i64, i64* %.14, align 4
  %.19 = select i1 %.17, i64 %.18, i64 0
  %.20 = icmp sle i64 1, %.53
  %.21 = icmp sle i64 %.53, %.13
  %.22 = zext i1 %.20 to i8
  %.23 = zext i1 %.21 to i8
  %.24 = and i8 %.22, %.23
  %.25 = trunc i8 %.24 to i1
  %.26 = icmp sle i64 0, %.54
  %.28 = zext i1 %.26 to i8
  %.30 = and i8 %.28, 1
  %.31 = trunc i8 %.30 to i1
  %.32 = load i64, i64* %.4, align 4
  %.33 = icmp sle i64 %.32, %.19
  %.35 = zext i1 %.33 to i8
  %.37 = trunc i8 %.35 to i1
  %.41 = zext i1 %.31 to i8
  %.42 = zext i1 %.37 to i8
  %.43 = and i8 %.41, %.42
  %.44 = trunc i8 %.43 to i1
  %.45 = zext i1 %.25 to i8
  %.46 = zext i1 %.44 to i8
  %.47 = and i8 %.45, %.46
  %.48 = trunc i8 %.47 to i1
  br i1 %.48, label %h.f, label %.

.:                                                ; preds = %i.f
  ret void

h.f:                                              ; preds = %i.f
  %1 = call {}* @b()
  unreachable

d.f:                                              ; preds = %0
  br label %g.f

e.f:                                              ; preds = %g.f
  %.51 = sdiv i64 %.6, 1
  %.52 = sub i64 %.6, 0
  %.53 = add i64 %.52, 0
  %.54 = add i64 %.51, 1
  br label %i.f
}

$ llvm-as unopt.ll -o unopt.bc
$ llvm-spirv --spirv-debug-info-version=ocl-100 unopt.bc

On LLVM 12 the SLP Vectorizer introduces calls to gather intrinsics that the translator does not handle:

$ opt -O3 unopt.bc
$ llvm-spirv --spirv-debug-info-version=ocl-100 opt.bc
InvalidFunctionCall: Unexpected llvm intrinsic:
 llvm.masked.gather.v2i64.v2p0i64 [Src: ../lib/SPIRV/SPIRVWriter.cpp:2755  ]

declare void @a([2 x i64]*) local_unnamed_addr

declare {}* @b() local_unnamed_addr

define void @c([3 x i64]* nocapture readonly %.7) local_unnamed_addr {
d.f:
  %.1 = alloca [2 x i64], align 8
  call void @a([2 x i64]* nonnull %.1)
  %.4 = getelementptr inbounds [2 x i64], [2 x i64]* %.1, i64 0, i64 0
  %.5 = load i64, i64* %.4, align 8
  %.54 = add i64 %.5, 1
  %.8 = getelementptr [3 x i64], [3 x i64]* %.7, i64 0, i64 0
  %.14 = getelementptr [3 x i64], [3 x i64]* %.7, i64 0, i64 2
  %0 = insertelement <2 x i64*> poison, i64* %.8, i32 0
  %1 = insertelement <2 x i64*> %0, i64* %.14, i32 1
  %2 = call <2 x i64> @llvm.masked.gather.v2i64.v2p0i64(<2 x i64*> %1, i32 4, <2 x i1> <i1 true, i1 true>, <2 x i64> undef)
  %3 = icmp sgt <2 x i64> %2, zeroinitializer
  %4 = select <2 x i1> %3, <2 x i64> %2, <2 x i64> zeroinitializer
  %5 = insertelement <2 x i64> poison, i64 %.5, i32 0
  %6 = shufflevector <2 x i64> %5, <2 x i64> undef, <2 x i32> zeroinitializer
  %7 = icmp sle <2 x i64> %6, %4
  %8 = insertelement <2 x i64> %5, i64 %.54, i32 1
  %9 = icmp sgt <2 x i64> %8, <i64 0, i64 -1>
  %10 = and <2 x i1> %9, %7
  %shift = shufflevector <2 x i1> %10, <2 x i1> poison, <2 x i32> <i32 1, i32 undef>
  %11 = and <2 x i1> %10, %shift
  %.473 = extractelement <2 x i1> %11, i32 0
  br i1 %.473, label %h.f, label %.

.:                                                ; preds = %d.f
  ret void

h.f:                                              ; preds = %d.f
  %12 = call {}* @b()
  unreachable
}

; Function Attrs: nofree nosync nounwind readonly willreturn
declare <2 x i64> @llvm.masked.gather.v2i64.v2p0i64(<2 x i64*>, i32 immarg, <2 x i1>, <2 x i64>) #0

attributes #0 = { nofree nosync nounwind readonly willreturn }

Is there any guarantee / expectation that code produced by LLVM's optimization passes is supported by the translator?

The text was updated successfully, but these errors were encountered:

AlexeySachkov · 2021-08-03T08:49:34Z

Hi @maleadt, thanks for the bug report

Is there any guarantee / expectation that code produced by LLVM's optimization passes is supported by the translator?

At the moment we have no guarantees that passing optimized IR through the translator will always work. From OpenCL Guide:

Note: Converting IR produced with optimization levels other than -O0 is only available as an experimental feature and it is not guaranteed to work. In the majority of cases, the conversion is expected to succeed when optimizations are enabled. Developers are encouraged to file a bug report when issues are encountered. As a workaround when encountering an issue in translating modules obtained with optimizations, generate LLVM IR with optimizations disabled, and then use the stand-alone spirv-opt tool to optimize at the SPIR-V level.

If I understand correctly, that is the case because historically the translator was developed in assumption that it will operate on non-optimized LLVM IR, so no one though about all those intrinsics. However, we have been using it on such LLVM IR for SYCL in intel/llvm GitHub repo for at least half a year already and did some improvements in that area.

In general, this direction is being tracked/discussed in #203 and there are known issues like #481 or #645.

For you, there are several ways how to proceed forward:

There is a flag, which allows you to translate unknown LLVM intrinsics as if they were regular function calls:

SPIRV-LLVM-Translator/tools/llvm-spirv/llvm-spirv.cpp

Lines 116 to 122 in bafc886

    
           static cl::list<std::string> SPIRVAllowUnknownIntrinsics( 
        
               "spirv-allow-unknown-intrinsics", cl::CommaSeparated, 
        
               cl::desc("Unknown intrinsics that begin with any prefix from the " 
        
                        "comma-separated input list will be translated as external " 
        
                        "function calls in SPIR-V.\nLeaving any prefix unspecified " 
        
                        "(default) would naturally allow all unknown intrinsics"), 
        
               cl::value_desc("intrinsic_prefix_1,intrinsic_prefix_2"), cl::ValueOptional);

It has some downsides like your SPIR-V consumer might not properly translate them back to LLVM intrinsics if you are not using the translator for consuming SPIR-V. Another (more ideological) issue is that by turning instrinsic into a function call you loose its semantic within the SPIR-V, there is no attached capability for it and there is no way to check if your client supports that intrinsic using mechanism described in the SPIR-V spec.

If that is possible, you can prepare a pass (or something like that) to lower the intrinsic into another construct, which can be in turn translated into SPIR-V. An example of that would be Add pass to lower Bitcast to nonstandard type instructions #1117, Implement support for dynamic memmove #1060, Translate the llvm.fshr intrinsic function #985, Support llvm.is.constant #672 and many other PRs
If that is possible, you can extend the translator to lower the intrinsic into some existing SPIR-V construct which bears the same semantic. An example would be Add possibility to lower llvm.fmuladd into mad from OpenCL extinst #824, Translate llvm.maxnum intrinsic function #674, Add llvm.ctpop* intrinsic translation #775 and many others
If you get down there, it means that the only way left for you is to define your own SPIR-V extension, which allows to represent the semantics you want in SPIR-V, publish it and implement here

maleadt · 2021-08-03T09:17:36Z

Thanks for the details. I'm targeting oneAPI GPUs (i.e. using Intel's IGC/compute-runtime); any concrete suggestions on how to represent these vector intrinsics?

If not, I think I'll try to emit unoptimized IR and use spirv-opt.

AlexeySachkov · 2021-08-03T09:23:58Z

I'm targeting oneAPI GPUs (i.e. using Intel's IGC/compute-runtime); any concrete suggestions on how to represent these vector intrinsics?

Let's ask @PawelJurek and @aratajew here for their inputs. if @llvm.masked.gather.* is supported on their side, then the easiest way for you would be to use --spirv-allow-unknown-intrinsics - at least for the short-medium term to discover other issues and have some proof-of-concept. Longer-term, re-using existing SPIR-V capabilities or making a new proper SPIR-V extension are preferable ways.

PawelJurek · 2021-08-03T12:07:52Z

@maleadt, @AlexeySachkov:

@llvm.masked.* are currently not supported in IGC and no currently supported frontend produces them, so I guess the best way to go here in the short term is to lower/expand these intrinsics on your side.

Fznamznon · 2021-08-04T15:43:58Z

To handle (work-around) masked memory intrinsics (llvm.masked.gather*, llvm.masked.scatter*, llvm.masked.load*, llvm.masked.store*) we could re-use ScalarizeMaskedMemIntrin pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp . But when I tried it last time, I had a problem with unsupported by the translator types, but I think it is still a bug in the translator rather than the problem with the pass (see #481 for example).
Unfortunately I only tried to lower llvm.masked.gather, so there might me more problems.

MrSidims · 2022-09-21T16:50:59Z

There is a new extension intel/llvm#6613 which was recently implemented in the translator #1580 . I believe it should help, though it's not yet supported by Intel GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

maleadt commented Aug 3, 2021

AlexeySachkov commented Aug 3, 2021

maleadt commented Aug 3, 2021

AlexeySachkov commented Aug 3, 2021

PawelJurek commented Aug 3, 2021

Fznamznon commented Aug 4, 2021

MrSidims commented Sep 21, 2022

LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

LLVM 12's SLP Vectorizer introduces unhandled llvm.masked.gather calls #1139

Comments

maleadt commented Aug 3, 2021

AlexeySachkov commented Aug 3, 2021

maleadt commented Aug 3, 2021

AlexeySachkov commented Aug 3, 2021

PawelJurek commented Aug 3, 2021

Fznamznon commented Aug 4, 2021

MrSidims commented Sep 21, 2022