s390x: vector cast using shuffle does not optimize well #129899

folkertdev · 2025-03-05T16:35:25Z

This LLVM IR

define range(i64 -128, 128) <2 x i64> @manual_vec_extend_s64(<16 x i8> %a) unnamed_addr {
start:
  %0 = shufflevector <16 x i8> %a, <16 x i8> poison, <2 x i32> <i32 7, i32 15>
  %1 = sext <2 x i8> %0 to <2 x i64>
  ret <2 x i64> %1
}

does not optimize to a single instruction.

The C code uses a slightly different (more manual) lowering to LLVM IR:

https://godbolt.org/z/aencTa3nq

define dso_local <2 x i64> @a(<16 x i8> noundef %a) local_unnamed_addr {
entry:
  %vecext.i = extractelement <16 x i8> %a, i64 7
  %conv.i = sext i8 %vecext.i to i64
  %vecinit.i = insertelement <2 x i64> poison, i64 %conv.i, i64 0
  %vecext1.i = extractelement <16 x i8> %a, i64 15
  %conv2.i = sext i8 %vecext1.i to i64
  %vecinit3.i = insertelement <2 x i64> %vecinit.i, i64 %conv2.i, i64 1
  ret <2 x i64> %vecinit3.i
}

but rust can't replicate that at the moment. That's a bug we'll fix on the rust side, but still I think the shufflevector should also work. And it seems like it might be smaller in LLVM IR and hence preferred for efficiency as the clang lowering as well?

The text was updated successfully, but these errors were encountered:

folkertdev · 2025-03-06T09:32:43Z

a correction: the rust LLVM IR always optimizes to the shufflevector version. I'm not sure why the C version doesn't, honestly. Maybe it's the poison values (which rust will not emit).

Generate more efficient code for zero or sign extensions where the source is a subvector generated via SHUFFLE_VECTOR. Specifically, recognize patterns corresponding to (series of) VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO DOUBLEWORD instruction. As a special case, also handle zero or sign extensions of a vector element to i128. Fixes: llvm/llvm-project#129576 Fixes: llvm/llvm-project#129899

llvmbot added the new issue label Mar 5, 2025

EugeneZelenko added backend:SystemZ missed-optimization and removed new issue labels Mar 5, 2025

uweigand closed this as completed in 4a4987b Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

s390x: vector cast using shuffle does not optimize well #129899

s390x: vector cast using shuffle does not optimize well #129899

folkertdev commented Mar 5, 2025

folkertdev commented Mar 6, 2025 •

edited

Loading

Uh oh!

s390x: vector cast using shuffle does not optimize well #129899

s390x: vector cast using shuffle does not optimize well #129899

Comments

folkertdev commented Mar 5, 2025

folkertdev commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev commented Mar 6, 2025 •

edited

Loading