Skip to content

s390x: vector cast using shuffle does not optimize well #129899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
folkertdev opened this issue Mar 5, 2025 · 1 comment
Closed

s390x: vector cast using shuffle does not optimize well #129899

folkertdev opened this issue Mar 5, 2025 · 1 comment

Comments

@folkertdev
Copy link

https://godbolt.org/z/6sodYY3fW

This LLVM IR

define range(i64 -128, 128) <2 x i64> @manual_vec_extend_s64(<16 x i8> %a) unnamed_addr {
start:
  %0 = shufflevector <16 x i8> %a, <16 x i8> poison, <2 x i32> <i32 7, i32 15>
  %1 = sext <2 x i8> %0 to <2 x i64>
  ret <2 x i64> %1
}

does not optimize to a single instruction.

The C code uses a slightly different (more manual) lowering to LLVM IR:

https://godbolt.org/z/aencTa3nq

define dso_local <2 x i64> @a(<16 x i8> noundef %a) local_unnamed_addr {
entry:
  %vecext.i = extractelement <16 x i8> %a, i64 7
  %conv.i = sext i8 %vecext.i to i64
  %vecinit.i = insertelement <2 x i64> poison, i64 %conv.i, i64 0
  %vecext1.i = extractelement <16 x i8> %a, i64 15
  %conv2.i = sext i8 %vecext1.i to i64
  %vecinit3.i = insertelement <2 x i64> %vecinit.i, i64 %conv2.i, i64 1
  ret <2 x i64> %vecinit3.i
}

but rust can't replicate that at the moment. That's a bug we'll fix on the rust side, but still I think the shufflevector should also work. And it seems like it might be smaller in LLVM IR and hence preferred for efficiency as the clang lowering as well?

@folkertdev
Copy link
Author

folkertdev commented Mar 6, 2025

a correction: the rust LLVM IR always optimizes to the shufflevector version. I'm not sure why the C version doesn't, honestly. Maybe it's the poison values (which rust will not emit).

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Mar 15, 2025
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.

Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.

As a special case, also handle zero or sign extensions of a
vector element to i128.

Fixes: llvm/llvm-project#129576
Fixes: llvm/llvm-project#129899
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants