Skip to content

s390x: manual vec_subc_u128 is not recognized #129608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
folkertdev opened this issue Mar 3, 2025 · 0 comments
Closed

s390x: manual vec_subc_u128 is not recognized #129608

folkertdev opened this issue Mar 3, 2025 · 0 comments

Comments

@folkertdev
Copy link

https://godbolt.org/z/EjoWhj8MM

I expect these to optimize to the same output, but they do not:

define noundef <16 x i8> @vec_subc_u128_intrinsic(<16 x i8> %a, <16 x i8> %b) unnamed_addr {
start:
  %0 = bitcast <16 x i8> %a to i128
  %1 = bitcast <16 x i8> %b to i128
  %_3 = tail call noundef i128 @llvm.s390.vscbiq(i128 noundef %0, i128 noundef %1) #3
  %2 = bitcast i128 %_3 to <16 x i8>
  ret <16 x i8> %2
}

define <16 x i8> @vec_subc_u128_manual(<16 x i8> %a, <16 x i8> %b) unnamed_addr {
start:
  %0 = bitcast <16 x i8> %a to i128
  %1 = bitcast <16 x i8> %b to i128
  %_8.1 = icmp uge i128 %0, %1
  %_5 = zext i1 %_8.1 to i128
  %2 = bitcast i128 %_5 to <16 x i8>
  ret <16 x i8> %2
}

declare i128 @llvm.s390.vscbiq(i128, i128) unnamed_addr #2

The equivalent with vec_addc_u128 does get optimized into just a vaccq instruction. For the subtraction here we get this:

vec_subc_u128_intrinsic:
        vscbiq  %v24, %v24, %v26
        br      %r14

.LCPI1_0:
        .quad   0
        .quad   1
vec_subc_u128_manual:
        veclg   %v24, %v26
        jlh     .LBB1_2
        vchlgs  %v0, %v26, %v24
.LBB1_2:
        ipm     %r0
        xilf    %r0, 268435456
        afi     %r0, 1879048192
        vlvgp   %v0, %r0, %r0
        larl    %r1, .LCPI1_0
        vl      %v1, 0(%r1), 3
        vrepib  %v2, 31
        vsrlb   %v0, %v0, %v2
        vsrl    %v0, %v0, %v2
        vn      %v24, %v0, %v1
        br      %r14

which is unfortunate.

in the addition case, we see

define <16 x i8> @vec_addc_u128_manual(<16 x i8> %a, <16 x i8> %b) unnamed_addr {
start:
  %0 = bitcast <16 x i8> %a to i128
  %1 = bitcast <16 x i8> %b to i128
  %2 = tail call { i128, i1 } @llvm.uadd.with.overflow.i128(i128 %0, i128 %1)
  %_7.1 = extractvalue { i128, i1 } %2, 1
  %_5 = zext i1 %_7.1 to i128
  %3 = bitcast i128 %_5 to <16 x i8>
  ret <16 x i8> %3
}

so here the @llvm.uadd.with.overflow.i128 is explicitly there. That won't work for the signed overflowing subtraction, which is too clever and just performs a compare.

@folkertdev folkertdev changed the title s390x: vec_subc_u128 is not recognized s390x: manual vec_subc_u128 is not recognized Mar 4, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Mar 15, 2025
Generate code using the VECTOR ADD COMPUTE CARRY and
VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions
to implement open-coded IR with those semantics.

Handles integer vector types as well as i128.

Fixes: llvm/llvm-project#129608
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants