[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend #105375

adprasad-nvidia · 2024-08-20T21:01:44Z

GCC compiles the built-in function __builtin_bswap16, to the ARM instruction rev16, which reverses the byte order of 16-bit data. On the other Clang compiles the same built-in function to e.g.

        rev     w8, w0
        lsr     w0, w8, #16

i.e. it performs a byte reversal of a 32-bit register, (which moves the lower half, which contains the 16-bit data, to the upper half) and then right shifts the reversed 16-bit data back to the lower half of the register.
We can improve Clang codegen by generating rev16 instead of rev and lsr, like GCC.

github-actions · 2024-08-20T21:02:02Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-08-20T21:02:35Z

@llvm/pr-subscribers-backend-aarch64

Author: None (adprasad-nvidia)

Changes

GCC compiles the built-in function __builtin_bswap16, to the ARM instruction rev16, which reverses the byte order of 16-bit data. On the other Clang compiles the same built-in function to e.g.

        rev     w8, w0
        lsr     w0, w8, #<!-- -->16

i.e. it performs a byte reversal of a 32-bit register, (which moves the lower half, which contains the 16-bit data, to the upper half) and then right shifts the reversed 16-bit data back to the lower half of the register.
We can improve Clang codegen by generating rev16 instead of rev and lsr, like GCC.

Full diff: https://github.com/llvm/llvm-project/pull/105375.diff

5 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+3-3)
(modified) llvm/test/CodeGen/AArch64/arm64-rev.ll (+13-28)
(modified) llvm/test/CodeGen/AArch64/bswap.ll (+1-2)
(modified) llvm/test/CodeGen/AArch64/memcmp.ll (+5-10)
(modified) llvm/test/CodeGen/AArch64/merge-trunc-store.ll (+4-8)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index a9324af5beb784..227ed141075582 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2825,9 +2825,9 @@ def : InstAlias<"rev64 $Rd, $Rn", (REVXr GPR64:$Rd, GPR64:$Rn), 0>;
 def : Pat<(bswap (rotr GPR32:$Rn, (i64 16))), (REV16Wr GPR32:$Rn)>;
 def : Pat<(bswap (rotr GPR64:$Rn, (i64 32))), (REV32Xr GPR64:$Rn)>;
 
-// Match (srl (bswap x), C) -> revC if the upper bswap bits are known zero.
-def : Pat<(srl (bswap top16Zero:$Rn), (i64 16)), (REV16Wr GPR32:$Rn)>;
-def : Pat<(srl (bswap top32Zero:$Rn), (i64 32)), (REV32Xr GPR64:$Rn)>;
+// Match (srl (bswap x), C) -> revC.
+def : Pat<(srl (bswap GPR32:$Rn), (i64 16)), (REV16Wr GPR32:$Rn)>;
+def : Pat<(srl (bswap GPR64:$Rn), (i64 32)), (REV32Xr GPR64:$Rn)>;
 
 def : Pat<(or (and (srl GPR64:$Rn, (i64 8)), (i64 0x00ff00ff00ff00ff)),
               (and (shl GPR64:$Rn, (i64 8)), (i64 0xff00ff00ff00ff00))),
diff --git a/llvm/test/CodeGen/AArch64/arm64-rev.ll b/llvm/test/CodeGen/AArch64/arm64-rev.ll
index f548a0e01feee6..5973a6a0cf113f 100644
--- a/llvm/test/CodeGen/AArch64/arm64-rev.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-rev.ll
@@ -27,15 +27,13 @@ entry:
 define i32 @test_rev_w_srl16(i16 %a) {
 ; CHECK-SD-LABEL: test_rev_w_srl16:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev w8, w0
-; CHECK-SD-NEXT:    lsr w0, w8, #16
+; CHECK-SD-NEXT:    rev16 w0, w0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: test_rev_w_srl16:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    and w8, w0, #0xffff
-; CHECK-GI-NEXT:    rev w8, w8
-; CHECK-GI-NEXT:    lsr w0, w8, #16
+; CHECK-GI-NEXT:    rev16 w0, w8
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = zext i16 %a to i32
@@ -48,8 +46,7 @@ define i32 @test_rev_w_srl16_load(ptr %a) {
 ; CHECK-LABEL: test_rev_w_srl16_load:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldrh w8, [x0]
-; CHECK-NEXT:    rev w8, w8
-; CHECK-NEXT:    lsr w0, w8, #16
+; CHECK-NEXT:    rev16 w0, w8
 ; CHECK-NEXT:    ret
 entry:
   %0 = load i16, ptr %a
@@ -71,8 +68,7 @@ define i32 @test_rev_w_srl16_add(i8 %a, i8 %b) {
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    and w8, w1, #0xff
 ; CHECK-GI-NEXT:    add w8, w8, w0, uxtb
-; CHECK-GI-NEXT:    rev w8, w8
-; CHECK-GI-NEXT:    lsr w0, w8, #16
+; CHECK-GI-NEXT:    rev16 w0, w8
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = zext i8 %a to i32
@@ -89,15 +85,13 @@ define i64 @test_rev_x_srl32(i32 %a) {
 ; CHECK-SD-LABEL: test_rev_x_srl32:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    // kill: def $w0 killed $w0 def $x0
-; CHECK-SD-NEXT:    rev x8, x0
-; CHECK-SD-NEXT:    lsr x0, x8, #32
+; CHECK-SD-NEXT:    rev32 x0, x0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: test_rev_x_srl32:
 ; CHECK-GI:       // %bb.0: // %entry
 ; CHECK-GI-NEXT:    mov w8, w0
-; CHECK-GI-NEXT:    rev x8, x8
-; CHECK-GI-NEXT:    lsr x0, x8, #32
+; CHECK-GI-NEXT:    rev32 x0, x8
 ; CHECK-GI-NEXT:    ret
 entry:
   %0 = zext i32 %a to i64
@@ -110,8 +104,7 @@ define i64 @test_rev_x_srl32_load(ptr %a) {
 ; CHECK-LABEL: test_rev_x_srl32_load:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    ldr w8, [x0]
-; CHECK-NEXT:    rev x8, x8
-; CHECK-NEXT:    lsr x0, x8, #32
+; CHECK-NEXT:    rev32 x0, x8
 ; CHECK-NEXT:    ret
 entry:
   %0 = load i32, ptr %a
@@ -122,18 +115,11 @@ entry:
 }
 
 define i64 @test_rev_x_srl32_shift(i64 %a) {
-; CHECK-SD-LABEL: test_rev_x_srl32_shift:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ubfx x8, x0, #2, #29
-; CHECK-SD-NEXT:    rev32 x0, x8
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_rev_x_srl32_shift:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    ubfx x8, x0, #2, #29
-; CHECK-GI-NEXT:    rev x8, x8
-; CHECK-GI-NEXT:    lsr x0, x8, #32
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_rev_x_srl32_shift:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ubfx x8, x0, #2, #29
+; CHECK-NEXT:    rev32 x0, x8
+; CHECK-NEXT:    ret
 entry:
   %0 = shl i64 %a, 33
   %1 = lshr i64 %0, 35
@@ -472,8 +458,7 @@ define void @test_rev16_truncstore() {
 ; CHECK-GI-NEXT:  .LBB30_1: // %cleanup
 ; CHECK-GI-NEXT:    // =>This Inner Loop Header: Depth=1
 ; CHECK-GI-NEXT:    ldrh w8, [x8]
-; CHECK-GI-NEXT:    rev w8, w8
-; CHECK-GI-NEXT:    lsr w8, w8, #16
+; CHECK-GI-NEXT:    rev16 w8, w8
 ; CHECK-GI-NEXT:    strh w8, [x8]
 ; CHECK-GI-NEXT:    tbz wzr, #0, .LBB30_1
 ; CHECK-GI-NEXT:  .LBB30_2: // %fail
diff --git a/llvm/test/CodeGen/AArch64/bswap.ll b/llvm/test/CodeGen/AArch64/bswap.ll
index 071613b9cc011e..2a60abdc2308f0 100644
--- a/llvm/test/CodeGen/AArch64/bswap.ll
+++ b/llvm/test/CodeGen/AArch64/bswap.ll
@@ -6,8 +6,7 @@
 define i16 @bswap_i16(i16 %a){
 ; CHECK-LABEL: bswap_i16:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    rev w8, w0
-; CHECK-NEXT:    lsr w0, w8, #16
+; CHECK-NEXT:    rev16 w0, w0
 ; CHECK-NEXT:    ret
     %3 = call i16 @llvm.bswap.i16(i16 %a)
     ret i16 %3
diff --git a/llvm/test/CodeGen/AArch64/memcmp.ll b/llvm/test/CodeGen/AArch64/memcmp.ll
index 4da7c8c95a4e4f..0a6a03844128c3 100644
--- a/llvm/test/CodeGen/AArch64/memcmp.ll
+++ b/llvm/test/CodeGen/AArch64/memcmp.ll
@@ -39,9 +39,8 @@ define i32 @length2(ptr %X, ptr %Y) nounwind {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldrh w8, [x0]
 ; CHECK-NEXT:    ldrh w9, [x1]
-; CHECK-NEXT:    rev w8, w8
+; CHECK-NEXT:    rev16 w8, w8
 ; CHECK-NEXT:    rev w9, w9
-; CHECK-NEXT:    lsr w8, w8, #16
 ; CHECK-NEXT:    sub w0, w8, w9, lsr #16
 ; CHECK-NEXT:    ret
   %m = tail call i32 @memcmp(ptr %X, ptr %Y, i64 2) nounwind
@@ -93,9 +92,8 @@ define i1 @length2_lt(ptr %X, ptr %Y) nounwind {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldrh w8, [x0]
 ; CHECK-NEXT:    ldrh w9, [x1]
-; CHECK-NEXT:    rev w8, w8
+; CHECK-NEXT:    rev16 w8, w8
 ; CHECK-NEXT:    rev w9, w9
-; CHECK-NEXT:    lsr w8, w8, #16
 ; CHECK-NEXT:    sub w8, w8, w9, lsr #16
 ; CHECK-NEXT:    lsr w0, w8, #31
 ; CHECK-NEXT:    ret
@@ -109,9 +107,8 @@ define i1 @length2_gt(ptr %X, ptr %Y) nounwind {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldrh w8, [x0]
 ; CHECK-NEXT:    ldrh w9, [x1]
-; CHECK-NEXT:    rev w8, w8
+; CHECK-NEXT:    rev16 w8, w8
 ; CHECK-NEXT:    rev w9, w9
-; CHECK-NEXT:    lsr w8, w8, #16
 ; CHECK-NEXT:    sub w8, w8, w9, lsr #16
 ; CHECK-NEXT:    cmp w8, #0
 ; CHECK-NEXT:    cset w0, gt
@@ -536,10 +533,8 @@ define i32 @length10(ptr %X, ptr %Y) nounwind {
 ; CHECK-NEXT:  // %bb.1: // %loadbb1
 ; CHECK-NEXT:    ldrh w8, [x0, #8]
 ; CHECK-NEXT:    ldrh w9, [x1, #8]
-; CHECK-NEXT:    rev w8, w8
-; CHECK-NEXT:    rev w9, w9
-; CHECK-NEXT:    lsr w8, w8, #16
-; CHECK-NEXT:    lsr w9, w9, #16
+; CHECK-NEXT:    rev16 w8, w8
+; CHECK-NEXT:    rev16 w9, w9
 ; CHECK-NEXT:    cmp x8, x9
 ; CHECK-NEXT:    b.ne .LBB32_3
 ; CHECK-NEXT:  // %bb.2:
diff --git a/llvm/test/CodeGen/AArch64/merge-trunc-store.ll b/llvm/test/CodeGen/AArch64/merge-trunc-store.ll
index b161d746ad11d5..4fcd030db1bace 100644
--- a/llvm/test/CodeGen/AArch64/merge-trunc-store.ll
+++ b/llvm/test/CodeGen/AArch64/merge-trunc-store.ll
@@ -10,8 +10,7 @@ define void @le_i16_to_i8(i16 %x, ptr %p0) {
 ;
 ; BE-LABEL: le_i16_to_i8:
 ; BE:       // %bb.0:
-; BE-NEXT:    rev w8, w0
-; BE-NEXT:    lsr w8, w8, #16
+; BE-NEXT:    rev16 w8, w0
 ; BE-NEXT:    strh w8, [x1]
 ; BE-NEXT:    ret
   %sh1 = lshr i16 %x, 8
@@ -31,8 +30,7 @@ define void @le_i16_to_i8_order(i16 %x, ptr %p0) {
 ;
 ; BE-LABEL: le_i16_to_i8_order:
 ; BE:       // %bb.0:
-; BE-NEXT:    rev w8, w0
-; BE-NEXT:    lsr w8, w8, #16
+; BE-NEXT:    rev16 w8, w0
 ; BE-NEXT:    strh w8, [x1]
 ; BE-NEXT:    ret
   %sh1 = lshr i16 %x, 8
@@ -47,8 +45,7 @@ define void @le_i16_to_i8_order(i16 %x, ptr %p0) {
 define void @be_i16_to_i8_offset(i16 %x, ptr %p0) {
 ; LE-LABEL: be_i16_to_i8_offset:
 ; LE:       // %bb.0:
-; LE-NEXT:    rev w8, w0
-; LE-NEXT:    lsr w8, w8, #16
+; LE-NEXT:    rev16 w8, w0
 ; LE-NEXT:    sturh w8, [x1, #11]
 ; LE-NEXT:    ret
 ;
@@ -69,8 +66,7 @@ define void @be_i16_to_i8_offset(i16 %x, ptr %p0) {
 define void @be_i16_to_i8_order(i16 %x, ptr %p0) {
 ; LE-LABEL: be_i16_to_i8_order:
 ; LE:       // %bb.0:
-; LE-NEXT:    rev w8, w0
-; LE-NEXT:    lsr w8, w8, #16
+; LE-NEXT:    rev16 w8, w0
 ; LE-NEXT:    strh w8, [x1]
 ; LE-NEXT:    ret
 ;

adprasad-nvidia · 2024-09-03T21:09:15Z

@DTeachs Thanks for pointing this out. As you pointed out, rev16 should only be generated from __builtin_bswap16 when the result is used as an i16, as if it's used as an i32, the top bits are still needed.
The updated patch now checks for this, and only lowers to rev16 if the result is used as an i16. It does this by checking that the bswap is followed by an any_extend - which means the top half is not used i.e. result is used as an i16. If it were instead followed by zero_extend, the top half would be used, and the updated patch does not lower to rev16.
I have added a test in CodeGen/AArch64/bswap.ll that checks bswap is not lowered to rev16 when the result is used as an i32.

davemgreen

Hi - I think this sounds good. There might be a way to handle this where it is performed from the demanded bits of the operation, but that isn't easy to do during selection and performing it at least for the anyext case sounds good for the time being.

I had some comments inline.

davemgreen · 2024-09-04T10:12:59Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+      N->getOperand(0).getValueType().isScalarInteger() &&
+      N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {


I think this could check N->getOperand(0).getValueType() == MVT::i16. Should it check that the input type is i32 too? Otherwise it would need to check i64 or i32, and add a pattern for i64 (and a test).

Thanks for the suggestion on simplifying to == MVT::i16.

By input type, I assume you mean the input type to the rev16 / output type of the old any_extend? As far as I can tell, this is only ever an i32. If an i64 is needed than a zero_extend is generated instead.
In the absence of writing a test, do you think it would be worthwhile to write a comment explaining this? No changes to the code would need to be made, because I am following your suggestion of using N->getValueType(0) instead of EVT(MVT::i32), which means the logic in the code is independent of whether it is i64 or i32.

To explain further - there are 2 cases.
Case 1: if the source code uses the result of the __builtin_bswap16 as either an i32 or an i64, then a zext i16 to i32/i64 is inserted in the IR before instruction selection. This patch's optimisation doesn't trigger because in the DAG, this zext becomes a zero_extend not an any_extend.
Case 2: It's only if the result is used as an i16 that the zext is not inserted. In this case, the any_extend i16 to i32 seems to be inserted later, during the initial building of the DAG from IR (presumably to help with type legalisation later on). The code generator should always insert an any_extend i16 to i32, as there's never any reason for an any_extend i16 to i64. If there were, i.e. we need the result to be an i64, we would be in case 1, not case 2.

I think Dave's suggestion is to add a check:

bswap->getOperand(0).getValueType() == MVT::i32

where bswap = N->getOperand(0).

That's a very good point, it guarantees that we are only dealing with REV16 for i32 where we know the upper bits are zero. It avoids us having to worry about i64s and different patterns.

I'm not sure I understand. bswap's output type is always its input type, so bswap->getOperand(0).getValueType() is always i16, not i32 (assuming we've checked its output type is i16, which we already do). I just verified this by adding a check in the if statement that N->getOperand(0)->getOperand(0).getValueType() == MVT::i32. This makes the test that checks rev16 is generated fail i.e. we generate the old rev and lsr instead of rev16, because the check will evaluate to false.
We guarantee we are only dealing with REV16 for i32 because we insert an any_extend before the REV16, and that any_extend is guaranteed to be extending to i32.

I'm worried about anyext(bswap) coming up for other types. These two look like they should end up with i64 anyext and i128 anyext after a little optimization: https://godbolt.org/z/G1Kc9K1cv

OK, got it. I didn't realise that we can also get an any_extend from an optmised zero_extend.
I can, as suggested, avoid this by adding a check that the old any_extend output type / rev16 input type is i32, i.e. add a check that N.getValueType() == MVT::i32.
But it might also be relatively simple to handle the i64 and i128 cases and still get the optimised rev16 codegen. We could insert two any_extends instead of one: one immediately before the rev16 that extends i16 to i32, and one immediately after the rev16 that extends i32 to whatever the value type of the old any_extend was (i32, i64, i128...).
Would you be happy with the second option too?

Yeah, Adding i64 sounds like a good idea. We can just add a new tablegen pattern for REV16Xr if I understand it correctly.

Usually with the way types legalize, if you handle i32 and i64 then the other types will follow suit, as they get legalized to i32/i64 anyway. We just need to make sure we don't generate a i128 AArch64ISD::REV16 that cannot be selected.

davemgreen · 2024-09-04T10:13:34Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+      N->getOperand(0).getOpcode() == ISD::BSWAP &&
+      N->getOperand(0).getValueType().isScalarInteger() &&
+      N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {
+    SDNode *BswapNode = N->getOperand(0).getNode();


Usually we would do SDLoc DL(N);, and use that elsewhere in both the nodes.

By this do you mean just replacing the inline SDLoc(N) in the second DAG.getNode with DL, defined as you suggest?
I'm not sure how else I could use SDLoc DL(N) and how this would replace needing to define BswapNode. The first DAG.getNode uses SDLoc(BswapNode) instead of SDLoc(N), and I also use the definition of BswapNode in the BswapNode->getOperand(0) expression, not just as a location.

Change SDLoc(BswapNode) with DL in the getNode calls and define:

SDLoc DL(N);

davemgreen · 2024-09-04T10:13:56Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+      N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {
+    SDNode *BswapNode = N->getOperand(0).getNode();
+    SDValue NewAnyExtend = DAG.getNode(ISD::ANY_EXTEND, SDLoc(BswapNode),
+                                       EVT(MVT::i32), BswapNode->getOperand(0));


EVT(MVT::i32) can just be MVT::i32, or probably just N->getValueType(0).

adprasad-nvidia · 2024-09-05T13:15:45Z

@davemgreen @sjoerdmeijer Thanks for the comments, I have pushed changes addressing them.

davemgreen

Thanks. LGTM with an extra comment. You might have to rebase the patch too.

davemgreen · 2024-09-05T19:36:59Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+      N->getOperand(0).getOpcode() == ISD::BSWAP &&
+      N->getOperand(0).getValueType() == MVT::i16 &&
+      (N->getValueType(0) == MVT::i32 || N->getValueType(0) == MVT::i64)) {
+    SDNode *BswapNode = N->getOperand(0).getNode();


SDValue BSwap = N->getOperand(0);, then it can use BSwap.getOperand(0) below.

…xtend Fixes llvm#77222.

davemgreen

LGTM, thanks.

sjoerdmeijer · 2024-09-10T09:57:24Z

Thanks for your help with this Dave!

github-actions · 2024-09-10T09:57:27Z

@adprasad-nvidia Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvmbot added the backend:AArch64 label Aug 20, 2024

adprasad-nvidia changed the title ~~[AArch64] [CodeGen] [REV] Generate revC for all (srl (bswap x), C) instructions when C is 16 or 32~~ [AArch64] Generate revC for all (srl (bswap x), C) instructions when C is 16 or 32 Sep 2, 2024

adprasad-nvidia changed the title ~~[AArch64] Generate revC for all (srl (bswap x), C) instructions when C is 16 or 32~~ [AArch64] Generate rev16 for certain uses of __builtin_bswap16 Sep 3, 2024

adprasad-nvidia force-pushed the adprasad/rev-lsr-to-rev16 branch from 28aa56d to d6b0448 Compare September 3, 2024 21:09

adprasad-nvidia mentioned this pull request Sep 3, 2024

[AArch64] Suboptimal expansion of __builtin_bswap16 #77222

Open

sjoerdmeijer requested a review from davemgreen September 4, 2024 09:46

davemgreen reviewed Sep 4, 2024

View reviewed changes

davemgreen requested a review from sjoerdmeijer September 4, 2024 10:29

adprasad-nvidia force-pushed the adprasad/rev-lsr-to-rev16 branch from d6b0448 to 2d935a9 Compare September 5, 2024 13:14

davemgreen reviewed Sep 5, 2024

View reviewed changes

adprasad-nvidia changed the title ~~[AArch64] Generate rev16 for certain uses of __builtin_bswap16~~ [AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend Sep 6, 2024

adprasad-nvidia force-pushed the adprasad/rev-lsr-to-rev16 branch from 2d935a9 to bbcebb7 Compare September 6, 2024 15:18

[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_e…

fd5e47e

…xtend Fixes llvm#77222.

adprasad-nvidia force-pushed the adprasad/rev-lsr-to-rev16 branch from bbcebb7 to fd5e47e Compare September 6, 2024 15:19

davemgreen approved these changes Sep 6, 2024

View reviewed changes

sjoerdmeijer merged commit 23595d1 into llvm:main Sep 10, 2024
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend #105375

[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend #105375

adprasad-nvidia commented Aug 20, 2024

github-actions bot commented Aug 20, 2024

llvmbot commented Aug 20, 2024

adprasad-nvidia commented Sep 3, 2024

davemgreen left a comment

davemgreen Sep 4, 2024

adprasad-nvidia Sep 4, 2024

adprasad-nvidia Sep 4, 2024 •

edited

Loading

adprasad-nvidia Sep 4, 2024 •

edited

Loading

sjoerdmeijer Sep 4, 2024

adprasad-nvidia Sep 4, 2024 •

edited

Loading

davemgreen Sep 4, 2024

adprasad-nvidia Sep 4, 2024

davemgreen Sep 4, 2024

davemgreen Sep 4, 2024

adprasad-nvidia Sep 4, 2024 •

edited

Loading

sjoerdmeijer Sep 4, 2024

davemgreen Sep 4, 2024

adprasad-nvidia commented Sep 5, 2024

davemgreen left a comment

davemgreen Sep 5, 2024

davemgreen left a comment

sjoerdmeijer commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

		N->getOperand(0).getValueType().isScalarInteger() &&
		N->getOperand(0).getValueType().getFixedSizeInBits() == 16) {

[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend #105375

[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend #105375

Conversation

adprasad-nvidia commented Aug 20, 2024

github-actions bot commented Aug 20, 2024

llvmbot commented Aug 20, 2024

adprasad-nvidia commented Sep 3, 2024

davemgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adprasad-nvidia Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

adprasad-nvidia Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adprasad-nvidia Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adprasad-nvidia Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adprasad-nvidia commented Sep 5, 2024

davemgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

sjoerdmeijer commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

adprasad-nvidia Sep 4, 2024 •

edited

Loading

adprasad-nvidia Sep 4, 2024 •

edited

Loading

adprasad-nvidia Sep 4, 2024 •

edited

Loading

adprasad-nvidia Sep 4, 2024 •

edited

Loading