Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Allow LDR merge with same destination register by renaming #71908

Merged
merged 1 commit into from
Nov 23, 2023

Conversation

nocchijiang
Copy link
Contributor

The patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things:

  1. Do the renaming with necessary checks passed, including alias check.
  2. Rename the register for the instructions between the pairs and combine the second load into the first. By doing so we can just check the renamability between the pairs and avoid scanning unknown amount of instructions before/after the pairs.

Necessary refactoring has been made in order to reuse as much code possible with STR renaming.

@llvmbot
Copy link
Member

llvmbot commented Nov 10, 2023

@llvm/pr-subscribers-backend-aarch64

Author: Zhaoxuan Jiang (nocchijiang)

Changes

The patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things:

  1. Do the renaming with necessary checks passed, including alias check.
  2. Rename the register for the instructions between the pairs and combine the second load into the first. By doing so we can just check the renamability between the pairs and avoid scanning unknown amount of instructions before/after the pairs.

Necessary refactoring has been made in order to reuse as much code possible with STR renaming.


Patch is 61.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71908.diff

10 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+191-81)
  • (modified) llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll (+1-2)
  • (modified) llvm/test/CodeGen/AArch64/fexplog.ll (+40-60)
  • (modified) llvm/test/CodeGen/AArch64/fpow.ll (+34-40)
  • (modified) llvm/test/CodeGen/AArch64/frem.ll (+34-40)
  • (modified) llvm/test/CodeGen/AArch64/fsincos.ll (+16-24)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+1-4)
  • (modified) llvm/test/CodeGen/AArch64/neon-dotreduce.ll (+16-26)
  • (modified) llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir (+132-2)
  • (modified) llvm/test/CodeGen/AArch64/sve-fixed-length-fp128.ll (+6-11)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 299ea43a539edb7..3827cef566f12ee 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -816,12 +816,11 @@ AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
   return NextI;
 }
 
-// Apply Fn to all instructions between MI and the beginning of the block, until
-// a def for DefReg is reached. Returns true, iff Fn returns true for all
-// visited instructions. Stop after visiting Limit iterations.
-static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
-                              const TargetRegisterInfo *TRI, unsigned Limit,
-                              std::function<bool(MachineInstr &, bool)> &Fn) {
+static bool forAllMIsUntil(MachineInstr &MI,
+                           std::function<bool(MachineInstr &MI, bool)> Until,
+                           MCPhysReg DefReg, const TargetRegisterInfo *TRI,
+                           unsigned Limit,
+                           std::function<bool(MachineInstr &, bool)> &Fn) {
   auto MBB = MI.getParent();
   for (MachineInstr &I :
        instructionsWithoutDebug(MI.getReverseIterator(), MBB->instr_rend())) {
@@ -841,6 +840,17 @@ static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
   return true;
 }
 
+// Apply Fn to all instructions between MI and the beginning of the block, until
+// a def for DefReg is reached. Returns true, iff Fn returns true for all
+// visited instructions. Stop after visiting Limit iterations.
+static bool forAllMIsUntilDef(MachineInstr &MI, MCPhysReg DefReg,
+                              const TargetRegisterInfo *TRI, unsigned Limit,
+                              std::function<bool(MachineInstr &, bool)> &Fn) {
+  return forAllMIsUntil(
+      MI, [](MachineInstr &, bool IsDef) { return IsDef; }, DefReg, TRI, Limit,
+      Fn);
+}
+
 static void updateDefinedRegisters(MachineInstr &MI, LiveRegUnits &Units,
                                    const TargetRegisterInfo *TRI) {
 
@@ -875,7 +885,7 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
   bool MergeForward = Flags.getMergeForward();
 
   std::optional<MCPhysReg> RenameReg = Flags.getRenameReg();
-  if (MergeForward && RenameReg) {
+  if (RenameReg) {
     MCRegister RegToRename = getLdStRegOp(*I).getReg();
     DefinedInBB.addReg(*RenameReg);
 
@@ -892,7 +902,8 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
     };
 
     std::function<bool(MachineInstr &, bool)> UpdateMIs =
-        [this, RegToRename, GetMatchingSubReg](MachineInstr &MI, bool IsDef) {
+        [this, RegToRename, GetMatchingSubReg, MergeForward](MachineInstr &MI,
+                                                             bool IsDef) {
           if (IsDef) {
             bool SeenDef = false;
             for (unsigned OpIdx = 0; OpIdx < MI.getNumOperands(); ++OpIdx) {
@@ -900,7 +911,8 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
               // Rename the first explicit definition and all implicit
               // definitions matching RegToRename.
               if (MOP.isReg() && !MOP.isDebug() && MOP.getReg() &&
-                  (!SeenDef || (MOP.isDef() && MOP.isImplicit())) &&
+                  (!MergeForward || !SeenDef ||
+                   (MOP.isDef() && MOP.isImplicit())) &&
                   TRI->regsOverlap(MOP.getReg(), RegToRename)) {
                 assert((MOP.isImplicit() ||
                         (MOP.isRenamable() && !MOP.isEarlyClobber())) &&
@@ -938,20 +950,35 @@ AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
           LLVM_DEBUG(dbgs() << "Renamed " << MI << "\n");
           return true;
         };
-    forAllMIsUntilDef(*I, RegToRename, TRI, LdStLimit, UpdateMIs);
+    if (MergeForward)
+      forAllMIsUntilDef(*I, RegToRename, TRI, LdStLimit, UpdateMIs);
+    else
+      forAllMIsUntil(
+          *std::prev(Paired),
+          [&I](MachineInstr &MI, bool) { return &MI == &*I; }, RegToRename, TRI,
+          LdStLimit, UpdateMIs);
 
 #if !defined(NDEBUG)
-    // Make sure the register used for renaming is not used between the paired
-    // instructions. That would trash the content before the new paired
-    // instruction.
+    // For forward merging store:
+    // Make sure the register used for renaming is not used between the
+    // paired instructions. That would trash the content before the new
+    // paired instruction.
+    MCPhysReg RegToCheck = *RenameReg;
+    // For backward merging load:
+    // Make sure the register being renamed is not used between the
+    // paired instructions. That would trash the content after the new
+    // paired instruction.
+    if (!MergeForward)
+      RegToCheck = RegToRename;
     for (auto &MI :
          iterator_range<MachineInstrBundleIterator<llvm::MachineInstr>>(
-             std::next(I), std::next(Paired)))
+             MergeForward ? std::next(I) : I,
+             MergeForward ? std::next(Paired) : Paired))
       assert(all_of(MI.operands(),
-                    [this, &RenameReg](const MachineOperand &MOP) {
+                    [this, RegToCheck](const MachineOperand &MOP) {
                       return !MOP.isReg() || MOP.isDebug() || !MOP.getReg() ||
                              MOP.isUndef() ||
-                             !TRI->regsOverlap(MOP.getReg(), *RenameReg);
+                             !TRI->regsOverlap(MOP.getReg(), RegToCheck);
                     }) &&
              "Rename register used between paired instruction, trashing the "
              "content");
@@ -1397,6 +1424,38 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI,
   // FIXME: Can we also match a mixed sext/zext unscaled/scaled pair?
 }
 
+static bool canRenameMOP(const MachineOperand &MOP,
+                         const TargetRegisterInfo *TRI) {
+  if (MOP.isReg()) {
+    auto *RegClass = TRI->getMinimalPhysRegClass(MOP.getReg());
+    // Renaming registers with multiple disjunct sub-registers (e.g. the
+    // result of a LD3) means that all sub-registers are renamed, potentially
+    // impacting other instructions we did not check. Bail out.
+    // Note that this relies on the structure of the AArch64 register file. In
+    // particular, a subregister cannot be written without overwriting the
+    // whole register.
+    if (RegClass->HasDisjunctSubRegs) {
+      LLVM_DEBUG(
+          dbgs()
+          << "  Cannot rename operands with multiple disjunct subregisters ("
+          << MOP << ")\n");
+      return false;
+    }
+
+    // We cannot rename arbitrary implicit-defs, the specific rule to rewrite
+    // them must be known. For example, in ORRWrs the implicit-def
+    // corresponds to the result register.
+    if (MOP.isImplicit() && MOP.isDef()) {
+      if (!isRewritableImplicitDef(MOP.getParent()->getOpcode()))
+        return false;
+      return TRI->isSuperOrSubRegisterEq(
+          MOP.getParent()->getOperand(0).getReg(), MOP.getReg());
+    }
+  }
+  return MOP.isImplicit() ||
+         (MOP.isRenamable() && !MOP.isEarlyClobber() && !MOP.isTied());
+}
+
 static bool
 canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
                  SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
@@ -1406,10 +1465,6 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
 
   // Check if we can find an unused register which we can use to rename
   // the register used by the first load/store.
-  auto *RegClass = TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
-  MachineFunction &MF = *FirstMI.getParent()->getParent();
-  if (!RegClass || !MF.getRegInfo().tracksLiveness())
-    return false;
 
   auto RegToRename = getLdStRegOp(FirstMI).getReg();
   // For now, we only rename if the store operand gets killed at the store.
@@ -1423,36 +1478,6 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
     LLVM_DEBUG(dbgs() << "  Operand not killed at " << FirstMI << "\n");
     return false;
   }
-  auto canRenameMOP = [TRI](const MachineOperand &MOP) {
-    if (MOP.isReg()) {
-      auto *RegClass = TRI->getMinimalPhysRegClass(MOP.getReg());
-      // Renaming registers with multiple disjunct sub-registers (e.g. the
-      // result of a LD3) means that all sub-registers are renamed, potentially
-      // impacting other instructions we did not check. Bail out.
-      // Note that this relies on the structure of the AArch64 register file. In
-      // particular, a subregister cannot be written without overwriting the
-      // whole register.
-      if (RegClass->HasDisjunctSubRegs) {
-        LLVM_DEBUG(
-            dbgs()
-            << "  Cannot rename operands with multiple disjunct subregisters ("
-            << MOP << ")\n");
-        return false;
-      }
-
-      // We cannot rename arbitrary implicit-defs, the specific rule to rewrite
-      // them must be known. For example, in ORRWrs the implicit-def
-      // corresponds to the result register.
-      if (MOP.isImplicit() && MOP.isDef()) {
-        if (!isRewritableImplicitDef(MOP.getParent()->getOpcode()))
-          return false;
-        return TRI->isSuperOrSubRegisterEq(
-            MOP.getParent()->getOperand(0).getReg(), MOP.getReg());
-      }
-    }
-    return MOP.isImplicit() ||
-           (MOP.isRenamable() && !MOP.isEarlyClobber() && !MOP.isTied());
-  };
 
   bool FoundDef = false;
 
@@ -1495,7 +1520,7 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
         if (!MOP.isReg() || !MOP.isDef() || MOP.isDebug() || !MOP.getReg() ||
             !TRI->regsOverlap(MOP.getReg(), RegToRename))
           continue;
-        if (!canRenameMOP(MOP)) {
+        if (!canRenameMOP(MOP, TRI)) {
           LLVM_DEBUG(dbgs()
                      << "  Cannot rename " << MOP << " in " << MI << "\n");
           return false;
@@ -1509,7 +1534,7 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
             !TRI->regsOverlap(MOP.getReg(), RegToRename))
           continue;
 
-        if (!canRenameMOP(MOP)) {
+        if (!canRenameMOP(MOP, TRI)) {
           LLVM_DEBUG(dbgs()
                      << "  Cannot rename " << MOP << " in " << MI << "\n");
           return false;
@@ -1530,6 +1555,56 @@ canRenameUpToDef(MachineInstr &FirstMI, LiveRegUnits &UsedInBetween,
   return true;
 }
 
+// We want to merge the second load into the first by rewriting the usages of
+// the same reg between first (incl.) and second (excl.). We don't need to care
+// about any insns before FirstLoad or after SecondLoad.
+// 1. The second load writes new value into the same reg.
+//    - The renaming is impossible to impact later use of the reg.
+//    - The second load always trash the value written by the first load which
+//      means the reg must be killed before the second load.
+// 2. The first load must be a def for the same reg so we don't need to look
+//    into anything before it.
+static bool canRenameUntilSecondLoad(
+    MachineInstr &FirstLoad, MachineInstr &SecondLoad,
+    LiveRegUnits &UsedInBetween,
+    SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
+    const TargetRegisterInfo *TRI) {
+  if (FirstLoad.isPseudo())
+    return false;
+
+  UsedInBetween.accumulate(FirstLoad);
+  auto RegToRename = getLdStRegOp(FirstLoad).getReg();
+  bool Success = std::all_of(
+      FirstLoad.getIterator(), SecondLoad.getIterator(),
+      [&](MachineBasicBlock::iterator MBBI) {
+        MachineInstr &MI = *MBBI;
+
+        LLVM_DEBUG(dbgs() << "Checking " << MI << "\n");
+        // Currently we do not try to rename across frame-setup instructions.
+        if (MI.getFlag(MachineInstr::FrameSetup)) {
+          LLVM_DEBUG(dbgs()
+                     << "  Cannot rename framesetup instructions currently ("
+                     << MI << ")\n");
+          return false;
+        }
+
+        for (auto &MOP : MI.operands()) {
+          if (!MOP.isReg() || MOP.isDebug() || !MOP.getReg() ||
+              !TRI->regsOverlap(MOP.getReg(), RegToRename))
+            continue;
+          if (!canRenameMOP(MOP, TRI)) {
+            LLVM_DEBUG(dbgs()
+                       << "  Cannot rename " << MOP << " in " << MI << "\n");
+            return false;
+          }
+          RequiredClasses.insert(TRI->getMinimalPhysRegClass(MOP.getReg()));
+        }
+
+        return true;
+      });
+  return Success;
+}
+
 // Check if we can find a physical register for renaming \p Reg. This register
 // must:
 // * not be defined already in \p DefinedInBB; DefinedInBB must contain all
@@ -1578,6 +1653,41 @@ static std::optional<MCPhysReg> tryToFindRegisterToRename(
   return std::nullopt;
 }
 
+// For store pairs: returns a register from FirstMI to the beginning of the
+// block that can be renamed.
+// For load pairs: returns a register from FirstMI to MI that can be renamed.
+static std::optional<MCPhysReg> findRenameRegForSameLdStRegPair(
+    std::optional<bool> MaybeCanRename, MachineInstr &FirstMI, MachineInstr &MI,
+    Register Reg, LiveRegUnits &DefinedInBB, LiveRegUnits &UsedInBetween,
+    SmallPtrSetImpl<const TargetRegisterClass *> &RequiredClasses,
+    const TargetRegisterInfo *TRI) {
+  std::optional<MCPhysReg> RenameReg;
+  if (!DebugCounter::shouldExecute(RegRenamingCounter))
+    return RenameReg;
+
+  auto *RegClass = TRI->getMinimalPhysRegClass(getLdStRegOp(FirstMI).getReg());
+  MachineFunction &MF = *FirstMI.getParent()->getParent();
+  if (!RegClass || !MF.getRegInfo().tracksLiveness())
+    return RenameReg;
+
+  const bool IsLoad = FirstMI.mayLoad();
+
+  if (!MaybeCanRename) {
+    if (IsLoad)
+      MaybeCanRename = {canRenameUntilSecondLoad(FirstMI, MI, UsedInBetween,
+                                                 RequiredClasses, TRI)};
+    else
+      MaybeCanRename = {
+          canRenameUpToDef(FirstMI, UsedInBetween, RequiredClasses, TRI)};
+  }
+
+  if (*MaybeCanRename) {
+    RenameReg = tryToFindRegisterToRename(MF, Reg, DefinedInBB, UsedInBetween,
+                                          RequiredClasses, TRI);
+  }
+  return RenameReg;
+}
+
 /// Scan the instructions looking for a load/store that can be combined with the
 /// current instruction into a wider equivalent or a load/store pair.
 MachineBasicBlock::iterator
@@ -1730,17 +1840,6 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
             continue;
           }
         }
-        // If the destination register of one load is the same register or a
-        // sub/super register of the other load, bail and keep looking. A
-        // load-pair instruction with both destination registers the same is
-        // UNPREDICTABLE and will result in an exception.
-        if (MayLoad &&
-            TRI->isSuperOrSubRegisterEq(Reg, getLdStRegOp(MI).getReg())) {
-          LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
-                                            TRI);
-          MemInsns.push_back(&MI);
-          continue;
-        }
 
         // If the BaseReg has been modified, then we cannot do the optimization.
         // For example, in the following pattern
@@ -1751,17 +1850,37 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
         if (!ModifiedRegUnits.available(BaseReg))
           return E;
 
+        const bool SameLoadReg = MayLoad && TRI->isSuperOrSubRegisterEq(
+                                                Reg, getLdStRegOp(MI).getReg());
+
         // If the Rt of the second instruction was not modified or used between
         // the two instructions and none of the instructions between the second
         // and first alias with the second, we can combine the second into the
         // first.
         if (ModifiedRegUnits.available(getLdStRegOp(MI).getReg()) &&
-            !(MI.mayLoad() &&
+            !(MI.mayLoad() && !SameLoadReg &&
               !UsedRegUnits.available(getLdStRegOp(MI).getReg())) &&
             !mayAlias(MI, MemInsns, AA)) {
+          // For pairs loading into the same reg, try to find a renaming
+          // opportunity to allow the renaming of Reg between FirstMI and MI
+          // and combine MI into FirstMI; otherwise bail and keep looking.
+          if (SameLoadReg) {
+            std::optional<MCPhysReg> RenameReg =
+                findRenameRegForSameLdStRegPair(MaybeCanRename, FirstMI, MI,
+                                                Reg, DefinedInBB, UsedInBetween,
+                                                RequiredClasses, TRI);
+            if (!RenameReg) {
+              LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits,
+                                                UsedRegUnits, TRI);
+              MemInsns.push_back(&MI);
+              continue;
+            }
+            Flags.setRenameReg(*RenameReg);
+          }
 
           Flags.setMergeForward(false);
-          Flags.clearRenameReg();
+          if (!SameLoadReg)
+            Flags.clearRenameReg();
           return MBBI;
         }
 
@@ -1779,22 +1898,13 @@ AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
             return MBBI;
           }
 
-          if (DebugCounter::shouldExecute(RegRenamingCounter)) {
-            if (!MaybeCanRename)
-              MaybeCanRename = {canRenameUpToDef(FirstMI, UsedInBetween,
-                                                 RequiredClasses, TRI)};
-
-            if (*MaybeCanRename) {
-              std::optional<MCPhysReg> MaybeRenameReg =
-                  tryToFindRegisterToRename(*FirstMI.getParent()->getParent(),
-                                            Reg, DefinedInBB, UsedInBetween,
-                                            RequiredClasses, TRI);
-              if (MaybeRenameReg) {
-                Flags.setRenameReg(*MaybeRenameReg);
-                Flags.setMergeForward(true);
-                MBBIWithRenameReg = MBBI;
-              }
-            }
+          std::optional<MCPhysReg> RenameReg = findRenameRegForSameLdStRegPair(
+              MaybeCanRename, FirstMI, MI, Reg, DefinedInBB, UsedInBetween,
+              RequiredClasses, TRI);
+          if (RenameReg) {
+            Flags.setMergeForward(true);
+            Flags.setRenameReg(*RenameReg);
+            MBBIWithRenameReg = MBBI;
           }
         }
         // Unable to combine these instructions due to interference in between.
diff --git a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
index de07ed1b5d7ec96..e688af7fdeca38a 100644
--- a/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
@@ -125,9 +125,8 @@ define dso_local void @test_va_copy() {
 
 ; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var
 
-; CHECK: ldr [[BLOCKB:q[0-9]+]], [x[[SRC]], #16]
+; CHECK: ldp [[BLOCKA:q[0-9]+]], [[BLOCKB:q[0-9]+]], [x[[SRC]]]
 ; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list
-; CHECK: ldr [[BLOCKA:q[0-9]+]], [x[[SRC]]]
 ; CHECK: stp [[BLOCKA]], [[BLOCKB]], [x[[DST]]]
   ret void
 ; CHECK: ret
diff --git a/llvm/test/CodeGen/AArch64/fexplog.ll b/llvm/test/CodeGen/AArch64/fexplog.ll
index be6040faa2836bf..26c0b68307b32b4 100644
--- a/llvm/test/CodeGen/AArch64/fexplog.ll
+++ b/llvm/test/CodeGen/AArch64/fexplog.ll
@@ -713,14 +713,12 @@ define <7 x half> @exp_v7f16(<7 x half> %a) {
 ; CHECK-GI-NEXT:    ldr x30, [sp, #144] // 8-byte Folded Reload
 ; CHECK-GI-NEXT:    ldp d11, d10, [sp, #112] // 16-byte Folded Reload
 ; CHECK-GI-NEXT:    mov v1.h[1], v2.h[0]
-; CHECK-GI-NEXT:    ldr q2, [sp, #48] // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ldp q2, q3, [sp, #32] // 32-byte Folded Reload
 ; CHECK-GI-NEXT:    ldp d13, d12, [sp, #96] // 16-byte Folded Reload
-; CHECK-GI-NEXT:    mov v1.h[2], v2.h[0]
-; CHECK-GI-NEXT:    ldr q2, [sp, #32] // 16-byte Folded Reload
+; CHECK-GI-NEXT:    mov v1.h[2], v3.h[0]
 ; CHECK-GI-NEXT:    mov v1.h[3], v2.h[0]
-; CHECK-GI-NEXT:    ldr q2, [sp, #16] // 16-byte Folded Reload
-; CHECK-GI-NEXT:    mov v1.h[4], v2.h[0]
-; CHECK-GI-NEXT:    ldr q2, [sp] // 16-byte Folded Reload
+; CHECK-GI-NEXT:    ldp q2, q3, [sp] // 32-byte Folded Reload
+; CHECK-GI-NEXT:    mov v1.h[4], v3.h[0]
 ; CHECK-GI-NEXT:    mov v1.h[5], v2.h[0]
 ; CHECK-GI-NEXT:    mov v1.h[6]...
[truncated]

@nocchijiang
Copy link
Contributor Author

@mstorsjo @sjoerdmeijer @fhahn @MeeraN7

Since I don't have commit access, please help add reviewers for me and take a look at the PR when you are free. Thanks!

@davemgreen
Copy link
Collaborator

Hi. This looks like an interesting patch. Have you tried running a bootstrap or the llvm-test-suite, to make sure it behaves OK? Thanks

@nocchijiang
Copy link
Contributor Author

nocchijiang commented Nov 11, 2023

Have you tried running a bootstrap or the llvm-test-suite, to make sure it behaves OK?

I tested the patch by running llvm-test-suite before rebasing onto the tip of main. I just found that the test was broken by the patch landed 2 weeks ago, which is fixed by the commit I just pushed. @c-rhodes please also have a look at this to make sure I'm not breaking anything.

I'm not familiar with bootstrapping stuff. I'm working on an Apple Silicon Mac so I followed this guide to build Apple clang but no luck:

ninja: error: unknown target 'stage2-distribution'

Update: I just tried the simplest bootstrapping configuration introduced by the guide above and the bootstrapping process finished successfully. Please let me know if I should test some other stuff if necessary.

@nocchijiang nocchijiang force-pushed the aarch64_ldst_rename_ldr branch from d95829d to 126c69f Compare November 16, 2023 07:33
@nocchijiang
Copy link
Contributor Author

Rebased onto latest main to resolve conflicts.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Whenever we try these kind of things there seems to often be a high chance of problems occurring. Please make sure you have tested thoroughly, but from what I can tell this LGTM.

llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir Outdated Show resolved Hide resolved
The patch is based on a reverted patch: https://reviews.llvm.org/D103597.
It was trying to rename registers before alias check, which is not safe
and causes miscompiles. This patch does 2 things:

1. Do the renaming with necessary checks passed, including alias check.
2. Rename the register for the instructions between the pairs and
   combine the second load into the first. By doing so we can just check
   the renamability between the pairs and avoid scanning unknown amount
   of instructions before/after the pairs.

Necessary refactoring has been made in order to reuse as much code as
possible with STR renaming.
@nocchijiang nocchijiang force-pushed the aarch64_ldst_rename_ldr branch from b99641a to 1dca0af Compare November 23, 2023 08:06
@nocchijiang
Copy link
Contributor Author

Rebased, squashed all commits and addressed comments.

@davemgreen Since I don't have commit access yet, could you please merge the PR for me? Before pushing the update, I have tested the patch by compiling Chromium using the bootstrapped Clang, passing all the unit tests from Chromium on an Apple Silicon Mac.

@davemgreen
Copy link
Collaborator

Thanks. Sounds good.

@davemgreen davemgreen merged commit 147c5d6 into llvm:main Nov 23, 2023
2 of 3 checks passed
davemgreen added a commit that referenced this pull request Nov 29, 2023
This includes a couple of fixes after #71908 for bundles and some cleanup for
the debug output. One was an iterator type that asserted on bundles, the second
a rather subtle issue where forAllMIsUntilDef would hit the LdStLimit when
renaming registers, meaning the last instruction was not updated leaving an
invalid `ldp x6, x6` instruction.
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 10, 2024
Target hook `canPairLdStOpc` is missing quite a few opcodes for which
LDPs/STPs can created. I was hoping that it would not be necessary to
add these missing opcodes here and that the attached motivating test
case would be handled by the LoadStoreOptimiser (especially after
 llvm#71908), but it's not. The problem is that after register allocation
some things are a lot harder to do. Consider this motivating example

[1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0)
[2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4)
[3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr
[4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0)
[5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0)

We can't combine the the load in line [5] into the load on [1]: regisister q1 is
used in between. And we can can't combine [1] into [5]: it is aliasing with
the STR on line [4].

That's why I thought that adding this opcode pair here to the MI scheduler was
a good compromise.
sjoerdmeijer added a commit that referenced this pull request Jan 11, 2024
Target hook `canPairLdStOpc` is missing quite a few opcodes for which
LDPs/STPs can created. I was hoping that it would not be necessary to
add these missing opcodes here and that the attached motivating test
case would be handled by the LoadStoreOptimiser (especially after
#71908), but it's not. The problem is that after register allocation
some things are a lot harder to do. Consider this for the motivating
example

```
[1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0)
[2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4)
[3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr
[4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0)
[5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0)
```
We can't combine the the load in line [5] into the load on [1]:
regisister q1 is used in between. And we can can't combine [1] into 
[5]: it is aliasing with the STR on line [4].

So, adding some missing opcodes here seems the best/easiest approach.
I will follow up to add some more missing cases here.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Target hook `canPairLdStOpc` is missing quite a few opcodes for which
LDPs/STPs can created. I was hoping that it would not be necessary to
add these missing opcodes here and that the attached motivating test
case would be handled by the LoadStoreOptimiser (especially after
llvm#71908), but it's not. The problem is that after register allocation
some things are a lot harder to do. Consider this for the motivating
example

```
[1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0)
[2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4)
[3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr
[4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0)
[5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0)
```
We can't combine the the load in line [5] into the load on [1]:
regisister q1 is used in between. And we can can't combine [1] into 
[5]: it is aliasing with the STR on line [4].

So, adding some missing opcodes here seems the best/easiest approach.
I will follow up to add some more missing cases here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants