[AArch64] Define high bits of FPR and GPR registers (take 2) #114827

sdesmalen-arm · 2024-11-04T16:31:32Z

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.
LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.
TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here llvm#109797. There are several changes in this patch: * AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests. The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests. * LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples. * TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

llvmbot · 2024-11-04T16:32:09Z

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-tools-llvm-mca

@llvm/pr-subscribers-llvm-globalisel

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.
LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.
TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)

Patch is 134.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114827.diff

21 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+56-2)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
(modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
(modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
(modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
(modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
(modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
(modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
(modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
(modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
(modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
(added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+152)
(modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)

diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..a885ce218f6eb2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,57 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static const MCPhysReg ReservedHi[] = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI};
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +541,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +568,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>, DwarfRegAl...
[truncated]

llvmbot · 2024-11-04T16:32:09Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.
LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.
TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)

Patch is 134.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114827.diff

21 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+56-2)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
(modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
(modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
(modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
(modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
(modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
(modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
(modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
(modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
(modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
(added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+152)
(modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)

diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..a885ce218f6eb2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,57 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static const MCPhysReg ReservedHi[] = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI};
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +541,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +568,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>, DwarfRegAl...
[truncated]

arsenm · 2024-11-04T23:02:16Z

llvm/lib/Target/AArch64/AArch64RegisterInfo.td

-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;


Comment the artificial cases? Could we use an explicit marker for these?

I've added a comment to describe the artificial cases. I'm not sure what you mean with an explicit marker though?

I mean an isArtificial bit on SubRegIndex

isArtificial is a property of a Register or a RegisterClass, but not a SubRegIndex.

sdesmalen-arm · 2024-11-05T11:33:39Z

@Rin18 would you be happy to have a look at the V1-clear-upper-regs.s test to see if there's any issues with the generated output? I'm not sure if there is a hidden bug in there (it seems to iterate through all subregs, including artificial ones, which may affect the output)

Rin18 · 2024-11-06T17:39:53Z

@Rin18 would you be happy to have a look at the V1-clear-upper-regs.s test to see if there's any issues with the generated output? I'm not sure if there is a hidden bug in there (it seems to iterate through all subregs, including artificial ones, which may affect the output)

I took a look and compared the MCA output with and without your patch. The first thing that jumped to my attention is that your patch adds dependencies on super registers. If you run the V1-clear-upper-regs.s test with -debug and look at the output for the GPR32-bit test, you'll see a lot of these lines:

[PRF] Found a dependent use of Register X0 (defined by instruction #

Similarly for the other super registers in the remaining tests in that file.

I've not been able to find the exact cause. I think this has something to do with the addition of *_HI registers and the fact that super registers are now described as *sub + *sub_hi.

sdesmalen-arm · 2024-11-08T13:39:20Z

I'm not entirely sure how llvm-mca works, but I think the issue is that the code adds dependences for all sub-registers of a register, but when clearsSuperRegisters evalutes to true it clears dependences for super-registers without considering any of the sibling _hi registers. For example; a write to w0 also clears a dependence on x0, but the dependence on artificial register w0_hi should also be cleared. It seems easiest not to consider artificial registers in the computation to start with.

Rin18 · 2024-11-08T13:58:18Z

but I think the issue is that the code adds dependences for all sub-registers of a register, but when clearsSuperRegisters evalutes to true it clears dependences for super-registers without considering any of the sibling _hi registers

That sounds right. Before your patch, there were no _hi registers to consider when updating dependencies. Now that artificial registers are added it makes sense to skip them. Looks like that fixed the dependencies created in the MCA *clear-upper-regs.s tests, thanks!

davemgreen

The general register changes sound to me.

davemgreen · 2024-11-11T10:15:55Z

llvm/include/llvm/MC/MCRegisterInfo.h

@@ -396,6 +399,9 @@ class MCRegisterInfo {
  /// Returns true if the given register is constant.
  bool isConstant(MCRegister RegNo) const { return get(RegNo).IsConstant; }

+  /// Returns true if the given register is artificial.


Is it worth explaining what Artificial means more precisely?

davemgreen · 2024-11-11T10:23:17Z

llvm/test/CodeGen/AArch64/misched-bundle.mir

 # CHECK-NEXT:   # rdefs left       : 0
 # CHECK-NEXT:   Latency            : 3
 # CHECK-NEXT:   Depth              : 0
 # CHECK-NEXT:   Height             : 7
 # CHECK-NEXT:   Successors:
 # CHECK-NEXT:     SU(7): Out  Latency=1
+# CHECK-NEXT:     SU(7): Out  Latency=1


Does this happen just for bundles or for other instructions? I was wondering if it was worth filtering out Artificial dependencies as they should be redundant, I believe? It might not matter too much if it is only bundles, as it doesn't seem to be doing any harm and they come up less often so would have less of a compile-time overhead to have so many deps.

It's not specific to BUNDLEs, but it is a little specific to how this test was written:

BUNDLE implicit-def $z1, implicit-def $q1, implicit-def $d1, implicit-def $s1, implicit-def $h1, implicit-def $b1, implicit $z5, implicit $p0, implicit killed $z4, implicit killed $z3 {

the instruction unnecessarily specifies all sub-registers of z1 as well as implicit-defs; b1, h1, s1, .. q1, even though implicit-def $z1 would have been sufficient. When I remove the implicit-def for the sub-registers, the scheduler no longer creates the extra out-dependences.

I can see that the scheduler goes through all register units for a given register to create dependences. Before this change there was only a single RegUnit for all registers z1, q1, s1, .. b1, whereas now there are 6 separate regunits, hence the extra dependences.

I'd need to dig a bit further to understand why the scheduler adds these when the implicit-def operands are present, although this isn't a common case for regular instructions (it may be more common for BUNDLE's?) so don't think this has much of an effect in practice. If you're satisfied with that answer, I'd prefer to move forward with the patch as-is, since I've got a few more follow-up patches to fix up other things. I'll look further into it anyway, just in case there's something that needs fixing.

github-actions · 2024-11-12T09:11:08Z

✅ With the latest revision this PR passed the C/C++ code formatter.

davemgreen

Thanks. If there are no objections to the name "Artificial", then LGTM.

sdesmalen-arm · 2024-11-13T15:01:56Z

Thanks. If there are no objections to the name "Artificial", then LGTM.

Thanks @davemgreen! The name Artificial is already established in TableGen and include/llvm/Target/Target.td, I've just reused the name here.

llvm-ci · 2024-11-14T09:45:40Z

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/14632