Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Define high bits of FPR and GPR registers. #114263

Conversation

sdesmalen-arm
Copy link
Collaborator

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-tablegen

@llvm/pr-subscribers-llvm-globalisel

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.


Patch is 137.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114263.diff

23 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+57-2)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
  • (modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
  • (added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+148)
  • (modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)
  • (modified) llvm/utils/TableGen/Common/CodeGenRegisters.cpp (+8-8)
  • (modified) llvm/utils/TableGen/RegisterInfoEmitter.cpp (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..1ef75c6e02e8c7 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static SmallVector<MCPhysReg> ReservedHi = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI
+    };
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +542,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +569,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>,...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.


Patch is 137.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114263.diff

23 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+57-2)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
  • (modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
  • (added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+148)
  • (modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)
  • (modified) llvm/utils/TableGen/Common/CodeGenRegisters.cpp (+8-8)
  • (modified) llvm/utils/TableGen/RegisterInfoEmitter.cpp (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..1ef75c6e02e8c7 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static SmallVector<MCPhysReg> ReservedHi = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI
+    };
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +542,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +569,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>,...
[truncated]

Copy link

github-actions bot commented Oct 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to split out the tablegen changes

@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
return {};
}

static SmallVector<MCPhysReg> ReservedHi = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this std::array or a regular C array. Alternatively could introduce a new register class for the synthetic cases.

It might also be possible to get away without explicitly marking these as reserved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively could introduce a new register class for the synthetic cases.

That's actually something I had to do (see here), as the compiler otherwise runs into some assertion requiring the (artificial) registers to have a corresponding regclass.

It might also be possible to get away without explicitly marking these as reserved

When I remove these from the list of reserved registers, a lot of the tests fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of failures? AMDGPU also has synthetic 16-bit high sub registers and they are not explicitly reserved. Are you adding these to an allocatable class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failures in LiveRangeCalc. This happens because in AArch64RegisterInfo::getStrictlyReservedRegs it marks e.g. 32-bit WSP and all of it's super-registers (in this case 64-bit SP) as reserved. WSP_HI is a sibling register of WSP but should also be marked as reserved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what are the actual failures, messages, location? If the high half of register isn't allocatable / addressable in the first place, it shouldn't just appear to cause issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without marking the registers as reserved, then for the example below:

---
name:            sv2i64
tracksRegLiveness: true
body:             |
  bb.0.entry:
    liveins: $q0, $q1

    %0:fpr128 = COPY $q0
    %1:fpr128 = COPY $q1
    %35:gpr64 = COPY %0.dsub
    %36:gpr64 = COPY %1.dsub
    %9:gpr64 = SDIVXr %35, %36
    %37:gpr64 = UMOVvi64 %0, 1
    %38:gpr64 = UMOVvi64 %1, 1
    %10:gpr64 = SDIVXr %37, %38
    %19:fpr128 = INSvi64gpr undef %19, 0, %9
    %19:fpr128 = INSvi64gpr %19, 1, %10
    %39:gpr64 = COPY %19.dsub
    %24:gpr64 = MADDXrrr %39, %36, $xzr
    %41:gpr64 = UMOVvi64 %19, 1
    %25:gpr64 = MADDXrrr %41, %38, $xzr
    %34:fpr128 = INSvi64gpr undef %34, 0, %24
    %34:fpr128 = INSvi64gpr %34, 1, %25
    %2:fpr128 = SUBv2i64 %0, %34
    $q0 = COPY %2
    RET_ReallyLR implicit $q0
...

When I run this with:

llc -global-isel -verify-machineinstrs -run-pass=machine-scheduler

It fails with:

Use of $xzr does not have a corresponding definition on every path:
216r %10:gpr64 = MADDXrrr %9:gpr64, %3:gpr64, $xzr
LLVM ERROR: Use not jointly dominated by defs.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./bin/llc -global-isel -verify-machineinstrs -run-pass=machine-scheduler /tmp/t.mir -o -
1.      Running pass 'Function Pass Manager' on module '/tmp/t.mir'.
2.      Running pass 'Machine Instruction Scheduler' on function '@sv2i64'
 ...
 #8 0x0000ffff80062b7c llvm::LiveRangeCalc::findReachingDefs(llvm::LiveRange&, llvm::MachineBasicBlock&, llvm::SlotIndex, unsigned int, llvm::ArrayRef<llvm::SlotIndex>)
 #9 0x0000ffff80063e94 llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, unsigned int, llvm::ArrayRef<llvm::SlotIndex>)
#10 0x0000ffff80064a18 llvm::LiveIntervalCalc::extendToUses(llvm::LiveRange&, llvm::Register, llvm::LaneBitmask, llvm::LiveInterval*)
#11 0x0000ffff8003e82c llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)
#12 0x0000ffff80044cdc llvm::LiveIntervals::HMEditor::updateAllRanges(llvm::MachineInstr*)
#13 0x0000ffff8004848c llvm::LiveIntervals::handleMove(llvm::MachineInstr&, bool)
#14 0x0000ffff801f44ec llvm::ScheduleDAGMI::moveInstruction(llvm::MachineInstr*, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>)
#15 0x0000ffff801fdb58 llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit*, bool)
#16 0x0000ffff8020b214 llvm::ScheduleDAGMILive::schedule()
#17 0x0000ffff801f0934 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) (.isra.0) MachineScheduler.cpp:0:0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells like an unrelated bug, this is not the kind of error I expected

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a bug; the code for moving an instruction goes through the list of operands to update the register's liverange. For each physreg it then goes through the regunits to calculate/update the liverange for that regunit, but only if the regunit is not reserved.

The code that determines if the register is reserved says:

// A register unit is considered reserved if all its roots and all their
// super registers are reserved.

Without this change to AArch64RegisterInfo.cpp, WZR and XZR are marked as reserved, but WZR_HI isn't (because WZR_HI is a sibling of WZR, and markSuperRegs marks only XZR as reserved), and so IsReserved is false for the WZR_HI regunit.

Why this doesn't fail for AMDGPU I don't know, perhaps these registers are always virtual and they never go down this path.


// Test that there is no overlap between different (sub)registers
// in a tuple.
ASSERT_EQ(TRI.getSubRegIndexLaneMask(AArch64::dsub0) &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_EQ throughout

When CoveredBySubRegs is true and a sub-register consists of two
parts; a regular subreg and an artificial subreg, then TableGen
should consider both as a concatenation of subregs.

This happens for example when a 64-bit register 'D0' consists of
32-bit 'S0_HI' (artificial) and 'S0', and 'S0' consists of (16-bit)
'H0_HI' (artificial) and 'H0'. Then the concatenation should be:
S0_HI, H0_HI, H0.
TableGen builds up a map of "SubRegIdx -> Subclass" where Subclass is
the largest class where all registers have SubRegIdx as a
sub-register. When SubRegIdx (vis-a-vis the sub-register) is
artificial it should still include it in the map. This map is used in
various places, including in the calculation of the Lanemask of a
register class, which otherwise calculates an incorrect lanemask.
This is a step towards enabling subreg liveness tracking for AArch64, which
requires that registers are fully covered by their subregisters, as
covered here llvm#109797.

There are several changes in this patch:

* AArch64RegisterInfo.td and tests: Define the high bits like B0_HI,
  H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some
  register class, this added a register class which meant that we had
  to update 'magic numbers' in several tests.

  The use of ComposedSubRegIndex helped 'compress' the number
  of bits required for the lanemask. The correctness of the masks
  is tested by an explicit unit tests.

* LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for
  register tuples, but with this change to describe the high bits, a
  register like 'D0' will also have 'HasDisjunctSubRegs' set to true
  (because it's fullly covered by S0 and S0_HI). The fix here is to
  explicitly test if the register class is one of the known D/Q/Z
  tuples.

* TableGen: The handling of the isArtificial flag was entirely broken.
  Skipping out too early from some of the loops led to incorrect
  internal representation of the (sub)register(index) hierarchy, and
  thus resulted in incorrect TableGen info.
@sdesmalen-arm
Copy link
Collaborator Author

Is it possible to split out the tablegen changes

Sure, I've created #114391 and #114392!

@sdesmalen-arm sdesmalen-arm changed the base branch from main to users/sdesmalen-arm/srlt-fix-tablegen-artificial-subreg-map October 31, 2024 11:51
@@ -424,6 +424,57 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
return {};
}

static MCPhysReg ReservedHi[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing const

@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
return {};
}

static SmallVector<MCPhysReg> ReservedHi = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of failures? AMDGPU also has synthetic 16-bit high sub registers and they are not explicitly reserved. Are you adding these to an allocatable class?

@sdesmalen-arm sdesmalen-arm force-pushed the users/sdesmalen-arm/srlt-fix-tablegen-artificial-subreg-map branch 2 times, most recently from 303e1c8 to 371287e Compare November 4, 2024 15:54
@sdesmalen-arm sdesmalen-arm deleted the branch llvm:users/sdesmalen-arm/srlt-fix-tablegen-artificial-subreg-map November 4, 2024 16:10
@arsenm
Copy link
Contributor

arsenm commented Nov 4, 2024

Was this reopened as a new PR?

@sdesmalen-arm
Copy link
Collaborator Author

It wasn't, but I also didn't realise that I closed it. Could Github have done this automatically after the branch it was based of was deleted? (I was about to push the rebased branch of this PR after merging #114391 and #114392)

@sdesmalen-arm
Copy link
Collaborator Author

Trying to reopen..

@sdesmalen-arm
Copy link
Collaborator Author

I think I need to create a new PR for this as Github doesn't allow me to reopen and choose a different branch to merge into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants