Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Define high bits of FPR and GPR registers (take 2) #114827

Merged
merged 7 commits into from
Nov 14, 2024

Conversation

sdesmalen-arm
Copy link
Collaborator

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)

This is a step towards enabling subreg liveness tracking for AArch64, which
requires that registers are fully covered by their subregisters, as
covered here llvm#109797.

There are several changes in this patch:

* AArch64RegisterInfo.td and tests: Define the high bits like B0_HI,
  H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some
  register class, this added a register class which meant that we had
  to update 'magic numbers' in several tests.

  The use of ComposedSubRegIndex helped 'compress' the number
  of bits required for the lanemask. The correctness of the masks
  is tested by an explicit unit tests.

* LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for
  register tuples, but with this change to describe the high bits, a
  register like 'D0' will also have 'HasDisjunctSubRegs' set to true
  (because it's fullly covered by S0 and S0_HI). The fix here is to
  explicitly test if the register class is one of the known D/Q/Z
  tuples.

* TableGen: The handling of the isArtificial flag was entirely broken.
  Skipping out too early from some of the loops led to incorrect
  internal representation of the (sub)register(index) hierarchy, and
  thus resulted in incorrect TableGen info.
@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2024

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-tools-llvm-mca

@llvm/pr-subscribers-llvm-globalisel

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)


Patch is 134.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114827.diff

21 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+56-2)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
  • (modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
  • (added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+152)
  • (modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..a885ce218f6eb2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,57 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static const MCPhysReg ReservedHi[] = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI};
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +541,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +568,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>, DwarfRegAl...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

This is a step towards enabling subreg liveness tracking for AArch64, which requires that registers are fully covered by their subregisters, as covered here #109797.

There are several changes in this patch:

  • AArch64RegisterInfo.td and tests: Define the high bits like B0_HI, H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some register class, this added a register class which meant that we had to update 'magic numbers' in several tests.

    The use of ComposedSubRegIndex helped 'compress' the number of bits required for the lanemask. The correctness of the masks is tested by an explicit unit tests.

  • LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for register tuples, but with this change to describe the high bits, a register like 'D0' will also have 'HasDisjunctSubRegs' set to true (because it's fullly covered by S0 and S0_HI). The fix here is to explicitly test if the register class is one of the known D/Q/Z tuples.

  • TableGen: The handling of the isArtificial flag was entirely broken. Skipping out too early from some of the loops led to incorrect internal representation of the (sub)register(index) hierarchy, and thus resulted in incorrect TableGen info.

(This follows on from #114263 which was mysteriously closed.)


Patch is 134.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114827.diff

21 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (+4-1)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+56-2)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.td (+263-234)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-sve-asm.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/blr-bti-preserves-operands.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/emit_fneg_with_non_register_operand.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/expand-blr-rvmarker-pseudo.mir (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/machine-outliner-calls.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/misched-bundle.mir (+23-3)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-01.mir (+12)
  • (modified) llvm/test/CodeGen/AArch64/misched-detail-resource-booking-02.mir (+4-1)
  • (modified) llvm/test/CodeGen/AArch64/peephole-insvigpr.mir (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/preserve.ll (+3-4)
  • (modified) llvm/test/CodeGen/AArch64/strpre-str-merge.mir (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryComm-merging.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-intrinsics-int-binaryCommWithRev-merging.mir (+1-1)
  • (added) llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp (+152)
  • (modified) llvm/unittests/Target/AArch64/CMakeLists.txt (+1)
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
index 1a9e5899892a1b..b051122609883f 100644
--- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
@@ -1541,7 +1541,10 @@ static bool canRenameMOP(const MachineOperand &MOP,
     // Note that this relies on the structure of the AArch64 register file. In
     // particular, a subregister cannot be written without overwriting the
     // whole register.
-    if (RegClass->HasDisjunctSubRegs) {
+    if (RegClass->HasDisjunctSubRegs && RegClass->CoveredBySubRegs &&
+        (TRI->getSubRegisterClass(RegClass, AArch64::dsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::qsub0) ||
+         TRI->getSubRegisterClass(RegClass, AArch64::zsub0))) {
       LLVM_DEBUG(
           dbgs()
           << "  Cannot rename operands with multiple disjunct subregisters ("
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 18290dd5f32df9..a885ce218f6eb2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -424,6 +424,57 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF,
   return {};
 }
 
+static const MCPhysReg ReservedHi[] = {
+    AArch64::B0_HI,  AArch64::B1_HI,  AArch64::B2_HI,  AArch64::B3_HI,
+    AArch64::B4_HI,  AArch64::B5_HI,  AArch64::B6_HI,  AArch64::B7_HI,
+    AArch64::B8_HI,  AArch64::B9_HI,  AArch64::B10_HI, AArch64::B11_HI,
+    AArch64::B12_HI, AArch64::B13_HI, AArch64::B14_HI, AArch64::B15_HI,
+    AArch64::B16_HI, AArch64::B17_HI, AArch64::B18_HI, AArch64::B19_HI,
+    AArch64::B20_HI, AArch64::B21_HI, AArch64::B22_HI, AArch64::B23_HI,
+    AArch64::B24_HI, AArch64::B25_HI, AArch64::B26_HI, AArch64::B27_HI,
+    AArch64::B28_HI, AArch64::B29_HI, AArch64::B30_HI, AArch64::B31_HI,
+    AArch64::H0_HI,  AArch64::H1_HI,  AArch64::H2_HI,  AArch64::H3_HI,
+    AArch64::H4_HI,  AArch64::H5_HI,  AArch64::H6_HI,  AArch64::H7_HI,
+    AArch64::H8_HI,  AArch64::H9_HI,  AArch64::H10_HI, AArch64::H11_HI,
+    AArch64::H12_HI, AArch64::H13_HI, AArch64::H14_HI, AArch64::H15_HI,
+    AArch64::H16_HI, AArch64::H17_HI, AArch64::H18_HI, AArch64::H19_HI,
+    AArch64::H20_HI, AArch64::H21_HI, AArch64::H22_HI, AArch64::H23_HI,
+    AArch64::H24_HI, AArch64::H25_HI, AArch64::H26_HI, AArch64::H27_HI,
+    AArch64::H28_HI, AArch64::H29_HI, AArch64::H30_HI, AArch64::H31_HI,
+    AArch64::S0_HI,  AArch64::S1_HI,  AArch64::S2_HI,  AArch64::S3_HI,
+    AArch64::S4_HI,  AArch64::S5_HI,  AArch64::S6_HI,  AArch64::S7_HI,
+    AArch64::S8_HI,  AArch64::S9_HI,  AArch64::S10_HI, AArch64::S11_HI,
+    AArch64::S12_HI, AArch64::S13_HI, AArch64::S14_HI, AArch64::S15_HI,
+    AArch64::S16_HI, AArch64::S17_HI, AArch64::S18_HI, AArch64::S19_HI,
+    AArch64::S20_HI, AArch64::S21_HI, AArch64::S22_HI, AArch64::S23_HI,
+    AArch64::S24_HI, AArch64::S25_HI, AArch64::S26_HI, AArch64::S27_HI,
+    AArch64::S28_HI, AArch64::S29_HI, AArch64::S30_HI, AArch64::S31_HI,
+    AArch64::D0_HI,  AArch64::D1_HI,  AArch64::D2_HI,  AArch64::D3_HI,
+    AArch64::D4_HI,  AArch64::D5_HI,  AArch64::D6_HI,  AArch64::D7_HI,
+    AArch64::D8_HI,  AArch64::D9_HI,  AArch64::D10_HI, AArch64::D11_HI,
+    AArch64::D12_HI, AArch64::D13_HI, AArch64::D14_HI, AArch64::D15_HI,
+    AArch64::D16_HI, AArch64::D17_HI, AArch64::D18_HI, AArch64::D19_HI,
+    AArch64::D20_HI, AArch64::D21_HI, AArch64::D22_HI, AArch64::D23_HI,
+    AArch64::D24_HI, AArch64::D25_HI, AArch64::D26_HI, AArch64::D27_HI,
+    AArch64::D28_HI, AArch64::D29_HI, AArch64::D30_HI, AArch64::D31_HI,
+    AArch64::Q0_HI,  AArch64::Q1_HI,  AArch64::Q2_HI,  AArch64::Q3_HI,
+    AArch64::Q4_HI,  AArch64::Q5_HI,  AArch64::Q6_HI,  AArch64::Q7_HI,
+    AArch64::Q8_HI,  AArch64::Q9_HI,  AArch64::Q10_HI, AArch64::Q11_HI,
+    AArch64::Q12_HI, AArch64::Q13_HI, AArch64::Q14_HI, AArch64::Q15_HI,
+    AArch64::Q16_HI, AArch64::Q17_HI, AArch64::Q18_HI, AArch64::Q19_HI,
+    AArch64::Q20_HI, AArch64::Q21_HI, AArch64::Q22_HI, AArch64::Q23_HI,
+    AArch64::Q24_HI, AArch64::Q25_HI, AArch64::Q26_HI, AArch64::Q27_HI,
+    AArch64::Q28_HI, AArch64::Q29_HI, AArch64::Q30_HI, AArch64::Q31_HI,
+    AArch64::W0_HI,  AArch64::W1_HI,  AArch64::W2_HI,  AArch64::W3_HI,
+    AArch64::W4_HI,  AArch64::W5_HI,  AArch64::W6_HI,  AArch64::W7_HI,
+    AArch64::W8_HI,  AArch64::W9_HI,  AArch64::W10_HI, AArch64::W11_HI,
+    AArch64::W12_HI, AArch64::W13_HI, AArch64::W14_HI, AArch64::W15_HI,
+    AArch64::W16_HI, AArch64::W17_HI, AArch64::W18_HI, AArch64::W19_HI,
+    AArch64::W20_HI, AArch64::W21_HI, AArch64::W22_HI, AArch64::W23_HI,
+    AArch64::W24_HI, AArch64::W25_HI, AArch64::W26_HI, AArch64::W27_HI,
+    AArch64::W28_HI, AArch64::W29_HI, AArch64::W30_HI, AArch64::WSP_HI,
+    AArch64::WZR_HI};
+
 BitVector
 AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
   const AArch64FrameLowering *TFI = getFrameLowering(MF);
@@ -490,7 +541,10 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
     markSuperRegs(Reserved, AArch64::W28);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  for (Register R : ReservedHi)
+    Reserved.set(R);
+
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
@@ -514,7 +568,7 @@ AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
       markSuperRegs(Reserved, AArch64::LR);
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  assert(checkAllSuperRegsMarked(Reserved, ReservedHi));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
index 4117d74d10c1e7..5f7f6aa7a7bf99 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -20,33 +20,39 @@ class AArch64Reg<bits<16> enc, string n, list<Register> subregs = [],
 
 let Namespace = "AArch64" in {
   // SubRegIndexes for GPR registers
-  def sub_32 : SubRegIndex<32>;
-  def sube64 : SubRegIndex<64>;
-  def subo64 : SubRegIndex<64>;
-  def sube32 : SubRegIndex<32>;
-  def subo32 : SubRegIndex<32>;
+  def sub_32   : SubRegIndex<32,  0>;
+  def sub_32_hi: SubRegIndex<32, 32>;
+  def sube64   : SubRegIndex<64>;
+  def subo64   : SubRegIndex<64>;
+  def sube32   : SubRegIndex<32>;
+  def subo32   : SubRegIndex<32>;
 
   // SubRegIndexes for FPR/Vector registers
-  def bsub : SubRegIndex<8>;
-  def hsub : SubRegIndex<16>;
-  def ssub : SubRegIndex<32>;
-  def dsub : SubRegIndex<64>;
-  def zsub : SubRegIndex<128>;
+  def bsub    : SubRegIndex<8, 0>;
+  def bsub_hi : SubRegIndex<8, 8>;
+  def hsub    : SubRegIndex<16, 0>;
+  def hsub_hi : SubRegIndex<16, 16>;
+  def ssub    : SubRegIndex<32, 0>;
+  def ssub_hi : SubRegIndex<32, 32>;
+  def dsub    : SubRegIndex<64, 0>;
+  def dsub_hi : SubRegIndex<64, 64>;
+  def zsub    : SubRegIndex<128, 0>;
+  def zsub_hi : SubRegIndex<-1, 128>;
   // Note: Code depends on these having consecutive numbers
-  def zsub0 : SubRegIndex<128, -1>;
-  def zsub1 : SubRegIndex<128, -1>;
-  def zsub2 : SubRegIndex<128, -1>;
-  def zsub3 : SubRegIndex<128, -1>;
-  // Note: Code depends on these having consecutive numbers
-  def dsub0 : SubRegIndex<64>;
-  def dsub1 : SubRegIndex<64>;
-  def dsub2 : SubRegIndex<64>;
-  def dsub3 : SubRegIndex<64>;
+  def zsub0 : SubRegIndex<-1>;
+  def zsub1 : SubRegIndex<-1>;
+  def zsub2 : SubRegIndex<-1>;
+  def zsub3 : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
   def qsub0 : SubRegIndex<128>;
-  def qsub1 : SubRegIndex<128>;
-  def qsub2 : SubRegIndex<128>;
-  def qsub3 : SubRegIndex<128>;
+  def qsub1 : ComposedSubRegIndex<zsub1, zsub>;
+  def qsub2 : ComposedSubRegIndex<zsub2, zsub>;
+  def qsub3 : ComposedSubRegIndex<zsub3, zsub>;
+  // Note: Code depends on these having consecutive numbers
+  def dsub0 : SubRegIndex<64>;
+  def dsub1 : ComposedSubRegIndex<qsub1, dsub>;
+  def dsub2 : ComposedSubRegIndex<qsub2, dsub>;
+  def dsub3 : ComposedSubRegIndex<qsub3, dsub>;
 
   // SubRegIndexes for SME Matrix tiles
   def zasubb  : SubRegIndex<2048>; // (16 x 16)/1 bytes  = 2048 bits
@@ -60,10 +66,10 @@ let Namespace = "AArch64" in {
   def zasubq1 : SubRegIndex<128>;  // (16 x 16)/16 bytes = 128 bits
 
   // SubRegIndexes for SVE Predicates
-  def psub  : SubRegIndex<16>;
+  def psub  : SubRegIndex<-1>;
   // Note: Code depends on these having consecutive numbers
-  def psub0 : SubRegIndex<16, -1>;
-  def psub1 : SubRegIndex<16, -1>;
+  def psub0 : SubRegIndex<-1>;
+  def psub1 : SubRegIndex<-1>;
 }
 
 let Namespace = "AArch64" in {
@@ -74,6 +80,14 @@ let Namespace = "AArch64" in {
 //===----------------------------------------------------------------------===//
 // Registers
 //===----------------------------------------------------------------------===//
+
+foreach i = 0-30 in {
+  // Define W0_HI, W1_HI, .. W30_HI
+  def W#i#_HI : AArch64Reg<-1,  "w"#i#"_hi"> { let isArtificial = 1; }
+}
+def WSP_HI : AArch64Reg<-1,  "wsp_hi"> { let isArtificial = 1; }
+def WZR_HI : AArch64Reg<-1,  "wzr_hi"> { let isArtificial = 1; }
+
 def W0    : AArch64Reg<0,   "w0" >, DwarfRegNum<[0]>;
 def W1    : AArch64Reg<1,   "w1" >, DwarfRegNum<[1]>;
 def W2    : AArch64Reg<2,   "w2" >, DwarfRegNum<[2]>;
@@ -106,44 +120,42 @@ def W28   : AArch64Reg<28, "w28">, DwarfRegNum<[28]>;
 def W29   : AArch64Reg<29, "w29">, DwarfRegNum<[29]>;
 def W30   : AArch64Reg<30, "w30">, DwarfRegNum<[30]>;
 def WSP   : AArch64Reg<31, "wsp">, DwarfRegNum<[31]>;
-let isConstant = true in
-def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP>;
-
-let SubRegIndices = [sub_32] in {
-def X0    : AArch64Reg<0,   "x0",  [W0]>, DwarfRegAlias<W0>;
-def X1    : AArch64Reg<1,   "x1",  [W1]>, DwarfRegAlias<W1>;
-def X2    : AArch64Reg<2,   "x2",  [W2]>, DwarfRegAlias<W2>;
-def X3    : AArch64Reg<3,   "x3",  [W3]>, DwarfRegAlias<W3>;
-def X4    : AArch64Reg<4,   "x4",  [W4]>, DwarfRegAlias<W4>;
-def X5    : AArch64Reg<5,   "x5",  [W5]>, DwarfRegAlias<W5>;
-def X6    : AArch64Reg<6,   "x6",  [W6]>, DwarfRegAlias<W6>;
-def X7    : AArch64Reg<7,   "x7",  [W7]>, DwarfRegAlias<W7>;
-def X8    : AArch64Reg<8,   "x8",  [W8]>, DwarfRegAlias<W8>;
-def X9    : AArch64Reg<9,   "x9",  [W9]>, DwarfRegAlias<W9>;
-def X10   : AArch64Reg<10, "x10", [W10]>, DwarfRegAlias<W10>;
-def X11   : AArch64Reg<11, "x11", [W11]>, DwarfRegAlias<W11>;
-def X12   : AArch64Reg<12, "x12", [W12]>, DwarfRegAlias<W12>;
-def X13   : AArch64Reg<13, "x13", [W13]>, DwarfRegAlias<W13>;
-def X14   : AArch64Reg<14, "x14", [W14]>, DwarfRegAlias<W14>;
-def X15   : AArch64Reg<15, "x15", [W15]>, DwarfRegAlias<W15>;
-def X16   : AArch64Reg<16, "x16", [W16]>, DwarfRegAlias<W16>;
-def X17   : AArch64Reg<17, "x17", [W17]>, DwarfRegAlias<W17>;
-def X18   : AArch64Reg<18, "x18", [W18]>, DwarfRegAlias<W18>;
-def X19   : AArch64Reg<19, "x19", [W19]>, DwarfRegAlias<W19>;
-def X20   : AArch64Reg<20, "x20", [W20]>, DwarfRegAlias<W20>;
-def X21   : AArch64Reg<21, "x21", [W21]>, DwarfRegAlias<W21>;
-def X22   : AArch64Reg<22, "x22", [W22]>, DwarfRegAlias<W22>;
-def X23   : AArch64Reg<23, "x23", [W23]>, DwarfRegAlias<W23>;
-def X24   : AArch64Reg<24, "x24", [W24]>, DwarfRegAlias<W24>;
-def X25   : AArch64Reg<25, "x25", [W25]>, DwarfRegAlias<W25>;
-def X26   : AArch64Reg<26, "x26", [W26]>, DwarfRegAlias<W26>;
-def X27   : AArch64Reg<27, "x27", [W27]>, DwarfRegAlias<W27>;
-def X28   : AArch64Reg<28, "x28", [W28]>, DwarfRegAlias<W28>;
-def FP    : AArch64Reg<29, "x29", [W29]>, DwarfRegAlias<W29>;
-def LR    : AArch64Reg<30, "x30", [W30]>, DwarfRegAlias<W30>;
-def SP    : AArch64Reg<31, "sp",  [WSP]>, DwarfRegAlias<WSP>;
-let isConstant = true in
-def XZR   : AArch64Reg<31, "xzr", [WZR]>, DwarfRegAlias<WSP>;
+def WZR   : AArch64Reg<31, "wzr">, DwarfRegAlias<WSP> { let isConstant = true; }
+
+let SubRegIndices = [sub_32, sub_32_hi], CoveredBySubRegs = 1 in {
+def X0    : AArch64Reg<0,   "x0",  [W0,  W0_HI]>, DwarfRegAlias<W0>;
+def X1    : AArch64Reg<1,   "x1",  [W1,  W1_HI]>, DwarfRegAlias<W1>;
+def X2    : AArch64Reg<2,   "x2",  [W2,  W2_HI]>, DwarfRegAlias<W2>;
+def X3    : AArch64Reg<3,   "x3",  [W3,  W3_HI]>, DwarfRegAlias<W3>;
+def X4    : AArch64Reg<4,   "x4",  [W4,  W4_HI]>, DwarfRegAlias<W4>;
+def X5    : AArch64Reg<5,   "x5",  [W5,  W5_HI]>, DwarfRegAlias<W5>;
+def X6    : AArch64Reg<6,   "x6",  [W6,  W6_HI]>, DwarfRegAlias<W6>;
+def X7    : AArch64Reg<7,   "x7",  [W7,  W7_HI]>, DwarfRegAlias<W7>;
+def X8    : AArch64Reg<8,   "x8",  [W8,  W8_HI]>, DwarfRegAlias<W8>;
+def X9    : AArch64Reg<9,   "x9",  [W9,  W9_HI]>, DwarfRegAlias<W9>;
+def X10   : AArch64Reg<10, "x10", [W10, W10_HI]>, DwarfRegAlias<W10>;
+def X11   : AArch64Reg<11, "x11", [W11, W11_HI]>, DwarfRegAlias<W11>;
+def X12   : AArch64Reg<12, "x12", [W12, W12_HI]>, DwarfRegAlias<W12>;
+def X13   : AArch64Reg<13, "x13", [W13, W13_HI]>, DwarfRegAlias<W13>;
+def X14   : AArch64Reg<14, "x14", [W14, W14_HI]>, DwarfRegAlias<W14>;
+def X15   : AArch64Reg<15, "x15", [W15, W15_HI]>, DwarfRegAlias<W15>;
+def X16   : AArch64Reg<16, "x16", [W16, W16_HI]>, DwarfRegAlias<W16>;
+def X17   : AArch64Reg<17, "x17", [W17, W17_HI]>, DwarfRegAlias<W17>;
+def X18   : AArch64Reg<18, "x18", [W18, W18_HI]>, DwarfRegAlias<W18>;
+def X19   : AArch64Reg<19, "x19", [W19, W19_HI]>, DwarfRegAlias<W19>;
+def X20   : AArch64Reg<20, "x20", [W20, W20_HI]>, DwarfRegAlias<W20>;
+def X21   : AArch64Reg<21, "x21", [W21, W21_HI]>, DwarfRegAlias<W21>;
+def X22   : AArch64Reg<22, "x22", [W22, W22_HI]>, DwarfRegAlias<W22>;
+def X23   : AArch64Reg<23, "x23", [W23, W23_HI]>, DwarfRegAlias<W23>;
+def X24   : AArch64Reg<24, "x24", [W24, W24_HI]>, DwarfRegAlias<W24>;
+def X25   : AArch64Reg<25, "x25", [W25, W25_HI]>, DwarfRegAlias<W25>;
+def X26   : AArch64Reg<26, "x26", [W26, W26_HI]>, DwarfRegAlias<W26>;
+def X27   : AArch64Reg<27, "x27", [W27, W27_HI]>, DwarfRegAlias<W27>;
+def X28   : AArch64Reg<28, "x28", [W28, W28_HI]>, DwarfRegAlias<W28>;
+def FP    : AArch64Reg<29, "x29", [W29, W29_HI]>, DwarfRegAlias<W29>;
+def LR    : AArch64Reg<30, "x30", [W30, W30_HI]>, DwarfRegAlias<W30>;
+def SP    : AArch64Reg<31, "sp",  [WSP, WSP_HI]>, DwarfRegAlias<WSP>;
+def XZR   : AArch64Reg<31, "xzr", [WZR, WZR_HI]>, DwarfRegAlias<WSP> { let isConstant = true; }
 }
 
 // Condition code register.
@@ -290,6 +302,14 @@ def CCR : RegisterClass<"AArch64", [i32], 32, (add NZCV)> {
 // Floating Point Scalar Registers
 //===----------------------------------------------------------------------===//
 
+foreach i = 0-31 in {
+  def B#i#_HI : AArch64Reg<-1,  "b"#i#"_hi"> { let isArtificial = 1; }
+  def H#i#_HI : AArch64Reg<-1,  "h"#i#"_hi"> { let isArtificial = 1; }
+  def S#i#_HI : AArch64Reg<-1,  "s"#i#"_hi"> { let isArtificial = 1; }
+  def D#i#_HI : AArch64Reg<-1,  "d"#i#"_hi"> { let isArtificial = 1; }
+  def Q#i#_HI : AArch64Reg<-1,  "q"#i#"_hi"> { let isArtificial = 1; }
+}
+
 def B0    : AArch64Reg<0,   "b0">, DwarfRegNum<[64]>;
 def B1    : AArch64Reg<1,   "b1">, DwarfRegNum<[65]>;
 def B2    : AArch64Reg<2,   "b2">, DwarfRegNum<[66]>;
@@ -323,144 +343,144 @@ def B29   : AArch64Reg<29, "b29">, DwarfRegNum<[93]>;
 def B30   : AArch64Reg<30, "b30">, DwarfRegNum<[94]>;
 def B31   : AArch64Reg<31, "b31">, DwarfRegNum<[95]>;
 
-let SubRegIndices = [bsub] in {
-def H0    : AArch64Reg<0,   "h0", [B0]>, DwarfRegAlias<B0>;
-def H1    : AArch64Reg<1,   "h1", [B1]>, DwarfRegAlias<B1>;
-def H2    : AArch64Reg<2,   "h2", [B2]>, DwarfRegAlias<B2>;
-def H3    : AArch64Reg<3,   "h3", [B3]>, DwarfRegAlias<B3>;
-def H4    : AArch64Reg<4,   "h4", [B4]>, DwarfRegAlias<B4>;
-def H5    : AArch64Reg<5,   "h5", [B5]>, DwarfRegAlias<B5>;
-def H6    : AArch64Reg<6,   "h6", [B6]>, DwarfRegAlias<B6>;
-def H7    : AArch64Reg<7,   "h7", [B7]>, DwarfRegAlias<B7>;
-def H8    : AArch64Reg<8,   "h8", [B8]>, DwarfRegAlias<B8>;
-def H9    : AArch64Reg<9,   "h9", [B9]>, DwarfRegAlias<B9>;
-def H10   : AArch64Reg<10, "h10", [B10]>, DwarfRegAlias<B10>;
-def H11   : AArch64Reg<11, "h11", [B11]>, DwarfRegAlias<B11>;
-def H12   : AArch64Reg<12, "h12", [B12]>, DwarfRegAlias<B12>;
-def H13   : AArch64Reg<13, "h13", [B13]>, DwarfRegAlias<B13>;
-def H14   : AArch64Reg<14, "h14", [B14]>, DwarfRegAlias<B14>;
-def H15   : AArch64Reg<15, "h15", [B15]>, DwarfRegAlias<B15>;
-def H16   : AArch64Reg<16, "h16", [B16]>, DwarfRegAlias<B16>;
-def H17   : AArch64Reg<17, "h17", [B17]>, DwarfRegAlias<B17>;
-def H18   : AArch64Reg<18, "h18", [B18]>, DwarfRegAlias<B18>;
-def H19   : AArch64Reg<19, "h19", [B19]>, DwarfRegAlias<B19>;
-def H20   : AArch64Reg<20, "h20", [B20]>, DwarfRegAlias<B20>;
-def H21   : AArch64Reg<21, "h21", [B21]>, DwarfRegAlias<B21>;
-def H22   : AArch64Reg<22, "h22", [B22]>, DwarfRegAlias<B22>;
-def H23   : AArch64Reg<23, "h23", [B23]>, DwarfRegAlias<B23>;
-def H24   : AArch64Reg<24, "h24", [B24]>, DwarfRegAlias<B24>;
-def H25   : AArch64Reg<25, "h25", [B25]>, DwarfRegAlias<B25>;
-def H26   : AArch64Reg<26, "h26", [B26]>, DwarfRegAlias<B26>;
-def H27   : AArch64Reg<27, "h27", [B27]>, DwarfRegAlias<B27>;
-def H28   : AArch64Reg<28, "h28", [B28]>, DwarfRegAlias<B28>;
-def H29   : AArch64Reg<29, "h29", [B29]>, DwarfRegAlias<B29>;
-def H30   : AArch64Reg<30, "h30", [B30]>, DwarfRegAlias<B30>;
-def H31   : AArch64Reg<31, "h31", [B31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [hsub] in {
-def S0    : AArch64Reg<0,   "s0", [H0]>, DwarfRegAlias<B0>;
-def S1    : AArch64Reg<1,   "s1", [H1]>, DwarfRegAlias<B1>;
-def S2    : AArch64Reg<2,   "s2", [H2]>, DwarfRegAlias<B2>;
-def S3    : AArch64Reg<3,   "s3", [H3]>, DwarfRegAlias<B3>;
-def S4    : AArch64Reg<4,   "s4", [H4]>, DwarfRegAlias<B4>;
-def S5    : AArch64Reg<5,   "s5", [H5]>, DwarfRegAlias<B5>;
-def S6    : AArch64Reg<6,   "s6", [H6]>, DwarfRegAlias<B6>;
-def S7    : AArch64Reg<7,   "s7", [H7]>, DwarfRegAlias<B7>;
-def S8    : AArch64Reg<8,   "s8", [H8]>, DwarfRegAlias<B8>;
-def S9    : AArch64Reg<9,   "s9", [H9]>, DwarfRegAlias<B9>;
-def S10   : AArch64Reg<10, "s10", [H10]>, DwarfRegAlias<B10>;
-def S11   : AArch64Reg<11, "s11", [H11]>, DwarfRegAlias<B11>;
-def S12   : AArch64Reg<12, "s12", [H12]>, DwarfRegAlias<B12>;
-def S13   : AArch64Reg<13, "s13", [H13]>, DwarfRegAlias<B13>;
-def S14   : AArch64Reg<14, "s14", [H14]>, DwarfRegAlias<B14>;
-def S15   : AArch64Reg<15, "s15", [H15]>, DwarfRegAlias<B15>;
-def S16   : AArch64Reg<16, "s16", [H16]>, DwarfRegAlias<B16>;
-def S17   : AArch64Reg<17, "s17", [H17]>, DwarfRegAlias<B17>;
-def S18   : AArch64Reg<18, "s18", [H18]>, DwarfRegAlias<B18>;
-def S19   : AArch64Reg<19, "s19", [H19]>, DwarfRegAlias<B19>;
-def S20   : AArch64Reg<20, "s20", [H20]>, DwarfRegAlias<B20>;
-def S21   : AArch64Reg<21, "s21", [H21]>, DwarfRegAlias<B21>;
-def S22   : AArch64Reg<22, "s22", [H22]>, DwarfRegAlias<B22>;
-def S23   : AArch64Reg<23, "s23", [H23]>, DwarfRegAlias<B23>;
-def S24   : AArch64Reg<24, "s24", [H24]>, DwarfRegAlias<B24>;
-def S25   : AArch64Reg<25, "s25", [H25]>, DwarfRegAlias<B25>;
-def S26   : AArch64Reg<26, "s26", [H26]>, DwarfRegAlias<B26>;
-def S27   : AArch64Reg<27, "s27", [H27]>, DwarfRegAlias<B27>;
-def S28   : AArch64Reg<28, "s28", [H28]>, DwarfRegAlias<B28>;
-def S29   : AArch64Reg<29, "s29", [H29]>, DwarfRegAlias<B29>;
-def S30   : AArch64Reg<30, "s30", [H30]>, DwarfRegAlias<B30>;
-def S31   : AArch64Reg<31, "s31", [H31]>, DwarfRegAlias<B31>;
-}
-
-let SubRegIndices = [ssub], RegAltNameIndices = [vreg, vlist1] in {
-def D0    : AArch64Reg<0,   "d0", [S0], ["v0", ""]>, DwarfRegAlias<B0>;
-def D1    : AArch64Reg<1,   "d1", [S1], ["v1", ""]>, DwarfRegAlias<B1>;
-def D2    : AArch64Reg<2,   "d2", [S2], ["v2", ""]>, DwarfRegAlias<B2>;
-def D3    : AArch64Reg<3,   "d3", [S3], ["v3", ""]>, DwarfRegAlias<B3>;
-def D4    : AArch64Reg<4,   "d4", [S4], ["v4", ""]>, DwarfRegAlias<B4>;
-def D5    : AArch64Reg<5,   "d5", [S5], ["v5", ""]>, DwarfRegAlias<B5>;
-def D6    : AArch64Reg<6,   "d6", [S6], ["v6", ""]>, DwarfRegAlias<B6>;
-def D7    : AArch64Reg<7,   "d7", [S7], ["v7", ""]>, DwarfRegAlias<B7>;
-def D8    : AArch64Reg<8,   "d8", [S8], ["v8", ""]>, DwarfRegAlias<B8>;
-def D9    : AArch64Reg<9,   "d9", [S9], ["v9", ""]>, DwarfRegAlias<B9>;
-def D10   : AArch64Reg<10, "d10", [S10], ["v10", ""]>, DwarfRegAlias<B10>;
-def D11   : AArch64Reg<11, "d11", [S11], ["v11", ""]>, DwarfRegAlias<B11>;
-def D12   : AArch64Reg<12, "d12", [S12], ["v12", ""]>, DwarfRegAlias<B12>;
-def D13   : AArch64Reg<13, "d13", [S13], ["v13", ""]>, DwarfRegAl...
[truncated]

def sube32 : SubRegIndex<32>;
def subo32 : SubRegIndex<32>;
def sub_32 : SubRegIndex<32, 0>;
def sub_32_hi: SubRegIndex<32, 32>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the artificial cases? Could we use an explicit marker for these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to describe the artificial cases. I'm not sure what you mean with an explicit marker though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean an isArtificial bit on SubRegIndex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isArtificial is a property of a Register or a RegisterClass, but not a SubRegIndex.

@sdesmalen-arm sdesmalen-arm requested a review from Rin18 November 5, 2024 11:31
@sdesmalen-arm
Copy link
Collaborator Author

@Rin18 would you be happy to have a look at the V1-clear-upper-regs.s test to see if there's any issues with the generated output? I'm not sure if there is a hidden bug in there (it seems to iterate through all subregs, including artificial ones, which may affect the output)

@Rin18
Copy link

Rin18 commented Nov 6, 2024

@Rin18 would you be happy to have a look at the V1-clear-upper-regs.s test to see if there's any issues with the generated output? I'm not sure if there is a hidden bug in there (it seems to iterate through all subregs, including artificial ones, which may affect the output)

I took a look and compared the MCA output with and without your patch. The first thing that jumped to my attention is that your patch adds dependencies on super registers. If you run the V1-clear-upper-regs.s test with -debug and look at the output for the GPR32-bit test, you'll see a lot of these lines:

[PRF] Found a dependent use of Register X0 (defined by instruction #

Similarly for the other super registers in the remaining tests in that file.

I've not been able to find the exact cause. I think this has something to do with the addition of *_HI registers and the fact that super registers are now described as *sub + *sub_hi.

@sdesmalen-arm
Copy link
Collaborator Author

I'm not entirely sure how llvm-mca works, but I think the issue is that the code adds dependences for all sub-registers of a register, but when clearsSuperRegisters evalutes to true it clears dependences for super-registers without considering any of the sibling _hi registers. For example; a write to w0 also clears a dependence on x0, but the dependence on artificial register w0_hi should also be cleared. It seems easiest not to consider artificial registers in the computation to start with.

@Rin18
Copy link

Rin18 commented Nov 8, 2024

but I think the issue is that the code adds dependences for all sub-registers of a register, but when clearsSuperRegisters evalutes to true it clears dependences for super-registers without considering any of the sibling _hi registers

That sounds right. Before your patch, there were no _hi registers to consider when updating dependencies. Now that artificial registers are added it makes sense to skip them. Looks like that fixed the dependencies created in the MCA *clear-upper-regs.s tests, thanks!

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general register changes sound to me.

@@ -396,6 +399,9 @@ class MCRegisterInfo {
/// Returns true if the given register is constant.
bool isConstant(MCRegister RegNo) const { return get(RegNo).IsConstant; }

/// Returns true if the given register is artificial.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth explaining what Artificial means more precisely?

# CHECK-NEXT: # rdefs left : 0
# CHECK-NEXT: Latency : 3
# CHECK-NEXT: Depth : 0
# CHECK-NEXT: Height : 7
# CHECK-NEXT: Successors:
# CHECK-NEXT: SU(7): Out Latency=1
# CHECK-NEXT: SU(7): Out Latency=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen just for bundles or for other instructions? I was wondering if it was worth filtering out Artificial dependencies as they should be redundant, I believe? It might not matter too much if it is only bundles, as it doesn't seem to be doing any harm and they come up less often so would have less of a compile-time overhead to have so many deps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specific to BUNDLEs, but it is a little specific to how this test was written:

BUNDLE implicit-def $z1, implicit-def $q1, implicit-def $d1, implicit-def $s1, implicit-def $h1, implicit-def $b1, implicit $z5, implicit $p0, implicit killed $z4, implicit killed $z3 {

the instruction unnecessarily specifies all sub-registers of z1 as well as implicit-defs; b1, h1, s1, .. q1, even though implicit-def $z1 would have been sufficient. When I remove the implicit-def for the sub-registers, the scheduler no longer creates the extra out-dependences.

I can see that the scheduler goes through all register units for a given register to create dependences. Before this change there was only a single RegUnit for all registers z1, q1, s1, .. b1, whereas now there are 6 separate regunits, hence the extra dependences.

I'd need to dig a bit further to understand why the scheduler adds these when the implicit-def operands are present, although this isn't a common case for regular instructions (it may be more common for BUNDLE's?) so don't think this has much of an effect in practice. If you're satisfied with that answer, I'd prefer to move forward with the patch as-is, since I've got a few more follow-up patches to fix up other things. I'll look further into it anyway, just in case there's something that needs fixing.

Copy link

github-actions bot commented Nov 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. If there are no objections to the name "Artificial", then LGTM.

@sdesmalen-arm
Copy link
Collaborator Author

Thanks. If there are no objections to the name "Artificial", then LGTM.

Thanks @davemgreen! The name Artificial is already established in TableGen and include/llvm/Target/Target.td, I've just reused the name here.

@sdesmalen-arm sdesmalen-arm merged commit c1c68ba into llvm:main Nov 14, 2024
5 of 8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/14632

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'BOLT-Unit :: Core/./CoreTests/11/23' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/build/buildbot/premerge-monolithic-linux/build/tools/bolt/unittests/Core/./CoreTests-BOLT-Unit-1195098-11-23.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=23 GTEST_SHARD_INDEX=11 /build/buildbot/premerge-monolithic-linux/build/tools/bolt/unittests/Core/./CoreTests
--

Script:
--
/build/buildbot/premerge-monolithic-linux/build/tools/bolt/unittests/Core/./CoreTests --gtest_filter=AArch64/MCPlusBuilderTester.AliasX0/0
--
/build/buildbot/premerge-monolithic-linux/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76: Failure
Expected equality of these values:
  BV.count()
    Which is: 6
  Count
    Which is: 5


/build/buildbot/premerge-monolithic-linux/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76
Expected equality of these values:
  BV.count()
    Which is: 6
  Count
    Which is: 5



********************


sdesmalen-arm added a commit that referenced this pull request Nov 14, 2024
Landing #114827 broke these tests, because they did not account
for the new artificial registers.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-shared running on bolt-worker while building llvm at step 6 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/151/builds/3244

Here is the relevant piece of the build log for the reference
Step 6 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT-Unit :: Core/./CoreTests/11/23' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/unittests/Core/./CoreTests-BOLT-Unit-3389418-11-23.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=23 GTEST_SHARD_INDEX=11 /home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/unittests/Core/./CoreTests
--

Script:
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/build/tools/bolt/unittests/Core/./CoreTests --gtest_filter=AArch64/MCPlusBuilderTester.AliasX0/0
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76: Failure
Expected equality of these values:
  BV.count()
    Which is: 6
  Count
    Which is: 5


/home/worker/bolt-worker2/bolt-x86_64-ubuntu-shared/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76
Expected equality of these values:
  BV.count()
    Which is: 6
  Count
    Which is: 5



********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-dylib running on bolt-worker while building llvm at step 6 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/119/builds/3088

Here is the relevant piece of the build log for the reference
Step 6 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT-Unit :: Core/./CoreTests/12/23' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/worker/bolt-worker2/bolt-x86_64-ubuntu-dylib/build/tools/bolt/unittests/Core/./CoreTests-BOLT-Unit-3417609-12-23.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=23 GTEST_SHARD_INDEX=12 /home/worker/bolt-worker2/bolt-x86_64-ubuntu-dylib/build/tools/bolt/unittests/Core/./CoreTests
--

Script:
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-dylib/build/tools/bolt/unittests/Core/./CoreTests --gtest_filter=AArch64/MCPlusBuilderTester.AliasSmallerX0/0
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-dylib/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76: Failure
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2


/home/worker/bolt-worker2/bolt-x86_64-ubuntu-dylib/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2



********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 8 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/9698

Here is the relevant piece of the build log for the reference
Step 8 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT-Unit :: Core/./CoreTests/12/23' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests-BOLT-Unit-3490729-12-23.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=23 GTEST_SHARD_INDEX=12 /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests
--

Script:
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests --gtest_filter=AArch64/MCPlusBuilderTester.AliasSmallerX0/0
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76: Failure
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2


/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2



********************

Step 12 (nfc-check-bolt) failure: NFC check-bolt completed (failure)
******************** TEST 'BOLT-Unit :: Core/./CoreTests/12/23' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests-BOLT-Unit-3506925-12-23.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=23 GTEST_SHARD_INDEX=12 /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests
--

Script:
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/unittests/Core/./CoreTests --gtest_filter=AArch64/MCPlusBuilderTester.AliasSmallerX0/0
--
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76: Failure
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2


/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/llvm-project/bolt/unittests/Core/MCPlusBuilder.cpp:76
Expected equality of these values:
  BV.count()
    Which is: 3
  Count
    Which is: 2



********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 14, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1601

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/37/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-4896-37-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=37 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 15, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-ubsan running on sanitizer-buildbot10 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/85/builds/2780

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{flags} substitution: '-pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{target-include-dir} -I %{include-dir} -I %{libcxx-dir}/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{link_flags} substitution: '-lc++experimental -nostdlib++ -L %{lib-dir} -Wl,-rpath,%{lib-dir} -lc++ -latomic'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{benchmark_flags} substitution: '-I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/benchmarks/google-benchmark/include -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/benchmarks/google-benchmark/lib -l benchmark'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{exec} substitution: '%{executor} --execdir %T -- '
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) All available features: -faligned-allocation, -fsized-deallocation, add-latomic-workaround, buildhost=linux, c++26, c++experimental, can-create-symlinks, clang, clang-20, clang-20.0, clang-20.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-hardening-mode=none, libcpp-has-no-availability-markup, libcpp-has-thread-api-pthread, linux, objective-c++, optimization=none, stdlib=libc++, stdlib=llvm-libc++, target=aarch64-unknown-linux-gnu, thread-safety, ubsan, verify-support
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1500 seconds was requested on the command line. Forcing timeout to be 1500 seconds.
-- Testing: 9909 of 9929 tests, 48 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp (9908 of 9909)
******************** TEST 'llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 1500 seconds

Command Output (stdout):
--
# COMPILED WITH
/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/std/experimental/simd/simd.class/simd_copy.pass.cpp -pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all -nostdinc++ -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings  -lc++experimental -nostdlib++ -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -Wl,-rpath,/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -lc++ -latomic -o /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/std/experimental/simd/simd.class/Output/simd_copy.pass.cpp.dir/t.tmp.exe
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/std/experimental/simd/simd.class/simd_copy.pass.cpp -pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all -nostdinc++ -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings -lc++experimental -nostdlib++ -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -Wl,-rpath,/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -lc++ -latomic -o /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/std/experimental/simd/simd.class/Output/simd_copy.pass.cpp.dir/t.tmp.exe
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
1500.24s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp
1011.59s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_arith_operators.pass.cpp
980.81s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_bitwise_operators.pass.cpp
946.01s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.modifying.operations/alg.transform/ranges.transform.binary.range.pass.cpp
878.59s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_assignment.pass.cpp
875.65s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_unary.pass.cpp
847.19s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.modifying.operations/alg.transform/ranges.transform.binary.iterator.pass.cpp
646.63s: llvm-libc++-shared.cfg.in :: std/depr/depr.c.headers/math_h.pass.cpp
628.25s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.symmetric.difference/ranges_set_symmetric_difference.pass.cpp
627.67s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.union/ranges_set_union.pass.cpp
558.26s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_ctor_load.pass.cpp
552.08s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
545.78s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.ref/compare_exchange_weak.pass.cpp
525.16s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.ref/compare_exchange_strong.pass.cpp
513.14s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.nonmodifying/alg.ends_with/ranges.ends_with.pass.cpp
465.94s: llvm-libc++-shared.cfg.in :: std/strings/basic.string/string.modifiers/string_replace/size_size_string_size_size.pass.cpp
458.15s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_ctor_default.pass.cpp
447.93s: llvm-libc++-shared.cfg.in :: std/containers/container.adaptors/container.adaptors.format/format.functions.vformat.pass.cpp
417.77s: llvm-libc++-shared.cfg.in :: std/strings/basic.string/string.modifiers/string_replace/size_size_T_size_size.pass.cpp
401.12s: llvm-libc++-shared.cfg.in :: std/containers/container.adaptors/container.adaptors.format/format.functions.format.pass.cpp
Step 11 (stage2/ubsan check-cxx) failure: stage2/ubsan check-cxx (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{flags} substitution: '-pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{target-include-dir} -I %{include-dir} -I %{libcxx-dir}/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{link_flags} substitution: '-lc++experimental -nostdlib++ -L %{lib-dir} -Wl,-rpath,%{lib-dir} -lc++ -latomic'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{benchmark_flags} substitution: '-I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/benchmarks/google-benchmark/include -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/benchmarks/google-benchmark/lib -l benchmark'
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) Using %{exec} substitution: '%{executor} --execdir %T -- '
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libc++-shared.cfg.in) All available features: -faligned-allocation, -fsized-deallocation, add-latomic-workaround, buildhost=linux, c++26, c++experimental, can-create-symlinks, clang, clang-20, clang-20.0, clang-20.0.0, diagnose-if-support, enable-benchmarks=no, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-1024-bit-atomics, has-64-bit-atomics, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-abi-version=1, libcpp-hardening-mode=none, libcpp-has-no-availability-markup, libcpp-has-thread-api-pthread, linux, objective-c++, optimization=none, stdlib=libc++, stdlib=llvm-libc++, target=aarch64-unknown-linux-gnu, thread-safety, ubsan, verify-support
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1500 seconds was requested on the command line. Forcing timeout to be 1500 seconds.
-- Testing: 9909 of 9929 tests, 48 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp (9908 of 9909)
******************** TEST 'llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 1500 seconds

Command Output (stdout):
--
# COMPILED WITH
/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/std/experimental/simd/simd.class/simd_copy.pass.cpp -pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all -nostdinc++ -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings  -lc++experimental -nostdlib++ -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -Wl,-rpath,/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -lc++ -latomic -o /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/std/experimental/simd/simd.class/Output/simd_copy.pass.cpp.dir/t.tmp.exe
# executed command: /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm_build0/bin/clang++ /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/std/experimental/simd/simd.class/simd_copy.pass.cpp -pthread --target=aarch64-unknown-linux-gnu -g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize=float-divide-by-zero -fno-sanitize-recover=all -nostdinc++ -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/include/c++/v1 -I /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/libcxx/test/support -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings -lc++experimental -nostdlib++ -L /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -Wl,-rpath,/home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test-suite-install/lib -lc++ -latomic -o /home/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/libcxx_build_ubsan/libcxx/test/std/experimental/simd/simd.class/Output/simd_copy.pass.cpp.dir/t.tmp.exe
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
1500.24s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_copy.pass.cpp
1011.59s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_arith_operators.pass.cpp
980.81s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_bitwise_operators.pass.cpp
946.01s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.modifying.operations/alg.transform/ranges.transform.binary.range.pass.cpp
878.59s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.reference/reference_assignment.pass.cpp
875.65s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_unary.pass.cpp
847.19s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.modifying.operations/alg.transform/ranges.transform.binary.iterator.pass.cpp
646.63s: llvm-libc++-shared.cfg.in :: std/depr/depr.c.headers/math_h.pass.cpp
628.25s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.symmetric.difference/ranges_set_symmetric_difference.pass.cpp
627.67s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.union/ranges_set_union.pass.cpp
558.26s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_ctor_load.pass.cpp
552.08s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.sorting/alg.set.operations/set.intersection/ranges_set_intersection.pass.cpp
545.78s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.ref/compare_exchange_weak.pass.cpp
525.16s: llvm-libc++-shared.cfg.in :: std/atomics/atomics.ref/compare_exchange_strong.pass.cpp
513.14s: llvm-libc++-shared.cfg.in :: std/algorithms/alg.nonmodifying/alg.ends_with/ranges.ends_with.pass.cpp
465.94s: llvm-libc++-shared.cfg.in :: std/strings/basic.string/string.modifiers/string_replace/size_size_string_size_size.pass.cpp
458.15s: llvm-libc++-shared.cfg.in :: std/experimental/simd/simd.class/simd_ctor_default.pass.cpp
447.93s: llvm-libc++-shared.cfg.in :: std/containers/container.adaptors/container.adaptors.format/format.functions.vformat.pass.cpp
417.77s: llvm-libc++-shared.cfg.in :: std/strings/basic.string/string.modifiers/string_replace/size_size_T_size_size.pass.cpp
401.12s: llvm-libc++-shared.cfg.in :: std/containers/container.adaptors/container.adaptors.format/format.functions.format.pass.cpp

akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
…4827)

This is a step towards enabling subreg liveness tracking for AArch64,
which requires that registers are fully covered by their subregisters,
as covered here llvm#109797.

There are several changes in this patch:

* AArch64RegisterInfo.td and tests: Define the high bits like B0_HI,
H0_HI, S0_HI, D0_HI, Q0_HI. Because the bits must be defined by some
register class, this added a register class which meant that we had to
update 'magic numbers' in several tests.

The use of ComposedSubRegIndex helped 'compress' the number of bits
required for the lanemask. The correctness of the masks is tested by an
explicit unit tests.

* LoadStoreOptimizer: previously 'HasDisjunctSubRegs' was only true for
register tuples, but with this change to describe the high bits, a
register like 'D0' will also have 'HasDisjunctSubRegs' set to true
(because it's fullly covered by S0 and S0_HI). The fix here is to
explicitly test if the register class is one of the known D/Q/Z tuples.
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this pull request Nov 18, 2024
Landing llvm#114827 broke these tests, because they did not account
for the new artificial registers.
@zeroomega
Copy link
Contributor

zeroomega commented Nov 21, 2024

Hi,

We noticed a large clang/llvm test time regression on our clang linux-arm64 and mac-arm64 builders after this patch is landed.
The test step took around 18mins before this patch was landed see: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8731309118845812913/overview (please expand the 'clang' step).

The test step took around 41mins after this change was landed. see: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8731304524561843377/overview

We also noticed that our stage2 build now takes 2x of time. It appears this patch caused a large performance regression on arm64 binaries generated by clang.

Could you take a look?
If it takes some time to fix, could you revert your change first?

@PiJoules
Copy link
Contributor

PiJoules commented Nov 21, 2024

Also this really affects our mac arm64 builders from 2.8hrs to do a full build and run tests (https://ci.chromium.org/ui/p/fuchsia/builders/prod/clang-mac-arm64/b8731305238018805313/overview) to >5hrs for a full build and run tests (https://ci.chromium.org/ui/p/fuchsia/builders/prod/clang-mac-arm64/b8731291539321484417/overview).

@sdesmalen-arm
Copy link
Collaborator Author

Thanks for letting me know. From what I can quickly see there is only a time difference for the runtime tests. The clang, polly, llvm and lld tests run in about the same time. @PiJoules the build you linked to was interrupted while building the check-clang target, which for the other builds pointed to by @zeroomega wasn't any slower. Perhaps I'm missing something here though.
I do see various other patches in the blamelist, has it been confirmed that this patch is the cause of the increased testing time?

@zeroomega
Copy link
Contributor

zeroomega commented Nov 21, 2024

Thanks for letting me know. From what I can quickly see there is only a time difference for the runtime tests. The clang, polly, llvm and lld tests run in about the same time. @PiJoules the build you linked to was interrupted while building the check-clang target, which for the other builds pointed to by @zeroomega wasn't any slower. Perhaps I'm missing something here though. I do see various other patches in the blamelist, has it been confirmed that this patch is the cause of the increased testing time?

The patch is confirmed through bisecting.

https://ci.chromium.org/b/8730624321112150225 builds e52238b and the test time looks normal.

https://ci.chromium.org/b/8730630997932921601 builds c1c68ba and the test time regressed. c1c68ba (#114827) is 1 commit after e52238b

For stage2 builders, I didn't do a bisection since they use LTO and they take very long time to build. But I have some indirect data:

https://ci.chromium.org/ui/p/fuchsia/builders/prod/clang-linux-arm64/b8730794949594129265/overview was from 2 days ago (of course after this patch was landed). It took 2.5hrs for building clang and runtimes. and 2.1hrs for running all the tests.
https://ci.chromium.org/ui/p/fuchsia/builders/prod/clang-linux-arm64/b8731348039888781153/overview was from 11/13 (before this patch was landed). It took 43mins to build clang and runtimes and 18 minutes to run the tests. If you feel it is necessary to perform a before/after stage2 build test on build time, though we feel it is unnecessary since the bisecting on the stage1 build already showed it clearly, we can do that, it will take about 6 hours.

I suspect the patch caused a general performance regression of binaries built by clang. Because the stage1 clang build was using the host toolchian, so the build time wasn't affected. But clang, lld and runtimes tests were using the freshly built clang, they were affected. The stage2 builder was using the stage1 toolchain (which has the performance bug), so the build time also regressed.

Let us know if you need additional information.

@vitalybuka
Copy link
Collaborator

vitalybuka commented Nov 22, 2024

This patch makes libcxx tests to timeouts here
https://lab.llvm.org/buildbot/#/builders/85

Many tests slowed down 10x or more.

CC @fmayer

vitalybuka added a commit that referenced this pull request Nov 22, 2024
maksfb added a commit that referenced this pull request Nov 22, 2024
This reverts commit 576865a.

Depends on #114827 that was reverted.
sdesmalen-arm added a commit that referenced this pull request Nov 27, 2024
…114827)"

The issue with slow compile-time was caused by an assert in
AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked'
after adding all the reserved registers. This call gets very expensive
after adding the _HI registers due to the way the function searches
in the 'Exception' list, which is expected to be a small list but isn't
(the patch added 190 _HI regs).

It was possible to rewrite the code in such a way that the _HI registers
are marked as reserved after the check. This makes the problem go away
entirely and restores compile-time to what it was before (tested for
`check-runtimes`, which previously showed a ~5x slowdown).

This reverts commits:
  1434d2a
  2704647
sdesmalen-arm added a commit that referenced this pull request Nov 28, 2024
…6248)"

This patch can now reland after 318c69d relanded #114827.

This reverts commit 1683f84.
sdesmalen-arm added a commit that referenced this pull request Nov 28, 2024
…zing. (#116191)"

This patch can now reland after 318c69d relanded #114827.

This reverts commit 14a58a1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants