Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter #110705

Merged
merged 9 commits into from
Nov 12, 2024

Conversation

atrosinenko
Copy link
Contributor

@atrosinenko atrosinenko commented Oct 1, 2024

Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.

@llvmbot
Copy link
Member

llvmbot commented Oct 1, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)

Changes

Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.

Shorten the generic XPAC-based non-trapping sequence by one mov
instruction: perform XPAC on the tested register itself instead of the
scratch one as XPACLRI cannot operate on the scratch register anyway.


Patch is 33.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110705.diff

9 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+112-31)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+13)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+2)
  • (modified) llvm/lib/Target/AArch64/AArch64PointerAuth.cpp (+2-180)
  • (modified) llvm/lib/Target/AArch64/AArch64PointerAuth.h (+18-22)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (-2)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (-23)
  • (modified) llvm/test/CodeGen/AArch64/ptrauth-ret-trap.ll (+18-18)
  • (modified) llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll (+27-27)
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 6d2dd0ecbccf31..50502477706ccf 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -153,6 +153,7 @@ class AArch64AsmPrinter : public AsmPrinter {
   void emitPtrauthCheckAuthenticatedValue(Register TestedReg,
                                           Register ScratchReg,
                                           AArch64PACKey::ID Key,
+                                          AArch64PAuth::AuthCheckMethod Method,
                                           bool ShouldTrap,
                                           const MCSymbol *OnFailure);
 
@@ -1731,7 +1732,8 @@ unsigned AArch64AsmPrinter::emitPtrauthDiscriminator(uint16_t Disc,
 /// of proceeding to the next instruction (only if ShouldTrap is false).
 void AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue(
     Register TestedReg, Register ScratchReg, AArch64PACKey::ID Key,
-    bool ShouldTrap, const MCSymbol *OnFailure) {
+    AArch64PAuth::AuthCheckMethod Method, bool ShouldTrap,
+    const MCSymbol *OnFailure) {
   // Insert a sequence to check if authentication of TestedReg succeeded,
   // such as:
   //
@@ -1757,38 +1759,70 @@ void AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue(
   //    Lsuccess:
   //      ...
   //
-  // This sequence is expensive, but we need more information to be able to
-  // do better.
-  //
-  // We can't TBZ the poison bit because EnhancedPAC2 XORs the PAC bits
-  // on failure.
-  // We can't TST the PAC bits because we don't always know how the address
-  // space is setup for the target environment (and the bottom PAC bit is
-  // based on that).
-  // Either way, we also don't always know whether TBI is enabled or not for
-  // the specific target environment.
+  // See the documentation on AuthCheckMethod enumeration constants for
+  // the specific code sequences that can be used to perform the check.
+  using AArch64PAuth::AuthCheckMethod;
 
-  unsigned XPACOpc = getXPACOpcodeForKey(Key);
+  if (Method == AuthCheckMethod::None)
+    return;
+  if (Method == AuthCheckMethod::DummyLoad) {
+    EmitToStreamer(MCInstBuilder(AArch64::LDRWui)
+                       .addReg(getWRegFromXReg(ScratchReg))
+                       .addReg(TestedReg)
+                       .addImm(0));
+    assert(ShouldTrap && !OnFailure && "DummyLoad always traps on error");
+    return;
+  }
 
   MCSymbol *SuccessSym = createTempSymbol("auth_success_");
+  if (Method == AuthCheckMethod::XPAC || Method == AuthCheckMethod::XPACHint) {
+    //  mov Xscratch, Xtested
+    emitMovXReg(ScratchReg, TestedReg);
 
-  //  mov Xscratch, Xtested
-  emitMovXReg(ScratchReg, TestedReg);
-
-  //  xpac(i|d) Xscratch
-  EmitToStreamer(MCInstBuilder(XPACOpc).addReg(ScratchReg).addReg(ScratchReg));
+    if (Method == AuthCheckMethod::XPAC) {
+      //  xpac(i|d) Xscratch
+      unsigned XPACOpc = getXPACOpcodeForKey(Key);
+      EmitToStreamer(
+          MCInstBuilder(XPACOpc).addReg(ScratchReg).addReg(ScratchReg));
+    } else {
+      //  xpaclri
+
+      // Note that this method applies XPAC to TestedReg instead of ScratchReg.
+      assert(TestedReg == AArch64::LR &&
+             "XPACHint mode is only compatible with checking the LR register");
+      assert((Key == AArch64PACKey::IA || Key == AArch64PACKey::IB) &&
+             "XPACHint mode is only compatible with I-keys");
+      EmitToStreamer(MCInstBuilder(AArch64::XPACLRI));
+    }
 
-  //  cmp Xtested, Xscratch
-  EmitToStreamer(MCInstBuilder(AArch64::SUBSXrs)
-                     .addReg(AArch64::XZR)
-                     .addReg(TestedReg)
-                     .addReg(ScratchReg)
-                     .addImm(0));
+    //  cmp Xtested, Xscratch
+    EmitToStreamer(MCInstBuilder(AArch64::SUBSXrs)
+                       .addReg(AArch64::XZR)
+                       .addReg(TestedReg)
+                       .addReg(ScratchReg)
+                       .addImm(0));
 
-  //  b.eq Lsuccess
-  EmitToStreamer(MCInstBuilder(AArch64::Bcc)
-                     .addImm(AArch64CC::EQ)
-                     .addExpr(MCSymbolRefExpr::create(SuccessSym, OutContext)));
+    //  b.eq Lsuccess
+    EmitToStreamer(
+        MCInstBuilder(AArch64::Bcc)
+            .addImm(AArch64CC::EQ)
+            .addExpr(MCSymbolRefExpr::create(SuccessSym, OutContext)));
+  } else if (Method == AuthCheckMethod::HighBitsNoTBI) {
+    //  eor Xscratch, Xtested, Xtested, lsl #1
+    EmitToStreamer(MCInstBuilder(AArch64::EORXrs)
+                       .addReg(ScratchReg)
+                       .addReg(TestedReg)
+                       .addReg(TestedReg)
+                       .addImm(1));
+    //  tbz Xscratch, #62, Lsuccess
+    EmitToStreamer(
+        MCInstBuilder(AArch64::TBZX)
+            .addReg(ScratchReg)
+            .addImm(62)
+            .addExpr(MCSymbolRefExpr::create(SuccessSym, OutContext)));
+  } else {
+    llvm_unreachable("Unsupported check method");
+  }
 
   if (ShouldTrap) {
     assert(!OnFailure && "Cannot specify OnFailure with ShouldTrap");
@@ -1802,9 +1836,26 @@ void AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue(
     // Note that this can introduce an authentication oracle (such as based on
     // the high bits of the re-signed value).
 
-    // FIXME: Can we simply return the AUT result, already in TestedReg?
-    //  mov Xtested, Xscratch
-    emitMovXReg(TestedReg, ScratchReg);
+    // FIXME: The XPAC method can be optimized by applying XPAC to TestedReg
+    //        instead of ScratchReg, thus eliminating one `mov` instruction.
+    //        Both XPAC and XPACHint can be further optimized by not using a
+    //        conditional branch jumping over an unconditional one.
+
+    switch (Method) {
+    case AuthCheckMethod::XPACHint:
+      // LR is already XPAC-ed at this point.
+      break;
+    case AuthCheckMethod::XPAC:
+      //  mov Xtested, Xscratch
+      emitMovXReg(TestedReg, ScratchReg);
+      break;
+    default:
+      // If Xtested was not XPAC-ed so far, emit XPAC here.
+      //  xpac(i|d) Xtested
+      unsigned XPACOpc = getXPACOpcodeForKey(Key);
+      EmitToStreamer(
+          MCInstBuilder(XPACOpc).addReg(TestedReg).addReg(TestedReg));
+    }
 
     if (OnFailure) {
       //  b Lend
@@ -1830,7 +1881,7 @@ void AArch64AsmPrinter::emitPtrauthAuthResign(const MachineInstr *MI) {
   //      ; sign x16 (if AUTPAC)
   //    Lend:   ; if not trapping on failure
   //
-  // with the checking sequence chosen depending on whether we should check
+  // with the checking sequence chosen depending on whether/how we should check
   // the pointer and whether we should trap on failure.
 
   // By default, auth/resign sequences check for auth failures.
@@ -1890,6 +1941,7 @@ void AArch64AsmPrinter::emitPtrauthAuthResign(const MachineInstr *MI) {
       EndSym = createTempSymbol("resign_end_");
 
     emitPtrauthCheckAuthenticatedValue(AArch64::X16, AArch64::X17, AUTKey,
+                                       AArch64PAuth::AuthCheckMethod::XPAC,
                                        ShouldTrap, EndSym);
   }
 
@@ -2260,11 +2312,34 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     OutStreamer->emitLabel(LOHLabel);
   }
 
+  // With Pointer Authentication, it may be needed to explicitly check the
+  // authenticated value in LR when performing a tail call.
+  // Otherwise, the callee may re-sign the invalid return address,
+  // introducing a signing oracle.
+  auto CheckLRInTailCall = [this](Register CallDestinationReg) {
+    if (!AArch64FI->shouldSignReturnAddress(*MF))
+      return;
+
+    auto LRCheckMethod = STI->getAuthenticatedLRCheckMethod(*MF);
+    if (LRCheckMethod == AArch64PAuth::AuthCheckMethod::None)
+      return;
+
+    Register ScratchReg =
+        CallDestinationReg == AArch64::X16 ? AArch64::X17 : AArch64::X16;
+    AArch64PACKey::ID Key =
+        AArch64FI->shouldSignWithBKey() ? AArch64PACKey::IB : AArch64PACKey::IA;
+    emitPtrauthCheckAuthenticatedValue(
+        AArch64::LR, ScratchReg, Key, LRCheckMethod,
+        /*ShouldTrap=*/true, /*OnFailure=*/nullptr);
+  };
+
   AArch64TargetStreamer *TS =
     static_cast<AArch64TargetStreamer *>(OutStreamer->getTargetStreamer());
   // Do any manual lowerings.
   switch (MI->getOpcode()) {
   default:
+    assert(!AArch64InstrInfo::isTailCallReturnInst(*MI) &&
+           "Unhandled tail call instruction");
     break;
   case AArch64::HINT: {
     // CurrentPatchableFunctionEntrySym can be CurrentFnBegin only for
@@ -2404,6 +2479,8 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
                               ? AArch64::X17
                               : AArch64::X16;
 
+    CheckLRInTailCall(MI->getOperand(0).getReg());
+
     unsigned DiscReg = AddrDisc;
     if (Disc) {
       if (AddrDisc != AArch64::NoRegister) {
@@ -2434,6 +2511,8 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   case AArch64::TCRETURNrix17:
   case AArch64::TCRETURNrinotx16:
   case AArch64::TCRETURNriALL: {
+    CheckLRInTailCall(MI->getOperand(0).getReg());
+
     MCInst TmpInst;
     TmpInst.setOpcode(AArch64::BR);
     TmpInst.addOperand(MCOperand::createReg(MI->getOperand(0).getReg()));
@@ -2441,6 +2520,8 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     return;
   }
   case AArch64::TCRETURNdi: {
+    CheckLRInTailCall(AArch64::NoRegister);
+
     MCOperand Dest;
     MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
     MCInst TmpInst;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 32bc0e7d0d6475..d54582df819604 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -107,6 +107,19 @@ unsigned AArch64InstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
   unsigned NumBytes = 0;
   const MCInstrDesc &Desc = MI.getDesc();
 
+  if (!MI.isBundle() && isTailCallReturnInst(MI)) {
+    NumBytes = Desc.getSize() ? Desc.getSize() : 4;
+
+    const auto *MFI = MF->getInfo<AArch64FunctionInfo>();
+    if (!MFI->shouldSignReturnAddress(MF))
+      return NumBytes;
+
+    auto &STI = MF->getSubtarget<AArch64Subtarget>();
+    auto Method = STI.getAuthenticatedLRCheckMethod(*MF);
+    NumBytes += AArch64PAuth::getCheckerSizeInBytes(Method);
+    return NumBytes;
+  }
+
   // Size should be preferably set in
   // llvm/lib/Target/AArch64/AArch64InstrInfo.td (default case).
   // Specific cases handle instructions of variable sizes
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 4374d92a5b7b16..57f0d76136e63e 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1903,6 +1903,8 @@ let Predicates = [HasPAuth] in {
   }
 
   // Size 16: 4 fixed + 8 variable, to compute discriminator.
+  // The size returned by getInstSizeInBytes() is incremented according
+  // to the variant of LR check.
   let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1, Size = 16,
       Uses = [SP] in {
     def AUTH_TCRETURN
diff --git a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
index 92ab4b5c3d251f..e966234296df57 100644
--- a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
@@ -12,7 +12,6 @@
 #include "AArch64InstrInfo.h"
 #include "AArch64MachineFunctionInfo.h"
 #include "AArch64Subtarget.h"
-#include "Utils/AArch64BaseInfo.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
@@ -35,15 +34,8 @@ class AArch64PointerAuth : public MachineFunctionPass {
   StringRef getPassName() const override { return AARCH64_POINTER_AUTH_NAME; }
 
 private:
-  /// An immediate operand passed to BRK instruction, if it is ever emitted.
-  static unsigned BrkOperandForKey(AArch64PACKey::ID KeyId) {
-    const unsigned BrkOperandBase = 0xc470;
-    return BrkOperandBase + KeyId;
-  }
-
   const AArch64Subtarget *Subtarget = nullptr;
   const AArch64InstrInfo *TII = nullptr;
-  const AArch64RegisterInfo *TRI = nullptr;
 
   void signLR(MachineFunction &MF, MachineBasicBlock::iterator MBBI) const;
 
@@ -230,97 +222,6 @@ void AArch64PointerAuth::authenticateLR(
   }
 }
 
-namespace {
-
-// Mark dummy LDR instruction as volatile to prevent removing it as dead code.
-MachineMemOperand *createCheckMemOperand(MachineFunction &MF,
-                                         const AArch64Subtarget &Subtarget) {
-  MachinePointerInfo PointerInfo(Subtarget.getAddressCheckPSV());
-  auto MOVolatileLoad =
-      MachineMemOperand::MOLoad | MachineMemOperand::MOVolatile;
-
-  return MF.getMachineMemOperand(PointerInfo, MOVolatileLoad, 4, Align(4));
-}
-
-} // namespace
-
-void llvm::AArch64PAuth::checkAuthenticatedRegister(
-    MachineBasicBlock::iterator MBBI, AuthCheckMethod Method,
-    Register AuthenticatedReg, Register TmpReg, bool UseIKey, unsigned BrkImm) {
-
-  MachineBasicBlock &MBB = *MBBI->getParent();
-  MachineFunction &MF = *MBB.getParent();
-  const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
-  const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
-  DebugLoc DL = MBBI->getDebugLoc();
-
-  // All terminator instructions should be grouped at the end of the machine
-  // basic block, with no non-terminator instructions between them. Depending on
-  // the method requested, we will insert some regular instructions, maybe
-  // followed by a conditional branch instruction, which is a terminator, before
-  // MBBI. Thus, MBBI is expected to be the first terminator of its MBB.
-  assert(MBBI->isTerminator() && MBBI == MBB.getFirstTerminator() &&
-         "MBBI should be the first terminator in MBB");
-
-  // First, handle the methods not requiring creating extra MBBs.
-  switch (Method) {
-  default:
-    break;
-  case AuthCheckMethod::None:
-    return;
-  case AuthCheckMethod::DummyLoad:
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::LDRWui), getWRegFromXReg(TmpReg))
-        .addReg(AuthenticatedReg)
-        .addImm(0)
-        .addMemOperand(createCheckMemOperand(MF, Subtarget));
-    return;
-  }
-
-  // Control flow has to be changed, so arrange new MBBs.
-
-  // The block that explicitly generates a break-point exception on failure.
-  MachineBasicBlock *BreakBlock =
-      MF.CreateMachineBasicBlock(MBB.getBasicBlock());
-  MF.push_back(BreakBlock);
-  MBB.addSuccessor(BreakBlock);
-
-  BuildMI(BreakBlock, DL, TII->get(AArch64::BRK)).addImm(BrkImm);
-
-  switch (Method) {
-  case AuthCheckMethod::None:
-  case AuthCheckMethod::DummyLoad:
-    llvm_unreachable("Should be handled above");
-  case AuthCheckMethod::HighBitsNoTBI:
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::EORXrs), TmpReg)
-        .addReg(AuthenticatedReg)
-        .addReg(AuthenticatedReg)
-        .addImm(1);
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::TBNZX))
-        .addReg(TmpReg)
-        .addImm(62)
-        .addMBB(BreakBlock);
-    return;
-  case AuthCheckMethod::XPACHint:
-    assert(AuthenticatedReg == AArch64::LR &&
-           "XPACHint mode is only compatible with checking the LR register");
-    assert(UseIKey && "XPACHint mode is only compatible with I-keys");
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), TmpReg)
-        .addReg(AArch64::XZR)
-        .addReg(AArch64::LR)
-        .addImm(0);
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::XPACLRI));
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBSXrs), AArch64::XZR)
-        .addReg(TmpReg)
-        .addReg(AArch64::LR)
-        .addImm(0);
-    BuildMI(MBB, MBBI, DL, TII->get(AArch64::Bcc))
-        .addImm(AArch64CC::NE)
-        .addMBB(BreakBlock);
-    return;
-  }
-  llvm_unreachable("Unknown AuthCheckMethod enum");
-}
-
 unsigned llvm::AArch64PAuth::getCheckerSizeInBytes(AuthCheckMethod Method) {
   switch (Method) {
   case AuthCheckMethod::None:
@@ -330,63 +231,12 @@ unsigned llvm::AArch64PAuth::getCheckerSizeInBytes(AuthCheckMethod Method) {
   case AuthCheckMethod::HighBitsNoTBI:
     return 12;
   case AuthCheckMethod::XPACHint:
+  case AuthCheckMethod::XPAC:
     return 20;
   }
   llvm_unreachable("Unknown AuthCheckMethod enum");
 }
 
-bool AArch64PointerAuth::checkAuthenticatedLR(
-    MachineBasicBlock::iterator TI) const {
-  const AArch64FunctionInfo *MFnI = TI->getMF()->getInfo<AArch64FunctionInfo>();
-  AArch64PACKey::ID KeyId =
-      MFnI->shouldSignWithBKey() ? AArch64PACKey::IB : AArch64PACKey::IA;
-
-  AuthCheckMethod Method =
-      Subtarget->getAuthenticatedLRCheckMethod(*TI->getMF());
-
-  if (Method == AuthCheckMethod::None)
-    return false;
-
-  // FIXME If FEAT_FPAC is implemented by the CPU, this check can be skipped.
-
-  assert(!TI->getMF()->hasWinCFI() && "WinCFI is not yet supported");
-
-  // The following code may create a signing oracle:
-  //
-  //   <authenticate LR>
-  //   TCRETURN          ; the callee may sign and spill the LR in its prologue
-  //
-  // To avoid generating a signing oracle, check the authenticated value
-  // before possibly re-signing it in the callee, as follows:
-  //
-  //   <authenticate LR>
-  //   <check if LR contains a valid address>
-  //   b.<cond> break_block
-  // ret_block:
-  //   TCRETURN
-  // break_block:
-  //   brk <BrkOperand>
-  //
-  // or just
-  //
-  //   <authenticate LR>
-  //   ldr tmp, [lr]
-  //   TCRETURN
-
-  // TmpReg is chosen assuming X16 and X17 are dead after TI.
-  assert(AArch64InstrInfo::isTailCallReturnInst(*TI) &&
-         "Tail call is expected");
-  Register TmpReg =
-      TI->readsRegister(AArch64::X16, TRI) ? AArch64::X17 : AArch64::X16;
-  assert(!TI->readsRegister(TmpReg, TRI) &&
-         "More than a single register is used by TCRETURN");
-
-  checkAuthenticatedRegister(TI, Method, AArch64::LR, TmpReg, /*UseIKey=*/true,
-                             BrkOperandForKey(KeyId));
-
-  return true;
-}
-
 void AArch64PointerAuth::emitBlend(MachineBasicBlock::iterator MBBI,
                                    Register Result, Register AddrDisc,
                                    unsigned IntDisc) const {
@@ -414,38 +264,21 @@ void AArch64PointerAuth::expandPAuthBlend(
 }
 
 bool AArch64PointerAuth::runOnMachineFunction(MachineFunction &MF) {
-  const auto *MFnI = MF.getInfo<AArch64FunctionInfo>();
-
   Subtarget = &MF.getSubtarget<AArch64Subtarget>();
   TII = Subtarget->getInstrInfo();
-  TRI = Subtarget->getRegisterInfo();
 
   SmallVector<MachineBasicBlock::instr_iterator> PAuthPseudoInstrs;
-  SmallVector<MachineBasicBlock::instr_iterator> TailCallInstrs;
 
   bool Modified = false;
-  bool HasAuthenticationInstrs = false;
 
   for (auto &MBB : MF) {
-    // Using instr_iterator to catch unsupported bundled TCRETURN* instructions
-    // instead of just skipping them.
-    for (auto &MI : MBB.instrs()) {
+    for (auto &MI : MBB) {
       switch (MI.getOpcode()) {
       default:
-        // Bundled TCRETURN* instructions (such as created by KCFI)
-        // are not supported yet, but no support is required if no
-        // PAUTH_EPILOGUE instructions exist in the same function.
-        // Skip the BUNDLE instruction itself (actual bundled instructions
-        // follow it in the instruction list).
-        if (MI.isBundle())
-          continue;
-        if (AArch64InstrInfo::isTailCallReturnInst(MI))
-          TailCallInstrs.push_back(MI.getIterator());
         break;
       case AArch64::PAUTH_PROLOGUE:
       case AArch64::PAUTH_EPILOGUE:
       case AArch64::PAUTH_BLEND:
-        assert(!MI.isBundled());
         PAuthPseudoInstrs.push_back(MI.getIterator());
         break;
       }
@@ -459,7 +292,6 @@ bool AArch64PointerAuth::runOnMachineFunction(MachineFunction &MF) {
       break;
     case AArch64::PAUTH_EPILOGUE:
       authenticateLR(MF, It);
-      HasAuthenticationInstrs = true;
       break;
     case AArch64::PAUTH_BLEND:
       expandPAuthBlend(It);
@@ -471,15 +303,5 @@ bool AArch64PointerAuth::runOnMachineFunction(MachineFunction &MF) {
     Modified = true;
   }
 
-  // FIXME Do we need to emit any PAuth-related epilogue code at all
-  //       when SCS is enabled?
-  if (HasAuthenticationInstrs &&
-      !MFnI->needsShadowCallStackPrologueEpilogue(MF)) {
-    for (auto TailCall : TailCallInstrs) {
-      assert(...
[truncated]

@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from 60ca463 to 089cc13 Compare October 2, 2024 09:32
@atrosinenko
Copy link
Contributor Author

Updated the commit message: removed the paragraph about dropping one mov instruction from the non-trapping variant of check. That was my initial idea to make non-hint xpac(i|d) operate on the tested register itself (just like xpaclri does), but it was removed from the final version of the patch, to not make unnecessary changes to the tests.

@kovdan01 kovdan01 self-requested a review October 2, 2024 17:47
@kovdan01
Copy link
Contributor

kovdan01 commented Oct 4, 2024

@atrosinenko With this patch applied, the following tests in test-suite compiled in Release mode become failed with segmentation fault:

  1. SingleSource/Benchmarks/Misc-C++-EH/spirit.test
  2. MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test
  3. MultiSource/Benchmarks/Prolangs-C++/shapes/shapes.test
  4. MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.test
  5. MultiSource/Benchmarks/Bullet/bullet.test
  6. MultiSource/Applications/lambda-0.1.3/lambda.test
  7. MultiSource/Applications/kimwitu++/kc.test

I'm now preparing a minimal reproducer for that. Stay tuned.

@kovdan01
Copy link
Contributor

kovdan01 commented Oct 7, 2024

@atrosinenko Here is a reproducer IR code.

define void @_ZN7myshape4moveEv(ptr %this) #0 {
entry:
  %0 = load ptr, ptr %this, align 8
  %vtable = load ptr, ptr %0, align 8
  %1 = ptrtoint ptr %0 to i64
  %2 = tail call i64 @llvm.ptrauth.blend(i64 %1, i64 50297)
  %3 = ptrtoint ptr %vtable to i64
  %4 = tail call i64 @llvm.ptrauth.auth(i64 %3, i32 2, i64 %2)
  %5 = inttoptr i64 %4 to ptr
  %6 = load ptr, ptr %5, align 8
  %7 = tail call i64 @llvm.ptrauth.blend(i64 %4, i64 36564)
  tail call void %6(ptr noundef nonnull align 8 dereferenceable(8) %0) [ "ptrauth"(i32 0, i64 %7) ]
  %r = getelementptr inbounds i8, ptr %this, i64 8
  %8 = load ptr, ptr %r, align 8
  %vtable2 = load ptr, ptr %8, align 8
  %9 = ptrtoint ptr %8 to i64
  %10 = tail call i64 @llvm.ptrauth.blend(i64 %9, i64 50297)
  %11 = ptrtoint ptr %vtable2 to i64
  %12 = tail call i64 @llvm.ptrauth.auth(i64 %11, i32 2, i64 %10)
  %13 = inttoptr i64 %12 to ptr
  %14 = load ptr, ptr %13, align 8
  %15 = tail call i64 @llvm.ptrauth.blend(i64 %12, i64 36564)
  tail call void %14(ptr noundef nonnull align 8 dereferenceable(8) %8) [ "ptrauth"(i32 0, i64 %15) ]
  ret void
}

declare i64 @llvm.ptrauth.blend(i64, i64)

declare i64 @llvm.ptrauth.auth(i64, i32 immarg, i64)

attributes #0 = { "ptrauth-auth-traps" "ptrauth-calls" "ptrauth-returns" }

When compiled with llc -O3 -mtriple aarch64-linux-gnu -mattr=+pauth, the following assembly is generated. X16 register is clobbered during check inserted under else if (Method == AuthCheckMethod::HighBitsNoTBI). The previous value (containing pointer for address discrimination) is not spilled before and not restored after that. So, the last braa instructio uses incorrect discrimination value in X16 register, which results in authentication failure and trying to pass control to an address with non-zero pac field, causing a segfault. nProbably, a different scratch register (e.g. X17) should be used for check sequence if X16 is already in use for address discrimination.

_ZN7myshape4moveEv:                     // @_ZN7myshape4moveEv
	.cfi_startproc
// %bb.0:                               // %entry
	.cfi_b_key_frame
	pacibsp
	stp	x30, x19, [sp, #-16]!           // 16-byte Folded Spill
	.cfi_negate_ra_state
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w30, -16
	mov	x19, x0
	ldr	x0, [x0]
	ldr	x16, [x0]
	mov	x17, x0
	movk	x17, #50297, lsl #48
	autda	x16, x17
	mov	x17, x16
	xpacd	x17
	cmp	x16, x17
	b.eq	.Lauth_success_0
	brk	#0xc472
.Lauth_success_0:
	ldr	x9, [x16]
	mov	x8, x16
	mov	x17, x8
	movk	x17, #36564, lsl #48
	blraa	x9, x17
	ldr	x0, [x19, #8]
	ldr	x16, [x0]
	mov	x17, x0
	movk	x17, #50297, lsl #48
	autda	x16, x17
	mov	x17, x16
	xpacd	x17
	cmp	x16, x17
	b.eq	.Lauth_success_1
	brk	#0xc472
.Lauth_success_1:
	ldr	x1, [x16]
	ldp	x30, x19, [sp], #16             // 16-byte Folded Reload
	autibsp
	eor	x16, x30, x30, lsl #1
	tbz	x16, #62, .Lauth_success_2
	brk	#0xc471
.Lauth_success_2:
	movk	x16, #36564, lsl #48
	braa	x1, x16

The IR corresponds to the following C++ code.

struct shape {
  virtual void move() = 0;
};

struct line : public shape {
  void move () override;
};

struct myshape {
  line *l;
  line *r;
  void move();
};

void myshape::move() {
  l->move();
  r->move();
}

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atrosinenko
Copy link
Contributor Author

@kovdan01 Thank you for the reproducer!

I restored the original check TI->readsRegister(AArch64::X16, TRI) ? AArch64::X17 : AArch64::X16 when computing the scratch register, as checking just the specific operand did not handle the AUTH_TCRETURN* pseudos which actually have two register operands. Additionally, the register class of the second register operand of these pseudo instructions was restricted, as it seems to be just a coincidence that at least one scratch register was available for authenticated tail calls.

@atrosinenko atrosinenko requested a review from kovdan01 October 7, 2024 15:21
@atrosinenko
Copy link
Contributor Author

atrosinenko commented Oct 7, 2024

Considering the reproducer, the code became a bit longer but correct:

_ZN7myshape4moveEv:                     // @_ZN7myshape4moveEv
        .cfi_startproc
// %bb.0:                               // %entry
        .cfi_b_key_frame
        pacibsp
        stp     x30, x19, [sp, #-16]!           // 16-byte Folded Spill
        .cfi_negate_ra_state
        .cfi_def_cfa_offset 16
        .cfi_offset w19, -8
        .cfi_offset w30, -16
        mov     x19, x0
        ldr     x0, [x0]
        ldr     x16, [x0]
        mov     x17, x0
        movk    x17, #50297, lsl #48
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_0
        brk     #0xc472
.Lauth_success_0:
        ldr     x9, [x16]
        mov     x8, x16
        mov     x17, x8
        movk    x17, #36564, lsl #48
        blraa   x9, x17
        ldr     x0, [x19, #8]
        ldr     x16, [x0]
        mov     x17, x0
        movk    x17, #50297, lsl #48
        autda   x16, x17
        mov     x17, x16
        xpacd   x17
        cmp     x16, x17
        b.eq    .Lauth_success_1
        brk     #0xc472
.Lauth_success_1:
        ldr     x2, [x16]
        mov     x1, x16
        ldp     x30, x19, [sp], #16             // 16-byte Folded Reload
        autibsp
        eor     x16, x30, x30, lsl #1
        tbz     x16, #62, .Lauth_success_2
        brk     #0xc471
.Lauth_success_2:
        mov     x16, x1
        movk    x16, #36564, lsl #48
        braa    x2, x16

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With latest update, test-suite passes, thanks!

@kovdan01 kovdan01 self-requested a review October 8, 2024 20:25
@kovdan01 kovdan01 dismissed their stale review October 8, 2024 20:26

Issue fixed in edaae6a

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atrosinenko Could you please add a test for the case you've fixed in your latest commit edaae6a? I might be missing smth, but the fix looks untested.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Outdated Show resolved Hide resolved
@atrosinenko
Copy link
Contributor Author

@kovdan01 Considering incorrectly choosing the scratch register, I doubt I can implement a test that is reliable: with my particular fix, it would require crafting an example that is likely to be emitted as an AUTH_TCRETURN* with AddrDisc operand being x16. I cannot simply write an MIR input with x16 being AddrDisc as it is explicitly prohibited by edaae6a. I guess I can write an input with some registers being allocated and some not (such as with copy from x16 to a virtual register being AddrDisc) and start the pipeline before register allocator pass, but this looks like overengineering. Or I could use your example as an LLVM IR input, but the test could pass for a lot of other reasons then. So let's consider this being tested by compiling a lot of sources by an assertion-enabled build of LLVM :)

On the other hand, restricting the second operand can be questionable. Maybe it is better to place the destination operand to an "unsafe" register as it is kind of "inherently untrusted" before authentication, but this is not suitable for AUTH_TCRETURN_BTI. Having two similar instructions with drastically different restrictions on their operands seems quite strange, so I just assume that usage of "safe" x16 and x17 registers outside of PAuth pseudos is not guaranteed anyway.

@atrosinenko
Copy link
Contributor Author

For uniformity, added AuthCheckMethod::XPAC to the list of all possible check methods and corresponding RUN line to llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll (this is not a test for edaae6a, though).

@asl
Copy link
Collaborator

asl commented Oct 14, 2024

@kbeyls @ahmedbougacha Will you please a look?

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp Show resolved Hide resolved
@atrosinenko atrosinenko force-pushed the users/atrosinenko/factor-out-auth-checks branch from f788c67 to bba4166 Compare October 25, 2024 12:28
@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from 5d35fc5 to 8c2472c Compare October 25, 2024 12:28
Base automatically changed from users/atrosinenko/factor-out-auth-checks to main October 25, 2024 14:03
@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from 8c2472c to 98e1252 Compare October 25, 2024 14:06
@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from 98e1252 to 1e92b53 Compare November 5, 2024 18:17
@atrosinenko
Copy link
Contributor Author

I have just rebased the PR onto current main branch (the first three commits are exactly the same, except for conflict resolution with 5192cb7) and added a few improvements, sorry for the delay.

According to code size metrics collected by building llvm-test-suite at different optimization levels, restricting $AddrDisc operand's register class actually regressed the code size. Restricting the call target, on the other hand, turned out to have negligible impact on .text section size (and only in a couple of tests). Thus, I updated the definition of AUTH_TCRETURN to avoid regressing the code size.

Presently, the PR still restricts the register class of $AddrDisc of AUTH_TCRETURN_BTI, which could still regress the code size when BTI is enabled, though the original implementation in AArch64PointerAuth pass turned out to be equally affected by this bug, so restricting $AddrDisc makes it correct at least. I added a test that catches the bug both for AUTH_TCRETURN and AUTH_TCRETURN_BTI.

@kovdan01 Are you OK with the changes?

Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left several comments with questions and suggestions. Apart of them, the PR LGTM.

llvm/test/CodeGen/AArch64/ptrauth-tail-call-regalloc.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/ptrauth-tail-call-regalloc.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/ptrauth-call.ll Show resolved Hide resolved
llvm/test/CodeGen/AArch64/ptrauth-tail-call-regalloc.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/ptrauth-tail-call-regalloc.ll Outdated Show resolved Hide resolved
Copy link
Contributor

@kovdan01 kovdan01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.
@atrosinenko atrosinenko force-pushed the users/atrosinenko/move-lr-checks-to-asmprinter branch from ec05d10 to a12ced5 Compare November 12, 2024 12:55
@atrosinenko atrosinenko merged commit 44076c9 into main Nov 12, 2024
8 checks passed
@atrosinenko atrosinenko deleted the users/atrosinenko/move-lr-checks-to-asmprinter branch November 12, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants