-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Add streaming-mode stack hazard optimization remarks #101695
Conversation
Emit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the `aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag to be set by the user, with the former flag taking precedence.
@llvm/pr-subscribers-backend-aarch64 Author: Hari Limaye (hazzlim) ChangesEmit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the Patch is 20.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101695.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index bd530903bb664..5b134f5e35324 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -240,6 +240,7 @@
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"
@@ -275,6 +276,10 @@ cl::opt<bool> EnableHomogeneousPrologEpilog(
// Stack hazard padding size. 0 = disabled.
static cl::opt<unsigned> StackHazardSize("aarch64-stack-hazard-size",
cl::init(0), cl::Hidden);
+// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
+static cl::opt<unsigned>
+ StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
+ cl::Hidden);
// Whether to insert padding into non-streaming functions (for testing).
static cl::opt<bool>
StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
@@ -2615,9 +2620,16 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
const auto &MFI = MF.getFrameInfo();
int64_t ObjectOffset = MFI.getObjectOffset(FI);
+ StackOffset SVEStackSize = getSVEStackSize(MF);
+
+ // For VLA-area objects, just emit an offset at the end of the stack frame.
+ // Whilst not quite correct, these objects do live at the end of the frame and
+ // so it is more useful for analysis for the offset to reflect this.
+ if (MFI.isVariableSizedObjectIndex(FI)) {
+ return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
+ }
// This is correct in the absence of any SVE stack objects.
- StackOffset SVEStackSize = getSVEStackSize(MF);
if (!SVEStackSize)
return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
@@ -3528,13 +3540,9 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
return true;
}
-// Return the FrameID for a Load/Store instruction by looking at the MMO.
-static std::optional<int> getLdStFrameID(const MachineInstr &MI,
- const MachineFrameInfo &MFI) {
- if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
- return std::nullopt;
-
- MachineMemOperand *MMO = *MI.memoperands_begin();
+// Return the FrameID for a MMO.
+static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
+ const MachineFrameInfo &MFI) {
auto *PSV =
dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
if (PSV)
@@ -3552,6 +3560,15 @@ static std::optional<int> getLdStFrameID(const MachineInstr &MI,
return std::nullopt;
}
+// Return the FrameID for a Load/Store instruction by looking at the first MMO.
+static std::optional<int> getLdStFrameID(const MachineInstr &MI,
+ const MachineFrameInfo &MFI) {
+ if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
+ return std::nullopt;
+
+ return getMMOFrameID(*MI.memoperands_begin(), MFI);
+}
+
// Check if a Hazard slot is needed for the current function, and if so create
// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
// which can be used to determine if any hazard padding is needed.
@@ -4626,6 +4643,10 @@ void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced(
if (StackTaggingMergeSetTag)
II = tryMergeAdjacentSTG(II, this, RS);
}
+
+ // Run remarks pass.
+ MachineOptimizationRemarkEmitter ORE(MF, nullptr);
+ emitRemarks(MF, ORE);
}
/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
@@ -5029,3 +5050,171 @@ void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
MI->eraseFromParent();
}
}
+
+struct StackAccess {
+ enum AccessType {
+ NotAccessed = 0, // Stack object not accessed by load/store instructions.
+ GPR = 1 << 0, // A general purpose register.
+ PPR = 1 << 1, // A predicate register.
+ FPR = 1 << 2, // A floating point/Neon/SVE register.
+ };
+
+ int Idx;
+ StackOffset Offset;
+ int64_t Size;
+ unsigned AccessTypes;
+
+ StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
+
+ bool operator<(const StackAccess &Rhs) const {
+ return std::make_tuple(start(), Idx) <
+ std::make_tuple(Rhs.start(), Rhs.Idx);
+ }
+
+ bool isCPU() const {
+ // Predicate register load and store instructions execute on the CPU.
+ return AccessTypes & (AccessType::GPR | AccessType::PPR);
+ }
+ bool isSME() const { return AccessTypes & AccessType::FPR; }
+ bool isMixed() const { return ((AccessTypes & (AccessTypes - 1)) != 0); }
+
+ int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
+ int64_t end() const { return start() + Size; }
+
+ std::string getTypeString() const {
+ switch (AccessTypes) {
+ case AccessType::FPR:
+ return "FPR";
+ case AccessType::PPR:
+ return "PPR";
+ case AccessType::GPR:
+ return "GPR";
+ case AccessType::NotAccessed:
+ return "NA";
+ default:
+ return "Mixed";
+ }
+ }
+
+ void print(raw_ostream &OS) const {
+ OS << getTypeString() << " stack object at [SP"
+ << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
+ if (Offset.getScalable())
+ OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
+ << " * vscale";
+ OS << "]";
+ }
+};
+
+static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
+ SA.print(OS);
+ return OS;
+}
+
+void AArch64FrameLowering::emitRemarks(
+ const MachineFunction &MF, MachineOptimizationRemarkEmitter &ORE) const {
+
+ SMEAttrs Attrs(MF.getFunction());
+ if (Attrs.hasNonStreamingInterfaceAndBody())
+ return;
+
+ const uint64_t HazardSize =
+ (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
+
+ if (HazardSize == 0)
+ return;
+
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
+
+ std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
+
+ size_t NumFPLdSt = 0;
+ size_t NumNonFPLdSt = 0;
+
+ // Collect stack accesses via Load/Store instructions.
+ for (const MachineBasicBlock &MBB : MF) {
+ for (const MachineInstr &MI : MBB) {
+ if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
+ continue;
+ for (MachineMemOperand *MMO : MI.memoperands()) {
+ std::optional<int> FI = getMMOFrameID(MMO, MFI);
+ if (FI && !MFI.isDeadObjectIndex(*FI)) {
+ int FrameIdx = *FI;
+
+ size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
+ if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
+ StackAccesses[ArrIdx].Idx = FrameIdx;
+ StackAccesses[ArrIdx].Offset =
+ getFrameIndexReferenceFromSP(MF, FrameIdx);
+ StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
+ }
+
+ unsigned RegTy = StackAccess::AccessType::GPR;
+ if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
+ if (AArch64::PPRRegClass.contains(MI.getOperand(0).getReg()))
+ RegTy = StackAccess::PPR;
+ else
+ RegTy = StackAccess::FPR;
+ } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
+ RegTy = StackAccess::FPR;
+ }
+
+ StackAccesses[ArrIdx].AccessTypes |= RegTy;
+
+ if (RegTy == StackAccess::FPR)
+ ++NumFPLdSt;
+ else
+ ++NumNonFPLdSt;
+ }
+ }
+ }
+ }
+
+ if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
+ return;
+
+ llvm::sort(StackAccesses);
+ StackAccesses.erase(llvm::remove_if(StackAccesses,
+ [](const StackAccess &S) {
+ return S.AccessTypes ==
+ StackAccess::NotAccessed;
+ }),
+ StackAccesses.end());
+
+ SmallVector<const StackAccess *> MixedObjects;
+ SmallVector<std::pair<const StackAccess *, const StackAccess *>> HazardPairs;
+
+ if (StackAccesses.front().isMixed())
+ MixedObjects.push_back(&StackAccesses.front());
+
+ for (auto It = StackAccesses.begin(), End = StackAccesses.end();
+ It != (End - 1); ++It) {
+ const auto &First = *It;
+ const auto &Second = *(It + 1);
+
+ if (Second.isMixed())
+ MixedObjects.push_back(&Second);
+
+ if ((First.isSME() && Second.isCPU()) ||
+ (First.isCPU() && Second.isSME())) {
+ uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
+ if (Distance < HazardSize)
+ HazardPairs.emplace_back(&First, &Second);
+ }
+ }
+
+ auto EmitRemark = [&](llvm::StringRef Str) {
+ ORE.emit([&]() {
+ auto R = MachineOptimizationRemarkAnalysis(
+ "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
+ return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
+ });
+ };
+
+ for (const auto &P : HazardPairs)
+ EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
+
+ for (const auto *Obj : MixedObjects)
+ EmitRemark(
+ formatv("{0} accessed by both GP and FP instructions", *Obj).str());
+}
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.h b/llvm/lib/Target/AArch64/AArch64FrameLowering.h
index 0ebab1700e9ce..396caa6b04868 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.h
@@ -13,8 +13,9 @@
#ifndef LLVM_LIB_TARGET_AARCH64_AARCH64FRAMELOWERING_H
#define LLVM_LIB_TARGET_AARCH64_AARCH64FRAMELOWERING_H
-#include "llvm/Support/TypeSize.h"
+#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
#include "llvm/CodeGen/TargetFrameLowering.h"
+#include "llvm/Support/TypeSize.h"
namespace llvm {
@@ -178,6 +179,9 @@ class AArch64FrameLowering : public TargetFrameLowering {
inlineStackProbeLoopExactMultiple(MachineBasicBlock::iterator MBBI,
int64_t NegProbeSize,
Register TargetReg) const;
+
+ void emitRemarks(const MachineFunction &MF,
+ MachineOptimizationRemarkEmitter &ORE) const;
};
} // End llvm namespace
diff --git a/llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll b/llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll
new file mode 100644
index 0000000000000..94b915eb42cfd
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll
@@ -0,0 +1,156 @@
+; RUN: llc < %s -mtriple=aarch64 -mattr=+sve2 -pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=64 -o /dev/null < %s 2>&1 | FileCheck %s --check-prefixes=CHECK
+; RUN: llc < %s -mtriple=aarch64 -mattr=+sve2 -pass-remarks-analysis=sme -aarch64-stack-hazard-size=1024 -o /dev/null < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-PADDING
+
+; Don't emit remarks for non-streaming functions.
+define float @csr_x20_stackargs_notsc(float %a, float %b, float %c, float %d, float %e, float %f, float %g, float %h, float %i) {
+; CHECK-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_notsc':
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_notsc':
+entry:
+ tail call void asm sideeffect "", "~{x20}"() #1
+ ret float %i
+}
+
+; Don't emit remarks for functions that only access GPR stack objects.
+define i64 @stackargs_gpr(i64 %a, i64 %b, i64 %c, i64 %d, i64 %e, i64 %f, i64 %g, i64 %h, i64 %i) #2 {
+; CHECK-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_gpr':
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_gpr':
+entry:
+ ret i64 %i
+}
+
+; Don't emit remarks for functions that only access FPR stack objects.
+define double @stackargs_fpr(double %a, double %b, double %c, double %d, double %e, double %f, double %g, double %h, double %i) #2 {
+; CHECK-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_fpr':
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs_fpr':
+entry:
+ ret double %i
+}
+
+; As this case is handled by addition of stack hazard padding, only emit remarks when this is not switched on.
+define i32 @csr_d8_alloci64(i64 %d) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'csr_d8_alloci64': FPR stack object at [SP-16] is too close to GPR stack object at [SP-8]
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_d8_alloci64':
+entry:
+ %a = alloca i64
+ tail call void asm sideeffect "", "~{d8}"() #1
+ store i64 %d, ptr %a
+ ret i32 0
+}
+
+; As this case is handled by addition of stack hazard padding, only emit remarks when this is not switched on.
+define i32 @csr_d8_allocnxv4i32(i64 %d) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32': FPR stack object at [SP-16] is too close to GPR stack object at [SP-8]
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32':
+entry:
+ %a = alloca <vscale x 4 x i32>
+ tail call void asm sideeffect "", "~{d8}"() #1
+ store <vscale x 4 x i32> zeroinitializer, ptr %a
+ ret i32 0
+}
+
+define float @csr_x20_stackargs(float %a, float %b, float %c, float %d, float %e, float %f, float %g, float %h, float %i) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs': GPR stack object at [SP-16] is too close to FPR stack object at [SP+0]
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'csr_x20_stackargs': GPR stack object at [SP-16] is too close to FPR stack object at [SP+0]
+entry:
+ tail call void asm sideeffect "", "~{x20}"() #1
+ ret float %i
+}
+
+; In this case, addition of stack hazard padding triggers x29 (fp) spill, so we hazard occurs between FPR argument and GPR spill.
+define float @csr_d8_stackargs(float %a, float %b, float %c, float %d, float %e, float %f, float %g, float %h, float %i) #2 {
+; CHECK-NOT: remark: <unknown>:0:0: stack hazard in 'csr_d8_stackargs':
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'csr_d8_stackargs': GPR stack object at [SP-8] is too close to FPR stack object at [SP+0]
+entry:
+ tail call void asm sideeffect "", "~{d8}"() #1
+ ret float %i
+}
+
+; SVE calling conventions
+; Predicate register spills end up in FP region, currently.
+
+define i32 @svecc_call(<4 x i16> %P0, ptr %P1, i32 %P2, <vscale x 16 x i8> %P3, i16 %P4) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'svecc_call': PPR stack object at [SP-48-258 * vscale] is too close to FPR stack object at [SP-48-256 * vscale]
+; CHECK: remark: <unknown>:0:0: stack hazard in 'svecc_call': FPR stack object at [SP-48-16 * vscale] is too close to GPR stack object at [SP-48]
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'svecc_call': PPR stack object at [SP-1072-258 * vscale] is too close to FPR stack object at [SP-1072-256 * vscale]
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'svecc_call':
+entry:
+ tail call void asm sideeffect "", "~{x0},~{x28},~{x27},~{x3}"() #2
+ %call = call ptr @memset(ptr noundef nonnull %P1, i32 noundef 45, i32 noundef 37)
+ ret i32 -396142473
+}
+
+define i32 @svecc_alloca_call(<4 x i16> %P0, ptr %P1, i32 %P2, <vscale x 16 x i8> %P3, i16 %P4) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'svecc_alloca_call': PPR stack object at [SP-48-258 * vscale] is too close to FPR stack object at [SP-48-256 * vscale]
+; CHECK: remark: <unknown>:0:0: stack hazard in 'svecc_alloca_call': FPR stack object at [SP-48-16 * vscale] is too close to GPR stack object at [SP-48]
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'svecc_alloca_call': PPR stack object at [SP-1072-258 * vscale] is too close to FPR stack object at [SP-1072-256 * vscale]
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'svecc_alloca_call':
+entry:
+ tail call void asm sideeffect "", "~{x0},~{x28},~{x27},~{x3}"() #2
+ %0 = alloca [37 x i8], align 16
+ %call = call ptr @memset(ptr noundef nonnull %0, i32 noundef 45, i32 noundef 37)
+ ret i32 -396142473
+}
+declare ptr @memset(ptr, i32, i32)
+
+%struct.test_struct = type { i32, float, i32 }
+
+define i32 @mixed_stack_object(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i64 %mixed_obj) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'mixed_stack_object': Mixed stack object at [SP+8] accessed by both GP and FP instructions
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'mixed_stack_object': Mixed stack object at [SP+8] accessed by both GP and FP instructions
+entry:
+ %t.sroa.0.0.extract.trunc = trunc i64 %mixed_obj to i32
+ %t.sroa.2.0.extract.shift = lshr i64 %mixed_obj, 32
+ %t.sroa.2.0.extract.trunc = trunc nuw i64 %t.sroa.2.0.extract.shift to i32
+ %0 = bitcast i32 %t.sroa.2.0.extract.trunc to float
+ %conv = sitofp i32 %t.sroa.0.0.extract.trunc to float
+ %add = fadd float %conv, %0
+ %conv2 = fptosi float %add to i32
+ ret i32 %conv2
+}
+
+define i32 @mixed_stack_objects(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i64 %mixed_obj_0, i64 %mixed_obj_1) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+8] is too close to Mixed stack object at [SP+16]
+; CHECK: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+8] accessed by both GP and FP instructions
+; CHECK: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+16] accessed by both GP and FP instructions
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+8] is too close to Mixed stack object at [SP+16]
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+8] accessed by both GP and FP instructions
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'mixed_stack_objects': Mixed stack object at [SP+16] accessed by both GP and FP instructions
+entry:
+ %t0.sroa.0.0.extract.trunc = trunc i64 %mixed_obj_0 to i32
+ %t0.sroa.2.0.extract.shift = lshr i64 %mixed_obj_0, 32
+ %t0.sroa.2.0.extract.trunc = trunc nuw i64 %t0.sroa.2.0.extract.shift to i32
+ %t1.sroa.0.0.extract.trunc = trunc i64 %mixed_obj_1 to i32
+ %t1.sroa.2.0.extract.shift = lshr i64 %mixed_obj_1, 32
+ %t1.sroa.2.0.extract.trunc = trunc nuw i64 %t1.sroa.2.0.extract.shift to i32
+ %0 = bitcast i32 %t0.sroa.2.0.extract.trunc to float
+ %1 = bitcast i32 %t1.sroa.2.0.extract.trunc to float
+ %conv0 = sitofp i32 %t0.sroa.0.0.extract.trunc to float
+ %conv1 = sitofp i32 %t1.sroa.0.0.extract.trunc to float
+ %add0 = fadd float %conv0, %0
+ %add1 = fadd float %conv1, %1
+ %add = fadd float %add0, %add1
+ %conv2 = fptosi float %add to i32
+ ret i32 %conv2
+}
+
+; VLA-area stack objects are not separated.
+define i32 @csr_d8_allocnxv4i32i32f64_vlai32f64(double %d, i32 %i) #2 {
+; CHECK: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32i32f64_vlai32f64': GPR stack object at [SP-48-16 * vscale] is too close to FPR stack object at [SP-48-16 * vscale]
+; CHECK: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32i32f64_vlai32f64': FPR stack object at [SP-32] is too close to GPR stack object at [SP-24]
+; CHECK-PADDING: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32i32f64_vlai32f64': GPR stack object at [SP-2096-16 * vscale] is too close to FPR stack object at [SP-2096-16 * vscale]
+; CHECK-PADDING-NOT: remark: <unknown>:0:0: stack hazard in 'csr_d8_allocnxv4i32i32f64_vlai32f64':
+entry:
+ %a = alloca <vscale x 4 x i32>
+ %0 = zext i32 %i to i64
+ %vla0 = alloca i32, i64 %0
+ %vla1 = alloca double, i64 %0
+ %c = alloca double
+ tail call void asm sideeffect "", "~{d8}"() #1
+ store <vscale x 4 x i32> zeroinitializer, ptr %a
+ store i32 zeroinitializer, ptr %vla0
+ store double %d, ptr %vla1
+ store double %d, ptr %c
+ ret i32 0
+}
+
+attributes #2 = { "aarch64_pstate_sm_compatible" }
diff --git a/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll b/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
index 431c9dc76508f..ec94198a08ca7 100644
--- a/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
+++ b/llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll
@@ -150,8 +150,8 @@ entry:
; CHECK-FRAMELAYOUT-NEXT...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. It sounds like it should be useful for detecting when stack hazards are able to fix the problems. I had some minor comments, but it otherwise looks sensible to me.
@@ -4626,6 +4643,10 @@ void AArch64FrameLowering::processFunctionBeforeFrameIndicesReplaced( | |||
if (StackTaggingMergeSetTag) | |||
II = tryMergeAdjacentSTG(II, this, RS); | |||
} | |||
|
|||
// Run remarks pass. | |||
MachineOptimizationRemarkEmitter ORE(MF, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEI has a ORE that comes from the pass pipeline. Could PEI pass ORE to processFunctionBeforeFrameIndicesReplaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I've added a new virtual emitRemarks
function to TFI so that we can reuse the ORE from PEI. I've put the call to TFI.emitRemarks()
at the end of PEI::runOnMachineFunction
, as other target independent remarks passes are run here.
return AccessTypes & (AccessType::GPR | AccessType::PPR); | ||
} | ||
bool isSME() const { return AccessTypes & AccessType::FPR; } | ||
bool isMixed() const { return ((AccessTypes & (AccessTypes - 1)) != 0); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this handle GPR & PPR? Maybe make it simpler and just return isCPU() && isSME()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - Done.
for (auto It = StackAccesses.begin(), End = StackAccesses.end(); | ||
It != (End - 1); ++It) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use End = std::prev(StackAccesses.end());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - Done.
|
||
const MachineFrameInfo &MFI = MF.getFrameInfo(); | ||
|
||
std::vector<StackAccess> StackAccesses(MFI.getNumObjects()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what happens if MFI.getNumObjects()==0
, but it might be worth bailing out early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - Done.
ret i32 %conv2 | ||
} | ||
|
||
define i32 @mixed_stack_objects(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i64 %mixed_obj_0, i64 %mixed_obj_1) #2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this not rely on fcvt? That might be something we try to optimize in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored these tests to just use mixed local objects (allocas) and removed the stuff producing fcvt instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
/cherry-pick a98a0dc |
…101695) Emit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the `aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag to be set by the user, with the former flag taking precedence. (cherry picked from commit a98a0dc)
/pull-request #102168 |
…101695) Emit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the `aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag to be set by the user, with the former flag taking precedence. (cherry picked from commit a98a0dc)
Emit an optimization remark when objects in the stack frame may cause hazards in a streaming mode function. The analysis requires either the
aarch64-stack-hazard-size
oraarch64-stack-hazard-remark-size
flag to be set by the user, with the former flag taking precedence.