[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944

jdenny-ornl · 2024-08-12T17:57:31Z

This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.

By default, the pass runs at the end of LTO, and options like -Rpass=kernel-info enable its remarks. Example opt and clang command lines appear in llvm/docs/KernelInfo.rst. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in llvm/test/Analysis/KernelInfo.

This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass is disabled. For convenience, `-kernel-info-end-lto` inserts it at the end of LTO, and options like `-Rpass=kernel-info` enable its remarks. Example opt and clang command lines appear in comments in `llvm/include/llvm/Analysis/KernelInfo.h`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.

llvmbot · 2024-08-12T17:58:03Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-analysis

Author: Joel E. Denny (jdenny-ornl)

Changes

This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.

By default, the pass is disabled. For convenience, -kernel-info-end-lto inserts it at the end of LTO, and options like -Rpass=kernel-info enable its remarks. Example opt and clang command lines appear in comments in
llvm/include/llvm/Analysis/KernelInfo.h. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in llvm/test/Analysis/KernelInfo.

Patch is 129.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/102944.diff

20 Files Affected:

(added) llvm/include/llvm/Analysis/KernelInfo.h (+148)
(modified) llvm/include/llvm/Target/TargetMachine.h (+3)
(modified) llvm/lib/Analysis/CMakeLists.txt (+1)
(added) llvm/lib/Analysis/KernelInfo.cpp (+350)
(modified) llvm/lib/Passes/PassBuilder.cpp (+1)
(modified) llvm/lib/Passes/PassRegistry.def (+2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+10)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+10)
(modified) llvm/lib/Target/TargetMachine.cpp (+5)
(added) llvm/test/Analysis/KernelInfo/addrspace0.ll (+152)
(added) llvm/test/Analysis/KernelInfo/allocas.ll (+78)
(added) llvm/test/Analysis/KernelInfo/calls.ll (+112)
(added) llvm/test/Analysis/KernelInfo/kernel-info-after-lto/amdgpu.ll (+47)
(added) llvm/test/Analysis/KernelInfo/kernel-info-after-lto/nvptx.ll (+47)
(added) llvm/test/Analysis/KernelInfo/launch-bounds/amdgpu.ll (+40)
(added) llvm/test/Analysis/KernelInfo/launch-bounds/nvptx.ll (+36)
(added) llvm/test/Analysis/KernelInfo/linkage.ll (+51)
(added) llvm/test/Analysis/KernelInfo/openmp/README.md (+40)
(added) llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll (+217)
(added) llvm/test/Analysis/KernelInfo/openmp/nvptx.ll (+811)

diff --git a/llvm/include/llvm/Analysis/KernelInfo.h b/llvm/include/llvm/Analysis/KernelInfo.h
new file mode 100644
index 00000000000000..5495bb2fd4d925
--- /dev/null
+++ b/llvm/include/llvm/Analysis/KernelInfo.h
@@ -0,0 +1,148 @@
+//=- KernelInfo.h - Kernel Analysis -------------------------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the KernelInfo, KernelInfoAnalysis, and KernelInfoPrinter
+// classes used to extract function properties from a GPU kernel.
+//
+// To analyze a C program as it appears to an LLVM GPU backend at the end of
+// LTO:
+//
+//   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
+//       -Rpass=kernel-info -mllvm -kernel-info-end-lto
+//
+// To analyze specified LLVM IR, perhaps previously generated by something like
+// 'clang -save-temps -g -fopenmp --offload-arch=native test.c':
+//
+//   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
+//       -pass-remarks=kernel-info -passes=kernel-info
+//
+// kernel-info can also be inserted into a specified LLVM pass pipeline using
+// -kernel-info-end-lto, or it can be positioned explicitly in that pipeline:
+//
+//   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
+//       -Rpass=kernel-info -mllvm -kernel-info-end-lto \
+//       -Xoffload-linker --lto-newpm-passes='lto<O2>'
+//
+//   $ clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
+//       -Rpass=kernel-info \
+//       -Xoffload-linker --lto-newpm-passes='lto<O2>,module(kernel-info)'
+//
+//   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
+//       -pass-remarks=kernel-info -kernel-info-end-lto -passes='lto<O2>'
+//
+//   $ opt -disable-output test-openmp-nvptx64-nvidia-cuda-sm_70.bc \
+//       -pass-remarks=kernel-info -passes='lto<O2>,module(kernel-info)'
+// ===---------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_KERNELINFO_H
+#define LLVM_ANALYSIS_KERNELINFO_H
+
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+
+namespace llvm {
+class DominatorTree;
+class Function;
+
+/// Data structure holding function info for kernels.
+class KernelInfo {
+  void updateForBB(const BasicBlock &BB, int64_t Direction,
+                   OptimizationRemarkEmitter &ORE);
+
+public:
+  static KernelInfo getKernelInfo(Function &F, FunctionAnalysisManager &FAM);
+
+  bool operator==(const KernelInfo &FPI) const {
+    return std::memcmp(this, &FPI, sizeof(KernelInfo)) == 0;
+  }
+
+  bool operator!=(const KernelInfo &FPI) const { return !(*this == FPI); }
+
+  /// If false, nothing was recorded here because the supplied function didn't
+  /// appear in a module compiled for a GPU.
+  bool IsValid = false;
+
+  /// Whether the function has external linkage and is not a kernel function.
+  bool ExternalNotKernel = false;
+
+  /// OpenMP Launch bounds.
+  ///@{
+  std::optional<int64_t> OmpTargetNumTeams;
+  std::optional<int64_t> OmpTargetThreadLimit;
+  ///@}
+
+  /// AMDGPU launch bounds.
+  ///@{
+  std::optional<int64_t> AmdgpuMaxNumWorkgroupsX;
+  std::optional<int64_t> AmdgpuMaxNumWorkgroupsY;
+  std::optional<int64_t> AmdgpuMaxNumWorkgroupsZ;
+  std::optional<int64_t> AmdgpuFlatWorkGroupSizeMin;
+  std::optional<int64_t> AmdgpuFlatWorkGroupSizeMax;
+  std::optional<int64_t> AmdgpuWavesPerEuMin;
+  std::optional<int64_t> AmdgpuWavesPerEuMax;
+  ///@}
+
+  /// NVPTX launch bounds.
+  ///@{
+  std::optional<int64_t> Maxclusterrank;
+  std::optional<int64_t> Maxntidx;
+  ///@}
+
+  /// The number of alloca instructions inside the function, the number of those
+  /// with allocation sizes that cannot be determined at compile time, and the
+  /// sum of the sizes that can be.
+  ///
+  /// With the current implementation for at least some GPU archs,
+  /// AllocasDyn > 0 might not be possible, but we report AllocasDyn anyway in
+  /// case the implementation changes.
+  int64_t Allocas = 0;
+  int64_t AllocasDyn = 0;
+  int64_t AllocasStaticSizeSum = 0;
+
+  /// Number of direct/indirect calls (anything derived from CallBase).
+  int64_t DirectCalls = 0;
+  int64_t IndirectCalls = 0;
+
+  /// Number of direct calls made from this function to other functions
+  /// defined in this module.
+  int64_t DirectCallsToDefinedFunctions = 0;
+
+  /// Number of calls of type InvokeInst.
+  int64_t Invokes = 0;
+
+  /// Number of addrspace(0) memory accesses (via load, store, etc.).
+  int64_t AddrspaceZeroAccesses = 0;
+};
+
+/// Analysis class for KernelInfo.
+class KernelInfoAnalysis : public AnalysisInfoMixin<KernelInfoAnalysis> {
+public:
+  static AnalysisKey Key;
+
+  using Result = const KernelInfo;
+
+  KernelInfo run(Function &F, FunctionAnalysisManager &FAM) {
+    return KernelInfo::getKernelInfo(F, FAM);
+  }
+};
+
+/// Printer pass for KernelInfoAnalysis.
+///
+/// It just calls KernelInfoAnalysis, which prints remarks if they are enabled.
+class KernelInfoPrinter : public PassInfoMixin<KernelInfoPrinter> {
+public:
+  explicit KernelInfoPrinter() {}
+
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
+    AM.getResult<KernelInfoAnalysis>(F);
+    return PreservedAnalyses::all();
+  }
+
+  static bool isRequired() { return true; }
+};
+} // namespace llvm
+#endif // LLVM_ANALYSIS_KERNELINFO_H
diff --git a/llvm/include/llvm/Target/TargetMachine.h b/llvm/include/llvm/Target/TargetMachine.h
index c3e9d41315f617..5c338a8fcd0cfb 100644
--- a/llvm/include/llvm/Target/TargetMachine.h
+++ b/llvm/include/llvm/Target/TargetMachine.h
@@ -18,6 +18,7 @@
 #include "llvm/IR/PassManager.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/CodeGen.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/PGOOptions.h"
 #include "llvm/Target/CGPassBuilderOption.h"
@@ -27,6 +28,8 @@
 #include <string>
 #include <utility>
 
+extern llvm::cl::opt<bool> KernelInfoEndLTO;
+
 namespace llvm {
 
 class AAManager;
diff --git a/llvm/lib/Analysis/CMakeLists.txt b/llvm/lib/Analysis/CMakeLists.txt
index 2cb3547ec40473..02e76af8d903de 100644
--- a/llvm/lib/Analysis/CMakeLists.txt
+++ b/llvm/lib/Analysis/CMakeLists.txt
@@ -78,6 +78,7 @@ add_llvm_component_library(LLVMAnalysis
   InstructionPrecedenceTracking.cpp
   InstructionSimplify.cpp
   InteractiveModelRunner.cpp
+  KernelInfo.cpp
   LazyBranchProbabilityInfo.cpp
   LazyBlockFrequencyInfo.cpp
   LazyCallGraph.cpp
diff --git a/llvm/lib/Analysis/KernelInfo.cpp b/llvm/lib/Analysis/KernelInfo.cpp
new file mode 100644
index 00000000000000..9df3b5b32afcb4
--- /dev/null
+++ b/llvm/lib/Analysis/KernelInfo.cpp
@@ -0,0 +1,350 @@
+//===- KernelInfo.cpp - Kernel Analysis -----------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the KernelInfo, KernelInfoAnalysis, and KernelInfoPrinter
+// classes used to extract function properties from a kernel.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Analysis/KernelInfo.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/IR/DebugInfo.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Passes/PassBuilder.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "kernel-info"
+
+static bool isKernelFunction(Function &F) {
+  // TODO: Is this general enough?  Consider languages beyond OpenMP.
+  return F.hasFnAttribute("kernel");
+}
+
+static void identifyFunction(OptimizationRemark &R, const Function &F) {
+  if (auto *SubProgram = F.getSubprogram()) {
+    if (SubProgram->isArtificial())
+      R << "artificial ";
+  }
+  R << "function '" << F.getName() << "'";
+}
+
+static void remarkAlloca(OptimizationRemarkEmitter &ORE, const Function &Caller,
+                         const AllocaInst &Alloca,
+                         TypeSize::ScalarTy StaticSize) {
+  ORE.emit([&] {
+    StringRef Name;
+    DebugLoc Loc;
+    bool Artificial = false;
+    auto DVRs = findDVRDeclares(&const_cast<AllocaInst &>(Alloca));
+    if (!DVRs.empty()) {
+      const DbgVariableRecord &DVR = **DVRs.begin();
+      Name = DVR.getVariable()->getName();
+      Loc = DVR.getDebugLoc();
+      Artificial = DVR.Variable->isArtificial();
+    }
+    OptimizationRemark R(DEBUG_TYPE, "Alloca", DiagnosticLocation(Loc),
+                         Alloca.getParent());
+    R << "in ";
+    identifyFunction(R, Caller);
+    R << ", ";
+    if (Artificial)
+      R << "artificial ";
+    if (Name.empty()) {
+      R << "unnamed alloca ";
+      if (DVRs.empty())
+        R << "(missing debug metadata) ";
+    } else {
+      R << "alloca '" << Name << "' ";
+    }
+    R << "with ";
+    if (StaticSize)
+      R << "static size of " << itostr(StaticSize) << " bytes";
+    else
+      R << "dynamic size";
+    return R;
+  });
+}
+
+static void remarkCall(OptimizationRemarkEmitter &ORE, const Function &Caller,
+                       const CallBase &Call, StringRef CallKind,
+                       StringRef RemarkKind) {
+  ORE.emit([&] {
+    OptimizationRemark R(DEBUG_TYPE, RemarkKind, &Call);
+    R << "in ";
+    identifyFunction(R, Caller);
+    R << ", " << CallKind;
+    if (const Function *Callee =
+            dyn_cast_or_null<Function>(Call.getCalledOperand())) {
+      R << ", callee is";
+      StringRef Name = Callee->getName();
+      if (auto *SubProgram = Callee->getSubprogram()) {
+        if (SubProgram->isArtificial())
+          R << " artificial";
+      }
+      if (!Name.empty())
+        R << " '" << Name << "'";
+      else
+        R << " with unknown name";
+    }
+    return R;
+  });
+}
+
+static void remarkAddrspaceZeroAccess(OptimizationRemarkEmitter &ORE,
+                                      const Function &Caller,
+                                      const Instruction &Inst) {
+  ORE.emit([&] {
+    OptimizationRemark R(DEBUG_TYPE, "AddrspaceZeroAccess", &Inst);
+    R << "in ";
+    identifyFunction(R, Caller);
+    if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(&Inst)) {
+      R << ", '" << II->getCalledFunction()->getName() << "' call";
+    } else {
+      R << ", '" << Inst.getOpcodeName() << "' instruction";
+    }
+    if (Inst.hasName())
+      R << " ('%" << Inst.getName() << "')";
+    R << " accesses memory in addrspace(0)";
+    return R;
+  });
+}
+
+void KernelInfo::updateForBB(const BasicBlock &BB, int64_t Direction,
+                             OptimizationRemarkEmitter &ORE) {
+  assert(Direction == 1 || Direction == -1);
+  const Function &F = *BB.getParent();
+  const Module &M = *F.getParent();
+  const DataLayout &DL = M.getDataLayout();
+  for (const Instruction &I : BB.instructionsWithoutDebug()) {
+    if (const AllocaInst *Alloca = dyn_cast<AllocaInst>(&I)) {
+      Allocas += Direction;
+      TypeSize::ScalarTy StaticSize = 0;
+      if (std::optional<TypeSize> Size = Alloca->getAllocationSize(DL)) {
+        StaticSize = Size->getFixedValue();
+        assert(StaticSize <= std::numeric_limits<int64_t>::max());
+        AllocasStaticSizeSum += Direction * StaticSize;
+      } else {
+        AllocasDyn += Direction;
+      }
+      remarkAlloca(ORE, F, *Alloca, StaticSize);
+    } else if (const CallBase *Call = dyn_cast<CallBase>(&I)) {
+      std::string CallKind;
+      std::string RemarkKind;
+      if (Call->isIndirectCall()) {
+        IndirectCalls += Direction;
+        CallKind += "indirect";
+        RemarkKind += "Indirect";
+      } else {
+        DirectCalls += Direction;
+        CallKind += "direct";
+        RemarkKind += "Direct";
+      }
+      if (isa<InvokeInst>(Call)) {
+        Invokes += Direction;
+        CallKind += " invoke";
+        RemarkKind += "Invoke";
+      } else {
+        CallKind += " call";
+        RemarkKind += "Call";
+      }
+      if (!Call->isIndirectCall()) {
+        if (const Function *Callee = Call->getCalledFunction()) {
+          if (Callee && !Callee->isIntrinsic() && !Callee->isDeclaration()) {
+            DirectCallsToDefinedFunctions += Direction;
+            CallKind += " to defined function";
+            RemarkKind += "ToDefinedFunction";
+          }
+        }
+      }
+      remarkCall(ORE, F, *Call, CallKind, RemarkKind);
+      if (const AnyMemIntrinsic *MI = dyn_cast<AnyMemIntrinsic>(Call)) {
+        if (MI->getDestAddressSpace() == 0) {
+          AddrspaceZeroAccesses += Direction;
+          remarkAddrspaceZeroAccess(ORE, F, I);
+        } else if (const AnyMemTransferInst *MT =
+                       dyn_cast<AnyMemTransferInst>(MI)) {
+          if (MT->getSourceAddressSpace() == 0) {
+            AddrspaceZeroAccesses += Direction;
+            remarkAddrspaceZeroAccess(ORE, F, I);
+          }
+        }
+      }
+    } else if (const LoadInst *Load = dyn_cast<LoadInst>(&I)) {
+      if (Load->getPointerAddressSpace() == 0) {
+        AddrspaceZeroAccesses += Direction;
+        remarkAddrspaceZeroAccess(ORE, F, I);
+      }
+    } else if (const StoreInst *Store = dyn_cast<StoreInst>(&I)) {
+      if (Store->getPointerAddressSpace() == 0) {
+        AddrspaceZeroAccesses += Direction;
+        remarkAddrspaceZeroAccess(ORE, F, I);
+      }
+    } else if (const AtomicRMWInst *At = dyn_cast<AtomicRMWInst>(&I)) {
+      if (At->getPointerAddressSpace() == 0) {
+        AddrspaceZeroAccesses += Direction;
+        remarkAddrspaceZeroAccess(ORE, F, I);
+      }
+    } else if (const AtomicCmpXchgInst *At = dyn_cast<AtomicCmpXchgInst>(&I)) {
+      if (At->getPointerAddressSpace() == 0) {
+        AddrspaceZeroAccesses += Direction;
+        remarkAddrspaceZeroAccess(ORE, F, I);
+      }
+    }
+  }
+}
+
+static void remarkProperty(OptimizationRemarkEmitter &ORE, const Function &F,
+                           StringRef Name, int64_t Value) {
+  ORE.emit([&] {
+    OptimizationRemark R(DEBUG_TYPE, Name, &F);
+    R << "in ";
+    identifyFunction(R, F);
+    R << ", " << Name << " = " << itostr(Value);
+    return R;
+  });
+}
+
+static void remarkProperty(OptimizationRemarkEmitter &ORE, const Function &F,
+                           StringRef Name, std::optional<int64_t> Value) {
+  if (!Value)
+    return;
+  remarkProperty(ORE, F, Name, Value.value());
+}
+
+static std::vector<std::optional<int64_t>>
+parseFnAttrAsIntegerFields(Function &F, StringRef Name, unsigned NumFields) {
+  std::vector<std::optional<int64_t>> Result(NumFields);
+  Attribute A = F.getFnAttribute(Name);
+  if (!A.isStringAttribute())
+    return Result;
+  StringRef Rest = A.getValueAsString();
+  for (unsigned I = 0; I < NumFields; ++I) {
+    StringRef Field;
+    std::tie(Field, Rest) = Rest.split(',');
+    if (Field.empty())
+      break;
+    int64_t Val;
+    if (Field.getAsInteger(0, Val)) {
+      F.getContext().emitError("cannot parse integer in attribute '" + Name +
+                               "': " + Field);
+      break;
+    }
+    Result[I] = Val;
+  }
+  if (!Rest.empty())
+    F.getContext().emitError("too many fields in attribute " + Name);
+  return Result;
+}
+
+static std::optional<int64_t> parseFnAttrAsInteger(Function &F,
+                                                   StringRef Name) {
+  return parseFnAttrAsIntegerFields(F, Name, 1)[0];
+}
+
+// TODO: This nearly duplicates the same function in OMPIRBuilder.cpp.  Can we
+// share?
+static MDNode *getNVPTXMDNode(Function &F, StringRef Name) {
+  Module &M = *F.getParent();
+  NamedMDNode *MD = M.getNamedMetadata("nvvm.annotations");
+  if (!MD)
+    return nullptr;
+  for (auto *Op : MD->operands()) {
+    if (Op->getNumOperands() != 3)
+      continue;
+    auto *KernelOp = dyn_cast<ConstantAsMetadata>(Op->getOperand(0));
+    if (!KernelOp || KernelOp->getValue() != &F)
+      continue;
+    auto *Prop = dyn_cast<MDString>(Op->getOperand(1));
+    if (!Prop || Prop->getString() != Name)
+      continue;
+    return Op;
+  }
+  return nullptr;
+}
+
+static std::optional<int64_t> parseNVPTXMDNodeAsInteger(Function &F,
+                                                        StringRef Name) {
+  std::optional<int64_t> Result;
+  if (MDNode *ExistingOp = getNVPTXMDNode(F, Name)) {
+    auto *Op = cast<ConstantAsMetadata>(ExistingOp->getOperand(2));
+    Result = cast<ConstantInt>(Op->getValue())->getZExtValue();
+  }
+  return Result;
+}
+
+KernelInfo KernelInfo::getKernelInfo(Function &F,
+                                     FunctionAnalysisManager &FAM) {
+  KernelInfo KI;
+  // Only analyze modules for GPUs.
+  // TODO: This would be more maintainable if there were an isGPU.
+  const std::string &TT = F.getParent()->getTargetTriple();
+  llvm::Triple T(TT);
+  if (!T.isAMDGPU() && !T.isNVPTX())
+    return KI;
+  KI.IsValid = true;
+
+  // Record function properties.
+  KI.ExternalNotKernel = F.hasExternalLinkage() && !isKernelFunction(F);
+  KI.OmpTargetNumTeams = parseFnAttrAsInteger(F, "omp_target_num_teams");
+  KI.OmpTargetThreadLimit = parseFnAttrAsInteger(F, "omp_target_thread_limit");
+  auto AmdgpuMaxNumWorkgroups =
+      parseFnAttrAsIntegerFields(F, "amdgpu-max-num-workgroups", 3);
+  KI.AmdgpuMaxNumWorkgroupsX = AmdgpuMaxNumWorkgroups[0];
+  KI.AmdgpuMaxNumWorkgroupsY = AmdgpuMaxNumWorkgroups[1];
+  KI.AmdgpuMaxNumWorkgroupsZ = AmdgpuMaxNumWorkgroups[2];
+  auto AmdgpuFlatWorkGroupSize =
+      parseFnAttrAsIntegerFields(F, "amdgpu-flat-work-group-size", 2);
+  KI.AmdgpuFlatWorkGroupSizeMin = AmdgpuFlatWorkGroupSize[0];
+  KI.AmdgpuFlatWorkGroupSizeMax = AmdgpuFlatWorkGroupSize[1];
+  auto AmdgpuWavesPerEu =
+      parseFnAttrAsIntegerFields(F, "amdgpu-waves-per-eu", 2);
+  KI.AmdgpuWavesPerEuMin = AmdgpuWavesPerEu[0];
+  KI.AmdgpuWavesPerEuMax = AmdgpuWavesPerEu[1];
+  KI.Maxclusterrank = parseNVPTXMDNodeAsInteger(F, "maxclusterrank");
+  KI.Maxntidx = parseNVPTXMDNodeAsInteger(F, "maxntidx");
+
+  const DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);
+  auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
+  for (const auto &BB : F)
+    if (DT.isReachableFromEntry(&BB))
+      KI.updateForBB(BB, +1, ORE);
+
+#define REMARK_PROPERTY(PROP_NAME)                                             \
+  remarkProperty(ORE, F, #PROP_NAME, KI.PROP_NAME)
+  REMARK_PROPERTY(ExternalNotKernel);
+  REMARK_PROPERTY(OmpTargetNumTeams);
+  REMARK_PROPERTY(OmpTargetThreadLimit);
+  REMARK_PROPERTY(AmdgpuMaxNumWorkgroupsX);
+  REMARK_PROPERTY(AmdgpuMaxNumWorkgroupsY);
+  REMARK_PROPERTY(AmdgpuMaxNumWorkgroupsZ);
+  REMARK_PROPERTY(AmdgpuFlatWorkGroupSizeMin);
+  REMARK_PROPERTY(AmdgpuFlatWorkGroupSizeMax);
+  REMARK_PROPERTY(AmdgpuWavesPerEuMin);
+  REMARK_PROPERTY(AmdgpuWavesPerEuMax);
+  REMARK_PROPERTY(Maxclusterrank);
+  REMARK_PROPERTY(Maxntidx);
+  REMARK_PROPERTY(Allocas);
+  REMARK_PROPERTY(AllocasStaticSizeSum);
+  REMARK_PROPERTY(AllocasDyn);
+  REMARK_PROPERTY(DirectCalls);
+  REMARK_PROPERTY(IndirectCalls);
+  REMARK_PROPERTY(DirectCallsToDefinedFunctions);
+  REMARK_PROPERTY(Invokes);
+  REMARK_PROPERTY(AddrspaceZeroAccesses);
+#undef REMARK_PROPERTY
+
+  return KI;
+}
+
+AnalysisKey KernelInfoAnalysis::Key;
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 46f43f3de4705c..61677f02783cc9 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -44,6 +44,7 @@
 #include "llvm/Analysis/InlineAdvisor.h"
 #include "llvm/Analysis/InlineSizeEstimatorAnalysis.h"
 #include "llvm/Analysis/InstCount.h"
+#include "llvm/Analysis/KernelInfo.h"
 #include "llvm/Analysis/LazyCallGraph.h"
 #include "llvm/Analysis/LazyValueInfo.h"
 #include "llvm/Analysis/Lint.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 0cec9fbd7cd05e..dcfa732f410b38 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -278,6 +278,7 @@ FUNCTION_ANALYSIS(
     MachineFunctionAnalysis(static_ca...
[truncated]

tschuett · 2024-08-12T18:01:24Z

Can you put an rst file to https://github.com/llvm/llvm-project/tree/main/llvm/docs instead of hiding everything in a header?

llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll

llvm/include/llvm/Analysis/KernelInfo.h

shiltian · 2024-08-12T18:05:41Z

My general question is, the kernel info looks like highly "target" dependent. I'm not sure if it is a good idea to have them all combined together in a target "independent" manner.

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll

jdoerfert · 2024-08-12T18:13:35Z

My general question is, the kernel info looks like highly "target" dependent. I'm not sure if it is a good idea to have them all combined together in a target "independent" manner.

What's the alternative? 2-3 passes that do basically the same thing (e.g., infos about alloca, call, etc.)?

shiltian · 2024-08-12T18:18:15Z

An alternative is to have a base class with common info and then have target dependent part in sub classes.

llvm/lib/Analysis/KernelInfo.cpp

llvm/test/Analysis/KernelInfo/addrspace0.ll

llvm/lib/Analysis/KernelInfo.cpp

arsenm · 2024-08-12T19:04:51Z

llvm/lib/Analysis/KernelInfo.cpp

+  auto AmdgpuWavesPerEu =
+      parseFnAttrAsIntegerFields(F, "amdgpu-waves-per-eu", 2);


Don't we already report information on this in the pass remarks in the AsmPrinter?

The AMDGPU remarks emit 0 for values of non-kernel functions, IIRC. Having a single location for all relevant remarks is also helpful. Only post-instruction selection & reg alloc stuff should be printed per-target (=for AMDGPU) late.

That's just a bug though

llvm/lib/Analysis/KernelInfo.cpp

jdoerfert · 2024-08-12T19:13:44Z

An alternative is to have a base class with common info and then have target dependent part in sub classes.

I don't see how that buys us much and I feel there is a cost to it (more files, more places, more boilerplate). What do we gain by splitting it up? In the existing code we can/should just print target dependent features (like waves-per-eu) only for AMDGPU and call it a day, no?

We have to be more careful about targets in the test suite now because `getFlatAddressSpace` returns garbage for unsupported targets. Should we change the remarks to say flat addrspace instead of addrspace(0)?

jdenny-ornl · 2024-08-13T00:08:46Z

Can you put an rst file to https://github.com/llvm/llvm-project/tree/main/llvm/docs instead of hiding everything in a header?

It's in llvm/docs/KernelInfo.rst. Any suggestions on what other docs should link to it?

llvm/lib/Analysis/KernelInfo.cpp

-kernel-info-end-lto doesn't insert kernel-info for cpu modules. If the user explicitly specifies the pass for a cpu module, then it will run now.

llvm/lib/Analysis/KernelInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

arsenm · 2024-10-14T15:35:31Z

llvm/test/Analysis/KernelInfo/allocas.ll

+
+define void @h() !dbg !3 {
+entry:
+  ; CHECK: remark: test.c:0:0: in artificial function 'h', artificial alloca 'dyn_ptr' with static size of 8 bytes


Not sure what's artificial about it

This test makes sure that functions marked as artificial in metadata are reported that way. It's been a while, but my recollection is I started with IR generated by clang, ran it through llvm-reduce, and then further reduced it by hand.

What are you objecting to? Do you want the test function to look more like an artificial function in some way? Do you want artificial functions not to be called out by kernel-info?

llvm/test/Analysis/KernelInfo/allocas.ll

llvm/lib/Analysis/KernelInfo.cpp

arsenm · 2024-10-15T17:50:54Z

llvm/test/Analysis/KernelInfo/enable-kernel-info/amdgpu.ll

+; DEFINE: %{fcheck-on} = FileCheck -match-full-lines %S/Inputs/test.ll
+; DEFINE: %{fcheck-off} = FileCheck -allow-empty -check-prefixes=NONE \
+; DEFINE:                 %S/Inputs/test.ll
+


Test the bounds attributes?

I don't know what you're asking. This test checks when kernel-info is enabled by looking for its output, and it arbitrarily chooses omp_target_num_teams as the output to look for. Do you want more bounds checking? Do you want something else intsead?

llvm/test/Analysis/KernelInfo/enable-kernel-info/amdgpu.ll

They were originally copied from FunctionPropertiesAnalysis.cpp.

See llvm/test/Analysis/KernelInfo/openmp/README.md.

Due to 0b40f97.

github-actions · 2024-12-27T19:12:13Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 7b1becd940cb93f8b63c9872e1af7431dea353d1 1f1ca6cfc17def5595adb71ea00282154d803fc7 llvm/include/llvm/Analysis/KernelInfo.h llvm/lib/Analysis/KernelInfo.cpp llvm/test/Analysis/KernelInfo/allocas.ll llvm/test/Analysis/KernelInfo/calls.ll llvm/test/Analysis/KernelInfo/enable-kernel-info/Inputs/test.ll llvm/test/Analysis/KernelInfo/flat-addrspace/Inputs/test.ll llvm/test/Analysis/KernelInfo/launch-bounds/amdgpu.ll llvm/test/Analysis/KernelInfo/launch-bounds/nvptx.ll llvm/test/Analysis/KernelInfo/linkage.ll llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll llvm/test/Analysis/KernelInfo/openmp/nvptx.ll llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/IR/Function.h llvm/include/llvm/Target/TargetMachine.h llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Passes/PassBuilder.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h llvm/lib/Target/TargetMachine.cpp llvm/lib/Transforms/IPO/OpenMPOpt.cpp

The following files introduce new uses of undef:

llvm/test/Analysis/KernelInfo/openmp/nvptx.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

See llvm/test/Analysis/KernelInfo/openmp/README.md.

jdenny-ornl · 2024-12-27T20:10:05Z

The following files introduce new uses of undef:
* llvm/test/Analysis/KernelInfo/openmp/nvptx.ll

When clang stops generating those uses, we can update the tests.

jdoerfert

Two minor issues and then we can merge this.

llvm/lib/Analysis/KernelInfo.cpp

llvm/test/Analysis/KernelInfo/allocas.ll

AlexMaclean · 2025-01-07T22:33:54Z

llvm/lib/IR/Module.cpp

@@ -322,6 +322,32 @@ void Module::eraseNamedMetadata(NamedMDNode *NMD) {
  eraseNamedMDNode(NMD);
 }

+SetVector<Function *> Module::getDeviceKernels() {
+  // TODO: Create a more cross-platform way of determining device kernels.
+  NamedMDNode *MD = getNamedMetadata("nvvm.annotations");


I'm working on deprecating this method of specifying a kernel in favor of the ptx_kernel calling convention (see #120806). I think it would be good to check the calling convention of all functions in the module and add them to the set if they have ptx_kernel or amdgpu_kernel (maybe others too?).

Thanks for letting me know. Now that your PR #122320 has landed, I've updated this PR to use it.

I see that llvm::omp::getDeviceKernels checks for the kernel calling convention but also still kernel in nvvm.annotations. Should KernelInfo check for the latter as well? My update only checks for the former.

#119261 will auto-upgrade nvvm.annotations allowing you to only check the CC. Currently you'll still need to check both but hopefully soon only CC will be needed.

When is that needed? Just for previously generated LLVM IR? Clang at today's main seems to generate the calling conventions already.

Yea to maintain backwards compatibility with older IR, or out-of-tree front-ends.

Makes sense. But for KernelInfo, I'm not sure if it's worth it. We could probably live with that small shortcoming until PR #119261 lands. @jdoerfert?

Backwards compatibility with old GPU IR is not much of a thing. This should be fine.

This reverts commit bb9d5c2. This will facilitate merging main due to 07ed818 (PR llvm#122320), which changes llvm::omp::getDeviceKernels. Will rewrite and reapply after merging main.

Also, regenerate OpenMP tests from current clang so they see the new kernel calling conventions.

jdoerfert

LG, I think.

llvm-ci · 2025-01-29T18:48:36Z

LLVM Buildbot has detected a new failure on builder flang-aarch64-dylib running on linaro-flang-aarch64-dylib while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/50/builds/9588

Here is the relevant piece of the build log for the reference

Step 5 (build-unified-tree) failure: build (failure)
...
1495.210 [995/1/5830] Linking CXX static library lib/libMLIRSparseTensorPipelines.a
1496.198 [994/1/5831] Linking CXX shared library lib/libmlir_runner_utils.so.21.0git
1496.218 [993/1/5832] Creating library symlink lib/libmlir_runner_utils.so
1496.363 [992/1/5833] Building CXX object tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIAMDGPU.dir/AMDGPU.cpp.o
1496.510 [991/1/5834] Linking CXX static library lib/libMLIRCAPIDebug.a
1496.620 [990/1/5835] Building CXX object tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIMath.dir/Math.cpp.o
1496.748 [989/1/5836] Building CXX object tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIEmitC.dir/EmitC.cpp.o
1496.895 [988/1/5837] Building CXX object tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestMatchers.cpp.o
1497.445 [987/1/5838] Building CXX object tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestLazyLoading.cpp.o
1520.288 [986/1/5839] Building CXX object tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestBytecodeRoundtrip.cpp.o
FAILED: tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestBytecodeRoundtrip.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_INCLUDE_TESTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/test/lib/IR/../Dialect/Test -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestBytecodeRoundtrip.cpp.o -MF tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestBytecodeRoundtrip.cpp.o.d -o tools/mlir/test/lib/IR/CMakeFiles/MLIRTestIR.dir/TestBytecodeRoundtrip.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestBytecodeRoundtrip.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/TestBytecodeRoundtrip.cpp:10:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/IR/../Dialect/Test/TestOps.h:148:10: fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

slackito · 2025-01-29T18:50:10Z

I was trying to fix the bazel build after this change, and it seems it introduces a couple of circular dependencies. Analysis/KernelInfo.cpp includes Passes/PassBuilder.h and Target/TargetMachine.h, but both Passes and Target already depend on Analysis.

I think the circular dependency can be broken by removing the PassBuilder.h include, which seems to be unused, and replacing the TargetMachine include with a forward decl. Something like this:

diff --git a/llvm/lib/Analysis/KernelInfo.cpp b/llvm/lib/Analysis/KernelInfo.cpp
index be9b1136321f..5175614645bf 100644
--- a/llvm/lib/Analysis/KernelInfo.cpp
+++ b/llvm/lib/Analysis/KernelInfo.cpp
@@ -15,17 +15,20 @@
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PassManager.h"
-#include "llvm/Passes/PassBuilder.h"
-#include "llvm/Target/TargetMachine.h"
 
 using namespace llvm;
 
+namespace llvm {
+class TargetMachine;
+}
+
 #define DEBUG_TYPE "kernel-info"
 
 namespace {

slackito · 2025-01-29T18:58:31Z

Never mind, it was already fixed by 953354c and 57f1731.

jdenny-ornl · 2025-01-29T19:00:31Z

Never mind, it was already fixed by 953354c and 57f1731.

Thanks for reporting that back here. Sorry for the trouble.

jdenny-ornl requested a review from jdoerfert August 12, 2024 17:57

llvmbot added backend:AMDGPU backend:NVPTX llvm:analysis Includes value tracking, cost tables and constant folding labels Aug 12, 2024

shiltian requested a review from arsenm August 12, 2024 17:59

shiltian reviewed Aug 12, 2024

View reviewed changes

llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll Show resolved Hide resolved

shiltian reviewed Aug 12, 2024

View reviewed changes

llvm/include/llvm/Analysis/KernelInfo.h Outdated Show resolved Hide resolved

shiltian reviewed Aug 12, 2024

View reviewed changes

llvm/include/llvm/Analysis/KernelInfo.h Outdated Show resolved Hide resolved

jdoerfert reviewed Aug 12, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp Outdated Show resolved Hide resolved

llvm/test/Analysis/KernelInfo/openmp/amdgpu.ll Show resolved Hide resolved

arsenm reviewed Aug 12, 2024

View reviewed changes

llvm/lib/Analysis/KernelInfo.cpp Outdated Show resolved Hide resolved

jdenny-ornl added 6 commits August 12, 2024 17:40

Move docs to KernelInfo.rst

a7656de

Move conditional outside registration call

d92856e

Use llvm::SmallString

6ac3f41

Use TTI.getFlatAddressSpace for addrspace(0)

6367ad7

We have to be more careful about targets in the test suite now because `getFlatAddressSpace` returns garbage for unsupported targets. Should we change the remarks to say flat addrspace instead of addrspace(0)?

Avoid repetition between amdgpu and nvptx tests

78446bb

Use named values in tests

fede524

arsenm reviewed Aug 13, 2024

View reviewed changes

llvm/lib/Analysis/KernelInfo.cpp Outdated Show resolved Hide resolved

jdenny-ornl added 4 commits August 13, 2024 12:03

Say flat address space instead of addrspace(0)

4c30b8a

Cache the flat address space

33f0d4d

Link KernelInfo.rst from Passes.rst

a2a512c

Don't filter out cpus

de04ac4

-kernel-info-end-lto doesn't insert kernel-info for cpu modules. If the user explicitly specifies the pass for a cpu module, then it will run now.

Apply clang-format

d3beccf

arsenm reviewed Oct 14, 2024

View reviewed changes

jdenny-ornl added 2 commits October 14, 2024 17:57

Avoid auto, as requested

5a4b873

For function name, use debug info or keep @

571181b

arsenm reviewed Oct 15, 2024

View reviewed changes

jdenny-ornl added 8 commits October 16, 2024 12:40

Use anonymous namespace

a5ce547

Remove currently unused capabilities, as requested

4d60911

They were originally copied from FunctionPropertiesAnalysis.cpp.

Rename test files without LLVM IR to .test

0c30e7c

Regenerate OpenMP tests from current clang

f5a6fbd

See llvm/test/Analysis/KernelInfo/openmp/README.md.

Include LLVM value name in alloca report

baad223

Merge branch 'main' into kernel-info-pr

86f9683

Update expected amdgpu-max-num-workgroups default values

c9aebce

Due to 0b40f97.

Merge branch 'main' into kernel-info-pr

8982f8f

Regenerate OpenMP tests from current clang

151bfb3

See llvm/test/Analysis/KernelInfo/openmp/README.md.

jdoerfert reviewed Jan 6, 2025

View reviewed changes

llvm/lib/Analysis/KernelInfo.cpp Outdated Show resolved Hide resolved

llvm/test/Analysis/KernelInfo/allocas.ll Outdated Show resolved Hide resolved

jdenny-ornl added 3 commits January 6, 2025 13:03

Merge branch 'main' into kernel-info-pr

ff33eb3

Relocate and use llvm::omp::getDeviceKernels

bb9d5c2

Extend test to cover dyn and non-entry allocas

0a347cf

AlexMaclean reviewed Jan 7, 2025

View reviewed changes

jdenny-ornl added 3 commits January 27, 2025 16:55

Revert "Relocate and use llvm::omp::getDeviceKernels"

2d321ce

This reverts commit bb9d5c2. This will facilitate merging main due to 07ed818 (PR llvm#122320), which changes llvm::omp::getDeviceKernels. Will rewrite and reapply after merging main.

Merge branch 'main' into kernel-info-pr

b9447c0

Relocate and use OpenMPOpt.cpp's isKernelCC

1f1ca6c

Also, regenerate OpenMP tests from current clang so they see the new kernel calling conventions.

jdoerfert approved these changes Jan 29, 2025

View reviewed changes

jdenny-ornl merged commit 18f8106 into llvm:main Jan 29, 2025
8 of 9 checks passed

		auto AmdgpuWavesPerEu =
		parseFnAttrAsIntegerFields(F, "amdgpu-waves-per-eu", 2);

[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944

[KernelInfo] Implement new LLVM IR pass for GPU code analysis #102944

Uh oh!

Conversation

jdenny-ornl commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tschuett commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shiltian commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdoerfert commented Aug 12, 2024

Uh oh!

shiltian commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jdoerfert commented Aug 12, 2024

Uh oh!

jdenny-ornl commented Aug 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdenny-ornl commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl commented Aug 12, 2024 •

edited

Loading

llvmbot commented Aug 12, 2024 •

edited

Loading

github-actions bot commented Dec 27, 2024 •

edited

Loading

jdenny-ornl commented Dec 27, 2024 •

edited

Loading