Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Remove unused VGPRSingleUseHintInsts feature #109769

Merged

Conversation

ScottEgerton
Copy link
Member

No description provided.

@llvmbot llvmbot added backend:AMDGPU mc Machine (object) code labels Sep 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-amdgpu

Author: Scott Egerton (ScottEgerton)

Changes

Patch is 94.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109769.diff

25 Files Affected:

  • (modified) llvm/docs/AMDGPUUsage.rst (+1-3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (-3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (-13)
  • (removed) llvm/lib/Target/AMDGPU/AMDGPUInsertSingleUseVDST.cpp (-245)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (-10)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (-1)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (-3)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (-2)
  • (modified) llvm/lib/Target/AMDGPU/SOPInstructions.td (-11)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (-18)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+2)
  • (modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+3-15)
  • (modified) llvm/lib/Target/AMDGPU/VOP2Instructions.td (+2-4)
  • (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+16-19)
  • (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+4-8)
  • (modified) llvm/lib/Target/AMDGPU/VOPCInstructions.td (+5-8)
  • (modified) llvm/lib/Target/AMDGPU/VOPInstructions.td (+1-19)
  • (removed) llvm/test/CodeGen/AMDGPU/insert-singleuse-vdst.mir (-1420)
  • (removed) llvm/test/MC/AMDGPU/gfx1150_asm_sopp.s (-10)
  • (modified) llvm/test/MC/AMDGPU/gfx11_unsupported.s (-3)
  • (modified) llvm/test/MC/AMDGPU/gfx12_asm_sopp.s (-9)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/decode-err.txt (-5)
  • (removed) llvm/test/MC/Disassembler/AMDGPU/gfx1150_dasm_sopp.txt (-10)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_sopp.txt (-8)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn (-1)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 4b48b54b18bb99..9e11b13c101d47 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -611,9 +611,7 @@ Generic processor code objects are versioned. See :ref:`amdgpu-generic-processor
                                                                                                 - ``gfx1152``
 
                                                                                                 SALU floating point instructions
-                                                                                                and single-use VGPR hint
-                                                                                                instructions are not available
-                                                                                                on:
+                                                                                                are not available on:
 
                                                                                                 - ``gfx1150``
                                                                                                 - ``gfx1151``
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index b2dd354e496a2e..4abb5a63ab6d2c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -405,9 +405,6 @@ extern char &SIModeRegisterID;
 void initializeAMDGPUInsertDelayAluPass(PassRegistry &);
 extern char &AMDGPUInsertDelayAluID;
 
-void initializeAMDGPUInsertSingleUseVDSTPass(PassRegistry &);
-extern char &AMDGPUInsertSingleUseVDSTID;
-
 void initializeSIInsertHardClausesPass(PassRegistry &);
 extern char &SIInsertHardClausesID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 5757ac0d4454d0..66474dfcbb5cc0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -929,12 +929,6 @@ def FeatureSALUFloatInsts : SubtargetFeature<"salu-float",
   "Has SALU floating point instructions"
 >;
 
-def FeatureVGPRSingleUseHintInsts : SubtargetFeature<"vgpr-singleuse-hint",
-  "HasVGPRSingleUseHintInsts",
-  "true",
-  "Has single-use VGPR hint instructions"
->;
-
 def FeaturePseudoScalarTrans : SubtargetFeature<"pseudo-scalar-trans",
   "HasPseudoScalarTrans",
   "true",
@@ -1615,14 +1609,12 @@ def FeatureISAVersion11_5_0 : FeatureSet<
   !listconcat(FeatureISAVersion11_Common.Features,
     [FeatureSALUFloatInsts,
      FeatureDPPSrc1SGPR,
-     FeatureVGPRSingleUseHintInsts,
      FeatureRequiredExportPriority])>;
 
 def FeatureISAVersion11_5_1 : FeatureSet<
   !listconcat(FeatureISAVersion11_Common.Features,
     [FeatureSALUFloatInsts,
      FeatureDPPSrc1SGPR,
-     FeatureVGPRSingleUseHintInsts,
      Feature1_5xVGPRs,
      FeatureRequiredExportPriority])>;
 
@@ -1630,7 +1622,6 @@ def FeatureISAVersion11_5_2 : FeatureSet<
   !listconcat(FeatureISAVersion11_Common.Features,
     [FeatureSALUFloatInsts,
      FeatureDPPSrc1SGPR,
-     FeatureVGPRSingleUseHintInsts,
      FeatureRequiredExportPriority])>;
 
 def FeatureISAVersion12 : FeatureSet<
@@ -1663,7 +1654,6 @@ def FeatureISAVersion12 : FeatureSet<
    FeatureSALUFloatInsts,
    FeaturePseudoScalarTrans,
    FeatureHasRestrictedSOffset,
-   FeatureVGPRSingleUseHintInsts,
    FeatureScalarDwordx3Loads,
    FeatureDPPSrc1SGPR,
    FeatureMaxHardClauseLength32,
@@ -2267,9 +2257,6 @@ def HasNotMADIntraFwdBug : Predicate<"!Subtarget->hasMADIntraFwdBug()">;
 def HasSALUFloatInsts : Predicate<"Subtarget->hasSALUFloatInsts()">,
   AssemblerPredicate<(all_of FeatureSALUFloatInsts)>;
 
-def HasVGPRSingleUseHintInsts : Predicate<"Subtarget->hasVGPRSingleUseHintInsts()">,
-  AssemblerPredicate<(all_of FeatureVGPRSingleUseHintInsts)>;
-
 def HasPseudoScalarTrans : Predicate<"Subtarget->hasPseudoScalarTrans()">,
   AssemblerPredicate<(all_of FeaturePseudoScalarTrans)>;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInsertSingleUseVDST.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInsertSingleUseVDST.cpp
deleted file mode 100644
index 43b3bf43fe56db..00000000000000
--- a/llvm/lib/Target/AMDGPU/AMDGPUInsertSingleUseVDST.cpp
+++ /dev/null
@@ -1,245 +0,0 @@
-//===- AMDGPUInsertSingleUseVDST.cpp - Insert s_singleuse_vdst instructions ==//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-/// \file
-/// Insert s_singleuse_vdst instructions on GFX11.5+ to mark regions of VALU
-/// instructions that produce single-use VGPR values. If the value is forwarded
-/// to the consumer instruction prior to VGPR writeback, the hardware can
-/// then skip (kill) the VGPR write.
-//
-//===----------------------------------------------------------------------===//
-
-#include "AMDGPU.h"
-#include "AMDGPUGenSearchableTables.inc"
-#include "GCNSubtarget.h"
-#include "SIInstrInfo.h"
-#include "SIRegisterInfo.h"
-#include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/SmallVector.h"
-#include "llvm/ADT/StringRef.h"
-#include "llvm/CodeGen/MachineBasicBlock.h"
-#include "llvm/CodeGen/MachineFunction.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
-#include "llvm/CodeGen/MachineInstr.h"
-#include "llvm/CodeGen/MachineInstrBuilder.h"
-#include "llvm/CodeGen/MachineOperand.h"
-#include "llvm/CodeGen/Register.h"
-#include "llvm/IR/DebugLoc.h"
-#include "llvm/MC/MCRegister.h"
-#include "llvm/MC/MCRegisterInfo.h"
-#include "llvm/Pass.h"
-#include <array>
-
-using namespace llvm;
-
-#define DEBUG_TYPE "amdgpu-insert-single-use-vdst"
-
-namespace {
-class AMDGPUInsertSingleUseVDST : public MachineFunctionPass {
-private:
-  const SIInstrInfo *SII;
-  class SingleUseInstruction {
-  private:
-    static const unsigned MaxSkipRange = 0b111;
-    static const unsigned MaxNumberOfSkipRegions = 2;
-
-    unsigned LastEncodedPositionEnd;
-    MachineInstr *ProducerInstr;
-
-    std::array<unsigned, MaxNumberOfSkipRegions + 1> SingleUseRegions;
-    SmallVector<unsigned, MaxNumberOfSkipRegions> SkipRegions;
-
-    // Adds a skip region into the instruction.
-    void skip(const unsigned ProducerPosition) {
-      while (LastEncodedPositionEnd + MaxSkipRange < ProducerPosition) {
-        SkipRegions.push_back(MaxSkipRange);
-        LastEncodedPositionEnd += MaxSkipRange;
-      }
-      SkipRegions.push_back(ProducerPosition - LastEncodedPositionEnd);
-      LastEncodedPositionEnd = ProducerPosition;
-    }
-
-    bool currentRegionHasSpace() {
-      const auto Region = SkipRegions.size();
-      // The first region has an extra bit of encoding space.
-      return SingleUseRegions[Region] <
-             ((Region == MaxNumberOfSkipRegions) ? 0b1111U : 0b111U);
-    }
-
-    unsigned encodeImm() {
-      // Handle the first Single Use Region separately as it has an extra bit
-      // of encoding space.
-      unsigned Imm = SingleUseRegions[SkipRegions.size()];
-      unsigned ShiftAmount = 4;
-      for (unsigned i = SkipRegions.size(); i > 0; i--) {
-        Imm |= SkipRegions[i - 1] << ShiftAmount;
-        ShiftAmount += 3;
-        Imm |= SingleUseRegions[i - 1] << ShiftAmount;
-        ShiftAmount += 3;
-      }
-      return Imm;
-    }
-
-  public:
-    SingleUseInstruction(const unsigned ProducerPosition,
-                         MachineInstr *Producer)
-        : LastEncodedPositionEnd(ProducerPosition + 1), ProducerInstr(Producer),
-          SingleUseRegions({1, 0, 0}) {}
-
-    // Returns false if adding a new single use producer failed. This happens
-    // because it could not be encoded, either because there is no room to
-    // encode another single use producer region or that this single use
-    // producer is too far away to encode the amount of instructions to skip.
-    bool tryAddProducer(const unsigned ProducerPosition, MachineInstr *MI) {
-      // Producer is too far away to encode into this instruction or another
-      // skip region is needed and SkipRegions.size() = 2 so there's no room for
-      // another skip region, therefore a new instruction is needed.
-      if (LastEncodedPositionEnd +
-              (MaxSkipRange * (MaxNumberOfSkipRegions - SkipRegions.size())) <
-          ProducerPosition)
-        return false;
-
-      // If a skip region is needed.
-      if (LastEncodedPositionEnd != ProducerPosition ||
-          !currentRegionHasSpace()) {
-        // If the current region is out of space therefore a skip region would
-        // be needed, but there is no room for another skip region.
-        if (SkipRegions.size() == MaxNumberOfSkipRegions)
-          return false;
-        skip(ProducerPosition);
-      }
-
-      SingleUseRegions[SkipRegions.size()]++;
-      LastEncodedPositionEnd = ProducerPosition + 1;
-      ProducerInstr = MI;
-      return true;
-    }
-
-    auto emit(const SIInstrInfo *SII) {
-      return BuildMI(*ProducerInstr->getParent(), ProducerInstr, DebugLoc(),
-                     SII->get(AMDGPU::S_SINGLEUSE_VDST))
-          .addImm(encodeImm());
-    }
-  };
-
-public:
-  static char ID;
-
-  AMDGPUInsertSingleUseVDST() : MachineFunctionPass(ID) {}
-
-  void insertSingleUseInstructions(
-      ArrayRef<std::pair<unsigned, MachineInstr *>> SingleUseProducers) const {
-    SmallVector<SingleUseInstruction> Instructions;
-
-    for (auto &[Position, MI] : SingleUseProducers) {
-      // Encode this position into the last single use instruction if possible.
-      if (Instructions.empty() ||
-          !Instructions.back().tryAddProducer(Position, MI)) {
-        // If not, add a new instruction.
-        Instructions.push_back(SingleUseInstruction(Position, MI));
-      }
-    }
-
-    for (auto &Instruction : Instructions)
-      Instruction.emit(SII);
-  }
-
-  bool runOnMachineFunction(MachineFunction &MF) override {
-    const auto &ST = MF.getSubtarget<GCNSubtarget>();
-    if (!ST.hasVGPRSingleUseHintInsts())
-      return false;
-
-    SII = ST.getInstrInfo();
-    const auto *TRI = &SII->getRegisterInfo();
-    bool InstructionEmitted = false;
-
-    for (MachineBasicBlock &MBB : MF) {
-      DenseMap<MCRegUnit, unsigned> RegisterUseCount;
-
-      // Handle boundaries at the end of basic block separately to avoid
-      // false positives. If they are live at the end of a basic block then
-      // assume it has more uses later on.
-      for (const auto &Liveout : MBB.liveouts()) {
-        for (MCRegUnitMaskIterator Units(Liveout.PhysReg, TRI); Units.isValid();
-             ++Units) {
-          const auto [Unit, Mask] = *Units;
-          if ((Mask & Liveout.LaneMask).any())
-            RegisterUseCount[Unit] = 2;
-        }
-      }
-
-      SmallVector<std::pair<unsigned, MachineInstr *>>
-          SingleUseProducerPositions;
-
-      unsigned VALUInstrCount = 0;
-      for (MachineInstr &MI : reverse(MBB.instrs())) {
-        // All registers in all operands need to be single use for an
-        // instruction to be marked as a single use producer.
-        bool AllProducerOperandsAreSingleUse = true;
-
-        // Gather a list of Registers used before updating use counts to avoid
-        // double counting registers that appear multiple times in a single
-        // MachineInstr.
-        SmallVector<MCRegUnit> RegistersUsed;
-
-        for (const auto &Operand : MI.all_defs()) {
-          const auto Reg = Operand.getReg();
-
-          const auto RegUnits = TRI->regunits(Reg);
-          if (any_of(RegUnits, [&RegisterUseCount](const MCRegUnit Unit) {
-                return RegisterUseCount[Unit] > 1;
-              }))
-            AllProducerOperandsAreSingleUse = false;
-
-          // Reset uses count when a register is no longer live.
-          for (const MCRegUnit Unit : RegUnits)
-            RegisterUseCount.erase(Unit);
-        }
-
-        for (const auto &Operand : MI.all_uses()) {
-          const auto Reg = Operand.getReg();
-
-          // Count the number of times each register is read.
-          for (const MCRegUnit Unit : TRI->regunits(Reg)) {
-            if (!is_contained(RegistersUsed, Unit))
-              RegistersUsed.push_back(Unit);
-          }
-        }
-        for (const MCRegUnit Unit : RegistersUsed)
-          RegisterUseCount[Unit]++;
-
-        // Do not attempt to optimise across exec mask changes.
-        if (MI.modifiesRegister(AMDGPU::EXEC, TRI) ||
-            AMDGPU::isInvalidSingleUseConsumerInst(MI.getOpcode())) {
-          for (auto &UsedReg : RegisterUseCount)
-            UsedReg.second = 2;
-        }
-
-        if (!SIInstrInfo::isVALU(MI) ||
-            AMDGPU::isInvalidSingleUseProducerInst(MI.getOpcode()))
-          continue;
-        if (AllProducerOperandsAreSingleUse) {
-          SingleUseProducerPositions.push_back({VALUInstrCount, &MI});
-          InstructionEmitted = true;
-        }
-        VALUInstrCount++;
-      }
-      insertSingleUseInstructions(SingleUseProducerPositions);
-    }
-    return InstructionEmitted;
-  }
-};
-} // namespace
-
-char AMDGPUInsertSingleUseVDST::ID = 0;
-
-char &llvm::AMDGPUInsertSingleUseVDSTID = AMDGPUInsertSingleUseVDST::ID;
-
-INITIALIZE_PASS(AMDGPUInsertSingleUseVDST, DEBUG_TYPE,
-                "AMDGPU Insert SingleUseVDST", false, false)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 04fdee0819b502..abd50748f2cc05 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -311,12 +311,6 @@ static cl::opt<bool> EnableSIModeRegisterPass(
   cl::init(true),
   cl::Hidden);
 
-// Enable GFX11.5+ s_singleuse_vdst insertion
-static cl::opt<bool>
-    EnableInsertSingleUseVDST("amdgpu-enable-single-use-vdst",
-                              cl::desc("Enable s_singleuse_vdst insertion"),
-                              cl::init(false), cl::Hidden);
-
 // Enable GFX11+ s_delay_alu insertion
 static cl::opt<bool>
     EnableInsertDelayAlu("amdgpu-enable-delay-alu",
@@ -450,7 +444,6 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   initializeAMDGPURewriteUndefForPHILegacyPass(*PR);
   initializeAMDGPUUnifyMetadataPass(*PR);
   initializeSIAnnotateControlFlowLegacyPass(*PR);
-  initializeAMDGPUInsertSingleUseVDSTPass(*PR);
   initializeAMDGPUInsertDelayAluPass(*PR);
   initializeSIInsertHardClausesPass(*PR);
   initializeSIInsertWaitcntsPass(*PR);
@@ -1518,9 +1511,6 @@ void GCNPassConfig::addPreEmitPass() {
   // cases.
   addPass(&PostRAHazardRecognizerID);
 
-  if (isPassEnabled(EnableInsertSingleUseVDST, CodeGenOptLevel::Less))
-    addPass(&AMDGPUInsertSingleUseVDSTID);
-
   if (isPassEnabled(EnableInsertDelayAlu, CodeGenOptLevel::Less))
     addPass(&AMDGPUInsertDelayAluID);
 
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index e813653158e5d9..7c883cc2017ddd 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -81,7 +81,6 @@ add_llvm_target(AMDGPUCodeGen
   AMDGPUMCInstLower.cpp
   AMDGPUMemoryUtils.cpp
   AMDGPUIGroupLP.cpp
-  AMDGPUInsertSingleUseVDST.cpp
   AMDGPUMarkLastScratchLoad.cpp
   AMDGPUMIRFormatter.cpp
   AMDGPUOpenCLEnqueuedBlockLowering.cpp
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index a4ae8a1be32258..e6b7342d5fffcf 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -215,7 +215,6 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasPackedTID = false;
   bool ScalarizeGlobal = false;
   bool HasSALUFloatInsts = false;
-  bool HasVGPRSingleUseHintInsts = false;
   bool HasPseudoScalarTrans = false;
   bool HasRestrictedSOffset = false;
 
@@ -1280,8 +1279,6 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasSALUFloatInsts() const { return HasSALUFloatInsts; }
 
-  bool hasVGPRSingleUseHintInsts() const { return HasVGPRSingleUseHintInsts; }
-
   bool hasPseudoScalarTrans() const { return HasPseudoScalarTrans; }
 
   bool hasRestrictedSOffset() const { return HasRestrictedSOffset; }
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index c016be2fc6c0fb..087ca1f954464d 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -2409,8 +2409,6 @@ class VOPProfile <list<ValueType> _ArgVT, bit _EnableClamp = 0> {
   field bit EnableClamp = _EnableClamp;
   field bit IsTrue16 = 0;
   field bit IsRealTrue16 = 0;
-  field bit IsInvalidSingleUseConsumer = 0;
-  field bit IsInvalidSingleUseProducer = 0;
 
   field ValueType DstVT = ArgVT[0];
   field ValueType Src0VT = ArgVT[1];
diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 2e73a1a15f6b32..9da27a7c7ee7d6 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1752,11 +1752,6 @@ let OtherPredicates = [HasExportInsts] in
                                 "$simm16">;
 } // End SubtargetPredicate = isGFX11Plus
 
-let SubtargetPredicate = HasVGPRSingleUseHintInsts in {
-  def S_SINGLEUSE_VDST :
-    SOPP_Pseudo<"s_singleuse_vdst", (ins s16imm:$simm16), "$simm16">;
-} // End SubtargetPredicate = HasVGPRSingeUseHintInsts
-
 let SubtargetPredicate = isGFX12Plus, hasSideEffects = 1 in {
   def S_WAIT_LOADCNT :
     SOPP_Pseudo<"s_wait_loadcnt", (ins s16imm:$simm16), "$simm16",
@@ -2676,12 +2671,6 @@ defm S_ICACHE_INV                 : SOPP_Real_32_gfx11_gfx12<0x03c>;
 
 defm S_BARRIER                    : SOPP_Real_32_gfx11<0x03d>;
 
-//===----------------------------------------------------------------------===//
-// SOPP - GFX1150, GFX12.
-//===----------------------------------------------------------------------===//
-
-defm S_SINGLEUSE_VDST             : SOPP_Real_32_gfx11_gfx12<0x013>;
-
 //===----------------------------------------------------------------------===//
 // SOPP - GFX6, GFX7, GFX8, GFX9, GFX10
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 8b5ec8793d84a2..f32c82f1e4ba4c 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -379,12 +379,6 @@ struct VOPTrue16Info {
   bool IsTrue16;
 };
 
-struct SingleUseExceptionInfo {
-  uint16_t Opcode;
-  bool IsInvalidSingleUseConsumer;
-  bool IsInvalidSingleUseProducer;
-};
-
 struct FP8DstByteSelInfo {
   uint16_t Opcode;
   bool HasFP8DstByteSel;
@@ -396,8 +390,6 @@ struct FP8DstByteSelInfo {
 #define GET_MTBUFInfoTable_IMPL
 #define GET_MUBUFInfoTable_DECL
 #define GET_MUBUFInfoTable_IMPL
-#define GET_SingleUseExceptionTable_DECL
-#define GET_SingleUseExceptionTable_IMPL
 #define GET_SMInfoTable_DECL
 #define GET_SMInfoTable_IMPL
 #define GET_VOP1InfoTable_DECL
@@ -626,16 +618,6 @@ bool isTrue16Inst(unsigned Opc) {
   return Info ? Info->IsTrue16 : false;
 }
 
-bool isInvalidSingleUseConsumerInst(unsigned Opc) {
-  const SingleUseExceptionInfo *Info = getSingleUseExceptionHelper(Opc);
-  return Info && Info->IsInvalidSingleUseConsumer;
-}
-
-bool isInvalidSingleUseProducerInst(unsigned Opc) {
-  const SingleUseExceptionInfo *Info = getSingleUseExceptionHelper(Opc);
-  return Info && Info->IsInvalidSingleUseProducer;
-}
-
 bool isFP8DstSelInst(unsigned Opc) {
   const FP8DstByteSelInfo *Info = getFP8DstByteSelHelper(Opc);
   return Info ? Info->HasFP8DstByteSel : false;
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 35c080d8e0bebc..da37534f2fa4ff 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -870,6 +870,8 @@ bool isInvalidSingleUseConsumerInst(unsigned Opc);
 LLVM_READONLY
 bool isInvalidSingleUseProducerInst(unsigned Opc);
 
+bool isDPMACCInstruction(unsigned Opc);
+
 LLVM_READONLY
 unsigned mapWMMA2AddrTo3AddrOpcode(unsigned Opc);
 
diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
index 33f2f9f1f5c5b9..bd805059705783 100644
--- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/llvm/lib/Target/AMDGPU...
[truncated]

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given that it was never enabled.

@ScottEgerton ScottEgerton merged commit 396f677 into llvm:main Sep 24, 2024
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 24, 2024

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/6260

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/memmove-hip-6.0.2.dir/memmove.hip.o -o External/HIP/memmove-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/memmove.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/memmove.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x55fa24479ac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 331.82s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x55fa24479ac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 331.82s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=668.438789

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 24, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/1225

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/35/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-20544-35-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=35 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request Dec 20, 2024
Fixes SWDEV-498470

(cherry picked from commit 396f677)
Change-Id: If965dab720db1a0fe76af6014ae6e57476c0c6d3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants