-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM] Add __builtin_readsteadycounter
intrinsic and builtin for realtime clocks
#81331
Conversation
Figured out where to specify the transformation is legal for the target. |
You can test this locally with the following command:git-clang-format --diff 2dbfa8407e7d2f4293add33b5ead3f2d5fcd04e9 4a0ee4be9690e0665ca93d63ffdd2dea404fd72d -- clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/builtins.c llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/IntrinsicLowering.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/SIISelLowering.cpp llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 274b9e7a04..4feb9d6ec0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2649,7 +2649,9 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::LOAD: ExpandIntRes_LOAD(cast<LoadSDNode>(N), Lo, Hi); break;
case ISD::MUL: ExpandIntRes_MUL(N, Lo, Hi); break;
case ISD::READCYCLECOUNTER:
- case ISD::READSTEADYCOUNTER: ExpandIntRes_READCOUNTER(N, Lo, Hi); break;
+ case ISD::READSTEADYCOUNTER:
+ ExpandIntRes_READCOUNTER(N, Lo, Hi);
+ break;
case ISD::SDIV: ExpandIntRes_SDIV(N, Lo, Hi); break;
case ISD::SIGN_EXTEND: ExpandIntRes_SIGN_EXTEND(N, Lo, Hi); break;
case ISD::SIGN_EXTEND_INREG: ExpandIntRes_SIGN_EXTEND_INREG(N, Lo, Hi); break;
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 9114987162..2b09703084 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -439,7 +439,7 @@ private:
void ExpandIntRes_CTPOP (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTTZ (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_LOAD (LoadSDNode *N, SDValue &Lo, SDValue &Hi);
- void ExpandIntRes_READCOUNTER (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_READCOUNTER(SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND_INREG (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_TRUNCATE (SDNode *N, SDValue &Lo, SDValue &Hi);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 0fbd999694..f06462ac5f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -104,7 +104,8 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::ATOMIC_STORE: return "AtomicStore";
case ISD::PCMARKER: return "PCMarker";
case ISD::READCYCLECOUNTER: return "ReadCycleCounter";
- case ISD::READSTEADYCOUNTER: return "ReadSteadyCounter";
+ case ISD::READSTEADYCOUNTER:
+ return "ReadSteadyCounter";
case ISD::SRCVALUE: return "SrcValue";
case ISD::MDNODE_SDNODE: return "MDNode";
case ISD::EntryToken: return "EntryToken";
|
4008cb9
to
6b85d8e
Compare
__builtin_readfixedtimer
intrinsic and buiiltin__builtin_readfixedtimer
intrinsic and buiiltin for realtime clocks
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to Patch is 21.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81331.diff 26 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index e91156837290f7..a30fc15183bfd4 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -2764,6 +2764,37 @@ Query for this feature with ``__has_builtin(__builtin_readcyclecounter)``. Note
that even if present, its use may depend on run-time privilege or other OS
controlled state.
+``__builtin_readfixedtimer``
+------------------------------
+
+``__builtin_readfixedtimer`` is used to access the fixed frequency counter
+register (or a similar steady-rate clock) on those targets that support it.
+The function is similar to ``__builtin_readcyclecounter`` above except that the
+frequency is fixed, making it suitable for measuring elapsed time.
+
+**Syntax**:
+
+.. code-block:: c++
+
+ __builtin_readfixedtimer()
+
+**Example of Use**:
+
+.. code-block:: c++
+
+ unsigned long long t0 = __builtin_readfixedtimer();
+ do_something();
+ unsigned long long t1 = __builtin_readfixedtimer();
+ unsigned long long secs_to_do_something = (t1 - t0) / tick_rate;
+
+**Description**:
+
+The ``__builtin_readfixedtimer()`` builtin returns the frequency counter value.
+When not supported by the target, the return value is always zero. This builtin
+takes no arguments and produces an unsigned long long result.
+
+Query for this feature with ``__has_builtin(__builtin_readfixedtimer)``.
+
``__builtin_dump_struct``
-------------------------
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 31a2bdeb2d3e5e..3bc043b35e187b 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -1110,6 +1110,12 @@ def ReadCycleCounter : Builtin {
let Prototype = "unsigned long long int()";
}
+def ReadFixedTimer : Builtin {
+ let Spellings = ["__builtin_readfixedtimer"];
+ let Attributes = [NoThrow];
+ let Prototype = "unsigned long long int()";
+}
+
def Trap : Builtin {
let Spellings = ["__builtin_trap"];
let Attributes = [NoThrow, NoReturn];
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index a7a410dab1a018..8da8bbc56758d5 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -3443,6 +3443,10 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
Function *F = CGM.getIntrinsic(Intrinsic::readcyclecounter);
return RValue::get(Builder.CreateCall(F));
}
+ case Builtin::BI__builtin_readfixedtimer: {
+ Function *F = CGM.getIntrinsic(Intrinsic::readfixedtimer);
+ return RValue::get(Builder.CreateCall(F));
+ }
case Builtin::BI__builtin___clear_cache: {
Value *Begin = EmitScalarExpr(E->getArg(0));
Value *End = EmitScalarExpr(E->getArg(1));
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 349d1286c8dc4f..882e80c521e897 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1179,6 +1179,12 @@ enum NodeType {
/// counter-like register (or other high accuracy low latency clock source).
READCYCLECOUNTER,
+ /// READFIXEDTIMER - This corresponds to the readfixedcounter intrinsic.
+ /// It has the same semantics as the READCYCLECOUNTER implementation except
+ /// that the result is the content of the architecture-specific fixed
+ /// frequency counter suitable for measuring elapsed time.
+ READFIXEDTIMER,
+
/// HANDLENODE node - Used as a handle for various purposes.
HANDLENODE,
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 3c19c7b063652c..4d7c57944f3778 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -870,6 +870,8 @@ def int_pcmarker : DefaultAttrsIntrinsic<[], [llvm_i32_ty]>;
def int_readcyclecounter : DefaultAttrsIntrinsic<[llvm_i64_ty]>;
+def int_readfixedtimer : DefaultAttrsIntrinsic<[llvm_i64_ty]>;
+
// The assume intrinsic is marked InaccessibleMemOnly so that proper control
// dependencies will be maintained.
def int_assume : DefaultAttrsIntrinsic<
diff --git a/llvm/include/llvm/Support/TargetOpcodes.def b/llvm/include/llvm/Support/TargetOpcodes.def
index abb237083d254e..29c6b6488ebb72 100644
--- a/llvm/include/llvm/Support/TargetOpcodes.def
+++ b/llvm/include/llvm/Support/TargetOpcodes.def
@@ -352,6 +352,9 @@ HANDLE_TARGET_OPCODE(G_INTRINSIC_ROUNDEVEN)
/// INTRINSIC readcyclecounter
HANDLE_TARGET_OPCODE(G_READCYCLECOUNTER)
+/// INTRINSIC readfixedtimer
+HANDLE_TARGET_OPCODE(G_READFIXEDTIMER)
+
/// Generic load (including anyext load)
HANDLE_TARGET_OPCODE(G_LOAD)
diff --git a/llvm/include/llvm/Target/GenericOpcodes.td b/llvm/include/llvm/Target/GenericOpcodes.td
index 2c73b67f9e1af0..1b3f84c1d782bf 100644
--- a/llvm/include/llvm/Target/GenericOpcodes.td
+++ b/llvm/include/llvm/Target/GenericOpcodes.td
@@ -1101,6 +1101,12 @@ def G_READCYCLECOUNTER : GenericInstruction {
let hasSideEffects = true;
}
+def G_READFIXEDTIMER : GenericInstruction {
+ let OutOperandList = (outs type0:$dst);
+ let InOperandList = (ins);
+ let hasSideEffects = true;
+}
+
//------------------------------------------------------------------------------
// Memory ops
//------------------------------------------------------------------------------
diff --git a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
index f792237203b431..e3e9622ae919d9 100644
--- a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
+++ b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
@@ -167,6 +167,7 @@ def : GINodeEquiv<G_FMAXNUM_IEEE, fmaxnum_ieee>;
def : GINodeEquiv<G_FMAXIMUM, fmaximum>;
def : GINodeEquiv<G_FMINIMUM, fminimum>;
def : GINodeEquiv<G_READCYCLECOUNTER, readcyclecounter>;
+def : GINodeEquiv<G_READFIXEDTIMER, readfixedtimer>;
def : GINodeEquiv<G_ROTR, rotr>;
def : GINodeEquiv<G_ROTL, rotl>;
def : GINodeEquiv<G_LROUND, lround>;
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 22360353790dbc..bedfa6807f425a 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -657,6 +657,9 @@ def prefetch : SDNode<"ISD::PREFETCH" , SDTPrefetch,
def readcyclecounter : SDNode<"ISD::READCYCLECOUNTER", SDTIntLeaf,
[SDNPHasChain, SDNPSideEffect]>;
+def readfixedtimer : SDNode<"ISD::READFIXEDTIMER", SDTIntLeaf,
+ [SDNPHasChain, SDNPSideEffect]>;
+
def membarrier : SDNode<"ISD::MEMBARRIER", SDTNone,
[SDNPHasChain, SDNPSideEffect]>;
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index c1d8e890a66edb..2b2d03ab69c01c 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1885,6 +1885,8 @@ unsigned IRTranslator::getSimpleIntrinsicOpcode(Intrinsic::ID ID) {
return TargetOpcode::G_INTRINSIC_TRUNC;
case Intrinsic::readcyclecounter:
return TargetOpcode::G_READCYCLECOUNTER;
+ case Intrinsic::readfixedtimer:
+ return TargetOpcode::G_READFIXEDTIMER;
case Intrinsic::ptrmask:
return TargetOpcode::G_PTRMASK;
case Intrinsic::lrint:
diff --git a/llvm/lib/CodeGen/IntrinsicLowering.cpp b/llvm/lib/CodeGen/IntrinsicLowering.cpp
index 61920a0e04ab59..106799bc306850 100644
--- a/llvm/lib/CodeGen/IntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/IntrinsicLowering.cpp
@@ -312,6 +312,12 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0));
break;
}
+ case Intrinsic::readfixedtimer: {
+ errs() << "WARNING: this target does not support the llvm.readfixedtimer"
+ << " intrinsic. It is being lowered to a constant 0\n";
+ CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0));
+ break;
+ }
case Intrinsic::dbg_declare:
case Intrinsic::dbg_label:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 892bfbd62f0d02..29c2356cdc438c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -1127,8 +1127,9 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
Action = TargetLowering::Custom;
break;
case ISD::READCYCLECOUNTER:
- // READCYCLECOUNTER returns an i64, even if type legalization might have
- // expanded that to several smaller types.
+ case ISD::READFIXEDTIMER:
+ // READCYCLECOUNTER and READFIXEDTIMER return a i64, even if type
+ // legalization might have expanded that to several smaller types.
Action = TLI.getOperationAction(Node->getOpcode(), MVT::i64);
break;
case ISD::READ_REGISTER:
@@ -3080,6 +3081,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Node->getOperand(0));
break;
case ISD::READCYCLECOUNTER:
+ case ISD::READFIXEDTIMER:
// If the target didn't expand this, just return 'zero' and preserve the
// chain.
Results.append(Node->getNumValues() - 1,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 39b7e061554141..643940b69f92ef 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2648,7 +2648,8 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::LLRINT: ExpandIntRes_XROUND_XRINT(N, Lo, Hi); break;
case ISD::LOAD: ExpandIntRes_LOAD(cast<LoadSDNode>(N), Lo, Hi); break;
case ISD::MUL: ExpandIntRes_MUL(N, Lo, Hi); break;
- case ISD::READCYCLECOUNTER: ExpandIntRes_READCYCLECOUNTER(N, Lo, Hi); break;
+ case ISD::READCYCLECOUNTER:
+ case ISD::READFIXEDTIMER: ExpandIntRes_READCOUNTER(N, Lo, Hi); break;
case ISD::SDIV: ExpandIntRes_SDIV(N, Lo, Hi); break;
case ISD::SIGN_EXTEND: ExpandIntRes_SIGN_EXTEND(N, Lo, Hi); break;
case ISD::SIGN_EXTEND_INREG: ExpandIntRes_SIGN_EXTEND_INREG(N, Lo, Hi); break;
@@ -4026,8 +4027,8 @@ void DAGTypeLegalizer::ExpandIntRes_MUL(SDNode *N,
Lo, Hi);
}
-void DAGTypeLegalizer::ExpandIntRes_READCYCLECOUNTER(SDNode *N, SDValue &Lo,
- SDValue &Hi) {
+void DAGTypeLegalizer::ExpandIntRes_READCOUNTER(SDNode *N, SDValue &Lo,
+ SDValue &Hi) {
SDLoc DL(N);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDVTList VTs = DAG.getVTList(NVT, NVT, MVT::Other);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 09f0bca8b8611e..91149871628574 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -439,7 +439,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void ExpandIntRes_CTPOP (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTTZ (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_LOAD (LoadSDNode *N, SDValue &Lo, SDValue &Hi);
- void ExpandIntRes_READCYCLECOUNTER (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_READCOUNTER (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND_INREG (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_TRUNCATE (SDNode *N, SDValue &Lo, SDValue &Hi);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5ce1013f30fd1b..ee1164c48140b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6781,6 +6781,14 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
DAG.setRoot(Res.getValue(1));
return;
}
+ case Intrinsic::readfixedtimer: {
+ SDValue Op = getRoot();
+ Res = DAG.getNode(ISD::READFIXEDTIMER, sdl,
+ DAG.getVTList(MVT::i64, MVT::Other), Op);
+ setValue(&I, Res);
+ DAG.setRoot(Res.getValue(1));
+ return;
+ }
case Intrinsic::bitreverse:
setValue(&I, DAG.getNode(ISD::BITREVERSE, sdl,
getValue(I.getArgOperand(0)).getValueType(),
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index a28d834f0522f2..0d308fa8ad16be 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::ATOMIC_STORE: return "AtomicStore";
case ISD::PCMARKER: return "PCMarker";
case ISD::READCYCLECOUNTER: return "ReadCycleCounter";
+ case ISD::READFIXEDTIMER: return "ReadFixedTimer";
case ISD::SRCVALUE: return "SrcValue";
case ISD::MDNODE_SDNODE: return "MDNode";
case ISD::EntryToken: return "EntryToken";
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index d8302ba2b42608..e06b266ba3cc1d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -964,6 +964,9 @@ void TargetLoweringBase::initActions() {
// Most targets also ignore the @llvm.readcyclecounter intrinsic.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);
+ // Most targets also ignore the @llvm.readfixedtimer intrinsic.
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Expand);
+
// ConstantFP nodes default to expand. Targets can either change this to
// Legal, in which case all fp constants are legal, or use isFPImmLegal()
// to optimize expansions for certain constants.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 97952de3e6a37b..63f843ea94fd71 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -1957,6 +1957,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
getActionDefinitionsBuilder(G_READCYCLECOUNTER)
.legalFor({S64});
+ getActionDefinitionsBuilder(G_READFIXEDTIMER).legalFor({S64});
+
getActionDefinitionsBuilder(G_FENCE)
.alwaysLegal();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 5323e4fc58de80..4a2c2ecdd8dd8b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4051,6 +4051,7 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
case AMDGPU::G_CONSTANT:
case AMDGPU::G_GLOBAL_VALUE:
case AMDGPU::G_BLOCK_ADDR:
+ case AMDGPU::G_READFIXEDTIMER:
case AMDGPU::G_READCYCLECOUNTER: {
unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a64a9e608f2173..361e7421f15dc4 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -468,6 +468,10 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
// On SI this is s_memtime and s_memrealtime on VI.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
+
+ if (Subtarget->hasSMemRealTime() ||
+ Subtarget->getGeneration() >= AMDGPUSubtarget::GFX11)
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Legal);
setOperationAction({ISD::TRAP, ISD::DEBUGTRAP}, MVT::Other, Custom);
if (Subtarget->has16BitInsts()) {
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index f3096962e2f3e8..680b32dd855e55 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -1068,6 +1068,20 @@ def : GCNPat <
}
} // let OtherPredicates = [HasShaderCyclesRegister]
+let OtherPredicates = [HasSMemRealTime] in {
+def : GCNPat <
+ (i64 (readfixedtimer)),
+ (S_MEMREALTIME)
+>;
+} // let OtherPredicates = [HasSMemRealTime]
+
+let SubtargetPredicate = isGFX11Plus in {
+def : GCNPat <
+ (i64 (readfixedtimer)),
+ (S_SENDMSG_RTN_B64 (i32 /*MSG_RTN_GET_REALTIME=*/0x83))
+>;
+} // let SubtargetPredicate = [isGFX11Plus]
+
def i32imm_zero : TImmLeaf <i32, [{
return Imm == 0;
}]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c7bc623a88e1b9..47b2d538063f9d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -489,6 +489,9 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v2f16, Expand);
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f16, Expand);
+ if (STI.getSmVersion() >= 30 && STI.getPTXVersion() > 31)
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Legal);
+
setFP16OperationAction(ISD::SETCC, MVT::f16, Legal, Promote);
setFP16OperationAction(ISD::SETCC, MVT::v2f16, Legal, Expand);
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 365afc6bd8c617..da99e3ec125345 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -3759,7 +3759,6 @@ def CALL_PROTOTYPE :
include "NVPTXIntrinsics.td"
-
//-----------------------------------
// Notes
//-----------------------------------
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 2330d7213c26dc..3d9023dc6697b4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -6372,12 +6372,16 @@ def INT_PTX_SREG_LANEMASK_GE :
def INT_PTX_SREG_LANEMASK_GT :
PTX_READ_SREG_R32<"lanemask_gt", int_nvvm_read_ptx_sreg_lanemask_gt>;
+let hasSideEffects = 1 in {
def INT_PTX_SREG_CLOCK :
PTX_READ_SREG_R32<"clock", int_nvvm_read_ptx_sreg_clock>;
def INT_PTX_SREG_CLOCK64 :
PTX_READ_SREG_R64<"clock64", int_nvvm_read_ptx_sreg_clock64>;
def INT_PTX_SREG_GLOBALTIMER :
PTX_READ_SREG_R64<"globaltimer", int_nvvm_read_ptx_sreg_globaltimer>;
+}
+
+def: Pat <(i64 (readfixedtimer)), (INT_PTX_SREG_GLOBALTIMER)>;
def INT_PTX_SREG_PM0 : PTX_READ_SREG_R32<"pm0", int_nvvm_read_ptx_sreg_pm0>;
def INT_PTX_SREG_PM1 : PTX_READ_SREG_R32<"pm1", int_nvvm_read_ptx_sreg_pm1>;
diff --git a/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll b/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll
new file mode 100644
index 00000000000000..e6af30b90b138d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll
@@ -0,0 +1,24 @@
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX700
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX1100
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX1100
+
+declare i64 @llvm.readfixedtimer() #0
+
+; GCN-LABEL: {{^}}test_readfixedtimer:
+; GFX700: s_mov_b32 s[[REG:[0-9]+]], 0
+; GFX900: s_memrealtime s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
+; GFX900: s_memrealtime s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
+; GFX1100: s_sendmsg_rtn_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]], sendmsg(MSG_RTN_GET_REALTIME)
+; GFX1100: s_sendmsg_rtn_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]], sendmsg(MSG_RTN_GET_REALTIME)
+define amdgpu_kernel void @test_readfixedtimer(ptr addrspace(1) %out) #0 {
+ %cycle0 = call i64 @llvm.readfixedtimer()
+ store volatile i64 %cycle0, ptr addrspace(1) %out
+
+ %cycle1 = call i64 @llvm.readfixedtimer()
+ store volatile i64...
[truncated]
|
@llvm/pr-subscribers-llvm-support Author: Joseph Huber (jhuber6) ChangesSummary: This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to Patch is 21.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81331.diff 26 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index e91156837290f7..a30fc15183bfd4 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -2764,6 +2764,37 @@ Query for this feature with ``__has_builtin(__builtin_readcyclecounter)``. Note
that even if present, its use may depend on run-time privilege or other OS
controlled state.
+``__builtin_readfixedtimer``
+------------------------------
+
+``__builtin_readfixedtimer`` is used to access the fixed frequency counter
+register (or a similar steady-rate clock) on those targets that support it.
+The function is similar to ``__builtin_readcyclecounter`` above except that the
+frequency is fixed, making it suitable for measuring elapsed time.
+
+**Syntax**:
+
+.. code-block:: c++
+
+ __builtin_readfixedtimer()
+
+**Example of Use**:
+
+.. code-block:: c++
+
+ unsigned long long t0 = __builtin_readfixedtimer();
+ do_something();
+ unsigned long long t1 = __builtin_readfixedtimer();
+ unsigned long long secs_to_do_something = (t1 - t0) / tick_rate;
+
+**Description**:
+
+The ``__builtin_readfixedtimer()`` builtin returns the frequency counter value.
+When not supported by the target, the return value is always zero. This builtin
+takes no arguments and produces an unsigned long long result.
+
+Query for this feature with ``__has_builtin(__builtin_readfixedtimer)``.
+
``__builtin_dump_struct``
-------------------------
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 31a2bdeb2d3e5e..3bc043b35e187b 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -1110,6 +1110,12 @@ def ReadCycleCounter : Builtin {
let Prototype = "unsigned long long int()";
}
+def ReadFixedTimer : Builtin {
+ let Spellings = ["__builtin_readfixedtimer"];
+ let Attributes = [NoThrow];
+ let Prototype = "unsigned long long int()";
+}
+
def Trap : Builtin {
let Spellings = ["__builtin_trap"];
let Attributes = [NoThrow, NoReturn];
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index a7a410dab1a018..8da8bbc56758d5 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -3443,6 +3443,10 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
Function *F = CGM.getIntrinsic(Intrinsic::readcyclecounter);
return RValue::get(Builder.CreateCall(F));
}
+ case Builtin::BI__builtin_readfixedtimer: {
+ Function *F = CGM.getIntrinsic(Intrinsic::readfixedtimer);
+ return RValue::get(Builder.CreateCall(F));
+ }
case Builtin::BI__builtin___clear_cache: {
Value *Begin = EmitScalarExpr(E->getArg(0));
Value *End = EmitScalarExpr(E->getArg(1));
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 349d1286c8dc4f..882e80c521e897 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1179,6 +1179,12 @@ enum NodeType {
/// counter-like register (or other high accuracy low latency clock source).
READCYCLECOUNTER,
+ /// READFIXEDTIMER - This corresponds to the readfixedcounter intrinsic.
+ /// It has the same semantics as the READCYCLECOUNTER implementation except
+ /// that the result is the content of the architecture-specific fixed
+ /// frequency counter suitable for measuring elapsed time.
+ READFIXEDTIMER,
+
/// HANDLENODE node - Used as a handle for various purposes.
HANDLENODE,
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 3c19c7b063652c..4d7c57944f3778 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -870,6 +870,8 @@ def int_pcmarker : DefaultAttrsIntrinsic<[], [llvm_i32_ty]>;
def int_readcyclecounter : DefaultAttrsIntrinsic<[llvm_i64_ty]>;
+def int_readfixedtimer : DefaultAttrsIntrinsic<[llvm_i64_ty]>;
+
// The assume intrinsic is marked InaccessibleMemOnly so that proper control
// dependencies will be maintained.
def int_assume : DefaultAttrsIntrinsic<
diff --git a/llvm/include/llvm/Support/TargetOpcodes.def b/llvm/include/llvm/Support/TargetOpcodes.def
index abb237083d254e..29c6b6488ebb72 100644
--- a/llvm/include/llvm/Support/TargetOpcodes.def
+++ b/llvm/include/llvm/Support/TargetOpcodes.def
@@ -352,6 +352,9 @@ HANDLE_TARGET_OPCODE(G_INTRINSIC_ROUNDEVEN)
/// INTRINSIC readcyclecounter
HANDLE_TARGET_OPCODE(G_READCYCLECOUNTER)
+/// INTRINSIC readfixedtimer
+HANDLE_TARGET_OPCODE(G_READFIXEDTIMER)
+
/// Generic load (including anyext load)
HANDLE_TARGET_OPCODE(G_LOAD)
diff --git a/llvm/include/llvm/Target/GenericOpcodes.td b/llvm/include/llvm/Target/GenericOpcodes.td
index 2c73b67f9e1af0..1b3f84c1d782bf 100644
--- a/llvm/include/llvm/Target/GenericOpcodes.td
+++ b/llvm/include/llvm/Target/GenericOpcodes.td
@@ -1101,6 +1101,12 @@ def G_READCYCLECOUNTER : GenericInstruction {
let hasSideEffects = true;
}
+def G_READFIXEDTIMER : GenericInstruction {
+ let OutOperandList = (outs type0:$dst);
+ let InOperandList = (ins);
+ let hasSideEffects = true;
+}
+
//------------------------------------------------------------------------------
// Memory ops
//------------------------------------------------------------------------------
diff --git a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
index f792237203b431..e3e9622ae919d9 100644
--- a/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
+++ b/llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
@@ -167,6 +167,7 @@ def : GINodeEquiv<G_FMAXNUM_IEEE, fmaxnum_ieee>;
def : GINodeEquiv<G_FMAXIMUM, fmaximum>;
def : GINodeEquiv<G_FMINIMUM, fminimum>;
def : GINodeEquiv<G_READCYCLECOUNTER, readcyclecounter>;
+def : GINodeEquiv<G_READFIXEDTIMER, readfixedtimer>;
def : GINodeEquiv<G_ROTR, rotr>;
def : GINodeEquiv<G_ROTL, rotl>;
def : GINodeEquiv<G_LROUND, lround>;
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 22360353790dbc..bedfa6807f425a 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -657,6 +657,9 @@ def prefetch : SDNode<"ISD::PREFETCH" , SDTPrefetch,
def readcyclecounter : SDNode<"ISD::READCYCLECOUNTER", SDTIntLeaf,
[SDNPHasChain, SDNPSideEffect]>;
+def readfixedtimer : SDNode<"ISD::READFIXEDTIMER", SDTIntLeaf,
+ [SDNPHasChain, SDNPSideEffect]>;
+
def membarrier : SDNode<"ISD::MEMBARRIER", SDTNone,
[SDNPHasChain, SDNPSideEffect]>;
diff --git a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
index c1d8e890a66edb..2b2d03ab69c01c 100644
--- a/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
@@ -1885,6 +1885,8 @@ unsigned IRTranslator::getSimpleIntrinsicOpcode(Intrinsic::ID ID) {
return TargetOpcode::G_INTRINSIC_TRUNC;
case Intrinsic::readcyclecounter:
return TargetOpcode::G_READCYCLECOUNTER;
+ case Intrinsic::readfixedtimer:
+ return TargetOpcode::G_READFIXEDTIMER;
case Intrinsic::ptrmask:
return TargetOpcode::G_PTRMASK;
case Intrinsic::lrint:
diff --git a/llvm/lib/CodeGen/IntrinsicLowering.cpp b/llvm/lib/CodeGen/IntrinsicLowering.cpp
index 61920a0e04ab59..106799bc306850 100644
--- a/llvm/lib/CodeGen/IntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/IntrinsicLowering.cpp
@@ -312,6 +312,12 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0));
break;
}
+ case Intrinsic::readfixedtimer: {
+ errs() << "WARNING: this target does not support the llvm.readfixedtimer"
+ << " intrinsic. It is being lowered to a constant 0\n";
+ CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0));
+ break;
+ }
case Intrinsic::dbg_declare:
case Intrinsic::dbg_label:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 892bfbd62f0d02..29c2356cdc438c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -1127,8 +1127,9 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
Action = TargetLowering::Custom;
break;
case ISD::READCYCLECOUNTER:
- // READCYCLECOUNTER returns an i64, even if type legalization might have
- // expanded that to several smaller types.
+ case ISD::READFIXEDTIMER:
+ // READCYCLECOUNTER and READFIXEDTIMER return a i64, even if type
+ // legalization might have expanded that to several smaller types.
Action = TLI.getOperationAction(Node->getOpcode(), MVT::i64);
break;
case ISD::READ_REGISTER:
@@ -3080,6 +3081,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Node->getOperand(0));
break;
case ISD::READCYCLECOUNTER:
+ case ISD::READFIXEDTIMER:
// If the target didn't expand this, just return 'zero' and preserve the
// chain.
Results.append(Node->getNumValues() - 1,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 39b7e061554141..643940b69f92ef 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -2648,7 +2648,8 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::LLRINT: ExpandIntRes_XROUND_XRINT(N, Lo, Hi); break;
case ISD::LOAD: ExpandIntRes_LOAD(cast<LoadSDNode>(N), Lo, Hi); break;
case ISD::MUL: ExpandIntRes_MUL(N, Lo, Hi); break;
- case ISD::READCYCLECOUNTER: ExpandIntRes_READCYCLECOUNTER(N, Lo, Hi); break;
+ case ISD::READCYCLECOUNTER:
+ case ISD::READFIXEDTIMER: ExpandIntRes_READCOUNTER(N, Lo, Hi); break;
case ISD::SDIV: ExpandIntRes_SDIV(N, Lo, Hi); break;
case ISD::SIGN_EXTEND: ExpandIntRes_SIGN_EXTEND(N, Lo, Hi); break;
case ISD::SIGN_EXTEND_INREG: ExpandIntRes_SIGN_EXTEND_INREG(N, Lo, Hi); break;
@@ -4026,8 +4027,8 @@ void DAGTypeLegalizer::ExpandIntRes_MUL(SDNode *N,
Lo, Hi);
}
-void DAGTypeLegalizer::ExpandIntRes_READCYCLECOUNTER(SDNode *N, SDValue &Lo,
- SDValue &Hi) {
+void DAGTypeLegalizer::ExpandIntRes_READCOUNTER(SDNode *N, SDValue &Lo,
+ SDValue &Hi) {
SDLoc DL(N);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDVTList VTs = DAG.getVTList(NVT, NVT, MVT::Other);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 09f0bca8b8611e..91149871628574 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -439,7 +439,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void ExpandIntRes_CTPOP (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_CTTZ (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_LOAD (LoadSDNode *N, SDValue &Lo, SDValue &Hi);
- void ExpandIntRes_READCYCLECOUNTER (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_READCOUNTER (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_SIGN_EXTEND_INREG (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_TRUNCATE (SDNode *N, SDValue &Lo, SDValue &Hi);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 5ce1013f30fd1b..ee1164c48140b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -6781,6 +6781,14 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
DAG.setRoot(Res.getValue(1));
return;
}
+ case Intrinsic::readfixedtimer: {
+ SDValue Op = getRoot();
+ Res = DAG.getNode(ISD::READFIXEDTIMER, sdl,
+ DAG.getVTList(MVT::i64, MVT::Other), Op);
+ setValue(&I, Res);
+ DAG.setRoot(Res.getValue(1));
+ return;
+ }
case Intrinsic::bitreverse:
setValue(&I, DAG.getNode(ISD::BITREVERSE, sdl,
getValue(I.getArgOperand(0)).getValueType(),
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index a28d834f0522f2..0d308fa8ad16be 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::ATOMIC_STORE: return "AtomicStore";
case ISD::PCMARKER: return "PCMarker";
case ISD::READCYCLECOUNTER: return "ReadCycleCounter";
+ case ISD::READFIXEDTIMER: return "ReadFixedTimer";
case ISD::SRCVALUE: return "SrcValue";
case ISD::MDNODE_SDNODE: return "MDNode";
case ISD::EntryToken: return "EntryToken";
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index d8302ba2b42608..e06b266ba3cc1d 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -964,6 +964,9 @@ void TargetLoweringBase::initActions() {
// Most targets also ignore the @llvm.readcyclecounter intrinsic.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);
+ // Most targets also ignore the @llvm.readfixedtimer intrinsic.
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Expand);
+
// ConstantFP nodes default to expand. Targets can either change this to
// Legal, in which case all fp constants are legal, or use isFPImmLegal()
// to optimize expansions for certain constants.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 97952de3e6a37b..63f843ea94fd71 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -1957,6 +1957,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
getActionDefinitionsBuilder(G_READCYCLECOUNTER)
.legalFor({S64});
+ getActionDefinitionsBuilder(G_READFIXEDTIMER).legalFor({S64});
+
getActionDefinitionsBuilder(G_FENCE)
.alwaysLegal();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 5323e4fc58de80..4a2c2ecdd8dd8b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4051,6 +4051,7 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
case AMDGPU::G_CONSTANT:
case AMDGPU::G_GLOBAL_VALUE:
case AMDGPU::G_BLOCK_ADDR:
+ case AMDGPU::G_READFIXEDTIMER:
case AMDGPU::G_READCYCLECOUNTER: {
unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a64a9e608f2173..361e7421f15dc4 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -468,6 +468,10 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
// On SI this is s_memtime and s_memrealtime on VI.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
+
+ if (Subtarget->hasSMemRealTime() ||
+ Subtarget->getGeneration() >= AMDGPUSubtarget::GFX11)
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Legal);
setOperationAction({ISD::TRAP, ISD::DEBUGTRAP}, MVT::Other, Custom);
if (Subtarget->has16BitInsts()) {
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index f3096962e2f3e8..680b32dd855e55 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -1068,6 +1068,20 @@ def : GCNPat <
}
} // let OtherPredicates = [HasShaderCyclesRegister]
+let OtherPredicates = [HasSMemRealTime] in {
+def : GCNPat <
+ (i64 (readfixedtimer)),
+ (S_MEMREALTIME)
+>;
+} // let OtherPredicates = [HasSMemRealTime]
+
+let SubtargetPredicate = isGFX11Plus in {
+def : GCNPat <
+ (i64 (readfixedtimer)),
+ (S_SENDMSG_RTN_B64 (i32 /*MSG_RTN_GET_REALTIME=*/0x83))
+>;
+} // let SubtargetPredicate = [isGFX11Plus]
+
def i32imm_zero : TImmLeaf <i32, [{
return Imm == 0;
}]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c7bc623a88e1b9..47b2d538063f9d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -489,6 +489,9 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v2f16, Expand);
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f16, Expand);
+ if (STI.getSmVersion() >= 30 && STI.getPTXVersion() > 31)
+ setOperationAction(ISD::READFIXEDTIMER, MVT::i64, Legal);
+
setFP16OperationAction(ISD::SETCC, MVT::f16, Legal, Promote);
setFP16OperationAction(ISD::SETCC, MVT::v2f16, Legal, Expand);
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 365afc6bd8c617..da99e3ec125345 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -3759,7 +3759,6 @@ def CALL_PROTOTYPE :
include "NVPTXIntrinsics.td"
-
//-----------------------------------
// Notes
//-----------------------------------
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 2330d7213c26dc..3d9023dc6697b4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -6372,12 +6372,16 @@ def INT_PTX_SREG_LANEMASK_GE :
def INT_PTX_SREG_LANEMASK_GT :
PTX_READ_SREG_R32<"lanemask_gt", int_nvvm_read_ptx_sreg_lanemask_gt>;
+let hasSideEffects = 1 in {
def INT_PTX_SREG_CLOCK :
PTX_READ_SREG_R32<"clock", int_nvvm_read_ptx_sreg_clock>;
def INT_PTX_SREG_CLOCK64 :
PTX_READ_SREG_R64<"clock64", int_nvvm_read_ptx_sreg_clock64>;
def INT_PTX_SREG_GLOBALTIMER :
PTX_READ_SREG_R64<"globaltimer", int_nvvm_read_ptx_sreg_globaltimer>;
+}
+
+def: Pat <(i64 (readfixedtimer)), (INT_PTX_SREG_GLOBALTIMER)>;
def INT_PTX_SREG_PM0 : PTX_READ_SREG_R32<"pm0", int_nvvm_read_ptx_sreg_pm0>;
def INT_PTX_SREG_PM1 : PTX_READ_SREG_R32<"pm1", int_nvvm_read_ptx_sreg_pm1>;
diff --git a/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll b/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll
new file mode 100644
index 00000000000000..e6af30b90b138d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/readfixedtimer.ll
@@ -0,0 +1,24 @@
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX700
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX900
+; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX1100
+; RUN: llc -global-isel=1 -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GCN,GFX1100
+
+declare i64 @llvm.readfixedtimer() #0
+
+; GCN-LABEL: {{^}}test_readfixedtimer:
+; GFX700: s_mov_b32 s[[REG:[0-9]+]], 0
+; GFX900: s_memrealtime s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
+; GFX900: s_memrealtime s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
+; GFX1100: s_sendmsg_rtn_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]], sendmsg(MSG_RTN_GET_REALTIME)
+; GFX1100: s_sendmsg_rtn_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]], sendmsg(MSG_RTN_GET_REALTIME)
+define amdgpu_kernel void @test_readfixedtimer(ptr addrspace(1) %out) #0 {
+ %cycle0 = call i64 @llvm.readfixedtimer()
+ store volatile i64 %cycle0, ptr addrspace(1) %out
+
+ %cycle1 = call i64 @llvm.readfixedtimer()
+ store volatile i64...
[truncated]
|
Formatting is expected to fail to preserve local style. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Just not sure about the name. "fixed timer" sounds pretty confusing to me. probably readfixedfreqtimer
?
case Intrinsic::readfixedtimer: { | ||
errs() << "WARNING: this target does not support the llvm.readfixedtimer" | ||
<< " intrinsic. It is being lowered to a constant 0\n"; | ||
CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant::getNullValue(Type::getInt64Ty(Context))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from the other one, but either way works. I think setting it to zero is a little more explicit.
Naming is the hard part. I was also thinking about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the clang builtin test
@@ -312,6 +312,12 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) { | |||
CI->replaceAllUsesWith(ConstantInt::get(Type::getInt64Ty(Context), 0)); | |||
break; | |||
} | |||
case Intrinsic::readfixedtimer: { | |||
errs() << "WARNING: this target does not support the llvm.readfixedtimer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised we have this random printing here. Is IntrinicLowering even used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I could tell, but I figured better safe than sorry.
@@ -3759,7 +3759,6 @@ def CALL_PROTOTYPE : | |||
|
|||
include "NVPTXIntrinsics.td" | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random whitespace change
Looks good.
Missing |
__builtin_readfixedtimer
intrinsic and buiiltin for realtime clocks__builtin_readfixedtimer
intrinsic and buiiltin for realtime clocks
6b85d8e
to
164d977
Compare
__builtin_readfixedtimer
intrinsic and buiiltin for realtime clocks__builtin_readsteadycounter
intrinsic and buiiltin for realtime clocks
Added clang test and renamed to |
1099392
to
3034107
Compare
__builtin_readsteadycounter
intrinsic and buiiltin for realtime clocks__builtin_readsteadycounter
intrinsic and builtin for realtime clocks
Are we assuming any particular relationship to __builtin_readcyclecounter in terms of scales etc? __builtin_readsteadycounter could be used to access x86 MPERF clock counters, but to access the corresponding APERF clock we'd then need a __builtin_readvariablecounter equivalent (__builtin_readcyclecounter gives the separate RDTSC clock value) |
Not currently at least. The idea is just to expose the counter that has a fixed tick rate, compared to the cycle counter which is variable. |
New intrinsic sounds right - a constant frequency counter is a different thing to a variable frequency counter. "Steady" implies unchanging, so I'd agree with We can't have a ratio between the two counters since one changes frequency and one doesn't. Does x64 have something that maps usefully onto a fixed frequency counter intrinsic? |
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with few nits for general and NVPTX parts.
``__builtin_readsteadycounter`` is used to access the fixed frequency counter | ||
register (or a similar steady-rate clock) on those targets that support it. | ||
The function is similar to ``__builtin_readcyclecounter`` above except that the | ||
frequency is fixed, making it suitable for measuring elapsed time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that we do not guarantee any particular frequency, just that it's stable and it's up to the user to figure out the actual frequency, if they need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
@@ -104,6 +104,7 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const { | |||
case ISD::ATOMIC_STORE: return "AtomicStore"; | |||
case ISD::PCMARKER: return "PCMarker"; | |||
case ISD::READCYCLECOUNTER: return "ReadCycleCounter"; | |||
case ISD::READSTEADYCOUNTER: return "ReadFixedTimer"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be "ReadSteadyCounter" ?
Also, whitespace/alignment looks off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to fix that when I renamed it, thanks for spotting it.
db53943
to
50c0bac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to release notes?
Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This is currently only valid for the NVPTX and AMDGPU targets.
50c0bac
to
4a0ee4b
Compare
Done |
Summary: Recent patches have added solutions to the remaining sources of divergence. This patch simply removes the last occures of things like `has_builtin`, `ifdef` or builtins with feature requirements. The one exception here is `nanosleep`, but I made changes in the `__nvvm_reflect` pass to make usage like this actually work at O0. Depends on llvm#81331
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Summary: Recent patches have added solutions to the remaining sources of divergence. This patch simply removes the last occures of things like `has_builtin`, `ifdef` or builtins with feature requirements. The one exception here is `nanosleep`, but I made changes in the `__nvvm_reflect` pass to make usage like this actually work at O0. Depends on llvm#81331
Summary: Recent patches have added solutions to the remaining sources of divergence. This patch simply removes the last occures of things like `has_builtin`, `ifdef` or builtins with feature requirements. The one exception here is `nanosleep`, but I made changes in the `__nvvm_reflect` pass to make usage like this actually work at O0. Depends on #81331
This intrinsic was introduced by #81331, which is a lot like `llvm.readcyclecounter`. For the RISCV implementation, we rename `ReadCycleWide` pseudo to `ReadCounterWide` and make it accept two operands (the low and high parts of the counter). As for legalization and lowering parts, we reuse the code of `ISD::READCYCLECOUNTER` (make it able to handle both intrinsics), and we use `time` CSR for `ISD::READSTEADYCOUNTER`. Tests using Clang builtins are runned on real hardware and it works as excepted. Reviewers: asb, MaskRay, dtcxzyw, preames, topperc, jhuber6 Reviewed By: jhuber6, asb, MaskRay, dtcxzyw Pull Request: #82322
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing
__builtin_readcyclecounter
. The difference is that thisimplementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.
This patch only adds support for the NVPTX and AMDGPU targets.
This is done as a new and separate builtin rather than an argument to
readcyclecounter
to avoid needing to change existing code and to makethe separation more explicit.