[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431

YonahGoldberg · 2025-12-16T07:36:00Z

Transform patterns like:

(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
(select (ult shift, BitWidth), (srl/shl x, shift), 0)

Into:

(srl/shl x, shift)

These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.

llvmbot · 2025-12-16T07:36:31Z

@llvm/pr-subscribers-backend-nvptx

Author: Yonah Goldberg (YonahGoldberg)

Changes

Transform patterns like:

(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
(select (ult shift, BitWidth), (srl/shl x, shift), 0)

Into:

(srl/shl x, shift)

These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.

Full diff: https://github.com/llvm/llvm-project/pull/172431.diff

3 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+109-7)
(modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+9)
(modified) llvm/test/CodeGen/NVPTX/shift-opt.ll (+260)

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 92f3865818530..e3dc73f625c3c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -867,13 +867,14 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
 
   // We have some custom DAG combine patterns for these nodes
   setTargetDAGCombine(
-      {ISD::ADD,          ISD::AND,           ISD::EXTRACT_VECTOR_ELT,
-       ISD::FADD,         ISD::FMAXNUM,       ISD::FMINNUM,
-       ISD::FMAXIMUM,     ISD::FMINIMUM,      ISD::FMAXIMUMNUM,
-       ISD::FMINIMUMNUM,  ISD::MUL,           ISD::SHL,
-       ISD::SREM,         ISD::UREM,          ISD::VSELECT,
-       ISD::BUILD_VECTOR, ISD::ADDRSPACECAST, ISD::LOAD,
-       ISD::STORE,        ISD::ZERO_EXTEND,   ISD::SIGN_EXTEND});
+      {ISD::ADD,         ISD::AND,          ISD::EXTRACT_VECTOR_ELT,
+       ISD::FADD,        ISD::FMAXNUM,      ISD::FMINNUM,
+       ISD::FMAXIMUM,    ISD::FMINIMUM,     ISD::FMAXIMUMNUM,
+       ISD::FMINIMUMNUM, ISD::MUL,          ISD::SELECT,
+       ISD::SHL,         ISD::SREM,         ISD::UREM,
+       ISD::VSELECT,     ISD::BUILD_VECTOR, ISD::ADDRSPACECAST,
+       ISD::LOAD,        ISD::STORE,        ISD::ZERO_EXTEND,
+       ISD::SIGN_EXTEND});
 
   // setcc for f16x2 and bf16x2 needs special handling to prevent
   // legalizer's attempt to scalarize it due to v2i1 not being legal.
@@ -6233,6 +6234,105 @@ static SDValue PerformEXTRACTCombine(SDNode *N,
   return Result;
 }
 
+/// Transform patterns like:
+///   (select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
+///   (select (ult shift, BitWidth), (srl/shl x, shift), 0)
+/// Into:
+///   (NVPTXISD::SRL_CLAMP x, shift) or (NVPTXISD::SHL_CLAMP x, shift)
+///
+/// These patterns arise from C/C++ code like `shift >= 32 ? 0 : x >> shift`
+/// which guards against undefined behavior. PTX shr/shl instructions clamp
+/// shift amounts >= BitWidth to produce 0 for logical shifts, making the
+/// guard redundant.
+///
+/// Note: We only handle SRL and SHL, not SRA, because arithmetic right
+/// shifts produce sign-extended results (0 or -1) when shift >= BitWidth,
+/// which differs from the C pattern that always returns 0.
+static SDValue PerformSELECTShiftCombine(SDNode *N,
+                                         TargetLowering::DAGCombinerInfo &DCI) {
+  SDValue Cond = N->getOperand(0);
+  SDValue TrueVal = N->getOperand(1);
+  SDValue FalseVal = N->getOperand(2);
+
+  // We're looking for:
+  //   (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift))
+  // or:
+  //   (select (setcc shift, BitWidth, ult), (shift x, shift), 0)
+
+  SDValue ZeroVal, ShiftOp;
+  bool Inverted = false;
+
+  if (isConstZero(TrueVal)) {
+    ZeroVal = TrueVal;
+    ShiftOp = FalseVal;
+  } else if (isConstZero(FalseVal)) {
+    ZeroVal = FalseVal;
+    ShiftOp = TrueVal;
+    Inverted = true;
+  } else {
+    return SDValue();
+  }
+
+  // Only handle logical shifts (SRL, SHL), not arithmetic (SRA)
+  unsigned ShiftOpc = ShiftOp.getOpcode();
+  if (ShiftOpc != ISD::SRL && ShiftOpc != ISD::SHL)
+    return SDValue();
+
+  // Condition must be a SETCC
+  if (Cond.getOpcode() != ISD::SETCC)
+    return SDValue();
+
+  ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
+  SDValue CondLHS = Cond.getOperand(0);
+  SDValue CondRHS = Cond.getOperand(1);
+  SDValue ShiftAmt = ShiftOp.getOperand(1);
+
+  // Look through truncation - NVPTX truncates 64-bit shift amounts to 32-bit
+  if (ShiftAmt.getOpcode() == ISD::TRUNCATE)
+    ShiftAmt = ShiftAmt.getOperand(0);
+
+  // The value being compared must be the same as the shift amount.
+  // e.g., in "shift > 31 ? 0 : x >> shift", both must use the same 'shift'.
+  if (CondLHS != ShiftAmt)
+    return SDValue();
+
+  auto *Threshold = dyn_cast<ConstantSDNode>(CondRHS);
+  if (!Threshold)
+    return SDValue();
+
+  unsigned BitWidth = ShiftOp.getValueType().getSizeInBits();
+  uint64_t ThreshVal = Threshold->getZExtValue();
+
+  // Check for valid patterns based on select orientation:
+  //
+  // When TrueVal is 0 (not inverted):
+  //   (select (ugt shift, BitWidth-1), 0, shift_result)
+  //   i.e., shift > 31 ? 0 : x >> shift
+  //
+  // When FalseVal is 0 (inverted):
+  //   (select (ult shift, BitWidth), shift_result, 0)
+  //   i.e., shift < 32 ? x >> shift : 0
+  //
+  // Both patterns return 0 when shift >= BitWidth, which PTX handles natively.
+  bool ValidPattern = false;
+  if (!Inverted && CC == ISD::SETUGT && ThreshVal == BitWidth - 1)
+    ValidPattern = true;
+  else if (Inverted && CC == ISD::SETULT && ThreshVal == BitWidth)
+    ValidPattern = true;
+
+  if (!ValidPattern)
+    return SDValue();
+
+  // Pattern matched! Return a custom clamp node that has defined semantics
+  // for out-of-range shift amounts, matching PTX's clamping behavior.
+  SelectionDAG &DAG = DCI.DAG;
+  SDLoc DL(N);
+  unsigned ClampOpc =
+      (ShiftOpc == ISD::SRL) ? NVPTXISD::SRL_CLAMP : NVPTXISD::SHL_CLAMP;
+  return DAG.getNode(ClampOpc, DL, ShiftOp.getValueType(),
+                     ShiftOp.getOperand(0), ShiftOp.getOperand(1));
+}
+
 static SDValue PerformVSELECTCombine(SDNode *N,
                                      TargetLowering::DAGCombinerInfo &DCI) {
   SDValue VA = N->getOperand(1);
@@ -6544,6 +6644,8 @@ SDValue NVPTXTargetLowering::PerformDAGCombine(SDNode *N,
   case NVPTXISD::StoreV2:
   case NVPTXISD::StoreV4:
     return combineSTORE(N, DCI, STI);
+  case ISD::SELECT:
+    return PerformSELECTShiftCombine(N, DCI);
   case ISD::VSELECT:
     return PerformVSELECTCombine(N, DCI);
   }
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 9c1ec38fb1f3a..74a552502ccf2 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -1323,6 +1323,15 @@ defm SHL : SHIFT<"shl.b", shl>;
 defm SRA : SHIFT<"shr.s", sra>;
 defm SRL : SHIFT<"shr.u", srl>;
 
+// Shift with clamping semantics - these have defined behavior for shift amounts
+// >= BitWidth (returning 0 for logical shifts). Used to optimize guarded shift
+// patterns like `shift >= 32 ? 0 : x >> shift`.
+def shl_clamp : SDNode<"NVPTXISD::SHL_CLAMP", SDTIntShiftOp, []>;
+def srl_clamp : SDNode<"NVPTXISD::SRL_CLAMP", SDTIntShiftOp, []>;
+
+defm SHL_CLAMP : SHIFT<"shl.b", shl_clamp>;
+defm SRL_CLAMP : SHIFT<"shr.u", srl_clamp>;
+
 // Bit-reverse
 foreach t = [I64RT, I32RT] in
   def BREV_ # t.PtxType :
diff --git a/llvm/test/CodeGen/NVPTX/shift-opt.ll b/llvm/test/CodeGen/NVPTX/shift-opt.ll
index e0d22c62993ba..06ab7b1cdc09b 100644
--- a/llvm/test/CodeGen/NVPTX/shift-opt.ll
+++ b/llvm/test/CodeGen/NVPTX/shift-opt.ll
@@ -171,3 +171,263 @@ define i64 @test_negative_use_shl(i64 %x, i32 %y) {
   call void @use(i64 %shl)
   ret i64 %srl
 }
+
+;; ============================================================================
+;; Guarded shift patterns
+;;
+;; C/C++ code like `shift >= 32 ? 0 : x >> shift` generates a guarded shift
+;; pattern to avoid undefined behavior. PTX shr/shl instructions clamp shift
+;; amounts >= BitWidth to produce 0, making the guard redundant.
+;;
+;; Transformation 1 (ugt form):
+;;   (select (icmp ugt shift, BitWidth-1), 0, (srl x, shift))
+;;   i.e., shift > 31 ? 0 : x >> shift
+;;   -->  (srl x, shift)
+;;
+;; Transformation 2 (ult form, equivalent after InstCombine):
+;;   (select (icmp ult shift, BitWidth), (srl x, shift), 0)
+;;   i.e., shift < 32 ? x >> shift : 0
+;;   -->  (srl x, shift)
+;;
+;; Same applies to shl (left shift).
+;; ============================================================================
+
+;; --- i16 shr tests ---
+
+; (select (ugt shift, 15), 0, (srl x, shift)) --> (srl x, shift)
+define i16 @test_guarded_i16_ugt(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ugt_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ugt_param_1];
+; CHECK-NEXT:    shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i16 %shift, 15
+  %shr = lshr i16 %x, %shift
+  %sel = select i1 %cmp, i16 0, i16 %shr
+  ret i16 %sel
+}
+
+; (select (ult shift, 16), (srl x, shift), 0) --> (srl x, shift)
+define i16 @test_guarded_i16_ult(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ult_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ult_param_1];
+; CHECK-NEXT:    shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i16 %shift, 16
+  %shr = lshr i16 %x, %shift
+  %sel = select i1 %cmp, i16 %shr, i16 0
+  ret i16 %sel
+}
+
+;; --- i32 shr tests ---
+
+; (select (ugt shift, 31), 0, (srl x, shift)) --> (srl x, shift)
+define i32 @test_guarded_i32_ugt(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ugt_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ugt_param_1];
+; CHECK-NEXT:    shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i32 %shift, 31
+  %shr = lshr i32 %x, %shift
+  %sel = select i1 %cmp, i32 0, i32 %shr
+  ret i32 %sel
+}
+
+; (select (ult shift, 32), (srl x, shift), 0) --> (srl x, shift)
+define i32 @test_guarded_i32_ult(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ult_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ult_param_1];
+; CHECK-NEXT:    shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i32 %shift, 32
+  %shr = lshr i32 %x, %shift
+  %sel = select i1 %cmp, i32 %shr, i32 0
+  ret i32 %sel
+}
+
+;; --- i64 shr tests ---
+
+; (select (ugt shift, 63), 0, (srl x, shift)) --> (srl x, shift)
+define i64 @test_guarded_i64_ugt(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ugt_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ugt_param_1];
+; CHECK-NEXT:    shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i64 %shift, 63
+  %shr = lshr i64 %x, %shift
+  %sel = select i1 %cmp, i64 0, i64 %shr
+  ret i64 %sel
+}
+
+; (select (ult shift, 64), (srl x, shift), 0) --> (srl x, shift)
+define i64 @test_guarded_i64_ult(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ult_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ult_param_1];
+; CHECK-NEXT:    shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i64 %shift, 64
+  %shr = lshr i64 %x, %shift
+  %sel = select i1 %cmp, i64 %shr, i64 0
+  ret i64 %sel
+}
+
+;; --- i16 shl tests ---
+
+; (select (ugt shift, 15), 0, (shl x, shift)) --> (shl x, shift)
+define i16 @test_guarded_i16_ugt_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i16 %shift, 15
+  %shl = shl i16 %x, %shift
+  %sel = select i1 %cmp, i16 0, i16 %shl
+  ret i16 %sel
+}
+
+; (select (ult shift, 16), (shl x, shift), 0) --> (shl x, shift)
+define i16 @test_guarded_i16_ult_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ult_shl_param_1];
+; CHECK-NEXT:    shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i16 %shift, 16
+  %shl = shl i16 %x, %shift
+  %sel = select i1 %cmp, i16 %shl, i16 0
+  ret i16 %sel
+}
+
+;; --- i32 shl tests ---
+
+; (select (ugt shift, 31), 0, (shl x, shift)) --> (shl x, shift)
+define i32 @test_guarded_i32_ugt_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i32 %shift, 31
+  %shl = shl i32 %x, %shift
+  %sel = select i1 %cmp, i32 0, i32 %shl
+  ret i32 %sel
+}
+
+; (select (ult shift, 32), (shl x, shift), 0) --> (shl x, shift)
+define i32 @test_guarded_i32_ult_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ult_shl_param_1];
+; CHECK-NEXT:    shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i32 %shift, 32
+  %shl = shl i32 %x, %shift
+  %sel = select i1 %cmp, i32 %shl, i32 0
+  ret i32 %sel
+}
+
+;; --- i64 shl tests ---
+
+; (select (ugt shift, 63), 0, (shl x, shift)) --> (shl x, shift)
+define i64 @test_guarded_i64_ugt_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i64 %shift, 63
+  %shl = shl i64 %x, %shift
+  %sel = select i1 %cmp, i64 0, i64 %shl
+  ret i64 %sel
+}
+
+; (select (ult shift, 64), (shl x, shift), 0) --> (shl x, shift)
+define i64 @test_guarded_i64_ult_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ult_shl_param_1];
+; CHECK-NEXT:    shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i64 %shift, 64
+  %shl = shl i64 %x, %shift
+  %sel = select i1 %cmp, i64 %shl, i64 0
+  ret i64 %sel
+}

AlexMaclean · 2025-12-16T17:53:23Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

+  // We're looking for:
+  //   (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift))
+  // or:
+  //   (select (setcc shift, BitWidth, ult), (shift x, shift), 0)


Can you use sd_match here?

Fixed that thanks.

github-actions · 2025-12-16T19:19:50Z

🐧 Linux x64 Test Results

187560 tests passed
4968 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2025-12-16T19:19:50Z

🪟 Windows x64 Test Results

128709 tests passed
2824 tests skipped

✅ The build succeeded and all tests passed.

llvm/test/CodeGen/NVPTX/shift-opt.ll

Artem-B

LGTM

AlexMaclean · 2025-12-16T23:16:34Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

+
+  // shift_amt > threshold ? 0 : shift_op
+  bool MatchedUGT =
+      sd_match(N, m_Select(m_SetCC(m_Value(ShiftAmt), m_Value(Threshold),


Use m_SpecificInt for threshold

Updated, I made it a lot cleaner I think

AlexMaclean

Lets guard this combine to only run after legalization. I worry we could end up folding some weird type we don't support and then end up with a node we cannot select.

YonahGoldberg · 2025-12-17T00:14:07Z

Lets guard this combine to only run after legalization. I worry we could end up folding some weird type we don't support and then end up with a node we cannot select.

Makes a lot of sense. This way I can also eliminate the explicit bitwidth check because after legalization there are no i8 ops.

YonahGoldberg added 5 commits December 16, 2025 07:08

start

67375f9

format

6789ed3

comment

c5e74a0

new

778b2d9

format

a089c53

llvmbot added the backend:NVPTX label Dec 16, 2025

AlexMaclean reviewed Dec 16, 2025

View reviewed changes

AlexMaclean requested review from Artem-B and justinfargnoli December 16, 2025 17:53

YonahGoldberg added 3 commits December 16, 2025 18:48

match

49c2bc8

format

e44d39b

one more test

504034c

accidentally updated tests wrongz'

fb2da3c

Artem-B reviewed Dec 16, 2025

View reviewed changes

llvm/test/CodeGen/NVPTX/shift-opt.ll Show resolved Hide resolved

i8 test + fix bug

0fb6026

Artem-B approved these changes Dec 16, 2025

View reviewed changes

AlexMaclean reviewed Dec 16, 2025

View reviewed changes

YonahGoldberg added 4 commits December 16, 2025 23:46

format

5a73e9c

way cleaner

b00e376

format

4211cf5

style

4b37ae6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431

[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431

YonahGoldberg commented Dec 16, 2025

Uh oh!

llvmbot commented Dec 16, 2025

Uh oh!

AlexMaclean Dec 16, 2025

Uh oh!

YonahGoldberg Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Artem-B left a comment

Uh oh!

AlexMaclean Dec 16, 2025

Uh oh!

YonahGoldberg Dec 17, 2025

Uh oh!

AlexMaclean left a comment

Uh oh!

YonahGoldberg commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431

Are you sure you want to change the base?

[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431

Conversation

YonahGoldberg commented Dec 16, 2025

Uh oh!

llvmbot commented Dec 16, 2025

Uh oh!

AlexMaclean Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

YonahGoldberg Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

YonahGoldberg Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

YonahGoldberg commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 16, 2025 •

edited

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading