Skip to content

Conversation

@YonahGoldberg
Copy link
Contributor

Transform patterns like:

(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
(select (ult shift, BitWidth), (srl/shl x, shift), 0)

Into:

(srl/shl x, shift)

These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.

@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Yonah Goldberg (YonahGoldberg)

Changes

Transform patterns like:

(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
(select (ult shift, BitWidth), (srl/shl x, shift), 0)

Into:

(srl/shl x, shift)

These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.


Full diff: https://github.com/llvm/llvm-project/pull/172431.diff

3 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+109-7)
  • (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+9)
  • (modified) llvm/test/CodeGen/NVPTX/shift-opt.ll (+260)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 92f3865818530..e3dc73f625c3c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -867,13 +867,14 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
 
   // We have some custom DAG combine patterns for these nodes
   setTargetDAGCombine(
-      {ISD::ADD,          ISD::AND,           ISD::EXTRACT_VECTOR_ELT,
-       ISD::FADD,         ISD::FMAXNUM,       ISD::FMINNUM,
-       ISD::FMAXIMUM,     ISD::FMINIMUM,      ISD::FMAXIMUMNUM,
-       ISD::FMINIMUMNUM,  ISD::MUL,           ISD::SHL,
-       ISD::SREM,         ISD::UREM,          ISD::VSELECT,
-       ISD::BUILD_VECTOR, ISD::ADDRSPACECAST, ISD::LOAD,
-       ISD::STORE,        ISD::ZERO_EXTEND,   ISD::SIGN_EXTEND});
+      {ISD::ADD,         ISD::AND,          ISD::EXTRACT_VECTOR_ELT,
+       ISD::FADD,        ISD::FMAXNUM,      ISD::FMINNUM,
+       ISD::FMAXIMUM,    ISD::FMINIMUM,     ISD::FMAXIMUMNUM,
+       ISD::FMINIMUMNUM, ISD::MUL,          ISD::SELECT,
+       ISD::SHL,         ISD::SREM,         ISD::UREM,
+       ISD::VSELECT,     ISD::BUILD_VECTOR, ISD::ADDRSPACECAST,
+       ISD::LOAD,        ISD::STORE,        ISD::ZERO_EXTEND,
+       ISD::SIGN_EXTEND});
 
   // setcc for f16x2 and bf16x2 needs special handling to prevent
   // legalizer's attempt to scalarize it due to v2i1 not being legal.
@@ -6233,6 +6234,105 @@ static SDValue PerformEXTRACTCombine(SDNode *N,
   return Result;
 }
 
+/// Transform patterns like:
+///   (select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
+///   (select (ult shift, BitWidth), (srl/shl x, shift), 0)
+/// Into:
+///   (NVPTXISD::SRL_CLAMP x, shift) or (NVPTXISD::SHL_CLAMP x, shift)
+///
+/// These patterns arise from C/C++ code like `shift >= 32 ? 0 : x >> shift`
+/// which guards against undefined behavior. PTX shr/shl instructions clamp
+/// shift amounts >= BitWidth to produce 0 for logical shifts, making the
+/// guard redundant.
+///
+/// Note: We only handle SRL and SHL, not SRA, because arithmetic right
+/// shifts produce sign-extended results (0 or -1) when shift >= BitWidth,
+/// which differs from the C pattern that always returns 0.
+static SDValue PerformSELECTShiftCombine(SDNode *N,
+                                         TargetLowering::DAGCombinerInfo &DCI) {
+  SDValue Cond = N->getOperand(0);
+  SDValue TrueVal = N->getOperand(1);
+  SDValue FalseVal = N->getOperand(2);
+
+  // We're looking for:
+  //   (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift))
+  // or:
+  //   (select (setcc shift, BitWidth, ult), (shift x, shift), 0)
+
+  SDValue ZeroVal, ShiftOp;
+  bool Inverted = false;
+
+  if (isConstZero(TrueVal)) {
+    ZeroVal = TrueVal;
+    ShiftOp = FalseVal;
+  } else if (isConstZero(FalseVal)) {
+    ZeroVal = FalseVal;
+    ShiftOp = TrueVal;
+    Inverted = true;
+  } else {
+    return SDValue();
+  }
+
+  // Only handle logical shifts (SRL, SHL), not arithmetic (SRA)
+  unsigned ShiftOpc = ShiftOp.getOpcode();
+  if (ShiftOpc != ISD::SRL && ShiftOpc != ISD::SHL)
+    return SDValue();
+
+  // Condition must be a SETCC
+  if (Cond.getOpcode() != ISD::SETCC)
+    return SDValue();
+
+  ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
+  SDValue CondLHS = Cond.getOperand(0);
+  SDValue CondRHS = Cond.getOperand(1);
+  SDValue ShiftAmt = ShiftOp.getOperand(1);
+
+  // Look through truncation - NVPTX truncates 64-bit shift amounts to 32-bit
+  if (ShiftAmt.getOpcode() == ISD::TRUNCATE)
+    ShiftAmt = ShiftAmt.getOperand(0);
+
+  // The value being compared must be the same as the shift amount.
+  // e.g., in "shift > 31 ? 0 : x >> shift", both must use the same 'shift'.
+  if (CondLHS != ShiftAmt)
+    return SDValue();
+
+  auto *Threshold = dyn_cast<ConstantSDNode>(CondRHS);
+  if (!Threshold)
+    return SDValue();
+
+  unsigned BitWidth = ShiftOp.getValueType().getSizeInBits();
+  uint64_t ThreshVal = Threshold->getZExtValue();
+
+  // Check for valid patterns based on select orientation:
+  //
+  // When TrueVal is 0 (not inverted):
+  //   (select (ugt shift, BitWidth-1), 0, shift_result)
+  //   i.e., shift > 31 ? 0 : x >> shift
+  //
+  // When FalseVal is 0 (inverted):
+  //   (select (ult shift, BitWidth), shift_result, 0)
+  //   i.e., shift < 32 ? x >> shift : 0
+  //
+  // Both patterns return 0 when shift >= BitWidth, which PTX handles natively.
+  bool ValidPattern = false;
+  if (!Inverted && CC == ISD::SETUGT && ThreshVal == BitWidth - 1)
+    ValidPattern = true;
+  else if (Inverted && CC == ISD::SETULT && ThreshVal == BitWidth)
+    ValidPattern = true;
+
+  if (!ValidPattern)
+    return SDValue();
+
+  // Pattern matched! Return a custom clamp node that has defined semantics
+  // for out-of-range shift amounts, matching PTX's clamping behavior.
+  SelectionDAG &DAG = DCI.DAG;
+  SDLoc DL(N);
+  unsigned ClampOpc =
+      (ShiftOpc == ISD::SRL) ? NVPTXISD::SRL_CLAMP : NVPTXISD::SHL_CLAMP;
+  return DAG.getNode(ClampOpc, DL, ShiftOp.getValueType(),
+                     ShiftOp.getOperand(0), ShiftOp.getOperand(1));
+}
+
 static SDValue PerformVSELECTCombine(SDNode *N,
                                      TargetLowering::DAGCombinerInfo &DCI) {
   SDValue VA = N->getOperand(1);
@@ -6544,6 +6644,8 @@ SDValue NVPTXTargetLowering::PerformDAGCombine(SDNode *N,
   case NVPTXISD::StoreV2:
   case NVPTXISD::StoreV4:
     return combineSTORE(N, DCI, STI);
+  case ISD::SELECT:
+    return PerformSELECTShiftCombine(N, DCI);
   case ISD::VSELECT:
     return PerformVSELECTCombine(N, DCI);
   }
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 9c1ec38fb1f3a..74a552502ccf2 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -1323,6 +1323,15 @@ defm SHL : SHIFT<"shl.b", shl>;
 defm SRA : SHIFT<"shr.s", sra>;
 defm SRL : SHIFT<"shr.u", srl>;
 
+// Shift with clamping semantics - these have defined behavior for shift amounts
+// >= BitWidth (returning 0 for logical shifts). Used to optimize guarded shift
+// patterns like `shift >= 32 ? 0 : x >> shift`.
+def shl_clamp : SDNode<"NVPTXISD::SHL_CLAMP", SDTIntShiftOp, []>;
+def srl_clamp : SDNode<"NVPTXISD::SRL_CLAMP", SDTIntShiftOp, []>;
+
+defm SHL_CLAMP : SHIFT<"shl.b", shl_clamp>;
+defm SRL_CLAMP : SHIFT<"shr.u", srl_clamp>;
+
 // Bit-reverse
 foreach t = [I64RT, I32RT] in
   def BREV_ # t.PtxType :
diff --git a/llvm/test/CodeGen/NVPTX/shift-opt.ll b/llvm/test/CodeGen/NVPTX/shift-opt.ll
index e0d22c62993ba..06ab7b1cdc09b 100644
--- a/llvm/test/CodeGen/NVPTX/shift-opt.ll
+++ b/llvm/test/CodeGen/NVPTX/shift-opt.ll
@@ -171,3 +171,263 @@ define i64 @test_negative_use_shl(i64 %x, i32 %y) {
   call void @use(i64 %shl)
   ret i64 %srl
 }
+
+;; ============================================================================
+;; Guarded shift patterns
+;;
+;; C/C++ code like `shift >= 32 ? 0 : x >> shift` generates a guarded shift
+;; pattern to avoid undefined behavior. PTX shr/shl instructions clamp shift
+;; amounts >= BitWidth to produce 0, making the guard redundant.
+;;
+;; Transformation 1 (ugt form):
+;;   (select (icmp ugt shift, BitWidth-1), 0, (srl x, shift))
+;;   i.e., shift > 31 ? 0 : x >> shift
+;;   -->  (srl x, shift)
+;;
+;; Transformation 2 (ult form, equivalent after InstCombine):
+;;   (select (icmp ult shift, BitWidth), (srl x, shift), 0)
+;;   i.e., shift < 32 ? x >> shift : 0
+;;   -->  (srl x, shift)
+;;
+;; Same applies to shl (left shift).
+;; ============================================================================
+
+;; --- i16 shr tests ---
+
+; (select (ugt shift, 15), 0, (srl x, shift)) --> (srl x, shift)
+define i16 @test_guarded_i16_ugt(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ugt_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ugt_param_1];
+; CHECK-NEXT:    shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i16 %shift, 15
+  %shr = lshr i16 %x, %shift
+  %sel = select i1 %cmp, i16 0, i16 %shr
+  ret i16 %sel
+}
+
+; (select (ult shift, 16), (srl x, shift), 0) --> (srl x, shift)
+define i16 @test_guarded_i16_ult(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ult_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ult_param_1];
+; CHECK-NEXT:    shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i16 %shift, 16
+  %shr = lshr i16 %x, %shift
+  %sel = select i1 %cmp, i16 %shr, i16 0
+  ret i16 %sel
+}
+
+;; --- i32 shr tests ---
+
+; (select (ugt shift, 31), 0, (srl x, shift)) --> (srl x, shift)
+define i32 @test_guarded_i32_ugt(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ugt_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ugt_param_1];
+; CHECK-NEXT:    shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i32 %shift, 31
+  %shr = lshr i32 %x, %shift
+  %sel = select i1 %cmp, i32 0, i32 %shr
+  ret i32 %sel
+}
+
+; (select (ult shift, 32), (srl x, shift), 0) --> (srl x, shift)
+define i32 @test_guarded_i32_ult(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ult_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ult_param_1];
+; CHECK-NEXT:    shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i32 %shift, 32
+  %shr = lshr i32 %x, %shift
+  %sel = select i1 %cmp, i32 %shr, i32 0
+  ret i32 %sel
+}
+
+;; --- i64 shr tests ---
+
+; (select (ugt shift, 63), 0, (srl x, shift)) --> (srl x, shift)
+define i64 @test_guarded_i64_ugt(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ugt_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ugt_param_1];
+; CHECK-NEXT:    shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i64 %shift, 63
+  %shr = lshr i64 %x, %shift
+  %sel = select i1 %cmp, i64 0, i64 %shr
+  ret i64 %sel
+}
+
+; (select (ult shift, 64), (srl x, shift), 0) --> (srl x, shift)
+define i64 @test_guarded_i64_ult(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ult_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ult_param_1];
+; CHECK-NEXT:    shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i64 %shift, 64
+  %shr = lshr i64 %x, %shift
+  %sel = select i1 %cmp, i64 %shr, i64 0
+  ret i64 %sel
+}
+
+;; --- i16 shl tests ---
+
+; (select (ugt shift, 15), 0, (shl x, shift)) --> (shl x, shift)
+define i16 @test_guarded_i16_ugt_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i16 %shift, 15
+  %shl = shl i16 %x, %shift
+  %sel = select i1 %cmp, i16 0, i16 %shl
+  ret i16 %sel
+}
+
+; (select (ult shift, 16), (shl x, shift), 0) --> (shl x, shift)
+define i16 @test_guarded_i16_ult_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b16 %rs<3>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b16 %rs1, [test_guarded_i16_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b16 %r1, [test_guarded_i16_ult_shl_param_1];
+; CHECK-NEXT:    shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT:    cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i16 %shift, 16
+  %shl = shl i16 %x, %shift
+  %sel = select i1 %cmp, i16 %shl, i16 0
+  ret i16 %sel
+}
+
+;; --- i32 shl tests ---
+
+; (select (ugt shift, 31), 0, (shl x, shift)) --> (shl x, shift)
+define i32 @test_guarded_i32_ugt_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i32 %shift, 31
+  %shl = shl i32 %x, %shift
+  %sel = select i1 %cmp, i32 0, i32 %shl
+  ret i32 %sel
+}
+
+; (select (ult shift, 32), (shl x, shift), 0) --> (shl x, shift)
+define i32 @test_guarded_i32_ult_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i32_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r2, [test_guarded_i32_ult_shl_param_1];
+; CHECK-NEXT:    shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT:    st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i32 %shift, 32
+  %shl = shl i32 %x, %shift
+  %sel = select i1 %cmp, i32 %shl, i32 0
+  ret i32 %sel
+}
+
+;; --- i64 shl tests ---
+
+; (select (ugt shift, 63), 0, (shl x, shift)) --> (shl x, shift)
+define i64 @test_guarded_i64_ugt_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ugt_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ugt_shl_param_1];
+; CHECK-NEXT:    shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ugt i64 %shift, 63
+  %shl = shl i64 %x, %shift
+  %sel = select i1 %cmp, i64 0, i64 %shl
+  ret i64 %sel
+}
+
+; (select (ult shift, 64), (shl x, shift), 0) --> (shl x, shift)
+define i64 @test_guarded_i64_ult_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult_shl(
+; CHECK:       {
+; CHECK-NEXT:    .reg .b32 %r<2>;
+; CHECK-NEXT:    .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.b64 %rd1, [test_guarded_i64_ult_shl_param_0];
+; CHECK-NEXT:    ld.param.b32 %r1, [test_guarded_i64_ult_shl_param_1];
+; CHECK-NEXT:    shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT:    st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT:    ret;
+  %cmp = icmp ult i64 %shift, 64
+  %shl = shl i64 %x, %shift
+  %sel = select i1 %cmp, i64 %shl, i64 0
+  ret i64 %sel
+}

Comment on lines 6257 to 6260
// We're looking for:
// (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift))
// or:
// (select (setcc shift, BitWidth, ult), (shift x, shift), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use sd_match here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that thanks.

@github-actions
Copy link

github-actions bot commented Dec 16, 2025

🐧 Linux x64 Test Results

  • 187560 tests passed
  • 4968 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

github-actions bot commented Dec 16, 2025

🪟 Windows x64 Test Results

  • 128709 tests passed
  • 2824 tests skipped

✅ The build succeeded and all tests passed.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// shift_amt > threshold ? 0 : shift_op
bool MatchedUGT =
sd_match(N, m_Select(m_SetCC(m_Value(ShiftAmt), m_Value(Threshold),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use m_SpecificInt for threshold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, I made it a lot cleaner I think

Copy link
Member

@AlexMaclean AlexMaclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets guard this combine to only run after legalization. I worry we could end up folding some weird type we don't support and then end up with a node we cannot select.

@YonahGoldberg
Copy link
Contributor Author

Lets guard this combine to only run after legalization. I worry we could end up folding some weird type we don't support and then end up with a node we cannot select.

Makes a lot of sense. This way I can also eliminate the explicit bitwidth check because after legalization there are no i8 ops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants