-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[NVPTX][DagCombiner] Eliminate guards on shift amount because PTX shifts automatically clamp #172431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-nvptx Author: Yonah Goldberg (YonahGoldberg) ChangesTransform patterns like:
Into:
These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift Full diff: https://github.com/llvm/llvm-project/pull/172431.diff 3 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 92f3865818530..e3dc73f625c3c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -867,13 +867,14 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
// We have some custom DAG combine patterns for these nodes
setTargetDAGCombine(
- {ISD::ADD, ISD::AND, ISD::EXTRACT_VECTOR_ELT,
- ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM,
- ISD::FMAXIMUM, ISD::FMINIMUM, ISD::FMAXIMUMNUM,
- ISD::FMINIMUMNUM, ISD::MUL, ISD::SHL,
- ISD::SREM, ISD::UREM, ISD::VSELECT,
- ISD::BUILD_VECTOR, ISD::ADDRSPACECAST, ISD::LOAD,
- ISD::STORE, ISD::ZERO_EXTEND, ISD::SIGN_EXTEND});
+ {ISD::ADD, ISD::AND, ISD::EXTRACT_VECTOR_ELT,
+ ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM,
+ ISD::FMAXIMUM, ISD::FMINIMUM, ISD::FMAXIMUMNUM,
+ ISD::FMINIMUMNUM, ISD::MUL, ISD::SELECT,
+ ISD::SHL, ISD::SREM, ISD::UREM,
+ ISD::VSELECT, ISD::BUILD_VECTOR, ISD::ADDRSPACECAST,
+ ISD::LOAD, ISD::STORE, ISD::ZERO_EXTEND,
+ ISD::SIGN_EXTEND});
// setcc for f16x2 and bf16x2 needs special handling to prevent
// legalizer's attempt to scalarize it due to v2i1 not being legal.
@@ -6233,6 +6234,105 @@ static SDValue PerformEXTRACTCombine(SDNode *N,
return Result;
}
+/// Transform patterns like:
+/// (select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))
+/// (select (ult shift, BitWidth), (srl/shl x, shift), 0)
+/// Into:
+/// (NVPTXISD::SRL_CLAMP x, shift) or (NVPTXISD::SHL_CLAMP x, shift)
+///
+/// These patterns arise from C/C++ code like `shift >= 32 ? 0 : x >> shift`
+/// which guards against undefined behavior. PTX shr/shl instructions clamp
+/// shift amounts >= BitWidth to produce 0 for logical shifts, making the
+/// guard redundant.
+///
+/// Note: We only handle SRL and SHL, not SRA, because arithmetic right
+/// shifts produce sign-extended results (0 or -1) when shift >= BitWidth,
+/// which differs from the C pattern that always returns 0.
+static SDValue PerformSELECTShiftCombine(SDNode *N,
+ TargetLowering::DAGCombinerInfo &DCI) {
+ SDValue Cond = N->getOperand(0);
+ SDValue TrueVal = N->getOperand(1);
+ SDValue FalseVal = N->getOperand(2);
+
+ // We're looking for:
+ // (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift))
+ // or:
+ // (select (setcc shift, BitWidth, ult), (shift x, shift), 0)
+
+ SDValue ZeroVal, ShiftOp;
+ bool Inverted = false;
+
+ if (isConstZero(TrueVal)) {
+ ZeroVal = TrueVal;
+ ShiftOp = FalseVal;
+ } else if (isConstZero(FalseVal)) {
+ ZeroVal = FalseVal;
+ ShiftOp = TrueVal;
+ Inverted = true;
+ } else {
+ return SDValue();
+ }
+
+ // Only handle logical shifts (SRL, SHL), not arithmetic (SRA)
+ unsigned ShiftOpc = ShiftOp.getOpcode();
+ if (ShiftOpc != ISD::SRL && ShiftOpc != ISD::SHL)
+ return SDValue();
+
+ // Condition must be a SETCC
+ if (Cond.getOpcode() != ISD::SETCC)
+ return SDValue();
+
+ ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
+ SDValue CondLHS = Cond.getOperand(0);
+ SDValue CondRHS = Cond.getOperand(1);
+ SDValue ShiftAmt = ShiftOp.getOperand(1);
+
+ // Look through truncation - NVPTX truncates 64-bit shift amounts to 32-bit
+ if (ShiftAmt.getOpcode() == ISD::TRUNCATE)
+ ShiftAmt = ShiftAmt.getOperand(0);
+
+ // The value being compared must be the same as the shift amount.
+ // e.g., in "shift > 31 ? 0 : x >> shift", both must use the same 'shift'.
+ if (CondLHS != ShiftAmt)
+ return SDValue();
+
+ auto *Threshold = dyn_cast<ConstantSDNode>(CondRHS);
+ if (!Threshold)
+ return SDValue();
+
+ unsigned BitWidth = ShiftOp.getValueType().getSizeInBits();
+ uint64_t ThreshVal = Threshold->getZExtValue();
+
+ // Check for valid patterns based on select orientation:
+ //
+ // When TrueVal is 0 (not inverted):
+ // (select (ugt shift, BitWidth-1), 0, shift_result)
+ // i.e., shift > 31 ? 0 : x >> shift
+ //
+ // When FalseVal is 0 (inverted):
+ // (select (ult shift, BitWidth), shift_result, 0)
+ // i.e., shift < 32 ? x >> shift : 0
+ //
+ // Both patterns return 0 when shift >= BitWidth, which PTX handles natively.
+ bool ValidPattern = false;
+ if (!Inverted && CC == ISD::SETUGT && ThreshVal == BitWidth - 1)
+ ValidPattern = true;
+ else if (Inverted && CC == ISD::SETULT && ThreshVal == BitWidth)
+ ValidPattern = true;
+
+ if (!ValidPattern)
+ return SDValue();
+
+ // Pattern matched! Return a custom clamp node that has defined semantics
+ // for out-of-range shift amounts, matching PTX's clamping behavior.
+ SelectionDAG &DAG = DCI.DAG;
+ SDLoc DL(N);
+ unsigned ClampOpc =
+ (ShiftOpc == ISD::SRL) ? NVPTXISD::SRL_CLAMP : NVPTXISD::SHL_CLAMP;
+ return DAG.getNode(ClampOpc, DL, ShiftOp.getValueType(),
+ ShiftOp.getOperand(0), ShiftOp.getOperand(1));
+}
+
static SDValue PerformVSELECTCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {
SDValue VA = N->getOperand(1);
@@ -6544,6 +6644,8 @@ SDValue NVPTXTargetLowering::PerformDAGCombine(SDNode *N,
case NVPTXISD::StoreV2:
case NVPTXISD::StoreV4:
return combineSTORE(N, DCI, STI);
+ case ISD::SELECT:
+ return PerformSELECTShiftCombine(N, DCI);
case ISD::VSELECT:
return PerformVSELECTCombine(N, DCI);
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 9c1ec38fb1f3a..74a552502ccf2 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -1323,6 +1323,15 @@ defm SHL : SHIFT<"shl.b", shl>;
defm SRA : SHIFT<"shr.s", sra>;
defm SRL : SHIFT<"shr.u", srl>;
+// Shift with clamping semantics - these have defined behavior for shift amounts
+// >= BitWidth (returning 0 for logical shifts). Used to optimize guarded shift
+// patterns like `shift >= 32 ? 0 : x >> shift`.
+def shl_clamp : SDNode<"NVPTXISD::SHL_CLAMP", SDTIntShiftOp, []>;
+def srl_clamp : SDNode<"NVPTXISD::SRL_CLAMP", SDTIntShiftOp, []>;
+
+defm SHL_CLAMP : SHIFT<"shl.b", shl_clamp>;
+defm SRL_CLAMP : SHIFT<"shr.u", srl_clamp>;
+
// Bit-reverse
foreach t = [I64RT, I32RT] in
def BREV_ # t.PtxType :
diff --git a/llvm/test/CodeGen/NVPTX/shift-opt.ll b/llvm/test/CodeGen/NVPTX/shift-opt.ll
index e0d22c62993ba..06ab7b1cdc09b 100644
--- a/llvm/test/CodeGen/NVPTX/shift-opt.ll
+++ b/llvm/test/CodeGen/NVPTX/shift-opt.ll
@@ -171,3 +171,263 @@ define i64 @test_negative_use_shl(i64 %x, i32 %y) {
call void @use(i64 %shl)
ret i64 %srl
}
+
+;; ============================================================================
+;; Guarded shift patterns
+;;
+;; C/C++ code like `shift >= 32 ? 0 : x >> shift` generates a guarded shift
+;; pattern to avoid undefined behavior. PTX shr/shl instructions clamp shift
+;; amounts >= BitWidth to produce 0, making the guard redundant.
+;;
+;; Transformation 1 (ugt form):
+;; (select (icmp ugt shift, BitWidth-1), 0, (srl x, shift))
+;; i.e., shift > 31 ? 0 : x >> shift
+;; --> (srl x, shift)
+;;
+;; Transformation 2 (ult form, equivalent after InstCombine):
+;; (select (icmp ult shift, BitWidth), (srl x, shift), 0)
+;; i.e., shift < 32 ? x >> shift : 0
+;; --> (srl x, shift)
+;;
+;; Same applies to shl (left shift).
+;; ============================================================================
+
+;; --- i16 shr tests ---
+
+; (select (ugt shift, 15), 0, (srl x, shift)) --> (srl x, shift)
+define i16 @test_guarded_i16_ugt(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt(
+; CHECK: {
+; CHECK-NEXT: .reg .b16 %rs<3>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b16 %rs1, [test_guarded_i16_ugt_param_0];
+; CHECK-NEXT: ld.param.b16 %r1, [test_guarded_i16_ugt_param_1];
+; CHECK-NEXT: shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT: cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i16 %shift, 15
+ %shr = lshr i16 %x, %shift
+ %sel = select i1 %cmp, i16 0, i16 %shr
+ ret i16 %sel
+}
+
+; (select (ult shift, 16), (srl x, shift), 0) --> (srl x, shift)
+define i16 @test_guarded_i16_ult(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult(
+; CHECK: {
+; CHECK-NEXT: .reg .b16 %rs<3>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b16 %rs1, [test_guarded_i16_ult_param_0];
+; CHECK-NEXT: ld.param.b16 %r1, [test_guarded_i16_ult_param_1];
+; CHECK-NEXT: shr.u16 %rs2, %rs1, %r1;
+; CHECK-NEXT: cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i16 %shift, 16
+ %shr = lshr i16 %x, %shift
+ %sel = select i1 %cmp, i16 %shr, i16 0
+ ret i16 %sel
+}
+
+;; --- i32 shr tests ---
+
+; (select (ugt shift, 31), 0, (srl x, shift)) --> (srl x, shift)
+define i32 @test_guarded_i32_ugt(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i32_ugt_param_0];
+; CHECK-NEXT: ld.param.b32 %r2, [test_guarded_i32_ugt_param_1];
+; CHECK-NEXT: shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i32 %shift, 31
+ %shr = lshr i32 %x, %shift
+ %sel = select i1 %cmp, i32 0, i32 %shr
+ ret i32 %sel
+}
+
+; (select (ult shift, 32), (srl x, shift), 0) --> (srl x, shift)
+define i32 @test_guarded_i32_ult(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i32_ult_param_0];
+; CHECK-NEXT: ld.param.b32 %r2, [test_guarded_i32_ult_param_1];
+; CHECK-NEXT: shr.u32 %r3, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i32 %shift, 32
+ %shr = lshr i32 %x, %shift
+ %sel = select i1 %cmp, i32 %shr, i32 0
+ ret i32 %sel
+}
+
+;; --- i64 shr tests ---
+
+; (select (ugt shift, 63), 0, (srl x, shift)) --> (srl x, shift)
+define i64 @test_guarded_i64_ugt(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b64 %rd1, [test_guarded_i64_ugt_param_0];
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i64_ugt_param_1];
+; CHECK-NEXT: shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT: st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i64 %shift, 63
+ %shr = lshr i64 %x, %shift
+ %sel = select i1 %cmp, i64 0, i64 %shr
+ ret i64 %sel
+}
+
+; (select (ult shift, 64), (srl x, shift), 0) --> (srl x, shift)
+define i64 @test_guarded_i64_ult(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b64 %rd1, [test_guarded_i64_ult_param_0];
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i64_ult_param_1];
+; CHECK-NEXT: shr.u64 %rd2, %rd1, %r1;
+; CHECK-NEXT: st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i64 %shift, 64
+ %shr = lshr i64 %x, %shift
+ %sel = select i1 %cmp, i64 %shr, i64 0
+ ret i64 %sel
+}
+
+;; --- i16 shl tests ---
+
+; (select (ugt shift, 15), 0, (shl x, shift)) --> (shl x, shift)
+define i16 @test_guarded_i16_ugt_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ugt_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b16 %rs<3>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b16 %rs1, [test_guarded_i16_ugt_shl_param_0];
+; CHECK-NEXT: ld.param.b16 %r1, [test_guarded_i16_ugt_shl_param_1];
+; CHECK-NEXT: shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT: cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i16 %shift, 15
+ %shl = shl i16 %x, %shift
+ %sel = select i1 %cmp, i16 0, i16 %shl
+ ret i16 %sel
+}
+
+; (select (ult shift, 16), (shl x, shift), 0) --> (shl x, shift)
+define i16 @test_guarded_i16_ult_shl(i16 %x, i16 %shift) {
+; CHECK-LABEL: test_guarded_i16_ult_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b16 %rs<3>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b16 %rs1, [test_guarded_i16_ult_shl_param_0];
+; CHECK-NEXT: ld.param.b16 %r1, [test_guarded_i16_ult_shl_param_1];
+; CHECK-NEXT: shl.b16 %rs2, %rs1, %r1;
+; CHECK-NEXT: cvt.u32.u16 %r2, %rs2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i16 %shift, 16
+ %shl = shl i16 %x, %shift
+ %sel = select i1 %cmp, i16 %shl, i16 0
+ ret i16 %sel
+}
+
+;; --- i32 shl tests ---
+
+; (select (ugt shift, 31), 0, (shl x, shift)) --> (shl x, shift)
+define i32 @test_guarded_i32_ugt_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ugt_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i32_ugt_shl_param_0];
+; CHECK-NEXT: ld.param.b32 %r2, [test_guarded_i32_ugt_shl_param_1];
+; CHECK-NEXT: shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i32 %shift, 31
+ %shl = shl i32 %x, %shift
+ %sel = select i1 %cmp, i32 0, i32 %shl
+ ret i32 %sel
+}
+
+; (select (ult shift, 32), (shl x, shift), 0) --> (shl x, shift)
+define i32 @test_guarded_i32_ult_shl(i32 %x, i32 %shift) {
+; CHECK-LABEL: test_guarded_i32_ult_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i32_ult_shl_param_0];
+; CHECK-NEXT: ld.param.b32 %r2, [test_guarded_i32_ult_shl_param_1];
+; CHECK-NEXT: shl.b32 %r3, %r1, %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r3;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i32 %shift, 32
+ %shl = shl i32 %x, %shift
+ %sel = select i1 %cmp, i32 %shl, i32 0
+ ret i32 %sel
+}
+
+;; --- i64 shl tests ---
+
+; (select (ugt shift, 63), 0, (shl x, shift)) --> (shl x, shift)
+define i64 @test_guarded_i64_ugt_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ugt_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b64 %rd1, [test_guarded_i64_ugt_shl_param_0];
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i64_ugt_shl_param_1];
+; CHECK-NEXT: shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT: st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ugt i64 %shift, 63
+ %shl = shl i64 %x, %shift
+ %sel = select i1 %cmp, i64 0, i64 %shl
+ ret i64 %sel
+}
+
+; (select (ult shift, 64), (shl x, shift), 0) --> (shl x, shift)
+define i64 @test_guarded_i64_ult_shl(i64 %x, i64 %shift) {
+; CHECK-LABEL: test_guarded_i64_ult_shl(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.b64 %rd1, [test_guarded_i64_ult_shl_param_0];
+; CHECK-NEXT: ld.param.b32 %r1, [test_guarded_i64_ult_shl_param_1];
+; CHECK-NEXT: shl.b64 %rd2, %rd1, %r1;
+; CHECK-NEXT: st.param.b64 [func_retval0], %rd2;
+; CHECK-NEXT: ret;
+ %cmp = icmp ult i64 %shift, 64
+ %shl = shl i64 %x, %shift
+ %sel = select i1 %cmp, i64 %shl, i64 0
+ ret i64 %sel
+}
|
| // We're looking for: | ||
| // (select (setcc shift, BitWidth-1, ugt), 0, (shift x, shift)) | ||
| // or: | ||
| // (select (setcc shift, BitWidth, ult), (shift x, shift), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use sd_match here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that thanks.
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| // shift_amt > threshold ? 0 : shift_op | ||
| bool MatchedUGT = | ||
| sd_match(N, m_Select(m_SetCC(m_Value(ShiftAmt), m_Value(Threshold), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use m_SpecificInt for threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, I made it a lot cleaner I think
AlexMaclean
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets guard this combine to only run after legalization. I worry we could end up folding some weird type we don't support and then end up with a node we cannot select.
Makes a lot of sense. This way I can also eliminate the explicit bitwidth check because after legalization there are no i8 ops. |
Transform patterns like:
(select (ugt shift, BitWidth-1), 0, (srl/shl x, shift))(select (ult shift, BitWidth), (srl/shl x, shift), 0)Into:
(srl/shl x, shift)These patterns arise from C/C++ code like shift >= 32 ? 0 : x >> shift
which guards against undefined behavior. PTX shr/shl instructions clamp
shift amounts >= BitWidth to produce 0 for logical shifts, making the
guard redundant.