Skip to content

[IndVars] Mark truncs as nuw/nsw #88686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2024
Merged

[IndVars] Mark truncs as nuw/nsw #88686

merged 1 commit into from
Apr 16, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Apr 15, 2024

When inserting truncs during IV widening, mark the trunc as either nuw or nsw depending on whether zext or sext widening was used. For non-negative IVs both nuw and nsw apply.

When inserting truncs during IV widening, mark the trunc as either
nuw or nsw depending on whether zext or sext widening was used.
For non-negative IVs both nuw and nsw apply.
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

When inserting truncs during IV widening, mark the trunc as either nuw or nsw depending on whether zext or sext widening was used. For non-negative IVs both nuw and nsw apply.


Patch is 38.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88686.diff

16 Files Affected:

  • (modified) llvm/include/llvm/IR/IRBuilder.h (+12-2)
  • (modified) llvm/lib/Transforms/Utils/SimplifyIndVar.cpp (+22-18)
  • (modified) llvm/test/Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll (+4-4)
  • (modified) llvm/test/Transforms/IndVarSimplify/X86/iv-widen.ll (+8-8)
  • (modified) llvm/test/Transforms/IndVarSimplify/elim-extend.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/hoist-wide-inc-for-narrow-use-recompute-flags.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/iv-sext.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/iv-widen-elim-ext.ll (+4-4)
  • (modified) llvm/test/Transforms/IndVarSimplify/lftr.ll (+2-2)
  • (modified) llvm/test/Transforms/IndVarSimplify/no-iv-rewrite.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/post-inc-range.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr25578.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/pr55925.ll (+4-4)
  • (modified) llvm/test/Transforms/IndVarSimplify/widen-nonnegative-countdown.ll (+11-11)
  • (modified) llvm/test/Transforms/IndVarSimplify/widen-nonnegative.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopFlatten/widen-iv3.ll (+1-1)
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index f381273c46cfb8..b6534a1962a2f5 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -2004,8 +2004,18 @@ class IRBuilderBase {
   // Instruction creation methods: Cast/Conversion Operators
   //===--------------------------------------------------------------------===//
 
-  Value *CreateTrunc(Value *V, Type *DestTy, const Twine &Name = "") {
-    return CreateCast(Instruction::Trunc, V, DestTy, Name);
+  Value *CreateTrunc(Value *V, Type *DestTy, const Twine &Name = "",
+                     bool IsNUW = false, bool IsNSW = false) {
+    if (V->getType() == DestTy)
+      return V;
+    if (Value *Folded = Folder.FoldCast(Instruction::Trunc, V, DestTy))
+      return Folded;
+    Instruction *I = CastInst::Create(Instruction::Trunc, V, DestTy);
+    if (IsNUW)
+      I->setHasNoUnsignedWrap();
+    if (IsNSW)
+      I->setHasNoSignedWrap();
+    return Insert(I, Name);
   }
 
   Value *CreateZExt(Value *V, Type *DestTy, const Twine &Name = "",
diff --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
index 440fe0790d7950..31be7d62c8d1d8 100644
--- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
@@ -1153,6 +1153,7 @@ class WidenIV {
 
   Instruction *widenIVUse(NarrowIVDefUse DU, SCEVExpander &Rewriter,
                           PHINode *OrigPhi, PHINode *WidePhi);
+  void truncateIVUse(NarrowIVDefUse DU);
 
   bool widenLoopCompare(NarrowIVDefUse DU);
   bool widenWithVariantUse(NarrowIVDefUse DU);
@@ -1569,15 +1570,18 @@ WidenIV::WidenedRecTy WidenIV::getWideRecurrence(WidenIV::NarrowIVDefUse DU) {
 
 /// This IV user cannot be widened. Replace this use of the original narrow IV
 /// with a truncation of the new wide IV to isolate and eliminate the narrow IV.
-static void truncateIVUse(WidenIV::NarrowIVDefUse DU, DominatorTree *DT,
-                          LoopInfo *LI) {
+void WidenIV::truncateIVUse(NarrowIVDefUse DU) {
   auto *InsertPt = getInsertPointForUses(DU.NarrowUse, DU.NarrowDef, DT, LI);
   if (!InsertPt)
     return;
   LLVM_DEBUG(dbgs() << "INDVARS: Truncate IV " << *DU.WideDef << " for user "
                     << *DU.NarrowUse << "\n");
+  ExtendKind ExtKind = getExtendKind(DU.NarrowDef);
   IRBuilder<> Builder(InsertPt);
-  Value *Trunc = Builder.CreateTrunc(DU.WideDef, DU.NarrowDef->getType());
+  Value *Trunc =
+      Builder.CreateTrunc(DU.WideDef, DU.NarrowDef->getType(), "",
+                          DU.NeverNegative || ExtKind == ExtendKind::Zero,
+                          DU.NeverNegative || ExtKind == ExtendKind::Sign);
   DU.NarrowUse->replaceUsesOfWith(DU.NarrowDef, Trunc);
 }
 
@@ -1826,6 +1830,13 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
   assert(ExtendKindMap.count(DU.NarrowDef) &&
          "Should already know the kind of extension used to widen NarrowDef");
 
+  // This narrow use can be widened by a sext if it's non-negative or its narrow
+  // def was widened by a sext. Same for zext.
+  bool CanWidenBySExt =
+      DU.NeverNegative || getExtendKind(DU.NarrowDef) == ExtendKind::Sign;
+  bool CanWidenByZExt =
+      DU.NeverNegative || getExtendKind(DU.NarrowDef) == ExtendKind::Zero;
+
   // Stop traversing the def-use chain at inner-loop phis or post-loop phis.
   if (PHINode *UsePhi = dyn_cast<PHINode>(DU.NarrowUse)) {
     if (LI->getLoopFor(UsePhi->getParent()) != L) {
@@ -1833,7 +1844,7 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
       // After SimplifyCFG most loop exit targets have a single predecessor.
       // Otherwise fall back to a truncate within the loop.
       if (UsePhi->getNumOperands() != 1)
-        truncateIVUse(DU, DT, LI);
+        truncateIVUse(DU);
       else {
         // Widening the PHI requires us to insert a trunc.  The logical place
         // for this trunc is in the same BB as the PHI.  This is not possible if
@@ -1847,7 +1858,8 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
         WidePhi->addIncoming(DU.WideDef, UsePhi->getIncomingBlock(0));
         BasicBlock *WidePhiBB = WidePhi->getParent();
         IRBuilder<> Builder(WidePhiBB, WidePhiBB->getFirstInsertionPt());
-        Value *Trunc = Builder.CreateTrunc(WidePhi, DU.NarrowDef->getType());
+        Value *Trunc = Builder.CreateTrunc(WidePhi, DU.NarrowDef->getType(), "",
+                                           CanWidenByZExt, CanWidenBySExt);
         UsePhi->replaceAllUsesWith(Trunc);
         DeadInsts.emplace_back(UsePhi);
         LLVM_DEBUG(dbgs() << "INDVARS: Widen lcssa phi " << *UsePhi << " to "
@@ -1857,18 +1869,9 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
     }
   }
 
-  // This narrow use can be widened by a sext if it's non-negative or its narrow
-  // def was widened by a sext. Same for zext.
-  auto canWidenBySExt = [&]() {
-    return DU.NeverNegative || getExtendKind(DU.NarrowDef) == ExtendKind::Sign;
-  };
-  auto canWidenByZExt = [&]() {
-    return DU.NeverNegative || getExtendKind(DU.NarrowDef) == ExtendKind::Zero;
-  };
-
   // Our raison d'etre! Eliminate sign and zero extension.
-  if ((match(DU.NarrowUse, m_SExtLike(m_Value())) && canWidenBySExt()) ||
-      (isa<ZExtInst>(DU.NarrowUse) && canWidenByZExt())) {
+  if ((match(DU.NarrowUse, m_SExtLike(m_Value())) && CanWidenBySExt) ||
+      (isa<ZExtInst>(DU.NarrowUse) && CanWidenByZExt)) {
     Value *NewDef = DU.WideDef;
     if (DU.NarrowUse->getType() != WideType) {
       unsigned CastWidth = SE->getTypeSizeInBits(DU.NarrowUse->getType());
@@ -1876,7 +1879,8 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
       if (CastWidth < IVWidth) {
         // The cast isn't as wide as the IV, so insert a Trunc.
         IRBuilder<> Builder(DU.NarrowUse);
-        NewDef = Builder.CreateTrunc(DU.WideDef, DU.NarrowUse->getType());
+        NewDef = Builder.CreateTrunc(DU.WideDef, DU.NarrowUse->getType(), "",
+                                     CanWidenByZExt, CanWidenBySExt);
       }
       else {
         // A wider extend was hidden behind a narrower one. This may induce
@@ -1975,7 +1979,7 @@ Instruction *WidenIV::widenIVUse(WidenIV::NarrowIVDefUse DU,
   // This user does not evaluate to a recurrence after widening, so don't
   // follow it. Instead insert a Trunc to kill off the original use,
   // eventually isolating the original narrow IV so it can be removed.
-  truncateIVUse(DU, DT, LI);
+  truncateIVUse(DU);
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll b/llvm/test/Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll
index 6f659a88da2e2b..c5f656c870a23a 100644
--- a/llvm/test/Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll
+++ b/llvm/test/Transforms/IndVarSimplify/AArch64/widen-loop-comp.ll
@@ -41,7 +41,7 @@ define i32 @test1() {
 ; CHECK-NEXT:    br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_COND]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    [[I_05_LCSSA_WIDE:%.*]] = phi i64 [ [[INDVARS_IV]], [[FOR_BODY]] ]
-; CHECK-NEXT:    [[TMP5:%.*]] = trunc i64 [[I_05_LCSSA_WIDE]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = trunc nuw nsw i64 [[I_05_LCSSA_WIDE]] to i32
 ; CHECK-NEXT:    store i32 [[TMP5]], ptr @idx, align 4
 ; CHECK-NEXT:    br label [[FOR_END:%.*]]
 ; CHECK:       for.cond.for.end.loopexit_crit_edge:
@@ -237,7 +237,7 @@ define i32 @test4(i32 %a) {
 ; CHECK-NEXT:    [[CONV3:%.*]] = trunc i32 [[OR]] to i8
 ; CHECK-NEXT:    [[CALL:%.*]] = call i32 @fn1(i8 signext [[CONV3]])
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i32 [[INDVARS_IV]], -1
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc i32 [[INDVARS_IV_NEXT]] to i8
+; CHECK-NEXT:    [[TMP0:%.*]] = trunc nuw i32 [[INDVARS_IV_NEXT]] to i8
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i8 [[TMP0]], -14
 ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
 ; CHECK:       for.end:
@@ -466,7 +466,7 @@ define i32 @test9(ptr %a, i32 %b, i32 %init) {
 ; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ADD]] = add nsw i32 [[SUM_0]], [[TMP1]]
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; CHECK-NEXT:    [[TMP2:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT:    [[TMP2:%.*]] = trunc nuw i64 [[INDVARS_IV_NEXT]] to i32
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp slt i32 0, [[TMP2]]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[FOR_COND]], label [[FOR_END]]
 ; CHECK:       for.end:
@@ -997,7 +997,7 @@ define i32 @test16_unsigned_neg(i32 %start, ptr %p, ptr %q, i32 %x) {
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ [[TMP0]], [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    [[COND:%.*]] = icmp eq i64 [[INDVARS_IV]], 0
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[INDVARS_IV]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc nuw i64 [[INDVARS_IV]] to i32
 ; CHECK-NEXT:    [[FOO:%.*]] = add i32 [[TMP1]], -1
 ; CHECK-NEXT:    br i1 [[COND]], label [[EXIT:%.*]], label [[GUARDED:%.*]]
 ; CHECK:       guarded:
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/iv-widen.ll b/llvm/test/Transforms/IndVarSimplify/X86/iv-widen.ll
index d05755bea0dddc..4e0c503794bfe4 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/iv-widen.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/iv-widen.ll
@@ -23,7 +23,7 @@ define void @loop_0(ptr %a) {
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[B18_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.*]], [[B24:%.*]] ]
 ; CHECK-NEXT:    call void @use(i64 [[INDVARS_IV]])
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc i64 [[INDVARS_IV]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = trunc nuw nsw i64 [[INDVARS_IV]] to i32
 ; CHECK-NEXT:    [[O:%.*]] = getelementptr i32, ptr [[A:%.*]], i32 [[TMP0]]
 ; CHECK-NEXT:    [[V:%.*]] = load i32, ptr [[O]], align 4
 ; CHECK-NEXT:    [[T:%.*]] = icmp eq i32 [[V]], 0
@@ -37,7 +37,7 @@ define void @loop_0(ptr %a) {
 ; CHECK-NEXT:    ret void
 ; CHECK:       exit24:
 ; CHECK-NEXT:    [[DOT02_LCSSA_WIDE:%.*]] = phi i64 [ [[INDVARS_IV]], [[B18]] ]
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[DOT02_LCSSA_WIDE]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc nuw nsw i64 [[DOT02_LCSSA_WIDE]] to i32
 ; CHECK-NEXT:    call void @dummy(i32 [[TMP1]])
 ; CHECK-NEXT:    unreachable
 ;
@@ -159,7 +159,7 @@ declare void @dummy(i32)
 declare void @dummy.i64(i64)
 
 
-define void @loop_2(i32 %size, i32 %nsteps, i32 %hsize, ptr %lined, i8 %tmp1) {
+define void @loop_2(i32 %size, i32 %nsteps, i32 %hsize, ptr %lined, i8 %arg) {
 ; CHECK-LABEL: @loop_2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[CMP215:%.*]] = icmp sgt i32 [[SIZE:%.*]], 1
@@ -180,12 +180,12 @@ define void @loop_2(i32 %size, i32 %nsteps, i32 %hsize, ptr %lined, i8 %tmp1) {
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 1, [[FOR_BODY2_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY2]] ]
 ; CHECK-NEXT:    [[TMP4:%.*]] = add nsw i64 [[TMP3]], [[INDVARS_IV]]
 ; CHECK-NEXT:    [[ADD_PTR:%.*]] = getelementptr inbounds i8, ptr [[LINED:%.*]], i64 [[TMP4]]
-; CHECK-NEXT:    store i8 [[TMP1:%.*]], ptr [[ADD_PTR]], align 1
+; CHECK-NEXT:    store i8 [[ARG:%.*]], ptr [[ADD_PTR]], align 1
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_BODY2]], label [[FOR_BODY3_PREHEADER:%.*]]
 ; CHECK:       for.body3.preheader:
-; CHECK-NEXT:    [[TMP5:%.*]] = trunc i64 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = trunc nsw i64 [[TMP3]] to i32
 ; CHECK-NEXT:    [[TMP6:%.*]] = zext i32 [[TMP5]] to i64
 ; CHECK-NEXT:    [[WIDE_TRIP_COUNT7:%.*]] = zext i32 [[SIZE]] to i64
 ; CHECK-NEXT:    br label [[FOR_BODY3:%.*]]
@@ -193,7 +193,7 @@ define void @loop_2(i32 %size, i32 %nsteps, i32 %hsize, ptr %lined, i8 %tmp1) {
 ; CHECK-NEXT:    [[INDVARS_IV3:%.*]] = phi i64 [ 1, [[FOR_BODY3_PREHEADER]] ], [ [[INDVARS_IV_NEXT4:%.*]], [[FOR_BODY3]] ]
 ; CHECK-NEXT:    [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], [[INDVARS_IV3]]
 ; CHECK-NEXT:    [[ADD_PTR2:%.*]] = getelementptr inbounds i8, ptr [[LINED]], i64 [[TMP7]]
-; CHECK-NEXT:    store i8 [[TMP1]], ptr [[ADD_PTR2]], align 1
+; CHECK-NEXT:    store i8 [[ARG]], ptr [[ADD_PTR2]], align 1
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT4]] = add nuw nsw i64 [[INDVARS_IV3]], 1
 ; CHECK-NEXT:    [[EXITCOND8:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT4]], [[WIDE_TRIP_COUNT7]]
 ; CHECK-NEXT:    br i1 [[EXITCOND8]], label [[FOR_BODY3]], label [[FOR_INC_LOOPEXIT:%.*]]
@@ -222,7 +222,7 @@ for.body2:
   %add4 = add nsw i32 %add, %k
   %idx.ext = sext i32 %add4 to i64
   %add.ptr = getelementptr inbounds i8, ptr %lined, i64 %idx.ext
-  store i8 %tmp1, ptr %add.ptr, align 1
+  store i8 %arg, ptr %add.ptr, align 1
   %inc = add nsw i32 %k, 1
   %cmp2 = icmp slt i32 %inc, %size
   br i1 %cmp2, label %for.body2, label %for.body3
@@ -233,7 +233,7 @@ for.body3:
   %add5 = add nuw i32 %add, %l
   %idx.ext2 = zext i32 %add5 to i64
   %add.ptr2 = getelementptr inbounds i8, ptr %lined, i64 %idx.ext2
-  store i8 %tmp1, ptr %add.ptr2, align 1
+  store i8 %arg, ptr %add.ptr2, align 1
   %inc2 = add nsw i32 %l, 1
   %cmp3 = icmp slt i32 %inc2, %size
   br i1 %cmp3, label %for.body3, label %for.inc
diff --git a/llvm/test/Transforms/IndVarSimplify/elim-extend.ll b/llvm/test/Transforms/IndVarSimplify/elim-extend.ll
index 54bb9951ff66ab..01c95dadd16261 100644
--- a/llvm/test/Transforms/IndVarSimplify/elim-extend.ll
+++ b/llvm/test/Transforms/IndVarSimplify/elim-extend.ll
@@ -142,7 +142,7 @@ define void @nestedIV(ptr %address, i32 %limit) nounwind {
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[INNERLOOP]], label [[INNEREXIT:%.*]]
 ; CHECK:       innerexit:
 ; CHECK-NEXT:    [[INNERCOUNT_LCSSA_WIDE:%.*]] = phi i64 [ [[INDVARS_IV_NEXT]], [[INNERLOOP]] ]
-; CHECK-NEXT:    [[TMP3:%.*]] = trunc i64 [[INNERCOUNT_LCSSA_WIDE]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = trunc nsw i64 [[INNERCOUNT_LCSSA_WIDE]] to i32
 ; CHECK-NEXT:    br label [[OUTERMERGE]]
 ; CHECK:       outermerge:
 ; CHECK-NEXT:    [[INNERCOUNT_MERGE]] = phi i32 [ [[TMP3]], [[INNEREXIT]] ], [ [[INNERCOUNT]], [[INNERPREHEADER]] ]
diff --git a/llvm/test/Transforms/IndVarSimplify/hoist-wide-inc-for-narrow-use-recompute-flags.ll b/llvm/test/Transforms/IndVarSimplify/hoist-wide-inc-for-narrow-use-recompute-flags.ll
index cc99ee312ccb7f..1135ca9dbf00dc 100644
--- a/llvm/test/Transforms/IndVarSimplify/hoist-wide-inc-for-narrow-use-recompute-flags.ll
+++ b/llvm/test/Transforms/IndVarSimplify/hoist-wide-inc-for-narrow-use-recompute-flags.ll
@@ -15,7 +15,7 @@ define void @test_pr82243(ptr %f) {
 ; CHECK-NEXT:    [[GEP_IV_EXT:%.*]] = getelementptr i32, ptr [[F]], i64 [[INDVARS_IV]]
 ; CHECK-NEXT:    store i32 1, ptr [[GEP_IV_EXT]], align 4
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = trunc nuw nsw i64 [[INDVARS_IV_NEXT]] to i32
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i32 123, [[TMP0]]
 ; CHECK-NEXT:    [[GEP_SHL:%.*]] = getelementptr i32, ptr [[F]], i32 [[SHL]]
 ; CHECK-NEXT:    br label [[INNER_HEADER:%.*]]
diff --git a/llvm/test/Transforms/IndVarSimplify/iv-sext.ll b/llvm/test/Transforms/IndVarSimplify/iv-sext.ll
index 450913f16baa29..95a036f0e54c7c 100644
--- a/llvm/test/Transforms/IndVarSimplify/iv-sext.ll
+++ b/llvm/test/Transforms/IndVarSimplify/iv-sext.ll
@@ -99,7 +99,7 @@ define void @t(ptr %pval1, ptr %peakWeight, ptr %nrgReducePeakrate, i32 %bandEdg
 ; CHECK-NEXT:    [[VAL35_LCSSA:%.*]] = phi float [ [[VAL35]], [[BB5]] ]
 ; CHECK-NEXT:    [[VAL31_LCSSA_WIDE:%.*]] = phi i64 [ [[INDVARS_IV_NEXT]], [[BB5]] ]
 ; CHECK-NEXT:    [[VAL30_LCSSA:%.*]] = phi float [ [[VAL30]], [[BB5]] ]
-; CHECK-NEXT:    [[TMP4:%.*]] = trunc i64 [[VAL31_LCSSA_WIDE]] to i32
+; CHECK-NEXT:    [[TMP4:%.*]] = trunc nsw i64 [[VAL31_LCSSA_WIDE]] to i32
 ; CHECK-NEXT:    br label [[BB7]]
 ; CHECK:       bb7:
 ; CHECK-NEXT:    [[DISTERBHI_2_LCSSA]] = phi float [ [[VAL30_LCSSA]], [[BB5_BB7_CRIT_EDGE]] ], [ [[DISTERBHI_0_PH]], [[BB5_PREHEADER]] ]
diff --git a/llvm/test/Transforms/IndVarSimplify/iv-widen-elim-ext.ll b/llvm/test/Transforms/IndVarSimplify/iv-widen-elim-ext.ll
index 59a0241bfe9fde..a83e9ce74b12ab 100644
--- a/llvm/test/Transforms/IndVarSimplify/iv-widen-elim-ext.ll
+++ b/llvm/test/Transforms/IndVarSimplify/iv-widen-elim-ext.ll
@@ -22,7 +22,7 @@ define void @foo(ptr %A, ptr %B, ptr %C, i32 %N) {
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[TMP1]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP0]], [[TMP2]]
-; CHECK-NEXT:    [[TMP3:%.*]] = trunc i64 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = trunc nuw nsw i64 [[TMP1]] to i32
 ; CHECK-NEXT:    [[DIV0:%.*]] = udiv i32 5, [[TMP3]]
 ; CHECK-NEXT:    [[ADD4:%.*]] = add nsw i32 [[ADD3]], [[DIV0]]
 ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[INDVARS_IV]]
@@ -224,7 +224,7 @@ define i32 @foo3(i32 %M) {
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP2]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = add nsw i64 [[INDVARS_IV]], [[TMP0]]
-; CHECK-NEXT:    [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
+; CHECK-NEXT:    [[TMP4:%.*]] = trunc nsw i64 [[TMP3]] to i32
 ; CHECK-NEXT:    [[IDXPROM4:%.*]] = zext i32 [[TMP4]] to i64
 ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [100 x i32], ptr @a, i64 0, i64 [[IDXPROM4]]
 ; CHECK-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX5]], align 4
@@ -365,7 +365,7 @@ define i32 @foo5(ptr %input, i32 %length, ptr %in) {
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_LR_PH]] ]
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[INPUT]], align 8
-; CHECK-NEXT:    [[TMP5:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT:    [[TMP5:%.*]] = trunc nuw nsw i64 [[INDVARS_IV_NEXT]] to i32
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i32 [[TMP4]], [[TMP5]]
 ; CHECK-NEXT:    [[IDX_EXT:%.*]] = sext i32 [[MUL]] to i64
 ; CHECK-NEXT:    [[ADD_PTR:%.*]] = getelementptr inbounds i32, ptr [[IN:%.*]], i64 [[IDX_EXT]]
@@ -514,7 +514,7 @@ define void @foo7(i32 %n, ptr %a, i32 %x) {
 ; CHECK-NEXT:    [[TMP2:%.*]] = shl nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    [[TMP3:%.*]] = or disjoint i64 [[TMP2]], 1
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[TMP3]]
-; CHECK-NEXT:    [[TMP4:%.*]] = trunc i64 [[INDVARS_IV]] to i32
+; CHECK-NEXT:    [[TMP4:%.*]] = trunc nsw i64 [[INDVARS_IV]] to i32
 ; CHECK-NEXT:    store i32 [[TMP4]], ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], [[TMP0]]
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], [[TMP1]]
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll
index 41db925de577ea..7f4820f093e55e 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll
@@ -525,7 +525,7 @@ define float @wide_trip_count_test3(ptr %b,
 ; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i64 [[INDVARS_IV]], 20
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B:%.*]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TEMP:%.*]] = load float, ptr [[ARRAYIDX]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[INDVARS_IV]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc nsw i64 [[INDVARS_IV]] to i32
 ; CHECK-NEXT:    [[CONV:%.*]] = sitofp i32 [[TMP1]] to float
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[CONV]], [[TEMP]]
 ; CHECK-NEXT:    [[ADD1]] = fadd float [[SUM_07]], [[MUL]]
@@ -584,7 +584,7 @@ define float @wide_trip_count_test4(ptr %b,
 ; CHECK-NEXT:    [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 20
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B:%.*]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TEMP:%.*]] = load float, ptr [[ARRAYIDX]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[INDVARS_IV]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc nuw nsw i64 [[INDVARS_IV]] to i32
 ; CHECK-NEXT:    [[CONV:%.*]] = sitofp i32 [[TMP1]] to float
 ;...
[truncated]

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

One of the thoughts I'd had when seeing this langref proposal was that it might allow us to simplify this code significantly. I believe one of the things which makes this code complex is that we have to "commit" to our signed or unsigned interpretation by performing all rewrites at once since we couldn't materialize a wide IV + truncate and preserve the information about the choice. With this IR change, we now have that possibility. This might not fully work out - there's other reasons for complexity in this code - but mentioning it in case it inspires you. :)

@nikic nikic merged commit 4b22a92 into llvm:main Apr 16, 2024
@nikic nikic deleted the indvars-trunc-nw branch April 16, 2024 01:42
@nikic
Copy link
Contributor Author

nikic commented Apr 16, 2024

One of the thoughts I'd had when seeing this langref proposal was that it might allow us to simplify this code significantly. I believe one of the things which makes this code complex is that we have to "commit" to our signed or unsigned interpretation by performing all rewrites at once since we couldn't materialize a wide IV + truncate and preserve the information about the choice. With this IR change, we now have that possibility. This might not fully work out - there's other reasons for complexity in this code - but mentioning it in case it inspires you. :)

Right. What IV widening conceptually does is to zext/sext the IV and then add trunc nuw/nsw at all uses -- and then implement a whole bunch of combines to get rid of that trunc again. That last part is no longer strictly necessary, but I suspect that we do need it in practice for phase ordering reasons: We probably can't leave these truncs around until the next InstCombine run that eliminates them, especially considering interleaved loop passes.

@preames
Copy link
Collaborator

preames commented Apr 16, 2024

One of the thoughts I'd had when seeing this langref proposal was that it might allow us to simplify this code significantly. I believe one of the things which makes this code complex is that we have to "commit" to our signed or unsigned interpretation by performing all rewrites at once since we couldn't materialize a wide IV + truncate and preserve the information about the choice. With this IR change, we now have that possibility. This might not fully work out - there's other reasons for complexity in this code - but mentioning it in case it inspires you. :)

Right. What IV widening conceptually does is to zext/sext the IV and then add trunc nuw/nsw at all uses -- and then implement a whole bunch of combines to get rid of that trunc again. That last part is no longer strictly necessary, but I suspect that we do need it in practice for phase ordering reasons: We probably can't leave these truncs around until the next InstCombine run that eliminates them, especially considering interleaved loop passes.

Oh, I hadn't meant anything that radical. If nothing else, we have a bunch of SCEV based reasoning in the widening code which InstCombine can't handle.

I'd simply meant that we could separate widening into two "passes" inside IndVarSimplify. Step 1 - Decide how to widen, insert the trunc nsw/nuw in the trivial position. Step 2 - Eliminate the trunc if possible.

One advantage of this approach is that it gives us a lot more confidence that the elimination code is correct because we can write specific tests which don't involving the widening strategy heuristics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants