[InstCombine] Make indexed compare fold GEP source type independent #71663

nikic · 2023-11-08T11:39:12Z

The indexed compare fold converts comparisons of GEPs with same (indirect) base into comparisons of offset. Currently, it only supports GEPs with the same source element type.

This change makes the transform operate on offsets instead, which removes the type dependence. To keep closer to the scope of the original implementation, this keeps the limitation that we should only have at most one variable index per GEP.

This addresses the main regression from #68882.

TBH I have some doubts that this is really a useful transform (at least for the case where there are extra pointer users, so we have to rematerialize pointers at some point). I can only assume it exists for a reason...

llvmbot · 2023-11-08T11:39:49Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

The indexed compare fold converts comparisons of GEPs with same (indirect) base into comparisons of offset. Currently, it only supports GEPs with the same source element type.

This change makes the transform operate on offsets instead, which removes the type dependence. To keep closer to the scope of the original implementation, this keeps the limitation that we should only have at most one variable index per GEP.

This addresses the main regression from #68882.

TBH I have some doubts that this is really a useful transform (at least for the case where there are extra pointer users, so we have to rematerialize pointers at some point). I can only assume it exists for a reason...

Full diff: https://github.com/llvm/llvm-project/pull/71663.diff

3 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+36-59)
(modified) llvm/test/Transforms/InstCombine/indexed-gep-compares.ll (+25-20)
(modified) llvm/test/Transforms/InstCombine/opaque-ptr.ll (+11-10)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 7c2ad92f919a3cc..c4b8ed7be3f97fe 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -18,6 +18,7 @@
 #include "llvm/Analysis/CmpInstAnalysis.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/Analysis/Utils/Local.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/DataLayout.h"
@@ -413,11 +414,12 @@ Instruction *InstCombinerImpl::foldCmpLoadFromIndexedGlobal(
 /// Returns true if we can rewrite Start as a GEP with pointer Base
 /// and some integer offset. The nodes that need to be re-written
 /// for this transformation will be added to Explored.
-static bool canRewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
+static bool canRewriteGEPAsOffset(Value *Start, Value *Base,
                                   const DataLayout &DL,
                                   SetVector<Value *> &Explored) {
   SmallVector<Value *, 16> WorkList(1, Start);
   Explored.insert(Base);
+  uint64_t IndexSize = DL.getIndexTypeSizeInBits(Start->getType());
 
   // The following traversal gives us an order which can be used
   // when doing the final transformation. Since in the final
@@ -447,11 +449,11 @@ static bool canRewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
         return false;
 
       if (auto *GEP = dyn_cast<GEPOperator>(V)) {
-        // We're limiting the GEP to having one index. This will preserve
-        // the original pointer type. We could handle more cases in the
-        // future.
-        if (GEP->getNumIndices() != 1 || !GEP->isInBounds() ||
-            GEP->getSourceElementType() != ElemTy)
+        // Only allow GEPs with at most one variable offset.
+        APInt Offset(IndexSize, 0);
+        MapVector<Value *, APInt> VarOffsets;
+        if (!GEP->collectOffset(DL, IndexSize, VarOffsets, Offset) ||
+            VarOffsets.size() > 1)
           return false;
 
         if (!Explored.contains(GEP->getOperand(0)))
@@ -528,7 +530,7 @@ static void setInsertionPoint(IRBuilder<> &Builder, Value *V,
 
 /// Returns a re-written value of Start as an indexed GEP using Base as a
 /// pointer.
-static Value *rewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
+static Value *rewriteGEPAsOffset(Value *Start, Value *Base,
                                  const DataLayout &DL,
                                  SetVector<Value *> &Explored,
                                  InstCombiner &IC) {
@@ -539,8 +541,8 @@ static Value *rewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
   // 3. Add the edges for the PHI nodes.
   // 4. Emit GEPs to get the original pointers.
   // 5. Remove the original instructions.
-  Type *IndexType = IntegerType::get(
-      Base->getContext(), DL.getIndexTypeSizeInBits(Start->getType()));
+  uint64_t IndexSize = DL.getIndexTypeSizeInBits(Start->getType());
+  Type *IndexType = IntegerType::get(Base->getContext(), IndexSize);
 
   DenseMap<Value *, Value *> NewInsts;
   NewInsts[Base] = ConstantInt::getNullValue(IndexType);
@@ -559,29 +561,22 @@ static Value *rewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
 
   // Create all the other instructions.
   for (Value *Val : Explored) {
-
     if (NewInsts.contains(Val))
       continue;
 
     if (auto *GEP = dyn_cast<GEPOperator>(Val)) {
-      Value *Index = NewInsts[GEP->getOperand(1)] ? NewInsts[GEP->getOperand(1)]
-                                                  : GEP->getOperand(1);
-      setInsertionPoint(Builder, GEP);
-      // Indices might need to be sign extended. GEPs will magically do
-      // this, but we need to do it ourselves here.
-      if (Index->getType()->getScalarSizeInBits() !=
-          NewInsts[GEP->getOperand(0)]->getType()->getScalarSizeInBits()) {
-        Index = Builder.CreateSExtOrTrunc(
-            Index, NewInsts[GEP->getOperand(0)]->getType(),
-            GEP->getOperand(0)->getName() + ".sext");
-      }
+      APInt Offset(IndexSize, 0);
+      MapVector<Value *, APInt> VarOffsets;
+      GEP->collectOffset(DL, IndexSize, VarOffsets, Offset);
 
-      auto *Op = NewInsts[GEP->getOperand(0)];
+      setInsertionPoint(Builder, GEP);
+      Value *Op = NewInsts[GEP->getOperand(0)];
+      Value *OffsetV = emitGEPOffset(&Builder, DL, GEP);
       if (isa<ConstantInt>(Op) && cast<ConstantInt>(Op)->isZero())
-        NewInsts[GEP] = Index;
+        NewInsts[GEP] = OffsetV;
       else
         NewInsts[GEP] = Builder.CreateNSWAdd(
-            Op, Index, GEP->getOperand(0)->getName() + ".add");
+            Op, OffsetV, GEP->getOperand(0)->getName() + ".add");
       continue;
     }
     if (isa<PHINode>(Val))
@@ -609,23 +604,14 @@ static Value *rewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
     }
   }
 
-  PointerType *PtrTy = PointerType::get(
-      Base->getContext(), Start->getType()->getPointerAddressSpace());
   for (Value *Val : Explored) {
     if (Val == Base)
       continue;
 
-    // Depending on the type, for external users we have to emit
-    // a GEP or a GEP + ptrtoint.
     setInsertionPoint(Builder, Val, false);
-
-    // Cast base to the expected type.
-    Value *NewVal = Builder.CreateBitOrPointerCast(
-        Base, PtrTy, Start->getName() + "to.ptr");
-    NewVal = Builder.CreateInBoundsGEP(ElemTy, NewVal, ArrayRef(NewInsts[Val]),
-                                       Val->getName() + ".ptr");
-    NewVal = Builder.CreateBitOrPointerCast(
-        NewVal, Val->getType(), Val->getName() + ".conv");
+    // Create GEP for external users.
+    Value *NewVal = Builder.CreateInBoundsGEP(
+        Builder.getInt8Ty(), Base, NewInsts[Val], Val->getName() + ".ptr");
     IC.replaceInstUsesWith(*cast<Instruction>(Val), NewVal);
     // Add old instruction to worklist for DCE. We don't directly remove it
     // here because the original compare is one of the users.
@@ -637,28 +623,18 @@ static Value *rewriteGEPAsOffset(Type *ElemTy, Value *Start, Value *Base,
 
 /// Looks through GEPs in order to express the input Value as a constant
 /// indexed GEP. Returns a pair containing the GEPs Pointer and Index.
-static std::pair<Value *, Value *>
-getAsConstantIndexedAddress(Type *ElemTy, Value *V, const DataLayout &DL) {
-  Type *IndexType = IntegerType::get(V->getContext(),
-                                     DL.getIndexTypeSizeInBits(V->getType()));
-
-  Constant *Index = ConstantInt::getNullValue(IndexType);
+static std::pair<Value *, APInt>
+getAsConstantIndexedAddress(Value *V, const DataLayout &DL) {
+  APInt Offset = APInt(DL.getIndexTypeSizeInBits(V->getType()), 0);
   while (GEPOperator *GEP = dyn_cast<GEPOperator>(V)) {
     // We accept only inbouds GEPs here to exclude the possibility of
     // overflow.
-    if (!GEP->isInBounds())
+    if (!GEP->isInBounds() || !GEP->accumulateConstantOffset(DL, Offset))
       break;
-    if (GEP->hasAllConstantIndices() && GEP->getNumIndices() == 1 &&
-        GEP->getSourceElementType() == ElemTy &&
-        GEP->getOperand(1)->getType() == IndexType) {
-      V = GEP->getOperand(0);
-      Constant *GEPIndex = static_cast<Constant *>(GEP->getOperand(1));
-      Index = ConstantExpr::getAdd(Index, GEPIndex);
-      continue;
-    }
-    break;
+
+    V = GEP->getPointerOperand();
   }
-  return {V, Index};
+  return {V, Offset};
 }
 
 /// Converts (CMP GEPLHS, RHS) if this change would make RHS a constant.
@@ -675,14 +651,14 @@ static Instruction *transformToIndexedCompare(GEPOperator *GEPLHS, Value *RHS,
   if (!GEPLHS->hasAllConstantIndices())
     return nullptr;
 
-  Type *ElemTy = GEPLHS->getSourceElementType();
-  Value *PtrBase, *Index;
-  std::tie(PtrBase, Index) = getAsConstantIndexedAddress(ElemTy, GEPLHS, DL);
+  Value *PtrBase;
+  APInt Offset;
+  std::tie(PtrBase, Offset) = getAsConstantIndexedAddress(GEPLHS, DL);
 
   // The set of nodes that will take part in this transformation.
   SetVector<Value *> Nodes;
 
-  if (!canRewriteGEPAsOffset(ElemTy, RHS, PtrBase, DL, Nodes))
+  if (!canRewriteGEPAsOffset(RHS, PtrBase, DL, Nodes))
     return nullptr;
 
   // We know we can re-write this as
@@ -691,13 +667,14 @@ static Instruction *transformToIndexedCompare(GEPOperator *GEPLHS, Value *RHS,
   // can't have overflow on either side. We can therefore re-write
   // this as:
   //   OFFSET1 cmp OFFSET2
-  Value *NewRHS = rewriteGEPAsOffset(ElemTy, RHS, PtrBase, DL, Nodes, IC);
+  Value *NewRHS = rewriteGEPAsOffset(RHS, PtrBase, DL, Nodes, IC);
 
   // RewriteGEPAsOffset has replaced RHS and all of its uses with a re-written
   // GEP having PtrBase as the pointer base, and has returned in NewRHS the
   // offset. Since Index is the offset of LHS to the base pointer, we will now
   // compare the offsets instead of comparing the pointers.
-  return new ICmpInst(ICmpInst::getSignedPredicate(Cond), Index, NewRHS);
+  return new ICmpInst(ICmpInst::getSignedPredicate(Cond),
+                      IC.Builder.getInt(Offset), NewRHS);
 }
 
 /// Fold comparisons between a GEP instruction and something else. At this point
diff --git a/llvm/test/Transforms/InstCombine/indexed-gep-compares.ll b/llvm/test/Transforms/InstCombine/indexed-gep-compares.ll
index 4681fdc59a99693..e34fd032b17d98b 100644
--- a/llvm/test/Transforms/InstCombine/indexed-gep-compares.ll
+++ b/llvm/test/Transforms/InstCombine/indexed-gep-compares.ll
@@ -6,14 +6,15 @@ target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32-f3
 define ptr@test1(ptr %A, i32 %Offset) {
 ; CHECK-LABEL: @test1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i32 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i32 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ;
 entry:
@@ -34,15 +35,16 @@ bb2:
 define ptr@test2(i32 %A, i32 %Offset) {
 ; CHECK-LABEL: @test2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i32 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
 ; CHECK-NEXT:    [[A_PTR:%.*]] = inttoptr i32 [[A:%.*]] to ptr
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A_PTR]], i32 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A_PTR]], i32 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ;
 entry:
@@ -99,16 +101,17 @@ bb2:
 define ptr@test4(i16 %A, i32 %Offset) {
 ; CHECK-LABEL: @test4(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i32 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
 ; CHECK-NEXT:    [[TMP0:%.*]] = zext i16 [[A:%.*]] to i32
 ; CHECK-NEXT:    [[A_PTR:%.*]] = inttoptr i32 [[TMP0]] to ptr
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A_PTR]], i32 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A_PTR]], i32 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ;
 entry:
@@ -137,14 +140,15 @@ define ptr@test5(i32 %Offset) personality ptr @__gxx_personality_v0 {
 ; CHECK-NEXT:    [[A:%.*]] = invoke ptr @fun_ptr()
 ; CHECK-NEXT:            to label [[CONT:%.*]] unwind label [[LPAD:%.*]]
 ; CHECK:       cont:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i32 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[CONT]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[CONT]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A]], i32 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ; CHECK:       lpad:
 ; CHECK-NEXT:    [[L:%.*]] = landingpad { ptr, i32 }
@@ -181,15 +185,16 @@ define ptr@test6(i32 %Offset) personality ptr @__gxx_personality_v0 {
 ; CHECK-NEXT:    [[A:%.*]] = invoke i32 @fun_i32()
 ; CHECK-NEXT:            to label [[CONT:%.*]] unwind label [[LPAD:%.*]]
 ; CHECK:       cont:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i32 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[CONT]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i32 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[CONT]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i32 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i32 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
 ; CHECK-NEXT:    [[A_PTR:%.*]] = inttoptr i32 [[A]] to ptr
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A_PTR]], i32 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A_PTR]], i32 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ; CHECK:       lpad:
 ; CHECK-NEXT:    [[L:%.*]] = landingpad { ptr, i32 }
diff --git a/llvm/test/Transforms/InstCombine/opaque-ptr.ll b/llvm/test/Transforms/InstCombine/opaque-ptr.ll
index 900d3f142a6ff79..4448a49ad92bf5d 100644
--- a/llvm/test/Transforms/InstCombine/opaque-ptr.ll
+++ b/llvm/test/Transforms/InstCombine/opaque-ptr.ll
@@ -382,14 +382,15 @@ define <4 x i1> @compare_geps_same_indices_scalar_vector_base_mismatch(ptr %ptr,
 define ptr @indexed_compare(ptr %A, i64 %offset) {
 ; CHECK-LABEL: @indexed_compare(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i64 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i64 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[OFFSET:%.*]], [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[RHS_ADD]] = add nsw i64 [[RHS_IDX]], 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i64 [[RHS_IDX]], 100
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i64 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i64 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i64 [[RHS_IDX]], 400
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[RHS_IDX]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[RHS_IDX]]
 ; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ;
 entry:
@@ -410,16 +411,16 @@ bb2:
 define ptr @indexed_compare_different_types(ptr %A, i64 %offset) {
 ; CHECK-LABEL: @indexed_compare_different_types(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[OFFSET:%.*]]
+; CHECK-NEXT:    [[TMP_IDX:%.*]] = shl nsw i64 [[OFFSET:%.*]], 2
 ; CHECK-NEXT:    br label [[BB:%.*]]
 ; CHECK:       bb:
-; CHECK-NEXT:    [[RHS:%.*]] = phi ptr [ [[RHS_NEXT:%.*]], [[BB]] ], [ [[TMP]], [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[LHS:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 100
-; CHECK-NEXT:    [[RHS_NEXT]] = getelementptr inbounds i32, ptr [[RHS]], i64 1
-; CHECK-NEXT:    [[COND:%.*]] = icmp ult ptr [[LHS]], [[RHS]]
+; CHECK-NEXT:    [[RHS_IDX:%.*]] = phi i64 [ [[RHS_ADD:%.*]], [[BB]] ], [ [[TMP_IDX]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[RHS_ADD]] = add nsw i64 [[RHS_IDX]], 4
+; CHECK-NEXT:    [[COND:%.*]] = icmp sgt i64 [[RHS_IDX]], 800
 ; CHECK-NEXT:    br i1 [[COND]], label [[BB2:%.*]], label [[BB]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    ret ptr [[RHS]]
+; CHECK-NEXT:    [[RHS_PTR:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[RHS_IDX]]
+; CHECK-NEXT:    ret ptr [[RHS_PTR]]
 ;
 entry:
   %tmp = getelementptr inbounds i32, ptr %A, i64 %offset

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

nikic · 2023-11-21T08:28:45Z

ping

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

goldsteinn · 2023-11-21T18:13:33Z

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

-
-      auto *Op = NewInsts[GEP->getOperand(0)];
+      Value *Op = NewInsts[GEP->getOperand(0)];
+      Value *OffsetV = emitGEPOffset(&Builder, DL, GEP);
      if (isa<ConstantInt>(Op) && cast<ConstantInt>(Op)->isZero())


nit: match(Op, m_Zero())

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

goldsteinn · 2023-11-21T18:30:50Z

LGTM.

That being said, I'm not particularly familiar with these functions, so can you wait a day to push to give others time to review.

dtcxzyw · 2023-11-21T18:57:00Z

I saw performance regressions with this patch. Please give me some time to check the artifacts.
PR Link: plctlab/llvm-ci#784
Artifacts: https://github.com/dtcxzyw/llvm-ci/actions/runs/6943611865

dtcxzyw · 2023-11-27T02:00:33Z

I saw performance regressions with this patch. Please give me some time to check the artifacts. PR Link: dtcxzyw/llvm-ci#784 Artifacts: https://github.com/dtcxzyw/llvm-ci/actions/runs/6943611865

In MultiSource/Benchmarks/MallocBench/cfrac/cfrac:

Files ./artifacts/binaries/1e915a03d89253f1f99ca32f670d4058_bc/seg11.ll and ./artifacts/binaries/fe454fa883b692ec582f94c980b07f2b_bc/seg11.ll differ
function @llvm.memset.p0.i64 exists only in right module
function @llvm.usub.sat.i64 exists only in right module
in function pmul:
  in block %191 / %191:
    >   %192 = tail call ptr @errorp(i32 noundef signext 1, ptr noundef nonnull @.str.llvm.13542161874275324736, ptr noundef nonnull @.str.1.llvm.13542161874275324736) #6
    <   %192 = tail call ptr @errorp(i32 noundef signext 1, ptr noundef nonnull @.str.llvm.2182144003073642098, ptr noundef nonnull @.str.1.llvm.2182144003073642098) #4
  in block %219 / %219:
    >   %230 = getelementptr inbounds %struct.precisionType.0, ptr %202, i64 0, i32 4
    <   %230 = getelementptr inbounds %struct.precisionType, ptr %215, i64 0, i32 4
        %231 = zext i16 %205 to i64
    >   %232 = shl nuw nsw i64 %231, 1
    >   %233 = add nuw nsw i64 %232, 6
    >   %234 = tail call i64 @llvm.usub.sat.i64(i64 %232, i64 2)
    >   %235 = sub nsw i64 %233, %234
    >   %236 = getelementptr i8, ptr %215, i64 %235
    >   %237 = add nuw nsw i64 %234, 2
    >   tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 2 dereferenceable(1) %236, i8 0, i64 %237, i1 false), !tbaa !7
    >   %238 = getelementptr inbounds %struct.precisionType.0, ptr %203, i64 0, i32 4
    >   %239 = getelementptr inbounds i8, ptr %215, i64 8
    >   %240 = getelementptr inbounds i16, ptr %238, i64 %231
    >   %241 = ptrtoint ptr %230 to i64
    >   %242 = zext i16 %208 to i64
    >   %243 = getelementptr inbounds i16, ptr %230, i64 %242
    <   %232 = getelementptr inbounds i16, ptr %230, i64 %231
  in block %217 / %217:
    >   %218 = tail call ptr @errorp(i32 noundef signext 1, ptr noundef nonnull @.str.llvm.13542161874275324736, ptr noundef nonnull @.str.1.llvm.13542161874275324736) #6
    <   %218 = tail call ptr @errorp(i32 noundef signext 1, ptr noundef nonnull @.str.llvm.2182144003073642098, ptr noundef nonnull @.str.1.llvm.2182144003073642098) #4
  in block %387 / %366:
    >   %367 = icmp ult i32 %360, 65536
    <   %388 = icmp ult i32 %381, 65536

It seems like converting GEP into shl+add may cause regression :(
We should improve BasicAA to handle more patterns in future.
But it shouldn't block this patch as we saw performance/code size improvement in some benchmarks.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

The indexed compare fold converts comparisons of GEPs with same (indirect) base into comparisons of offset. Currently, it only supports GEPs with the same source element type. This change makes the transform operate on offsets instead, which removes the type dependence. To keep closer to the scope of the original implementation, this keeps the limitation that we should only have at most one variable index per GEP.

nikic · 2023-11-27T14:53:55Z

@dtcxzyw I'm not seeing something exactly like that (using usub.sat) in my own IR diffs, but I do see something quite similar in pcmul.c.ll (https://gist.github.com/nikic/805ec1c7a79b8a94ff6c1a60a4a19d3e). As far as I can tell, what happens there is that a loop that stores 0 gets converted into a memset by LIR now. Nominally this is a positive change, but it can also be negative if the number of stores is actually low (as the trip count calculation is complex).

dtcxzyw

LGTM.

nikic requested review from dtcxzyw and goldsteinn November 8, 2023 11:39

llvmbot added the llvm:transforms label Nov 8, 2023

dtcxzyw mentioned this pull request Nov 8, 2023

test PR71663 plctlab/llvm-ci#784

Closed

dtcxzyw reviewed Nov 8, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Nov 13, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Nov 13, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

nikic force-pushed the instcombine-indexed-compare branch from d248f7d to 9c7db50 Compare November 14, 2023 14:08

goldsteinn reviewed Nov 21, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Show resolved Hide resolved

nikic commented Nov 21, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Nov 21, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Show resolved Hide resolved

dtcxzyw reviewed Nov 27, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

nikic and others added 5 commits November 27, 2023 09:43

Use stripAndAccumulateConstantOffsets()

a2b4bb7

remove dead code

79c3ce2

Simplify check for at most one variable index

d3f9de6

Restore inbounds check

af6cc8f

nikic force-pushed the instcombine-indexed-compare branch from e04064e to af6cc8f Compare November 27, 2023 08:46

dtcxzyw approved these changes Nov 27, 2023

View reviewed changes

nikic merged commit d01237c into llvm:main Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Make indexed compare fold GEP source type independent #71663

[InstCombine] Make indexed compare fold GEP source type independent #71663

Uh oh!

nikic commented Nov 8, 2023

Uh oh!

llvmbot commented Nov 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic commented Nov 21, 2023

Uh oh!

Uh oh!

Uh oh!

goldsteinn Nov 21, 2023

Uh oh!

Uh oh!

goldsteinn commented Nov 21, 2023

Uh oh!

dtcxzyw commented Nov 21, 2023

Uh oh!

dtcxzyw commented Nov 27, 2023

Uh oh!

Uh oh!

nikic commented Nov 27, 2023

Uh oh!

dtcxzyw left a comment

Uh oh!

Uh oh!

[InstCombine] Make indexed compare fold GEP source type independent #71663

[InstCombine] Make indexed compare fold GEP source type independent #71663

Uh oh!

Conversation

nikic commented Nov 8, 2023

Uh oh!

llvmbot commented Nov 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic commented Nov 21, 2023

Uh oh!

Uh oh!

Uh oh!

goldsteinn Nov 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goldsteinn commented Nov 21, 2023

Uh oh!

dtcxzyw commented Nov 21, 2023

Uh oh!

dtcxzyw commented Nov 27, 2023

Uh oh!

Uh oh!

nikic commented Nov 27, 2023

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!