[SROA]: Only defer trying partial sized ptr or ptr vector types #82279

jrbyrnes · 2024-02-19T21:42:24Z

#77678 introduced regressions of the following type: LoadStoreTys had the better candidate type, but it was not selected since considering LoadStoreTys was deferred until after full consideration of CandidateTys (those that are the same size as the partition).

See: https://godbolt.org/z/G5sxbf1sx

Since the issue 77678 tried to resolve only required to deferring consideration of ptr or ptr vector types, this patch implements that (and resolves the class of regressions).

llvmbot · 2024-02-19T21:42:51Z

@llvm/pr-subscribers-llvm-transforms

Author: Jeffrey Byrnes (jrbyrnes)

Changes

#77678 introduced regressions of the following type: LoadStoreTys had the better candidate type, but it was not selected since considering LoadStoreTys was deferred until after full consideration of CandidateTys (those that are the same size as the partition).

See: https://godbolt.org/z/G5sxbf1sx

Full diff: https://github.com/llvm/llvm-project/pull/82279.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/SROA.cpp (+45-28)
(modified) llvm/test/Transforms/SROA/vector-promotion.ll (+32)

diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 6c8785d52c4eab..b12d156e067edd 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -2271,6 +2271,7 @@ static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
   // we have different element types.
   SmallVector<VectorType *, 4> CandidateTys;
   SetVector<Type *> LoadStoreTys;
+  SetVector<Type *> DeferredTys;
   Type *CommonEltTy = nullptr;
   VectorType *CommonVecPtrTy = nullptr;
   bool HaveVecPtrTy = false;
@@ -2305,6 +2306,35 @@ static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
     }
   };
 
+  auto createAndCheckVectorTypesForPromotion =
+      [&](SetVector<Type *> OtherTys,
+          SmallVector<VectorType *, 4> CandidateTysCopy) {
+        // Consider additional vector types where the element type size is a
+        // multiple of load/store element size.
+        for (Type *Ty : OtherTys) {
+          if (!VectorType::isValidElementType(Ty))
+            continue;
+          unsigned TypeSize = DL.getTypeSizeInBits(Ty).getFixedValue();
+          // Make a copy of CandidateTys and iterate through it, because we
+          // might append to CandidateTys in the loop.
+          for (VectorType *&VTy : CandidateTysCopy) {
+            unsigned VectorSize = DL.getTypeSizeInBits(VTy).getFixedValue();
+            unsigned ElementSize =
+                DL.getTypeSizeInBits(VTy->getElementType()).getFixedValue();
+            if (TypeSize != VectorSize && TypeSize != ElementSize &&
+                VectorSize % TypeSize == 0) {
+              VectorType *NewVTy =
+                  VectorType::get(Ty, VectorSize / TypeSize, false);
+              CheckCandidateType(NewVTy);
+            }
+          }
+        }
+
+        return checkVectorTypesForPromotion(
+            P, DL, CandidateTys, HaveCommonEltTy, CommonEltTy, HaveVecPtrTy,
+            HaveCommonVecPtrTy, CommonVecPtrTy);
+      };
+
   // Put load and store types into a set for de-duplication.
   for (const Slice &S : P) {
     Type *Ty;
@@ -2314,44 +2344,31 @@ static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
       Ty = SI->getValueOperand()->getType();
     else
       continue;
+
+    auto CandTy =
+        isa<VectorType>(Ty) ? cast<VectorType>(Ty)->getElementType() : Ty;
+    if (CandTy->isPointerTy() && (S.beginOffset() != P.beginOffset() ||
+                                  S.endOffset() != P.endOffset())) {
+      DeferredTys.insert(Ty);
+      continue;
+    }
+
     LoadStoreTys.insert(Ty);
     // Consider any loads or stores that are the exact size of the slice.
     if (S.beginOffset() == P.beginOffset() && S.endOffset() == P.endOffset())
       CheckCandidateType(Ty);
   }
 
-  if (auto *VTy = checkVectorTypesForPromotion(
-          P, DL, CandidateTys, HaveCommonEltTy, CommonEltTy, HaveVecPtrTy,
-          HaveCommonVecPtrTy, CommonVecPtrTy))
+  SmallVector<VectorType *, 4> CandidateTysCopy = CandidateTys;
+  if (auto *VTy =
+          createAndCheckVectorTypesForPromotion(LoadStoreTys, CandidateTysCopy))
     return VTy;
 
-  // Consider additional vector types where the element type size is a
-  // multiple of load/store element size.
-  for (Type *Ty : LoadStoreTys) {
-    if (!VectorType::isValidElementType(Ty))
-      continue;
-    unsigned TypeSize = DL.getTypeSizeInBits(Ty).getFixedValue();
-    // Make a copy of CandidateTys and iterate through it, because we might
-    // append to CandidateTys in the loop.
-    SmallVector<VectorType *, 4> CandidateTysCopy = CandidateTys;
-    CandidateTys.clear();
-    for (VectorType *&VTy : CandidateTysCopy) {
-      unsigned VectorSize = DL.getTypeSizeInBits(VTy).getFixedValue();
-      unsigned ElementSize =
-          DL.getTypeSizeInBits(VTy->getElementType()).getFixedValue();
-      if (TypeSize != VectorSize && TypeSize != ElementSize &&
-          VectorSize % TypeSize == 0) {
-        VectorType *NewVTy = VectorType::get(Ty, VectorSize / TypeSize, false);
-        CheckCandidateType(NewVTy);
-      }
-    }
-  }
-
-  return checkVectorTypesForPromotion(P, DL, CandidateTys, HaveCommonEltTy,
-                                      CommonEltTy, HaveVecPtrTy,
-                                      HaveCommonVecPtrTy, CommonVecPtrTy);
+  CandidateTys.clear();
+  return createAndCheckVectorTypesForPromotion(DeferredTys, CandidateTysCopy);
 }
 
+
 /// Test whether a slice of an alloca is valid for integer widening.
 ///
 /// This implements the necessary checking for the \c isIntegerWideningViable
diff --git a/llvm/test/Transforms/SROA/vector-promotion.ll b/llvm/test/Transforms/SROA/vector-promotion.ll
index e48dd5bb392082..dc70520fc5ca70 100644
--- a/llvm/test/Transforms/SROA/vector-promotion.ll
+++ b/llvm/test/Transforms/SROA/vector-promotion.ll
@@ -1392,6 +1392,38 @@ define <4 x ptr> @ptrLoadStoreTysPtr(ptr %init, i64 %val2) {
   ret <4 x ptr> %sroaval
 }
 
+define <4 x i32> @validLoadStoreTy([2 x i64] %cond.coerce) {
+; CHECK-LABEL: @validLoadStoreTy(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[COND_COERCE_FCA_0_EXTRACT:%.*]] = extractvalue [2 x i64] [[COND_COERCE:%.*]], 0
+; CHECK-NEXT:    [[COND_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i64> undef, i64 [[COND_COERCE_FCA_0_EXTRACT]], i32 0
+; CHECK-NEXT:    [[COND_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [2 x i64] [[COND_COERCE]], 1
+; CHECK-NEXT:    [[COND_SROA_0_8_VEC_INSERT:%.*]] = insertelement <2 x i64> [[COND_SROA_0_0_VEC_INSERT]], i64 [[COND_COERCE_FCA_1_EXTRACT]], i32 1
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast <2 x i64> [[COND_SROA_0_8_VEC_INSERT]] to <4 x i32>
+; CHECK-NEXT:    ret <4 x i32> [[TMP0]]
+;
+; DEBUG-LABEL: @validLoadStoreTy(
+; DEBUG-NEXT:  entry:
+; DEBUG-NEXT:    call void @llvm.dbg.value(metadata ptr undef, metadata [[META553:![0-9]+]], metadata !DIExpression()), !dbg [[DBG557:![0-9]+]]
+; DEBUG-NEXT:    call void @llvm.dbg.value(metadata ptr undef, metadata [[META554:![0-9]+]], metadata !DIExpression()), !dbg [[DBG558:![0-9]+]]
+; DEBUG-NEXT:    [[COND_COERCE_FCA_0_EXTRACT:%.*]] = extractvalue [2 x i64] [[COND_COERCE:%.*]], 0, !dbg [[DBG559:![0-9]+]]
+; DEBUG-NEXT:    [[COND_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i64> undef, i64 [[COND_COERCE_FCA_0_EXTRACT]], i32 0, !dbg [[DBG559]]
+; DEBUG-NEXT:    [[COND_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [2 x i64] [[COND_COERCE]], 1, !dbg [[DBG559]]
+; DEBUG-NEXT:    [[COND_SROA_0_8_VEC_INSERT:%.*]] = insertelement <2 x i64> [[COND_SROA_0_0_VEC_INSERT]], i64 [[COND_COERCE_FCA_1_EXTRACT]], i32 1, !dbg [[DBG559]]
+; DEBUG-NEXT:    call void @llvm.dbg.value(metadata ptr undef, metadata [[META555:![0-9]+]], metadata !DIExpression()), !dbg [[DBG560:![0-9]+]]
+; DEBUG-NEXT:    [[TMP0:%.*]] = bitcast <2 x i64> [[COND_SROA_0_8_VEC_INSERT]] to <4 x i32>, !dbg [[DBG561:![0-9]+]]
+; DEBUG-NEXT:    call void @llvm.dbg.value(metadata <4 x i32> [[TMP0]], metadata [[META556:![0-9]+]], metadata !DIExpression()), !dbg [[DBG561]]
+; DEBUG-NEXT:    ret <4 x i32> [[TMP0]], !dbg [[DBG562:![0-9]+]]
+;
+entry:
+  %cond = alloca <4 x i32>, align 8
+  %coerce.dive2 = getelementptr inbounds <4 x i32>, ptr %cond, i32 0, i32 0
+  store [2 x i64] %cond.coerce, ptr %coerce.dive2, align 8
+  %m5 = getelementptr inbounds <4 x i32>, ptr %cond, i32 0, i32 0
+  %0 = load <4 x i32>, ptr %m5, align 8
+  ret <4 x i32> %0
+}
+
 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
 declare void @llvm.lifetime.end.p0(i64, ptr)
 ;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:

jrbyrnes · 2024-02-19T21:44:27Z

@topperc This is what I was referring to

github-actions · 2024-02-19T21:44:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Change-Id: Iea5d60b12e2de7033fc1a71e80aa96c261e998bf

Change-Id: Ic77f87290905addadd5819dff2d0c62f031022ab

jrbyrnes · 2024-02-19T21:52:06Z

Formatting

llvm/lib/Transforms/Scalar/SROA.cpp

…tion Change-Id: I7f63cfe0fdb9f08fd94e40c33c060af492bdd26a

arsenm · 2024-02-28T08:09:15Z

llvm/lib/Transforms/Scalar/SROA.cpp

+    // Make a copy of CandidateTys and iterate through it, because we
+    // might append to CandidateTys in the loop.
+    for (VectorType *&VTy : CandidateTysCopy) {


This isn't being mutated though? can you just use ArrayRef of CandidateTys?

I think CheckCandidateType modifies CandidateTys

But if it's append only the ArrayRef won't see the new elements

For the purpose of the loop ArrayRef is fine I think since it is treated as fixed size, but we need the actual copy for the second call to createAndCheckVectorTypesForPromotion since we CandidateTys.clear();.

The clear() is done to save iterations in checkVectorTypesForPromotion by not retrying things, but we still need these Tys for the size-based candidateTys generation.

Actually, this is causing a failure.

Looks like if push_back exceeds the capacity of the allocation for small_vector, push_back will call grow

llvm-project/llvm/include/llvm/ADT/SmallVector.h

Line 446 in 597f976

void SmallVectorTemplateBase<T, TriviallyCopyable>::grow(size_t MinSize) {

This will malloc new memory, move the existing elements and reclaim their prior memory.

Thus the ArrayRef will be working with bad data.

Besides, we may CandidateTys.clear() from CheckCandidateType

Add a test that hit this failure?

There is no good way to capture the actual failure. This is because the failure depends on overwriting the reclaimed memory -- this is dependent upon malloc as well as hardware.

Instead, I've added an assert which enforces memory invariance to support the test. This is one of the conditions for the failure.

Change-Id: Iea1367c878118b6f1dcac3c43b53b98be0ca57e3

llvm/lib/Transforms/Scalar/SROA.cpp

Change-Id: Ib00edfe8bd3517effb7f58bf9631bd2700dc84c1

Change-Id: I8f00f67fcd9db825b74ed98b260b8c720e17f653

Change-Id: Idff43d522274474a6b719548eaefcf675fdb66e2

llvm/lib/Transforms/Scalar/SROA.cpp

Change-Id: Ic77f87290905addadd5819dff2d0c62f031022ab

jrbyrnes · 2024-03-05T16:55:40Z

1e828f8

dianqk · 2024-03-21T12:31:26Z

This seems like an addition to the previous PR, although this is not a critical bug.

/cherry-pick 1e828f8

llvmbot · 2024-03-21T12:36:38Z

/pull-request #86114

llvmbot added the llvm:transforms label Feb 19, 2024

jrbyrnes requested a review from topperc February 19, 2024 21:44

jrbyrnes added 2 commits February 19, 2024 13:48

[SROA] NFC: Extract common code to createAndCheckVectorTypesForPromotion

5ea3531

Change-Id: Iea5d60b12e2de7033fc1a71e80aa96c261e998bf

jrbyrnes force-pushed the FixSROA branch from 02e7464 to 31d5126 Compare February 19, 2024 21:51

jrbyrnes requested a review from arsenm February 23, 2024 17:19

arsenm reviewed Feb 25, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/SROA.cpp Outdated Show resolved Hide resolved

fixup! convert createAndCheckVectorTypesForPromotion to a helper func…

Loading
Loading status checks…

1d95dd8

…tion Change-Id: I7f63cfe0fdb9f08fd94e40c33c060af492bdd26a

arsenm reviewed Feb 28, 2024

View reviewed changes

fixup! Clean up CandidateTysCopy

Loading
Loading status checks…

08bbc80

Change-Id: Iea1367c878118b6f1dcac3c43b53b98be0ca57e3

arsenm reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/SROA.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/SROA.cpp Outdated Show resolved Hide resolved

fixup! use API & minor code changes

Loading
Loading status checks…

80eede3

Change-Id: Ib00edfe8bd3517effb7f58bf9631bd2700dc84c1

arsenm approved these changes Mar 1, 2024

View reviewed changes

jrbyrnes added 2 commits March 1, 2024 12:56

fixup! Don't use ArrayRef of CandidateTys

Loading
Loading status checks…

2d3b239

Change-Id: I8f00f67fcd9db825b74ed98b260b8c720e17f653

fixup! Add test for invariant memory across loop

Loading
Loading status checks…

ea54c6f

Change-Id: Idff43d522274474a6b719548eaefcf675fdb66e2

arsenm approved these changes Mar 5, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/SROA.cpp Show resolved Hide resolved

jrbyrnes referenced this pull request Mar 5, 2024

[SROA]: Only defer trying partial sized ptr or ptr vector types

Loading
Loading status checks…

1e828f8

Change-Id: Ic77f87290905addadd5819dff2d0c62f031022ab

jrbyrnes closed this Mar 5, 2024

dianqk mentioned this pull request Mar 21, 2024

Codegen regression in Ipv6Addr [u16; 8] to [u8; 16] conversion rust-lang/rust#122805

Closed

dianqk added this to the LLVM 18.X Release milestone Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SROA]: Only defer trying partial sized ptr or ptr vector types #82279

[SROA]: Only defer trying partial sized ptr or ptr vector types #82279

jrbyrnes commented Feb 19, 2024 •

edited

Loading

llvmbot commented Feb 19, 2024

jrbyrnes commented Feb 19, 2024

github-actions bot commented Feb 19, 2024 •

edited

Loading

jrbyrnes commented Feb 19, 2024

arsenm Feb 28, 2024

topperc Feb 28, 2024

arsenm Feb 28, 2024

jrbyrnes Feb 28, 2024

jrbyrnes Mar 1, 2024 •

edited

Loading

jrbyrnes Mar 1, 2024

arsenm Mar 4, 2024

jrbyrnes Mar 4, 2024

jrbyrnes commented Mar 5, 2024

dianqk commented Mar 21, 2024

llvmbot commented Mar 21, 2024

[SROA]: Only defer trying partial sized ptr or ptr vector types #82279

[SROA]: Only defer trying partial sized ptr or ptr vector types #82279

Conversation

jrbyrnes commented Feb 19, 2024 • edited Loading

llvmbot commented Feb 19, 2024

jrbyrnes commented Feb 19, 2024

github-actions bot commented Feb 19, 2024 • edited Loading

jrbyrnes commented Feb 19, 2024

arsenm Feb 28, 2024

Choose a reason for hiding this comment

topperc Feb 28, 2024

Choose a reason for hiding this comment

arsenm Feb 28, 2024

Choose a reason for hiding this comment

jrbyrnes Feb 28, 2024

Choose a reason for hiding this comment

jrbyrnes Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

jrbyrnes Mar 1, 2024

Choose a reason for hiding this comment

arsenm Mar 4, 2024

Choose a reason for hiding this comment

jrbyrnes Mar 4, 2024

Choose a reason for hiding this comment

jrbyrnes commented Mar 5, 2024

dianqk commented Mar 21, 2024

llvmbot commented Mar 21, 2024

jrbyrnes commented Feb 19, 2024 •

edited

Loading

github-actions bot commented Feb 19, 2024 •

edited

Loading

jrbyrnes Mar 1, 2024 •

edited

Loading