-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP] Initial vectorization of non-power-of-2 ops. #77790
Merged
Merged
Changes from 40 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
252567a
[SLP] Initial vectorization of non-power-of-2 ops.
fhahn 0bb957b
Remove stale PADDING check lines, fix POW2/NON-POW2 prefixes in test.
fhahn 84cf9b9
!fixup Address latest comments, thanks!
fhahn 552b8aa
!fixup Add fixme to processBuildVector
fhahn 0ee85a3
Merge branch 'main' into slp-vec3
fhahn 4bb53dd
!fixup undo gather cos changes.
fhahn 627c30b
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn cabbe05
!fixup remove escape hatch for non-power-of-2 vectors from processBV.
fhahn f30c753
!fixup removed with in wrong place
fhahn f15ddd9
!fixup also update odd_store.ll
fhahn 82efe8a
Merge branch 'main' into slp-vec3
fhahn 35fc0f9
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn 5cd569b
!fixup address latest comments, thanks!
fhahn e189eec
[SLP] Collect candidate VFs in vector in vectorizeStores (NFC).
fhahn e0b403a
Merge branch 'users/fhahn/slp-store-vfs-in-vector' into slp-vec3
fhahn b6dac7b
!fixup update tests after merge.
fhahn 13db21f
Merge remote-tracking branch 'origin/main' into users/fhahn/slp-store…
fhahn 8e7339a
[SLP] Exit early .
fhahn 3eacfa6
[SLP] Exit early if MaxVF < MinVF (NFCI).
fhahn 0d62c2c
Merge remote-tracking branch 'origin/users/fhahn/slp-early-exit' into…
fhahn 8b6b0e8
!fixup use for_each.
fhahn 454acf8
Merge remote-tracking branch 'origin/users/fhahn/slp-store-vfs-in-vec…
fhahn 1576b0a
Merge remote-tracking branch 'origin/main' into users/fhahn/slp-store…
fhahn d733a61
Merge remote-tracking branch 'origin/users/fhahn/slp-store-vfs-in-vec…
fhahn 4d8c47d
!fixup add non-power-of-2 VF correctly.
fhahn 0103a25
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn de3a7e8
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn fb1c7be
!fixup address latest comments, thanks!
fhahn 4c1197a
!fixup fix formatting
fhahn 6757ddf
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn 210210f
!fixup add separate early exit for Non-power-of-2 VFs.
fhahn 47df498
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn 981a3d4
!fixup adjust VF computation as suggested
fhahn a0155f1
Merge branch 'main' into slp-vec3
fhahn 6e4996a
Merge branch 'main' into slp-vec3
fhahn cded768
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn c52b68c
!fixup address comments, update after upstream changes.
fhahn 8d1b5d4
!fixup remove newline
fhahn db8bb3f
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn b7ccdd4
!fixup address latest comments, thanks!
fhahn 8c9627d
Merge remote-tracking branch 'origin/main' into slp-vec3
fhahn 3919ee6
!fixup add assert
fhahn ad67f18
Merge branch 'main' into slp-vec3
fhahn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -190,6 +190,10 @@ static cl::opt<bool> | |
ViewSLPTree("view-slp-tree", cl::Hidden, | ||
cl::desc("Display the SLP trees with Graphviz")); | ||
|
||
static cl::opt<bool> VectorizeNonPowerOf2( | ||
"slp-vectorize-non-power-of-2", cl::init(false), cl::Hidden, | ||
cl::desc("Try to vectorize with non-power-of-2 number of elements.")); | ||
|
||
// Limit the number of alias checks. The limit is chosen so that | ||
// it has no negative effect on the llvm benchmarks. | ||
static const unsigned AliasedCheckLimit = 10; | ||
|
@@ -2806,6 +2810,9 @@ class BoUpSLP { | |
SmallVectorImpl<Value *> *OpScalars = nullptr, | ||
SmallVectorImpl<Value *> *AltScalars = nullptr) const; | ||
|
||
/// Return true if this is a non-power-of-2 node. | ||
bool isNonPowOf2Vec() const { return !isPowerOf2_32(Scalars.size()); } | ||
|
||
#ifndef NDEBUG | ||
/// Debug printer. | ||
LLVM_DUMP_METHOD void dump() const { | ||
|
@@ -2971,9 +2978,11 @@ class BoUpSLP { | |
MustGather.insert(VL.begin(), VL.end()); | ||
} | ||
|
||
if (UserTreeIdx.UserTE) | ||
if (UserTreeIdx.UserTE) { | ||
Last->UserTreeIndices.push_back(UserTreeIdx); | ||
|
||
assert((!Last->isNonPowOf2Vec() || Last->ReorderIndices.empty()) && | ||
"Reordering isn't implemented for non-power-of-2 nodes yet"); | ||
} | ||
return Last; | ||
} | ||
|
||
|
@@ -4224,6 +4233,13 @@ BoUpSLP::LoadsState BoUpSLP::canVectorizeLoads( | |
auto *VecTy = FixedVectorType::get(ScalarTy, Sz); | ||
// Check the order of pointer operands or that all pointers are the same. | ||
bool IsSorted = sortPtrAccesses(PointerOps, ScalarTy, *DL, *SE, Order); | ||
// FIXME: Reordering isn't implemented for non-power-of-2 nodes yet. | ||
if (!Order.empty() && !isPowerOf2_32(VL.size())) { | ||
assert(VectorizeNonPowerOf2 && "non-power-of-2 number of loads only " | ||
"supported with VectorizeNonPowerOf2"); | ||
return LoadsState::Gather; | ||
} | ||
|
||
Align CommonAlignment = computeCommonAlignment<LoadInst>(VL); | ||
if (!IsSorted && Sz > MinProfitableStridedLoads && TTI->isTypeLegal(VecTy) && | ||
TTI->isLegalStridedLoadStore(VecTy, CommonAlignment) && | ||
|
@@ -4543,6 +4559,10 @@ static bool areTwoInsertFromSameBuildVector( | |
|
||
std::optional<BoUpSLP::OrdersType> | ||
BoUpSLP::getReorderingData(const TreeEntry &TE, bool TopToBottom) { | ||
// FIXME: Vectorizing is not supported yet for non-power-of-2 ops. | ||
if (TE.isNonPowOf2Vec()) | ||
return std::nullopt; | ||
|
||
// No need to reorder if need to shuffle reuses, still need to shuffle the | ||
// node. | ||
if (!TE.ReuseShuffleIndices.empty()) { | ||
|
@@ -5117,6 +5137,10 @@ bool BoUpSLP::canReorderOperands( | |
TreeEntry *UserTE, SmallVectorImpl<std::pair<unsigned, TreeEntry *>> &Edges, | ||
ArrayRef<TreeEntry *> ReorderableGathers, | ||
SmallVectorImpl<TreeEntry *> &GatherOps) { | ||
// FIXME: Reordering isn't implemented for non-power-of-2 nodes yet. | ||
if (UserTE->isNonPowOf2Vec()) | ||
return false; | ||
|
||
for (unsigned I = 0, E = UserTE->getNumOperands(); I < E; ++I) { | ||
if (any_of(Edges, [I](const std::pair<unsigned, TreeEntry *> &OpData) { | ||
return OpData.first == I && | ||
|
@@ -5290,6 +5314,9 @@ void BoUpSLP::reorderBottomToTop(bool IgnoreReorder) { | |
} | ||
auto Res = OrdersUses.insert(std::make_pair(OrdersType(), 0)); | ||
const auto AllowsReordering = [&](const TreeEntry *TE) { | ||
// FIXME: Reordering isn't implemented for non-power-of-2 nodes yet. | ||
if (TE->isNonPowOf2Vec()) | ||
return false; | ||
if (!TE->ReorderIndices.empty() || !TE->ReuseShuffleIndices.empty() || | ||
(TE->State == TreeEntry::Vectorize && TE->isAltShuffle()) || | ||
(IgnoreReorder && TE->Idx == 0)) | ||
|
@@ -5805,6 +5832,9 @@ BoUpSLP::TreeEntry::EntryState BoUpSLP::getScalarsVectorizationState( | |
case Instruction::ExtractValue: | ||
case Instruction::ExtractElement: { | ||
bool Reuse = canReuseExtract(VL, VL0, CurrentOrder); | ||
// FIXME: Vectorizing is not supported yet for non-power-of-2 ops. | ||
if (!isPowerOf2_32(VL.size())) | ||
return TreeEntry::NeedToGather; | ||
if (Reuse || !CurrentOrder.empty()) | ||
return TreeEntry::Vectorize; | ||
LLVM_DEBUG(dbgs() << "SLP: Gather extract sequence.\n"); | ||
|
@@ -6111,6 +6141,13 @@ void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth, | |
if (NumUniqueScalarValues == VL.size()) { | ||
ReuseShuffleIndicies.clear(); | ||
} else { | ||
// FIXME: Reshuffing scalars is not supported yet for non-power-of-2 ops. | ||
if (UserTreeIdx.UserTE && UserTreeIdx.UserTE->isNonPowOf2Vec()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add FIXME here for non-power-of-2 support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! |
||
LLVM_DEBUG(dbgs() << "SLP: Reshuffling scalars not yet supported " | ||
"for nodes with padding.\n"); | ||
newTreeEntry(VL, std::nullopt /*not vectorized*/, S, UserTreeIdx); | ||
return false; | ||
} | ||
LLVM_DEBUG(dbgs() << "SLP: Shuffle for reused scalars.\n"); | ||
if (NumUniqueScalarValues <= 1 || | ||
(UniquePositions.size() == 1 && all_of(UniqueValues, | ||
|
@@ -7721,7 +7758,8 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis { | |
for (unsigned I = 0, End = VL.size(); I < End; I += VF) { | ||
if (VectorizedLoads.contains(VL[I])) | ||
continue; | ||
GatherCost += getBuildVectorCost(VL.slice(I, VF), Root); | ||
GatherCost += | ||
getBuildVectorCost(VL.slice(I, std::min(End - I, VF)), Root); | ||
} | ||
// Exclude potentially vectorized loads from list of gathered | ||
// scalars. | ||
|
@@ -10528,6 +10566,9 @@ BoUpSLP::isGatherShuffledEntry( | |
// No need to check for the topmost gather node. | ||
if (TE == VectorizableTree.front().get()) | ||
return {}; | ||
// FIXME: Gathering for non-power-of-2 nodes not implemented yet. | ||
if (TE->isNonPowOf2Vec()) | ||
return {}; | ||
Mask.assign(VL.size(), PoisonMaskElem); | ||
assert(TE->UserTreeIndices.size() == 1 && | ||
"Expected only single user of the gather node."); | ||
|
@@ -14783,8 +14824,13 @@ bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R, | |
const unsigned Sz = R.getVectorElementSize(Chain[0]); | ||
unsigned VF = Chain.size(); | ||
|
||
if (!isPowerOf2_32(Sz) || !isPowerOf2_32(VF) || VF < 2 || VF < MinVF) | ||
return false; | ||
if (!isPowerOf2_32(Sz) || !isPowerOf2_32(VF) || VF < 2 || VF < MinVF) { | ||
// Check if vectorizing with a non-power-of-2 VF should be considered. At | ||
// the moment, only consider cases where VF + 1 is a power-of-2, i.e. almost | ||
// all vector lanes are used. | ||
if (!VectorizeNonPowerOf2 || (VF < MinVF && VF + 1 != MinVF)) | ||
return false; | ||
} | ||
|
||
LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << Idx | ||
<< "\n"); | ||
|
@@ -14883,14 +14929,22 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores, | |
continue; | ||
} | ||
|
||
unsigned NonPowerOf2VF = 0; | ||
if (VectorizeNonPowerOf2) { | ||
// First try vectorizing with a non-power-of-2 VF. At the moment, only | ||
// consider cases where VF + 1 is a power-of-2, i.e. almost all vector | ||
// lanes are used. | ||
unsigned CandVF = Operands.size(); | ||
if (isPowerOf2_32(CandVF + 1) && CandVF <= MaxVF) | ||
NonPowerOf2VF = CandVF; | ||
} | ||
|
||
unsigned Sz = 1 + Log2_32(MaxVF) - Log2_32(MinVF); | ||
SmallVector<unsigned> CandidateVFs(Sz); | ||
// FIXME: Is division-by-2 the correct step? Should we assert that the | ||
// register size is a power-of-2? | ||
unsigned Size = MaxVF; | ||
for_each(CandidateVFs, [&](unsigned &VF) { | ||
VF = Size; | ||
Size /= 2; | ||
SmallVector<unsigned> CandidateVFs(Sz + (NonPowerOf2VF > 0 ? 1 : 0)); | ||
unsigned Size = MinVF; | ||
for_each(reverse(CandidateVFs), [&](unsigned &VF) { | ||
VF = Size > MaxVF ? NonPowerOf2VF : Size; | ||
Size *= 2; | ||
}); | ||
unsigned StartIdx = 0; | ||
for (unsigned Size : CandidateVFs) { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if ReuseShuffleIndices is not empty? Will it work?Can you add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code paths should guard against that AFAICT. I added an assertion to make sure. Couldn't find any test case that triggers this across large code bases (SPEC2006, SPEC2017, llvm-test-suite, clang bootstrap and large internal benchmarks)