-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LoopVectorize] Allow Early-Exit Loop Vectorization with EVL #130918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like you are in general enabling tail-folding for early exit loops, which I haven't tested at all for AArch64/SVE targets. It may work, but it would be good to add variants of test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll with tail-folding enabled so that we can verify the correct behaviour. Also, have you run the LLVM test suite with tail-folding and early-exit vectorisation enabled to verify all the tests build and pass? When developing the early exit work I found it was a great test suite for exposing issues. To get better coverage you can modify this code in LoopVectorizationLegality::isVectorizableEarlyExitLoop:
to always vectorise:
It's technically unsafe, but the LLVM test suite is well behaved and tests shouldn't crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the tip! I'll apply the modification and run the LLVM test suite locally to check for potential issues and follow up once I have results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added a test at llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit_predication.ll.
From what I observed, the main difference is that vectorization is not considered beneficial when predication is enabled.
I also ran the LLVM test suite with march=rv64gcv_zvl256b, patched to always vectorize regardless of isDereferenceableReadOnlyLoop., All 2041 tests passed under the following flag:
-O3 -mllvm -prefer-predicate-over-epilogue=predicate-dont-vectorize
-O3 -mllvm -prefer-predicate-over-epilogue=predicate-dont-vectorize -mllvm -force-tail-folding-style=data-with-evl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So tail predication only works with some early exit loops it seems. Some of the tests in simple_early_exit_predication.ll fail to vectorise because of this:
That explains why it's not considered beneficial for many loops. This error comes from
LoopVectorizationLegality::canFoldTailByMasking
. In all likelihood a simple search loop such as std::find will have an outside use of an induction variable so I'm not sure how much value there is right now in enabling early exit vectorisation with tail-folding? I'm not against enabling it, but I wonder what loops you're specifically interested in here?Also some of the tests have an exact trip count of 64, where we know there will not be a tail so we avoid using predication. It would be good to change the tests
same_exit_block_pre_inc_use1
,same_exit_block_pre_inc_use4
,loop_contains_safe_call
,loop_contains_safe_div
andloop_contains_load_after_early_exit
to have a trip count of 63 instead of 64. Also, would be good to have at least one test that doesn't have any outside uses so we can verify it's working correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I just tried out your patch and when we use tail-folding in combination with
get.active.lane.mask
for early exit loops the IR is broken. I modifiedsame_exit_block_pre_inc_use1
to have a trip count of 63 and removed outside uses of the induction variable so that I ended up with vectorised IR like this:The branch condition should be an
or
of%13
and%15
. Unfortunately, I think I have to request changes this PR for now. Sorry about that!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed feedback—it's super helpful!
I am inspired by your patch (PR #120603) and would like to extend it by leveraging the vp.load.ff intrinsics (PR #128593) to safely handle out-of-bounds accesses. My goal is to introduce a new WidenFFLoad recipe, to enable vectorization of loops like std::find using EVL-based tail-folding.
You're right that canFoldTailByMasking() currently blocks this. Since EVL-based tail-folding doesn't mask the loop body, I'll probably need to relax that check specifically for EVL tail-folding. Also, the current VPLane::getAsRuntimeExpr() implementation, which uses ElementCount to calculate the last lane index, gets in the way too.
Good catch on the tests—I now realize they're currently using multiples of VF for the trip count, meaning the tail-folding paths aren't really being tested properly. I'll fix that up.