-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[llvm][loop-rotate] Allow forcing loop-rotation #82828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][loop-rotate] Allow forcing loop-rotation #82828
Conversation
Created using spr 1.3.4
|
@llvm/pr-subscribers-llvm-transforms Author: Paul Kirth (ilovepi) ChangesMany profitable optimizations cannot be performed at -Oz, due to This patch adds an experimental, backend-only flag to allow loop header Bug: 50308 Full diff: https://github.com/llvm/llvm-project/pull/82828.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopRotation.cpp b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
index 7036759a4eed57..c6943695ec1549 100644
--- a/llvm/lib/Transforms/Scalar/LoopRotation.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
@@ -39,8 +39,20 @@ static cl::opt<bool> PrepareForLTOOption(
cl::desc("Run loop-rotation in the prepare-for-lto stage. This option "
"should be used for testing only."));
+// Experimentally allow loop header duplication. This should allow for better
+// optimization at Oz, since loop-idiom recognition can then recognize things
+// like memcpy. If this ends up being profitable, we should drop this flag and
+// making a code gen option that can be controled independent of the opt level
+// and exposed through clang. See
+// https://github.com/llvm/llvm-project/issues/50308 for details.
+static cl::opt<bool>
+ ForceHeaderDuplication("force-loop-header-duplication", cl::init(false),
+ cl::Hidden,
+ cl::desc("Always enable loop header duplication"));
+
LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO)
- : EnableHeaderDuplication(EnableHeaderDuplication),
+ : EnableHeaderDuplication(EnableHeaderDuplication ||
+ ForceHeaderDuplication),
PrepareForLTO(PrepareForLTO) {}
void LoopRotatePass::printPipeline(
diff --git a/llvm/test/Transforms/LoopRotate/oz-disable.ll b/llvm/test/Transforms/LoopRotate/oz-disable.ll
index 6a7847ac0ff215..b933200ed80586 100644
--- a/llvm/test/Transforms/LoopRotate/oz-disable.ll
+++ b/llvm/test/Transforms/LoopRotate/oz-disable.ll
@@ -4,6 +4,9 @@
; RUN: opt < %s -S -passes='default<Os>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
; RUN: opt < %s -S -passes='default<Oz>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
+;; Make sure -force-loop-header-duplication overrides the default behavior at Oz
+; RUN: opt < %s -S -passes='default<Oz>' -force-loop-header-duplication -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
+
; Loop should be rotated for -Os but not for -Oz.
; OS: rotating Loop at depth 1
; OZ-NOT: rotating Loop at depth 1
|
Created using spr 1.3.4
|
I've opted to move the flag into the pipeline builder, where several other loop related flags are used. |
Created using spr 1.3.4
petrhosek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but you might also want to wait for @fhahn and @hiraditya to take a look.
|
Thanks for enabling this. LGTM! |
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a PhaseOrdering test that shows the impact of the change? Possibly just with the reproducer from #50308
| LPM1.addPass( | ||
| LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase))); | ||
| LPM1.addPass(LoopRotatePass(EnableLoopHeaderDuplication || | ||
| (Level != OptimizationLevel::Oz), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundant ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| cl::Hidden, | ||
| cl::desc("Enable the LoopFlatten Pass")); | ||
|
|
||
| // Experimentally allow loop header duplication. This should allow for better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't think allowing loop-rotation for all loops to simplify some to memsets is the best way forward here, as I don't see any path towards having this enabled by default.
If this ends up being profitable, we should drop this flag and
// making a code gen option that can be controlled independent of the opt level
// and exposed through clang.
It is not clear to me what profitable means here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new wording should be less ambiguous. Happy to change it if you have a preference, though.
Created using spr 1.3.4
@fhahn I've added that test case. I used |
Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue llvm#50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is useful for many targets, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior. Pull Request: llvm#82828
|
@fhahn If there's no objections, I plan to land this in the afternoon. Are there any more changes you'd like to see? |
FWIW I still think this would be better fixed by improve LoopIdiomRecognize to support non-rotated loops. This would benefit all users by default. With the flag, it's not clear to me if we expect any feedback from downstream users and what we would do with the feedback. |
|
loop rotation enables other optimizations like DCE and LICM and in many cases it reduces the code size. We could potentially have a cost model where we can disable loop rotation if the cloned-blocks are greater than certain size. However, modifying loop-idiom to detect unrotated loops will result in a lot of duplicated work and only benefit the loops that are recognized in llvm (which aren't that many). Also unrotated loops come in many forms (for, while, gotos, inlined recursive calls, etc.) |
Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.
This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.