[llvm][loop-rotate] Allow forcing loop-rotation #82828

ilovepi · 2024-02-23T20:43:47Z

Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.

Created using spr 1.3.4

llvmbot · 2024-02-23T20:44:21Z

@llvm/pr-subscribers-llvm-transforms

Author: Paul Kirth (ilovepi)

Changes

Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patters found by loop idiom recognition.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.

Bug: 50308

Full diff: https://github.com/llvm/llvm-project/pull/82828.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopRotation.cpp (+13-1)
(modified) llvm/test/Transforms/LoopRotate/oz-disable.ll (+3)

diff --git a/llvm/lib/Transforms/Scalar/LoopRotation.cpp b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
index 7036759a4eed57..c6943695ec1549 100644
--- a/llvm/lib/Transforms/Scalar/LoopRotation.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
@@ -39,8 +39,20 @@ static cl::opt<bool> PrepareForLTOOption(
     cl::desc("Run loop-rotation in the prepare-for-lto stage. This option "
              "should be used for testing only."));
 
+// Experimentally allow loop header duplication. This should allow for better
+// optimization at Oz, since loop-idiom recognition can then recognize things
+// like memcpy. If this ends up being profitable, we should drop this flag and
+// making a code gen option that can be controled independent of the opt level
+// and exposed through clang. See
+// https://github.com/llvm/llvm-project/issues/50308 for details.
+static cl::opt<bool>
+    ForceHeaderDuplication("force-loop-header-duplication", cl::init(false),
+                           cl::Hidden,
+                           cl::desc("Always enable loop header duplication"));
+
 LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO)
-    : EnableHeaderDuplication(EnableHeaderDuplication),
+    : EnableHeaderDuplication(EnableHeaderDuplication ||
+                              ForceHeaderDuplication),
       PrepareForLTO(PrepareForLTO) {}
 
 void LoopRotatePass::printPipeline(
diff --git a/llvm/test/Transforms/LoopRotate/oz-disable.ll b/llvm/test/Transforms/LoopRotate/oz-disable.ll
index 6a7847ac0ff215..b933200ed80586 100644
--- a/llvm/test/Transforms/LoopRotate/oz-disable.ll
+++ b/llvm/test/Transforms/LoopRotate/oz-disable.ll
@@ -4,6 +4,9 @@
 ; RUN: opt < %s -S -passes='default<Os>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
 ; RUN: opt < %s -S -passes='default<Oz>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
 
+;; Make sure -force-loop-header-duplication overrides the default behavior at Oz
+; RUN: opt < %s -S -passes='default<Oz>' -force-loop-header-duplication -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
+
 ; Loop should be rotated for -Os but not for -Oz.
 ; OS: rotating Loop at depth 1
 ; OZ-NOT: rotating Loop at depth 1

llvm/lib/Transforms/Scalar/LoopRotation.cpp

Created using spr 1.3.4

ilovepi · 2024-02-23T21:56:10Z

I've opted to move the flag into the pipeline builder, where several other loop related flags are used.

Created using spr 1.3.4

petrhosek

LGTM but you might also want to wait for @fhahn and @hiraditya to take a look.

hiraditya · 2024-02-23T23:08:24Z

Thanks for enabling this. LGTM!

fhahn

Could you add a PhaseOrdering test that shows the impact of the change? Possibly just with the reproducer from #50308

fhahn · 2024-02-26T12:03:31Z

llvm/lib/Passes/PassBuilderPipelines.cpp

-  LPM1.addPass(
-      LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
+  LPM1.addPass(LoopRotatePass(EnableLoopHeaderDuplication ||
+                                  (Level != OptimizationLevel::Oz),


nit: redundant ()

fhahn · 2024-02-26T12:07:47Z

llvm/lib/Passes/PassBuilderPipelines.cpp

                                       cl::Hidden,
                                       cl::desc("Enable the LoopFlatten Pass"));

+// Experimentally allow loop header duplication. This should allow for better


FWIW I don't think allowing loop-rotation for all loops to simplify some to memsets is the best way forward here, as I don't see any path towards having this enabled by default.

If this ends up being profitable, we should drop this flag and
// making a code gen option that can be controlled independent of the opt level
// and exposed through clang.

It is not clear to me what profitable means here?

I think the new wording should be less ambiguous. Happy to change it if you have a preference, though.

Created using spr 1.3.4

ilovepi · 2024-02-26T17:59:55Z

Could you add a PhaseOrdering test that shows the impact of the change? Possibly just with the reproducer from #50308

@fhahn I've added that test case. I used udate_test_checks.py, which seems to be the norm in this directory, but TBH it seems like some of the cases are a bit brittle, as its checking for many attributes that aren't relevant. It could probably just be a check for memset or not. Is there a preference in these tests for a hand maintained test vs auto-generated ones?

Many profitable optimizations cannot be performed at -Oz, due to unrotated loops. While this is worse for size (minimally), many of the optimizations significantly reduce code size, such as memcpy optimizations and other patterns found by loop idiom recognition. Related discussion can be found in issue llvm#50308. This patch adds an experimental, backend-only flag to allow loop header duplication, regardless of the optimization level. Downstream consumers can experiment with this flag, and if it is useful for many targets, we can adjust the compiler's defaults accordingly, and expose any useful frontend flags to opt into the new behavior. Pull Request: llvm#82828

ilovepi · 2024-02-27T19:08:10Z

@fhahn If there's no objections, I plan to land this in the afternoon. Are there any more changes you'd like to see?

fhahn · 2024-03-01T11:40:27Z

@fhahn If there's no objections, I plan to land this in the afternoon. Are there any more changes you'd like to see?

FWIW I still think this would be better fixed by improve LoopIdiomRecognize to support non-rotated loops. This would benefit all users by default.

With the flag, it's not clear to me if we expect any feedback from downstream users and what we would do with the feedback.

hiraditya · 2024-03-02T04:45:17Z

loop rotation enables other optimizations like DCE and LICM and in many cases it reduces the code size. We could potentially have a cost model where we can disable loop rotation if the cloned-blocks are greater than certain size.

However, modifying loop-idiom to detect unrotated loops will result in a lot of duplicated work and only benefit the loops that are recognized in llvm (which aren't that many). Also unrotated loops come in many forms (for, while, gotos, inlined recursive calls, etc.)

[𝘀𝗽𝗿] initial version

b34b5e6

Created using spr 1.3.4

llvmbot added the llvm:transforms label Feb 23, 2024

ilovepi requested review from fhahn, hiraditya and petrhosek February 23, 2024 20:48

petrhosek reviewed Feb 23, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/LoopRotation.cpp Outdated Show resolved Hide resolved

Move flag to PassPipelineBuilder and rename Flag

b65bd96

Created using spr 1.3.4

Rename flag

23a2789

Created using spr 1.3.4

petrhosek approved these changes Feb 23, 2024

View reviewed changes

hiraditya approved these changes Feb 23, 2024

View reviewed changes

fhahn reviewed Feb 26, 2024

View reviewed changes

Add phase ordering test and address nits

312592a

Created using spr 1.3.4

ilovepi merged commit 2fef685 into main Feb 29, 2024

ilovepi deleted the users/ilovepi/spr/llvmloop-rotate-allow-forcing-loop-rotation branch February 29, 2024 21:46

[llvm][loop-rotate] Allow forcing loop-rotation #82828

[llvm][loop-rotate] Allow forcing loop-rotation #82828

Uh oh!

Conversation

ilovepi commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 23, 2024

Uh oh!

Uh oh!

ilovepi commented Feb 23, 2024

Uh oh!

petrhosek left a comment

Choose a reason for hiding this comment

Uh oh!

hiraditya commented Feb 23, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

ilovepi Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

fhahn Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

ilovepi Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

ilovepi commented Feb 26, 2024

Uh oh!

ilovepi commented Feb 27, 2024

Uh oh!

fhahn commented Mar 1, 2024

Uh oh!

hiraditya commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ilovepi commented Feb 23, 2024 •

edited

Loading