Skip to content

Conversation

@ilovepi
Copy link
Contributor

@ilovepi ilovepi commented Feb 23, 2024

Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.

Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Feb 23, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Paul Kirth (ilovepi)

Changes

Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patters found by loop idiom recognition.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.

Bug: 50308


Full diff: https://github.com/llvm/llvm-project/pull/82828.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopRotation.cpp (+13-1)
  • (modified) llvm/test/Transforms/LoopRotate/oz-disable.ll (+3)
diff --git a/llvm/lib/Transforms/Scalar/LoopRotation.cpp b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
index 7036759a4eed57..c6943695ec1549 100644
--- a/llvm/lib/Transforms/Scalar/LoopRotation.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopRotation.cpp
@@ -39,8 +39,20 @@ static cl::opt<bool> PrepareForLTOOption(
     cl::desc("Run loop-rotation in the prepare-for-lto stage. This option "
              "should be used for testing only."));
 
+// Experimentally allow loop header duplication. This should allow for better
+// optimization at Oz, since loop-idiom recognition can then recognize things
+// like memcpy. If this ends up being profitable, we should drop this flag and
+// making a code gen option that can be controled independent of the opt level
+// and exposed through clang. See
+// https://github.com/llvm/llvm-project/issues/50308 for details.
+static cl::opt<bool>
+    ForceHeaderDuplication("force-loop-header-duplication", cl::init(false),
+                           cl::Hidden,
+                           cl::desc("Always enable loop header duplication"));
+
 LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO)
-    : EnableHeaderDuplication(EnableHeaderDuplication),
+    : EnableHeaderDuplication(EnableHeaderDuplication ||
+                              ForceHeaderDuplication),
       PrepareForLTO(PrepareForLTO) {}
 
 void LoopRotatePass::printPipeline(
diff --git a/llvm/test/Transforms/LoopRotate/oz-disable.ll b/llvm/test/Transforms/LoopRotate/oz-disable.ll
index 6a7847ac0ff215..b933200ed80586 100644
--- a/llvm/test/Transforms/LoopRotate/oz-disable.ll
+++ b/llvm/test/Transforms/LoopRotate/oz-disable.ll
@@ -4,6 +4,9 @@
 ; RUN: opt < %s -S -passes='default<Os>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
 ; RUN: opt < %s -S -passes='default<Oz>' -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OZ
 
+;; Make sure -force-loop-header-duplication overrides the default behavior at Oz
+; RUN: opt < %s -S -passes='default<Oz>' -force-loop-header-duplication -debug -debug-only=loop-rotate 2>&1 | FileCheck %s -check-prefix=OS
+
 ; Loop should be rotated for -Os but not for -Oz.
 ; OS: rotating Loop at depth 1
 ; OZ-NOT: rotating Loop at depth 1

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 23, 2024

I've opted to move the flag into the pipeline builder, where several other loop related flags are used.

Created using spr 1.3.4
Copy link
Member

@petrhosek petrhosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but you might also want to wait for @fhahn and @hiraditya to take a look.

@hiraditya
Copy link
Collaborator

Thanks for enabling this. LGTM!

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a PhaseOrdering test that shows the impact of the change? Possibly just with the reproducer from #50308

LPM1.addPass(
LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
LPM1.addPass(LoopRotatePass(EnableLoopHeaderDuplication ||
(Level != OptimizationLevel::Oz),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: redundant ()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

cl::Hidden,
cl::desc("Enable the LoopFlatten Pass"));

// Experimentally allow loop header duplication. This should allow for better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think allowing loop-rotation for all loops to simplify some to memsets is the best way forward here, as I don't see any path towards having this enabled by default.

If this ends up being profitable, we should drop this flag and
// making a code gen option that can be controlled independent of the opt level
// and exposed through clang.

It is not clear to me what profitable means here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new wording should be less ambiguous. Happy to change it if you have a preference, though.

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 26, 2024

Could you add a PhaseOrdering test that shows the impact of the change? Possibly just with the reproducer from #50308

@fhahn I've added that test case. I used udate_test_checks.py, which seems to be the norm in this directory, but TBH it seems like some of the cases are a bit brittle, as its checking for many attributes that aren't relevant. It could probably just be a check for memset or not. Is there a preference in these tests for a hand maintained test vs auto-generated ones?

ilovepi added a commit to ilovepi/llvm-project that referenced this pull request Feb 26, 2024
Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue llvm#50308.

This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is useful for many targets, we
can adjust the compiler's defaults accordingly, and expose any useful
frontend flags to opt into the new behavior.

Pull Request: llvm#82828
@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 27, 2024

@fhahn If there's no objections, I plan to land this in the afternoon. Are there any more changes you'd like to see?

@ilovepi ilovepi merged commit 2fef685 into main Feb 29, 2024
@ilovepi ilovepi deleted the users/ilovepi/spr/llvmloop-rotate-allow-forcing-loop-rotation branch February 29, 2024 21:46
@fhahn
Copy link
Contributor

fhahn commented Mar 1, 2024

@fhahn If there's no objections, I plan to land this in the afternoon. Are there any more changes you'd like to see?

FWIW I still think this would be better fixed by improve LoopIdiomRecognize to support non-rotated loops. This would benefit all users by default.

With the flag, it's not clear to me if we expect any feedback from downstream users and what we would do with the feedback.

@hiraditya
Copy link
Collaborator

loop rotation enables other optimizations like DCE and LICM and in many cases it reduces the code size. We could potentially have a cost model where we can disable loop rotation if the cloned-blocks are greater than certain size.

However, modifying loop-idiom to detect unrotated loops will result in a lot of duplicated work and only benefit the loops that are recognized in llvm (which aren't that many). Also unrotated loops come in many forms (for, while, gotos, inlined recursive calls, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants