Commit 5c7fbce
committed
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize
The znver3/znver4 scheduler modules are outliers, specifying very
large LoopMicroOpBufferSizes at 512, while typical values for
other subtargets are on the order of ~50. Even if this information
is micro-architecturally correct (*), this does not mean that we
want to runtime unroll all loops to a size that completely fills
the loop buffer. Unless this is the single hot loop in the entire
application, the massive code size increase will bust the micro-op
and instruction caches.
Protect against this by clamping to the default PartialThreshold
of 150, which is the same as the default full-unroll threshold
and half the aggressive full-unroll threshold. Allowing more
partial unrolling than full unrolling is certainly non-sensical.
(*) I strongly doubt that this is actually correct -- I believe
this may derive from an incorrect reading of Agner Fog's
micro-architecture guide. The number 4096 that was originally
used here is the size of the general micro-op cache, not that of
a loop buffer. A separate loop buffer is not listed for the Zen
microarchitecture. Comparing this to the listing for Skylake, it
has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer,
with a note that it's rarely fully utilized. Our scheduling model
specifies LoopMicroOpBufferSize of 50 in that case.1 parent 37f6ba4 commit 5c7fbce
File tree
2 files changed
+30
-742
lines changed- llvm
- include/llvm/CodeGen
- test/Transforms/LoopUnroll/X86
2 files changed
+30
-742
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
610 | 610 | | |
611 | 611 | | |
612 | 612 | | |
613 | | - | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
614 | 620 | | |
615 | 621 | | |
616 | 622 | | |
| |||
0 commit comments