-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Don't do aggressive block compaction too early #105041
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Here's another interesting consequence: Aggressive compaction would sometimes move a loop inside a try region up, such that the loop header is now the first block in the region. Loop cloning gives up if it finds the start of a try region in the loop, so with this change, I'm now seeing more loop cloning. |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs have some pretty big code size increases. The |
I worry about this much churn this late in the cycle. Seems like it will give the usual mix of improvements and some regressions. I wonder if we could just make loop inversion (and perhaps loop cloning) smarter? Inversion is very pattern-matchy. Though perhaps that causes the same or worse level of churn. |
I share your concerns about churn; for similar reasons, I'm nervous about tweaking loop inversion. The diffs for this change are large, but the problem it's trying to address is somewhat narrow: We compact blocks early on such that we get this weird loop shape that loop inversion can't fix, and the loop is now the beginning of a try region, so I also think the diffs on this PR are misleadingly large. I ran SPMI on this change versus not doing aggressive compaction at all (i.e. without #103785), and this change seems to be reversing a lot of the churn I initially created: The diffs are much smaller. Here's the short summary: Diffs are based on 2,696,572 contexts (1,091,355 MinOpts, 1,605,217 FullOpts). MISSED contexts: base: 29,302 (1.07%), diff: 29,214 (1.07%) Overall (-2,847 bytes)
FullOpts (-2,847 bytes)
|
@AndyAyersMS are you ok with including this in .NET 9, or should we hold off? |
I am thinking we should hold off. But let me look at which benchmarks had regressions a bit more closely first. |
Ok, some of them are pretty large, and the number of improvements from #103785 was fairly small. Can you repro some of the worst-case regressions locally, and see that they're now fixed? I built a collated table here: #103972 (comment) |
Maybe also look at #50204? It might be an interesting alternative, if that's all that it takes to "fix" loop inversion. |
Thanks for taking a look at the regressions! I'm not able to reproduce any improvement for the largest amd64 regressions locally, unfortunately. Looking at some of the JIT dumps, I see diffs from more conservative block compaction, but this isn't manifesting significant codegen diffs, which is odd...
Thanks for pointing that out. I've opened #105161 -- the asmdiffs looked promising locally. Maybe we can pursue that change for P7, and come back to our block compaction strategy (among other flow opts) in .NET 10. |
Let's keep the new compaction on for now, and look into adjusting affected phases to handle the shapes they're struggling with. |
Several of the regressions in #103972 stem from early aggressive block compaction pessimizing loop inversion. For example, consider the following block layout early in compilation for
System.Linq.Tests.Perf_Enumerable.Aggregate_Seed
. If we don't aggressively compact infgUpdateFlowGraphPhase
, we get the following layout right before loop inversion:In the try region, there's an opportunity to aggressively compact, and get the following loop shape:
Loop inversion won't kick in for this, and since
fgMoveHotJumps
won't modify the first block of an EH region, we don't get the opportunity to flip this shape, so we end up with subpar loop layout.If we don't aggressively compact, loop inversion kicks in, and we get the following shape:
After flow opts and block layout run, our final shape looks like this:
I don't think we are losing anything by reserving aggressive compaction for later flow opt phases, and the upside is more opportunities for loop inversion, which may unlock other loop opts.