-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify alignment block marking #62940
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsCurrently, loops that are candidates for alignment are determined We can simplify this by deferring marking the loops to align until To avoid too many diffs, Additionally, I added a trivial phase There are a few diffs where we align more loops even without profile
|
@kunalspathak @dotnet/jit-contrib PTAL |
Thanks for the cleanup. Could you share the |
I still like this change, although it's clear the loop table is not valid this late in compilation. I've been adding more loop table checking, trying to figure out how late it is valid, or can be made valid. The new I note that |
Currently, loops that are candidates for alignment are determined immediately after loop recognition. The top block of these loops are marked by setting the `BBF_LOOP_ALIGN` flag. Later, just before codegen, the `placeLoopAlignInstructions` phase is called to mark the blocks where the actual align instructions will be placed. Between these two phases, the `BBF_LOOP_ALIGN` flag needs to be maintained through various optimization phases. We can simplify this by deferring marking the loops to align until we call the `placeLoopAlignInstructions` phase. Then, we don't need to worry about maintaining the `BBF_LOOP_ALIGN` flag. The loop table is still valid, so can be used. (Note that if we ever want to "kill" the loop table earlier in compilation, we could simply move the `BBF_LOOP_ALIGN` marking to that point, and still avoid all the flag maintenance.) This PR makes that change. To avoid too many diffs, `DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT` is reduced from 4 to 3. This is because many loops (without profile data) end up with a weight of 4, and if those loops are cloned, the weight of the hot path gets reduced to 3.96 (so the cold path gets at least some non-zero weight). We still want these hot path cloned loops to be aligned. However, decreasing this does cause some more ASP.NET loops with profile data and weights between 3 and 4 to be aligned. Additionally, I added a trivial phase `optClearLoopIterInfo` to clear out all the loop table info related to `LPFLG_ITER`. This data is used by loop cloning and unrolling, but not afterwards. I still want to be able to dump the loop table without hitting asserts about improper `LPFLG_ITER` form data, and don't want downstream phases to take a dependency on this data, so clearing it out enables that. There are a few diffs where we align more loops even without profile data because the weight of loop top blocks increases due to various flow optimizations (like block compaction), and now we see the larger weight before making the alignment decision.
aafbb6d
to
31303a9
Compare
self ping |
Closing this for now |
Currently, loops that are candidates for alignment are determined
immediately after loop recognition. The top block of these loops
are marked by setting the
BBF_LOOP_ALIGN
flag. Later, just beforecodegen, the
placeLoopAlignInstructions
phase is called to markthe blocks where the actual align instructions will be placed.
Between these two phases, the
BBF_LOOP_ALIGN
flag needs to be maintainedthrough various optimization phases.
We can simplify this by deferring marking the loops to align until
we call the
placeLoopAlignInstructions
phase. Then, we don't needto worry about maintaining the
BBF_LOOP_ALIGN
flag. The loop tableis still valid, so can be used. (Note that if we ever want to "kill"
the loop table earlier in compilation, we could simply move the
BBF_LOOP_ALIGN
marking to that point, and still avoid all theflag maintenance.) This PR makes that change.
To avoid too many diffs,
DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT
isreduced from 4 to 3. This is because many loops (without profile data)
end up with a weight of 4, and if those loops are cloned, the weight
of the hot path gets reduced to 3.96 (so the cold path gets at least
some non-zero weight). We still want these hot path cloned loops to
be aligned. However, decreasing this does cause some more ASP.NET
loops with profile data and weights between 3 and 4 to be aligned.
Additionally, I added a trivial phase
optClearLoopIterInfo
to clearout all the loop table info related to
LPFLG_ITER
. This data is usedby loop cloning and unrolling, but not afterwards. I still want to be
able to dump the loop table without hitting asserts about improper
LPFLG_ITER
form data, and don't want downstream phases to take a dependencyon this data, so clearing it out enables that.
There are a few diffs where we align more loops even without profile
data because the weight of loop top blocks increases due to various
flow optimizations (like block compaction), and now we see the larger
weight before making the alignment decision.