Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify alignment block marking #62940

Closed

Conversation

BruceForstall
Copy link
Member

Currently, loops that are candidates for alignment are determined
immediately after loop recognition. The top block of these loops
are marked by setting the BBF_LOOP_ALIGN flag. Later, just before
codegen, the placeLoopAlignInstructions phase is called to mark
the blocks where the actual align instructions will be placed.
Between these two phases, the BBF_LOOP_ALIGN flag needs to be maintained
through various optimization phases.

We can simplify this by deferring marking the loops to align until
we call the placeLoopAlignInstructions phase. Then, we don't need
to worry about maintaining the BBF_LOOP_ALIGN flag. The loop table
is still valid, so can be used. (Note that if we ever want to "kill"
the loop table earlier in compilation, we could simply move the
BBF_LOOP_ALIGN marking to that point, and still avoid all the
flag maintenance.) This PR makes that change.

To avoid too many diffs, DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT is
reduced from 4 to 3. This is because many loops (without profile data)
end up with a weight of 4, and if those loops are cloned, the weight
of the hot path gets reduced to 3.96 (so the cold path gets at least
some non-zero weight). We still want these hot path cloned loops to
be aligned. However, decreasing this does cause some more ASP.NET
loops with profile data and weights between 3 and 4 to be aligned.

Additionally, I added a trivial phase optClearLoopIterInfo to clear
out all the loop table info related to LPFLG_ITER. This data is used
by loop cloning and unrolling, but not afterwards. I still want to be
able to dump the loop table without hitting asserts about improper
LPFLG_ITER form data, and don't want downstream phases to take a dependency
on this data, so clearing it out enables that.

There are a few diffs where we align more loops even without profile
data because the weight of loop top blocks increases due to various
flow optimizations (like block compaction), and now we see the larger
weight before making the alignment decision.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 17, 2021
@ghost ghost assigned BruceForstall Dec 17, 2021
@ghost
Copy link

ghost commented Dec 17, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently, loops that are candidates for alignment are determined
immediately after loop recognition. The top block of these loops
are marked by setting the BBF_LOOP_ALIGN flag. Later, just before
codegen, the placeLoopAlignInstructions phase is called to mark
the blocks where the actual align instructions will be placed.
Between these two phases, the BBF_LOOP_ALIGN flag needs to be maintained
through various optimization phases.

We can simplify this by deferring marking the loops to align until
we call the placeLoopAlignInstructions phase. Then, we don't need
to worry about maintaining the BBF_LOOP_ALIGN flag. The loop table
is still valid, so can be used. (Note that if we ever want to "kill"
the loop table earlier in compilation, we could simply move the
BBF_LOOP_ALIGN marking to that point, and still avoid all the
flag maintenance.) This PR makes that change.

To avoid too many diffs, DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT is
reduced from 4 to 3. This is because many loops (without profile data)
end up with a weight of 4, and if those loops are cloned, the weight
of the hot path gets reduced to 3.96 (so the cold path gets at least
some non-zero weight). We still want these hot path cloned loops to
be aligned. However, decreasing this does cause some more ASP.NET
loops with profile data and weights between 3 and 4 to be aligned.

Additionally, I added a trivial phase optClearLoopIterInfo to clear
out all the loop table info related to LPFLG_ITER. This data is used
by loop cloning and unrolling, but not afterwards. I still want to be
able to dump the loop table without hitting asserts about improper
LPFLG_ITER form data, and don't want downstream phases to take a dependency
on this data, so clearing it out enables that.

There are a few diffs where we align more loops even without profile
data because the weight of loop top blocks increases due to various
flow optimizations (like block compaction), and now we see the larger
weight before making the alignment decision.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

@kunalspathak @dotnet/jit-contrib PTAL

@kunalspathak
Copy link
Member

Thanks for the cleanup. Could you share the PerfScore diffs as well?

@BruceForstall
Copy link
Member Author

I still like this change, although it's clear the loop table is not valid this late in compilation. I've been adding more loop table checking, trying to figure out how late it is valid, or can be made valid. The new BBF_LOOP_ALIGN setting might need to move earlier based on that investigation.

I note that placeLoopAlignInstructions is already depending on bbNatLoopNum on blocks, and if the loop table is no longer valid, there's some question about how valid these are as well. It's probably the case they are "good enough" for their purpose, but it could also be the case that someone late in compilation inserts a new block in a loop but doesn't set the bbNatLoopNum, or set it correctly.

Currently, loops that are candidates for alignment are determined
immediately after loop recognition. The top block of these loops
are marked by setting the `BBF_LOOP_ALIGN` flag. Later, just before
codegen, the `placeLoopAlignInstructions` phase is called to mark
the blocks where the actual align instructions will be placed.
Between these two phases, the `BBF_LOOP_ALIGN` flag needs to be maintained
through various optimization phases.

We can simplify this by deferring marking the loops to align until
we call the `placeLoopAlignInstructions` phase. Then, we don't need
to worry about maintaining the `BBF_LOOP_ALIGN` flag. The loop table
is still valid, so can be used. (Note that if we ever want to "kill"
the loop table earlier in compilation, we could simply move the
`BBF_LOOP_ALIGN` marking to that point, and still avoid all the
flag maintenance.) This PR makes that change.

To avoid too many diffs, `DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT` is
reduced from 4 to 3. This is because many loops (without profile data)
end up with a weight of 4, and if those loops are cloned, the weight
of the hot path gets reduced to 3.96 (so the cold path gets at least
some non-zero weight). We still want these hot path cloned loops to
be aligned. However, decreasing this does cause some more ASP.NET
loops with profile data and weights between 3 and 4 to be aligned.

Additionally, I added a trivial phase `optClearLoopIterInfo` to clear
out all the loop table info related to `LPFLG_ITER`. This data is used
by loop cloning and unrolling, but not afterwards. I still want to be
able to dump the loop table without hitting asserts about improper
`LPFLG_ITER` form data, and don't want downstream phases to take a dependency
on this data, so clearing it out enables that.

There are a few diffs where we align more loops even without profile
data because the weight of loop top blocks increases due to various
flow optimizations (like block compaction), and now we see the larger
weight before making the alignment decision.
@BruceForstall
Copy link
Member Author

self ping

@BruceForstall
Copy link
Member Author

Closing this for now

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants