Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #93020. When cloning a set of basic blocks (such as during loop cloning, finally cloning, etc.), the successors of the copied blocks may be different from the successors of their original counterparts, due to the successors being copied as well. We currently handle this as so:

  • Create the block copies without initializing their successors.
  • Copy the target(s) of the original block over with BasicBlock::CopyTarget; don't create any edges yet, as the new block's target(s) may change.
  • In Compiler::optRedirectBlock, determine if the new block's target needs to be redirected to the copy of the original target, using a mapping of original blocks to their copies. We may or may not also initialize/update the successor edges for the new block.

This PR simplifies this process in preparation for updating BasicBlock to track its successors via FlowEdge pointers, instead of BasicBlock pointers. This is the new process:

  • Create the block copies without initializing their successors.
  • In Compiler::optRedirectBlock, for each successor of the original block, determine if the new block should use the successor, or if the new block should be redirected to a copy of that successor, using the block map. Either way, all of the new block's successor edges must be initialized here.

@AndyAyersMS I finished refactoring BasicBlock to use flow edges to access its successors locally. That change is quite large, so I'm trying to move backwards and open PRs for the more modular bits of the refactor, starting with optRedirectBlock.

@ghost ghost assigned amanasifkhalid Feb 15, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 15, 2024
@ghost
Copy link

ghost commented Feb 15, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #93020. When cloning a set of basic blocks (such as during loop cloning, finally cloning, etc.), the successors of the copied blocks may be different from the successors of their original counterparts, due to the successors being copied as well. We currently handle this as so:

  • Create the block copies without initializing their successors.
  • Copy the target(s) of the original block over with BasicBlock::CopyTarget; don't create any edges yet, as the new block's target(s) may change.
  • In Compiler::optRedirectBlock, determine if the new block's target needs to be redirected to the copy of the original target, using a mapping of original blocks to their copies. We may or may not also initialize/update the successor edges for the new block.

This PR simplifies this process in preparation for updating BasicBlock to track its successors via FlowEdge pointers, instead of BasicBlock pointers. This is the new process:

  • Create the block copies without initializing their successors.
  • In Compiler::optRedirectBlock, for each successor of the original block, determine if the new block should use the successor, or if the new block should be redirected to a copy of that successor, using the block map. Either way, all of the new block's successor edges must be initialized here.

@AndyAyersMS I finished refactoring BasicBlock to use flow edges to access its successors locally. That change is quite large, so I'm trying to move backwards and open PRs for the more modular bits of the refactor, starting with optRedirectBlock.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -


case BBJ_EHCATCHRET:
case BBJ_EHFILTERRET:
// These block types should not need redirecting
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the previous optRedirectBlock implementation did nothing for these types.

@ryujit-bot
Copy link

Diff results for #98526

Assembly diffs

Assembly diffs for windows/arm64 ran on linux/x64

Diffs are based on 1,454,211 contexts (555,875 MinOpts, 898,336 FullOpts).

Overall (+0 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch 322,127,180 +0
FullOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch 117,771,784 +0

Details here


Throughput diffs

Throughput diffs for osx/arm64 ran on linux/x64

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.02%
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
libraries.crossgen2.osx.arm64.checked.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.01%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch +0.01%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.02%
benchmarks.run_pgo.osx.arm64.checked.mch -0.02%
benchmarks.run_tiered.osx.arm64.checked.mch -0.02%
libraries.crossgen2.osx.arm64.checked.mch -0.01%
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on linux/x64

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.02%
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
libraries.crossgen2.windows.arm64.checked.mch -0.01%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%
realworld.run.windows.arm64.checked.mch +0.01%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.02%
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.02%
libraries.crossgen2.windows.arm64.checked.mch -0.01%
libraries.pmi.windows.arm64.checked.mch -0.02%
libraries_tests.run.windows.arm64.Release.mch -0.01%
realworld.run.windows.arm64.checked.mch -0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%

Throughput diffs for windows/x64 ran on linux/x64

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch -0.02%
benchmarks.run_pgo.windows.x64.checked.mch -0.01%
benchmarks.run_tiered.windows.x64.checked.mch -0.01%
libraries.crossgen2.windows.x64.checked.mch -0.02%
realworld.run.windows.x64.checked.mch -0.01%
smoke_tests.nativeaot.windows.x64.checked.mch -0.02%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch -0.02%
benchmarks.run_pgo.windows.x64.checked.mch -0.02%
benchmarks.run_tiered.windows.x64.checked.mch -0.02%
libraries.crossgen2.windows.x64.checked.mch -0.02%
realworld.run.windows.x64.checked.mch -0.01%
smoke_tests.nativeaot.windows.x64.checked.mch -0.02%

Details here


@ryujit-bot
Copy link

Diff results for #98526

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,533,745 contexts (1,007,248 MinOpts, 1,526,497 FullOpts).

MISSED contexts: 3 (0.00%)

Overall (-4 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm64.checked.mch 518,962,008 -4
libraries_tests.run.linux.arm64.Release.mch 381,002,252 +0
FullOpts (-4 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm64.checked.mch 167,095,208 -4
libraries_tests.run.linux.arm64.Release.mch 166,087,844 +0

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,573,319 contexts (1,008,940 MinOpts, 1,564,379 FullOpts).

Overall (+49 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 426,828,729 +36
libraries_tests.run.linux.x64.Release.mch 333,393,493 +13
FullOpts (+49 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch 133,190,581 +36
libraries_tests.run.linux.x64.Release.mch 150,726,678 +13

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,224,699 contexts (831,156 MinOpts, 1,393,543 FullOpts).

MISSED contexts: 73,372 (3.19%)

Overall (+2 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm.Release.mch 237,803,444 +2
FullOpts (+2 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm.Release.mch 116,399,952 +2

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch -0.01%
libraries.crossgen2.linux.arm64.checked.mch -0.01%
libraries.pmi.linux.arm64.checked.mch -0.02%
libraries_tests.run.linux.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%
coreclr_tests.run.linux.arm64.checked.mch -0.01%
libraries.crossgen2.linux.arm64.checked.mch -0.01%
libraries.pmi.linux.arm64.checked.mch -0.02%
libraries_tests.run.linux.arm64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.02%
benchmarks.run_pgo.linux.x64.checked.mch -0.01%
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
coreclr_tests.run.linux.x64.checked.mch -0.01%
libraries.crossgen2.linux.x64.checked.mch -0.02%
libraries.pmi.linux.x64.checked.mch -0.02%
libraries_tests.run.linux.x64.Release.mch -0.01%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.02%
FullOpts (-0.02% to -0.01%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.02%
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
benchmarks.run_tiered.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch -0.01%
libraries.crossgen2.linux.x64.checked.mch -0.02%
libraries.pmi.linux.x64.checked.mch -0.02%
libraries_tests.run.linux.x64.Release.mch -0.02%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.02%

Details here


@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. The diffs are from slight changes in edge weights/likelihoods.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never liked RedirectBlockOption (even though it was my creation) so nice to see it go away.

You can defer the likelihood updates if that works out better for you.

}

fgAddRefPred(trueTarget, newBlk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be passing in the "inspiring edge" here (the one from original source to original target)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on doing that once we have easy access to successor edges -- something like fgAddRefPred(trueTarget, newBlk, blk->GetTrueEdge()). In the meantime, I can add calls to fgGetPredForBlock in to get the inspiring edge, since the TP impact will be temporary, or I can include this in my upcoming PRs for adding successor edge members to BasicBlock -- do you have a preference?

}

fgAddRefPred(*jumpPtr, newBlk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, though it can be tricker for edges that have dup counts.

Comment on lines 6842 to +6844
void optRedirectBlock(BasicBlock* blk,
BlockToBlockMap* redirectMap,
const RedirectBlockOption = RedirectBlockOption::DoNotChangePredLists);
BasicBlock* newBlk,
BlockToBlockMap* redirectMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the new function that well -- it requires me to have a block around that the previous function did not require me to have around, but what if I don't have that?

This function doesn't seem to be doing redirection anymore, it rather seems to be initializing a new block based on an old block and a map. Which is fine if that's what we generally need, but the old function seemed much more general than this -- basically a generalization of fgReplaceJumpTarget to replace multiple targets at once. How would we implement that functionality with the new scheme? It is conceivable to me that we are going to need it again.

Copy link
Member Author

@amanasifkhalid amanasifkhalid Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it requires me to have a block around that the previous function did not require me to have around, but what if I don't have that?

For the current callers of optRedirectBlock, newBlk is obtained from redirectMap -- for each copied block blk, we expect there to be a blk -> newBlk mapping. I could refactor optRedirectBlock slightly so that it only takes blk, and retrieves newBlk from redirectMap. We'd always expect this mapping to exist, right? Can you foresee a situation where we wouldn't have that?

This function doesn't seem to be doing redirection anymore, it rather seems to be initializing a new block based on an old block and a map. Which is fine if that's what we generally need, but the old function seemed much more general than this

In my opinion, the old version was trying to do a few too many things, making the semantics of its callers a bit awkward. Previously, callers would have to copy the old block's target(s) to the new block with BasicBlock::CopyTarget without setting up any pred edges, and then tell optRedirectBlock to add pred edges, while deciding what the real target should be. I don't think this approach will hold up well once we've replaced block targets with successor edges: Once we've done that, it won't be possible for a block to have a jump target without an edge to it, which is a state we'd previously establish before calling optRedirectBlock.

How would we implement that functionality with the new scheme? It is conceivable to me that we are going to need it again.

This would introduce some code duplication, but maybe we could have a version of optRedirectBlock that assumes newBlk's successors are initialized, and the current version (which we can rename to imply it is initializing successors, if you'd prefer) can continue to assume successors aren't initialized?

My next few PRs will be touching this method, so I'm happy to add any follow-up changes you recommend to those.

Copy link
Member

@jakobbotsch jakobbotsch Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current callers of optRedirectBlock, newBlk is obtained from redirectMap -- for each copied block blk, we expect there to be a blk -> newBlk mapping. I could refactor optRedirectBlock slightly so that it only takes blk, and retrieves newBlk from redirectMap. We'd always expect this mapping to exist, right? Can you foresee a situation where we wouldn't have that?

I would not expect this mapping to always exist. As I mentioned above, I think optRedirectBlock should be viewed as a generalization of fgReplaceJumpTarget that allows replacing multiple targets in one operation. I.e. it was an operation similar to

for ((BasicBlock* key, BasicBlock* target) in blockMap) { fgReplaceJumpTarget(block, key, target); }

I agree the overloaded handling around preds before wasn't very pretty.

I think with the refactoring done in this PR the function should be renamed to something that indicates that it operates on a partially initialized block and is some form of initialization -- it no longer matches the above semantics. Maybe something like optInitializeDuplicatedBlockTargets?

I actually had local changes where I generalized optRedirectBlock to take a functor instead of a BlockToBlockMap, making it more easy to use for these kinds of more general block redirections. With that I think fgReplaceJumpTarget could be implemented in terms of optRedirectBlock (with a functor like [=](BasicBlock* target) { if (target == oldBlock) { return newBlock; } return target; }).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the refactoring done in this PR the function should be renamed to something that indicates that it operates on a partially initialized block and is some form of initialization -- it no longer matches the above semantics. Maybe something like optInitializeDuplicatedBlockTargets?

Sure thing, I can include that change in my next PR.

I actually had local changes where I generalized optRedirectBlock to take a functor instead of a BlockToBlockMap, making it more easy to use for these kinds of more general block redirections.

I like that idea. Are you ok with me wrapping up the successor edge refactor first, and then coming back to this? That should only take a few more PRs; I have the code ready locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- also, I'm ok with leaving the function out entirely and leaving it as it is right now (with the rename). We can add the more general version when we find a need for it.

@amanasifkhalid
Copy link
Member Author

For my next few PRs, I plan to incrementally convert each block kind to use successor edges instead of block targets. With each update to optRedirectBlock, I'll add the inspiringEdge logic in.

@amanasifkhalid amanasifkhalid merged commit 9394d2e into dotnet:main Feb 16, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the optRedirectBlock branch February 16, 2024 16:39
@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants