JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

amanasifkhalid · 2024-02-15T21:49:58Z

Part of #93020. When cloning a set of basic blocks (such as during loop cloning, finally cloning, etc.), the successors of the copied blocks may be different from the successors of their original counterparts, due to the successors being copied as well. We currently handle this as so:

Create the block copies without initializing their successors.
Copy the target(s) of the original block over with BasicBlock::CopyTarget; don't create any edges yet, as the new block's target(s) may change.
In Compiler::optRedirectBlock, determine if the new block's target needs to be redirected to the copy of the original target, using a mapping of original blocks to their copies. We may or may not also initialize/update the successor edges for the new block.

This PR simplifies this process in preparation for updating BasicBlock to track its successors via FlowEdge pointers, instead of BasicBlock pointers. This is the new process:

Create the block copies without initializing their successors.
In Compiler::optRedirectBlock, for each successor of the original block, determine if the new block should use the successor, or if the new block should be redirected to a copy of that successor, using the block map. Either way, all of the new block's successor edges must be initialized here.

@AndyAyersMS I finished refactoring BasicBlock to use flow edges to access its successors locally. That change is quite large, so I'm trying to move backwards and open PRs for the more modular bits of the refactor, starting with optRedirectBlock.

ghost · 2024-02-15T21:50:08Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #93020. When cloning a set of basic blocks (such as during loop cloning, finally cloning, etc.), the successors of the copied blocks may be different from the successors of their original counterparts, due to the successors being copied as well. We currently handle this as so:

Create the block copies without initializing their successors.
Copy the target(s) of the original block over with BasicBlock::CopyTarget; don't create any edges yet, as the new block's target(s) may change.
In Compiler::optRedirectBlock, determine if the new block's target needs to be redirected to the copy of the original target, using a mapping of original blocks to their copies. We may or may not also initialize/update the successor edges for the new block.

This PR simplifies this process in preparation for updating BasicBlock to track its successors via FlowEdge pointers, instead of BasicBlock pointers. This is the new process:

Create the block copies without initializing their successors.
In Compiler::optRedirectBlock, for each successor of the original block, determine if the new block should use the successor, or if the new block should be redirected to a copy of that successor, using the block map. Either way, all of the new block's successor edges must be initialized here.

@AndyAyersMS I finished refactoring BasicBlock to use flow edges to access its successors locally. That change is quite large, so I'm trying to move backwards and open PRs for the more modular bits of the refactor, starting with optRedirectBlock.

Author:	amanasifkhalid
Assignees:	amanasifkhalid
Labels:	`area-CodeGen-coreclr`
Milestone:	-

amanasifkhalid · 2024-02-15T22:05:28Z

src/coreclr/jit/optimizer.cpp

+
+            case BBJ_EHCATCHRET:
+            case BBJ_EHFILTERRET:
+                // These block types should not need redirecting


For context, the previous optRedirectBlock implementation did nothing for these types.

ryujit-bot · 2024-02-15T23:59:01Z

Diff results for #98526

Assembly diffs

Assembly diffs for windows/arm64 ran on linux/x64

Diffs are based on 1,454,211 contexts (555,875 MinOpts, 898,336 FullOpts).

Overall (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch	322,127,180	+0

FullOpts (+0 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch	117,771,784	+0

Details here

Throughput diffs

Throughput diffs for osx/arm64 ran on linux/x64

Overall (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	-0.02%
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	-0.01%
libraries.crossgen2.osx.arm64.checked.mch	-0.01%
realworld.run.osx.arm64.checked.mch	-0.01%

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
realworld.run.osx.arm64.checked.mch	+0.01%

FullOpts (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	-0.02%
benchmarks.run_pgo.osx.arm64.checked.mch	-0.02%
benchmarks.run_tiered.osx.arm64.checked.mch	-0.02%
libraries.crossgen2.osx.arm64.checked.mch	-0.01%
realworld.run.osx.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on linux/x64

Overall (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	-0.02%
benchmarks.run_pgo.windows.arm64.checked.mch	-0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	-0.01%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries.pmi.windows.arm64.checked.mch	-0.02%
libraries_tests.run.windows.arm64.Release.mch	-0.01%
realworld.run.windows.arm64.checked.mch	-0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch	-0.02%

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	+0.01%
realworld.run.windows.arm64.checked.mch	+0.01%

FullOpts (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	-0.02%
benchmarks.run_pgo.windows.arm64.checked.mch	-0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	-0.02%
libraries.crossgen2.windows.arm64.checked.mch	-0.01%
libraries.pmi.windows.arm64.checked.mch	-0.02%
libraries_tests.run.windows.arm64.Release.mch	-0.01%
realworld.run.windows.arm64.checked.mch	-0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch	-0.02%

Throughput diffs for windows/x64 ran on linux/x64

Overall (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.windows.x64.checked.mch	-0.02%
benchmarks.run_pgo.windows.x64.checked.mch	-0.01%
benchmarks.run_tiered.windows.x64.checked.mch	-0.01%
libraries.crossgen2.windows.x64.checked.mch	-0.02%
realworld.run.windows.x64.checked.mch	-0.01%
smoke_tests.nativeaot.windows.x64.checked.mch	-0.02%

FullOpts (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.windows.x64.checked.mch	-0.02%
benchmarks.run_pgo.windows.x64.checked.mch	-0.02%
benchmarks.run_tiered.windows.x64.checked.mch	-0.02%
libraries.crossgen2.windows.x64.checked.mch	-0.02%
realworld.run.windows.x64.checked.mch	-0.01%
smoke_tests.nativeaot.windows.x64.checked.mch	-0.02%

Details here

ryujit-bot · 2024-02-16T00:59:15Z

Diff results for #98526

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,533,745 contexts (1,007,248 MinOpts, 1,526,497 FullOpts).

MISSED contexts: 3 (0.00%)

Overall (-4 bytes)

Collection	Base size (bytes)	Diff size (bytes)
coreclr_tests.run.linux.arm64.checked.mch	518,962,008	-4
libraries_tests.run.linux.arm64.Release.mch	381,002,252	+0

FullOpts (-4 bytes)

Collection	Base size (bytes)	Diff size (bytes)
coreclr_tests.run.linux.arm64.checked.mch	167,095,208	-4
libraries_tests.run.linux.arm64.Release.mch	166,087,844	+0

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,573,319 contexts (1,008,940 MinOpts, 1,564,379 FullOpts).

Overall (+49 bytes)

Collection	Base size (bytes)	Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch	426,828,729	+36
libraries_tests.run.linux.x64.Release.mch	333,393,493	+13

FullOpts (+49 bytes)

Collection	Base size (bytes)	Diff size (bytes)
coreclr_tests.run.linux.x64.checked.mch	133,190,581	+36
libraries_tests.run.linux.x64.Release.mch	150,726,678	+13

Details here

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,224,699 contexts (831,156 MinOpts, 1,393,543 FullOpts).

MISSED contexts: 73,372 (3.19%)

Overall (+2 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm.Release.mch	237,803,444	+2

FullOpts (+2 bytes)

Collection	Base size (bytes)	Diff size (bytes)
libraries_tests.run.linux.arm.Release.mch	116,399,952	+2

Details here

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	-0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries.pmi.linux.arm64.checked.mch	-0.02%
libraries_tests.run.linux.arm64.Release.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%
realworld.run.linux.arm64.checked.mch	-0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch	-0.02%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	-0.01%

FullOpts (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	-0.01%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	-0.02%
coreclr_tests.run.linux.arm64.checked.mch	-0.01%
libraries.crossgen2.linux.arm64.checked.mch	-0.01%
libraries.pmi.linux.arm64.checked.mch	-0.02%
libraries_tests.run.linux.arm64.Release.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	-0.01%
realworld.run.linux.arm64.checked.mch	-0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch	-0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.linux.x64.checked.mch	-0.02%
benchmarks.run_pgo.linux.x64.checked.mch	-0.01%
benchmarks.run_tiered.linux.x64.checked.mch	-0.01%
coreclr_tests.run.linux.x64.checked.mch	-0.01%
libraries.crossgen2.linux.x64.checked.mch	-0.02%
libraries.pmi.linux.x64.checked.mch	-0.02%
libraries_tests.run.linux.x64.Release.mch	-0.01%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	-0.01%
realworld.run.linux.x64.checked.mch	-0.01%
smoke_tests.nativeaot.linux.x64.checked.mch	-0.02%

FullOpts (-0.02% to -0.01%)

Collection	PDIFF
benchmarks.run.linux.x64.checked.mch	-0.02%
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%
benchmarks.run_tiered.linux.x64.checked.mch	-0.02%
coreclr_tests.run.linux.x64.checked.mch	-0.01%
libraries.crossgen2.linux.x64.checked.mch	-0.02%
libraries.pmi.linux.x64.checked.mch	-0.02%
libraries_tests.run.linux.x64.Release.mch	-0.02%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	-0.01%
realworld.run.linux.x64.checked.mch	-0.01%
smoke_tests.nativeaot.linux.x64.checked.mch	-0.02%

Details here

amanasifkhalid · 2024-02-16T01:09:00Z

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. The diffs are from slight changes in edge weights/likelihoods.

AndyAyersMS

I never liked RedirectBlockOption (even though it was my creation) so nice to see it go away.

You can defer the likelihood updates if that works out better for you.

AndyAyersMS · 2024-02-16T01:37:52Z

src/coreclr/jit/optimizer.cpp

            }
+
+            fgAddRefPred(trueTarget, newBlk);


Should we be passing in the "inspiring edge" here (the one from original source to original target)?

I was planning on doing that once we have easy access to successor edges -- something like fgAddRefPred(trueTarget, newBlk, blk->GetTrueEdge()). In the meantime, I can add calls to fgGetPredForBlock in to get the inspiring edge, since the TP impact will be temporary, or I can include this in my upcoming PRs for adding successor edge members to BasicBlock -- do you have a preference?

AndyAyersMS · 2024-02-16T01:38:22Z

src/coreclr/jit/optimizer.cpp

                }
+
+                fgAddRefPred(*jumpPtr, newBlk);


Ditto here, though it can be tricker for edges that have dup counts.

jakobbotsch · 2024-02-16T07:44:24Z

src/coreclr/jit/compiler.h

    void optRedirectBlock(BasicBlock*      blk,
-                          BlockToBlockMap* redirectMap,
-                          const RedirectBlockOption = RedirectBlockOption::DoNotChangePredLists);
+                          BasicBlock*      newBlk,
+                          BlockToBlockMap* redirectMap);


I'm not sure I understand the new function that well -- it requires me to have a block around that the previous function did not require me to have around, but what if I don't have that?

This function doesn't seem to be doing redirection anymore, it rather seems to be initializing a new block based on an old block and a map. Which is fine if that's what we generally need, but the old function seemed much more general than this -- basically a generalization of fgReplaceJumpTarget to replace multiple targets at once. How would we implement that functionality with the new scheme? It is conceivable to me that we are going to need it again.

it requires me to have a block around that the previous function did not require me to have around, but what if I don't have that?

For the current callers of optRedirectBlock, newBlk is obtained from redirectMap -- for each copied block blk, we expect there to be a blk -> newBlk mapping. I could refactor optRedirectBlock slightly so that it only takes blk, and retrieves newBlk from redirectMap. We'd always expect this mapping to exist, right? Can you foresee a situation where we wouldn't have that?

This function doesn't seem to be doing redirection anymore, it rather seems to be initializing a new block based on an old block and a map. Which is fine if that's what we generally need, but the old function seemed much more general than this

In my opinion, the old version was trying to do a few too many things, making the semantics of its callers a bit awkward. Previously, callers would have to copy the old block's target(s) to the new block with BasicBlock::CopyTarget without setting up any pred edges, and then tell optRedirectBlock to add pred edges, while deciding what the real target should be. I don't think this approach will hold up well once we've replaced block targets with successor edges: Once we've done that, it won't be possible for a block to have a jump target without an edge to it, which is a state we'd previously establish before calling optRedirectBlock.

How would we implement that functionality with the new scheme? It is conceivable to me that we are going to need it again.

This would introduce some code duplication, but maybe we could have a version of optRedirectBlock that assumes newBlk's successors are initialized, and the current version (which we can rename to imply it is initializing successors, if you'd prefer) can continue to assume successors aren't initialized?

My next few PRs will be touching this method, so I'm happy to add any follow-up changes you recommend to those.

For the current callers of optRedirectBlock, newBlk is obtained from redirectMap -- for each copied block blk, we expect there to be a blk -> newBlk mapping. I could refactor optRedirectBlock slightly so that it only takes blk, and retrieves newBlk from redirectMap. We'd always expect this mapping to exist, right? Can you foresee a situation where we wouldn't have that?

I would not expect this mapping to always exist. As I mentioned above, I think optRedirectBlock should be viewed as a generalization of fgReplaceJumpTarget that allows replacing multiple targets in one operation. I.e. it was an operation similar to

for ((BasicBlock* key, BasicBlock* target) in blockMap) { fgReplaceJumpTarget(block, key, target); }

I agree the overloaded handling around preds before wasn't very pretty.

I think with the refactoring done in this PR the function should be renamed to something that indicates that it operates on a partially initialized block and is some form of initialization -- it no longer matches the above semantics. Maybe something like optInitializeDuplicatedBlockTargets?

I actually had local changes where I generalized optRedirectBlock to take a functor instead of a BlockToBlockMap, making it more easy to use for these kinds of more general block redirections. With that I think fgReplaceJumpTarget could be implemented in terms of optRedirectBlock (with a functor like [=](BasicBlock* target) { if (target == oldBlock) { return newBlock; } return target; }).

I think with the refactoring done in this PR the function should be renamed to something that indicates that it operates on a partially initialized block and is some form of initialization -- it no longer matches the above semantics. Maybe something like optInitializeDuplicatedBlockTargets?

Sure thing, I can include that change in my next PR.

I actually had local changes where I generalized optRedirectBlock to take a functor instead of a BlockToBlockMap, making it more easy to use for these kinds of more general block redirections.

I like that idea. Are you ok with me wrapping up the successor edge refactor first, and then coming back to this? That should only take a few more PRs; I have the code ready locally.

Sure -- also, I'm ok with leaving the function out entirely and leaving it as it is right now (with the rename). We can add the more general version when we find a need for it.

amanasifkhalid · 2024-02-16T16:22:00Z

For my next few PRs, I plan to incrementally convert each block kind to use successor edges instead of block targets. With each update to optRedirectBlock, I'll add the inspiringEdge logic in.

amanasifkhalid added 2 commits February 15, 2024 16:25

Refactor optRedirectBlock

c6c7d2a

Remove BasicBlock::CopyTarget

b11f5f2

ghost assigned amanasifkhalid Feb 15, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 15, 2024

amanasifkhalid commented Feb 15, 2024

View reviewed changes

AndyAyersMS approved these changes Feb 16, 2024

View reviewed changes

jakobbotsch reviewed Feb 16, 2024

View reviewed changes

amanasifkhalid merged commit 9394d2e into dotnet:main Feb 16, 2024
129 checks passed

amanasifkhalid deleted the optRedirectBlock branch February 16, 2024 16:39

amanasifkhalid mentioned this pull request Feb 19, 2024

JIT: Use successor edges instead of block targets for BBJ_SWITCH #98671

Merged

kunalspathak mentioned this pull request Feb 21, 2024

Diff in checked vs. release #98772

Closed

github-actions bot locked and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

amanasifkhalid commented Feb 15, 2024

ghost commented Feb 15, 2024

amanasifkhalid Feb 15, 2024

ryujit-bot commented Feb 15, 2024

Assembly diffs

Assembly diffs for windows/arm64 ran on linux/x64

Throughput diffs

Throughput diffs for osx/arm64 ran on linux/x64

Throughput diffs for windows/arm64 ran on linux/x64

Throughput diffs for windows/x64 ran on linux/x64

ryujit-bot commented Feb 16, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for linux/arm ran on windows/x86

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for linux/x64 ran on windows/x64

amanasifkhalid commented Feb 16, 2024

AndyAyersMS left a comment

AndyAyersMS Feb 16, 2024

amanasifkhalid Feb 16, 2024

AndyAyersMS Feb 16, 2024

jakobbotsch Feb 16, 2024

amanasifkhalid Feb 16, 2024 •

edited

Loading

jakobbotsch Feb 16, 2024 •

edited

Loading

amanasifkhalid Feb 19, 2024

jakobbotsch Feb 19, 2024

amanasifkhalid commented Feb 16, 2024

JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

JIT: Refactor Compiler::optRedirectBlock; remove BasicBlock::CopyTarget #98526

Conversation

amanasifkhalid commented Feb 15, 2024

ghost commented Feb 15, 2024

amanasifkhalid Feb 15, 2024

Choose a reason for hiding this comment

ryujit-bot commented Feb 15, 2024

Assembly diffs

Assembly diffs for windows/arm64 ran on linux/x64

Throughput diffs

Throughput diffs for osx/arm64 ran on linux/x64

Throughput diffs for windows/arm64 ran on linux/x64

Throughput diffs for windows/x64 ran on linux/x64

ryujit-bot commented Feb 16, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for linux/arm ran on windows/x86

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for linux/x64 ran on windows/x64

amanasifkhalid commented Feb 16, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS Feb 16, 2024

Choose a reason for hiding this comment

amanasifkhalid Feb 16, 2024

Choose a reason for hiding this comment

AndyAyersMS Feb 16, 2024

Choose a reason for hiding this comment

jakobbotsch Feb 16, 2024

Choose a reason for hiding this comment

amanasifkhalid Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

jakobbotsch Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

amanasifkhalid Feb 19, 2024

Choose a reason for hiding this comment

jakobbotsch Feb 19, 2024

Choose a reason for hiding this comment

amanasifkhalid commented Feb 16, 2024

amanasifkhalid Feb 16, 2024 •

edited

Loading

jakobbotsch Feb 16, 2024 •

edited

Loading