Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: update block weight for uncond to cond flow opt #98324

Merged
merged 3 commits into from
Feb 27, 2024

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Feb 12, 2024

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Diffs

This optimization duplicates code and flow in a BBJ_COND successor into one of its
preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to dotnet#93020
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024
@ghost ghost assigned AndyAyersMS Feb 12, 2024
@ghost
Copy link

ghost commented Feb 12, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

FYI @dotnet/jit-contrib

A few large local regressions. The ones I looked at were all additional cloning in loops with type tests.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix! I anticipate those additional fgGetPredForBlock calls will go away soon, once I replace bbTrueTarget and bbFalseTarget with flow edges.

@@ -2471,24 +2471,39 @@ bool Compiler::fgOptimizeUncondBranchToSimpleCond(BasicBlock* block, BasicBlock*
// add an unconditional block after this block to jump to the target block's fallthrough block
//
assert(!target->IsLast());
BasicBlock* next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget());
BasicBlock* const next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixup block should also go away soon, since we no longer need to maintain fallthrough into the false target. I have a change locally for removing this and a bunch of other fixups, but the diffs were discouraging due to all the profile changes by the time we got to block reordering. Hopefully the profile maintenance work we're doing will lessen that impact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had removed the fixup block originally, but put it back to reduce diffs.

@ryujit-bot
Copy link

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts).

MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%)

Overall (+366,484 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 74,854,976 -16,084
coreclr_tests.run.linux.arm64.checked.mch 509,287,460 +21,008
libraries.pmi.linux.arm64.checked.mch 76,692,260 +136
libraries_tests.run.linux.arm64.Release.mch 380,944,972 +363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 164,707,488 -2,524
FullOpts (+366,484 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 52,852,636 -16,084
coreclr_tests.run.linux.arm64.checked.mch 160,417,140 +21,008
libraries.pmi.linux.arm64.checked.mch 76,572,276 +136
libraries_tests.run.linux.arm64.Release.mch 166,327,572 +363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,304,588 -2,524

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%)

Overall (+420,577 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,699,439 -55,836
coreclr_tests.run.linux.x64.checked.mch 403,413,227 +19,477
libraries.pmi.linux.x64.checked.mch 60,773,002 +2,467
libraries_tests.run.linux.x64.Release.mch 336,651,500 +446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,477,813 +8,262
FullOpts (+420,577 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 46,789,254 -55,836
coreclr_tests.run.linux.x64.checked.mch 123,798,121 +19,477
libraries.pmi.linux.x64.checked.mch 60,660,145 +2,467
libraries_tests.run.linux.x64.Release.mch 153,474,826 +446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,893,760 +8,262

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts).

MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%)

Overall (+202,992 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 24,713,208 -9,188
coreclr_tests.run.osx.arm64.checked.mch 476,227,036 +16,244
libraries_tests.run.osx.arm64.Release.mch 312,891,008 +199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 162,507,756 -3,112
FullOpts (+202,992 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 8,957,196 -9,188
coreclr_tests.run.osx.arm64.checked.mch 150,934,824 +16,244
libraries_tests.run.osx.arm64.Release.mch 111,510,444 +199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 149,448,344 -3,112

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%)

Overall (+305,988 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 46,686,644 +1,196
coreclr_tests.run.windows.arm64.checked.mch 496,328,272 +21,196
libraries.pmi.windows.arm64.checked.mch 80,267,692 +828
libraries_tests.run.windows.arm64.Release.mch 323,036,384 +281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,257,132 +1,480
FullOpts (+305,988 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 30,351,036 +1,196
coreclr_tests.run.windows.arm64.checked.mch 156,850,884 +21,196
libraries.pmi.windows.arm64.checked.mch 80,147,708 +828
libraries_tests.run.windows.arm64.Release.mch 119,164,376 +281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,197,788 +1,480

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts).

MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%)

Overall (+239,961 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 46,759,029 +26,975
benchmarks.run_pgo.windows.x64.checked.mch 45,780,921 -82,475
coreclr_tests.run.windows.x64.checked.mch 464,618,729 +19,542
libraries.pmi.windows.x64.checked.mch 64,172,055 +3,231
libraries_tests.run.windows.x64.Release.mch 309,629,720 +259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 149,849,548 +12,991
FullOpts (+239,961 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,268,214 +26,975
benchmarks.run_pgo.windows.x64.checked.mch 23,454,275 -82,475
coreclr_tests.run.windows.x64.checked.mch 130,697,415 +19,542
libraries.pmi.windows.x64.checked.mch 64,058,534 +3,231
libraries_tests.run.windows.x64.Release.mch 110,763,662 +259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 138,621,801 +12,991

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.18%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.18%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
FullOpts (-0.01% to +0.24%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.24%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to +0.21%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
libraries_tests.run.linux.x64.Release.mch +0.21%
FullOpts (-0.02% to +0.26%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
libraries_tests.run.linux.x64.Release.mch +0.26%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.14%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.14%
FullOpts (-0.01% to +0.21%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.21%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.00% to +0.19%)
Collection PDIFF
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.19%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%
FullOpts (-0.00% to +0.27%)
Collection PDIFF
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.27%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.02% to +0.15%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.06%
benchmarks.run_pgo.windows.x64.checked.mch -0.02%
coreclr_tests.run.windows.x64.checked.mch +0.01%
libraries_tests.run.windows.x64.Release.mch +0.15%
FullOpts (-0.02% to +0.21%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.07%
benchmarks.run_pgo.windows.x64.checked.mch -0.02%
coreclr_tests.run.windows.x64.checked.mch +0.01%
libraries_tests.run.windows.x64.Release.mch +0.21%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (-0.01% to +0.06%)
Collection PDIFF
benchmarks.run_pgo.linux.arm.checked.mch -0.01%
libraries.pmi.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.06%
FullOpts (-0.01% to +0.08%)
Collection PDIFF
benchmarks.run_pgo.linux.arm.checked.mch -0.01%
libraries.pmi.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.08%

Throughput diffs for windows/x86 ran on windows/x86

Overall (-0.02% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.windows.x86.checked.mch -0.02%
libraries_tests.run.windows.x86.Release.mch +0.01%
FullOpts (-0.02% to +0.02%)
Collection PDIFF
benchmarks.run_pgo.windows.x86.checked.mch -0.02%
libraries_tests.run.windows.x86.Release.mch +0.02%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.18%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch +0.18%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
FullOpts (-0.01% to +0.24%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch +0.24%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.02% to +0.20%)
Collection PDIFF
libraries_tests.run.linux.x64.Release.mch +0.20%
coreclr_tests.run.linux.x64.checked.mch +0.01%
benchmarks.run_pgo.linux.x64.checked.mch -0.02%
FullOpts (-0.02% to +0.26%)
Collection PDIFF
libraries_tests.run.linux.x64.Release.mch +0.26%
coreclr_tests.run.linux.x64.checked.mch +0.01%
benchmarks.run_pgo.linux.x64.checked.mch -0.02%

Details here


@ryujit-bot
Copy link

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts).

MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%)

Overall (+122,712 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 65,676,478 +36,494
coreclr_tests.run.linux.arm.checked.mch 321,682,372 +4,784
libraries.pmi.linux.arm.checked.mch 50,272,220 +36
libraries_tests.run.linux.arm.Release.mch 239,445,652 +79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,257,664 +1,768
FullOpts (+122,712 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 53,624,530 +36,494
coreclr_tests.run.linux.arm.checked.mch 109,216,788 +4,784
libraries.pmi.linux.arm.checked.mch 50,165,996 +36
libraries_tests.run.linux.arm.Release.mch 117,579,462 +79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,227,820 +1,768

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts).

MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%)

Overall (-51,776 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 55,285,524 -117,707
coreclr_tests.run.windows.x86.checked.mch 371,677,826 +6,755
libraries.pmi.windows.x86.checked.mch 49,759,154 +3,071
libraries_tests.run.windows.x86.Release.mch 206,782,528 +44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 112,705,032 +11,659
FullOpts (-51,776 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 44,439,683 -117,707
coreclr_tests.run.windows.x86.checked.mch 119,096,783 +6,755
libraries.pmi.windows.x86.checked.mch 49,663,921 +3,071
libraries_tests.run.windows.x86.Release.mch 97,462,256 +44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,862,275 +11,659

Details here


@AndyAyersMS
Copy link
Member Author

Hmm, rather bigger diffs than I was expecting.

I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic.

@AndyAyersMS
Copy link
Member Author

Looks like regressions are indeed from more cloning.

@AndyAyersMS
Copy link
Member Author

In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful.

@AndyAyersMS
Copy link
Member Author

@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly.

TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for getting rid of some of the "no fallthrough" cruft.

@AndyAyersMS
Copy link
Member Author

Failure is a timeout spmi replay for linux arm32.

@AndyAyersMS AndyAyersMS merged commit f729653 into dotnet:main Feb 27, 2024
127 of 129 checks passed
@EgorBo
Copy link
Member

EgorBo commented Feb 29, 2024

@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants