JIT: update block weight for uncond to cond flow opt #98324

AndyAyersMS · 2024-02-12T20:59:36Z

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Diffs

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease. Fixes some issues seen with odd perf scores in the ML/CSE experiment. Contributes to dotnet#93020

ghost · 2024-02-12T20:59:47Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.

Fixes some issues seen with odd perf scores in the ML/CSE experiment.

Contributes to #93020

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2024-02-12T21:00:26Z

FYI @dotnet/jit-contrib

A few large local regressions. The ones I looked at were all additional cloning in loops with type tests.

amanasifkhalid

LGTM, thanks for the fix! I anticipate those additional fgGetPredForBlock calls will go away soon, once I replace bbTrueTarget and bbFalseTarget with flow edges.

amanasifkhalid · 2024-02-12T21:16:11Z

src/coreclr/jit/fgopt.cpp

@@ -2471,24 +2471,39 @@ bool Compiler::fgOptimizeUncondBranchToSimpleCond(BasicBlock* block, BasicBlock*
    // add an unconditional block after this block to jump to the target block's fallthrough block
    //
    assert(!target->IsLast());
-    BasicBlock* next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget());
+    BasicBlock* const next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget());


This fixup block should also go away soon, since we no longer need to maintain fallthrough into the false target. I have a change locally for removing this and a bunch of other fixups, but the diffs were discouraging due to all the profile changes by the time we got to block reordering. Hopefully the profile maintenance work we're doing will lessen that impact.

I had removed the fixup block originally, but put it back to reduce diffs.

ryujit-bot · 2024-02-13T00:11:24Z

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts).

MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%)

Overall (+366,484 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	74,854,976	-16,084
coreclr_tests.run.linux.arm64.checked.mch	509,287,460	+21,008
libraries.pmi.linux.arm64.checked.mch	76,692,260	+136
libraries_tests.run.linux.arm64.Release.mch	380,944,972	+363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	164,707,488	-2,524

FullOpts (+366,484 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	52,852,636	-16,084
coreclr_tests.run.linux.arm64.checked.mch	160,417,140	+21,008
libraries.pmi.linux.arm64.checked.mch	76,572,276	+136
libraries_tests.run.linux.arm64.Release.mch	166,327,572	+363,948
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	151,304,588	-2,524

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%)

Overall (+420,577 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	69,699,439	-55,836
coreclr_tests.run.linux.x64.checked.mch	403,413,227	+19,477
libraries.pmi.linux.x64.checked.mch	60,773,002	+2,467
libraries_tests.run.linux.x64.Release.mch	336,651,500	+446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	132,477,813	+8,262

FullOpts (+420,577 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	46,789,254	-55,836
coreclr_tests.run.linux.x64.checked.mch	123,798,121	+19,477
libraries.pmi.linux.x64.checked.mch	60,660,145	+2,467
libraries_tests.run.linux.x64.Release.mch	153,474,826	+446,207
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	121,893,760	+8,262

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts).

MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%)

Overall (+202,992 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	24,713,208	-9,188
coreclr_tests.run.osx.arm64.checked.mch	476,227,036	+16,244
libraries_tests.run.osx.arm64.Release.mch	312,891,008	+199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	162,507,756	-3,112

FullOpts (+202,992 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	8,957,196	-9,188
coreclr_tests.run.osx.arm64.checked.mch	150,934,824	+16,244
libraries_tests.run.osx.arm64.Release.mch	111,510,444	+199,048
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	149,448,344	-3,112

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%)

Overall (+305,988 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	46,686,644	+1,196
coreclr_tests.run.windows.arm64.checked.mch	496,328,272	+21,196
libraries.pmi.windows.arm64.checked.mch	80,267,692	+828
libraries_tests.run.windows.arm64.Release.mch	323,036,384	+281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	171,257,132	+1,480

FullOpts (+305,988 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	30,351,036	+1,196
coreclr_tests.run.windows.arm64.checked.mch	156,850,884	+21,196
libraries.pmi.windows.arm64.checked.mch	80,147,708	+828
libraries_tests.run.windows.arm64.Release.mch	119,164,376	+281,288
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	158,197,788	+1,480

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts).

MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%)

Overall (+239,961 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	46,759,029	+26,975
benchmarks.run_pgo.windows.x64.checked.mch	45,780,921	-82,475
coreclr_tests.run.windows.x64.checked.mch	464,618,729	+19,542
libraries.pmi.windows.x64.checked.mch	64,172,055	+3,231
libraries_tests.run.windows.x64.Release.mch	309,629,720	+259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	149,849,548	+12,991

FullOpts (+239,961 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	28,268,214	+26,975
benchmarks.run_pgo.windows.x64.checked.mch	23,454,275	-82,475
coreclr_tests.run.windows.x64.checked.mch	130,697,415	+19,542
libraries.pmi.windows.x64.checked.mch	64,058,534	+3,231
libraries_tests.run.windows.x64.Release.mch	110,763,662	+259,697
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	138,621,801	+12,991

Details here

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.18%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
libraries_tests.run.linux.arm64.Release.mch	+0.18%

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

FullOpts (-0.01% to +0.24%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.02%
libraries_tests.run.linux.arm64.Release.mch	+0.24%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.02% to +0.21%)

Collection	PDIFF
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
libraries_tests.run.linux.x64.Release.mch	+0.21%

FullOpts (-0.02% to +0.26%)

Collection	PDIFF
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
libraries_tests.run.linux.x64.Release.mch	+0.26%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.14%)

Collection	PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.14%

FullOpts (-0.01% to +0.21%)

Collection	PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch	-0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries_tests.run.osx.arm64.Release.mch	+0.21%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.00% to +0.19%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.19%

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	-0.01%

FullOpts (-0.00% to +0.27%)

Collection	PDIFF
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries_tests.run.windows.arm64.Release.mch	+0.27%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.02% to +0.15%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.06%
benchmarks.run_pgo.windows.x64.checked.mch	-0.02%
coreclr_tests.run.windows.x64.checked.mch	+0.01%
libraries_tests.run.windows.x64.Release.mch	+0.15%

FullOpts (-0.02% to +0.21%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.07%
benchmarks.run_pgo.windows.x64.checked.mch	-0.02%
coreclr_tests.run.windows.x64.checked.mch	+0.01%
libraries_tests.run.windows.x64.Release.mch	+0.21%

Details here

Throughput diffs for linux/arm ran on windows/x86

Overall (-0.01% to +0.06%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm.checked.mch	-0.01%
libraries.pmi.linux.arm.checked.mch	+0.01%
libraries_tests.run.linux.arm.Release.mch	+0.06%

FullOpts (-0.01% to +0.08%)

Collection	PDIFF
benchmarks.run_pgo.linux.arm.checked.mch	-0.01%
libraries.pmi.linux.arm.checked.mch	+0.01%
libraries_tests.run.linux.arm.Release.mch	+0.08%

Throughput diffs for windows/x86 ran on windows/x86

Overall (-0.02% to +0.01%)

Collection	PDIFF
benchmarks.run_pgo.windows.x86.checked.mch	-0.02%
libraries_tests.run.windows.x86.Release.mch	+0.01%

FullOpts (-0.02% to +0.02%)

Collection	PDIFF
benchmarks.run_pgo.windows.x86.checked.mch	-0.02%
libraries_tests.run.windows.x86.Release.mch	+0.02%

Details here

Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.18%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	+0.18%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%

FullOpts (-0.01% to +0.24%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	+0.24%
benchmarks.run_pgo.linux.arm64.checked.mch	-0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.02% to +0.20%)

Collection	PDIFF
libraries_tests.run.linux.x64.Release.mch	+0.20%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%

FullOpts (-0.02% to +0.26%)

Collection	PDIFF
libraries_tests.run.linux.x64.Release.mch	+0.26%
coreclr_tests.run.linux.x64.checked.mch	+0.01%
benchmarks.run_pgo.linux.x64.checked.mch	-0.02%

Details here

ryujit-bot · 2024-02-13T01:11:31Z

Diff results for #98324

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts).

MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%)

Overall (+122,712 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	65,676,478	+36,494
coreclr_tests.run.linux.arm.checked.mch	321,682,372	+4,784
libraries.pmi.linux.arm.checked.mch	50,272,220	+36
libraries_tests.run.linux.arm.Release.mch	239,445,652	+79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	94,257,664	+1,768

FullOpts (+122,712 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	53,624,530	+36,494
coreclr_tests.run.linux.arm.checked.mch	109,216,788	+4,784
libraries.pmi.linux.arm.checked.mch	50,165,996	+36
libraries_tests.run.linux.arm.Release.mch	117,579,462	+79,630
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	84,227,820	+1,768

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts).

MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%)

Overall (-51,776 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	55,285,524	-117,707
coreclr_tests.run.windows.x86.checked.mch	371,677,826	+6,755
libraries.pmi.windows.x86.checked.mch	49,759,154	+3,071
libraries_tests.run.windows.x86.Release.mch	206,782,528	+44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	112,705,032	+11,659

FullOpts (-51,776 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	44,439,683	-117,707
coreclr_tests.run.windows.x86.checked.mch	119,096,783	+6,755
libraries.pmi.windows.x86.checked.mch	49,663,921	+3,071
libraries_tests.run.windows.x86.Release.mch	97,462,256	+44,446
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	103,862,275	+11,659

Details here

AndyAyersMS · 2024-02-13T01:30:51Z

Hmm, rather bigger diffs than I was expecting.

I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic.

AndyAyersMS · 2024-02-26T23:59:10Z

Looks like regressions are indeed from more cloning.

AndyAyersMS · 2024-02-26T23:59:51Z

In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful.

AndyAyersMS · 2024-02-27T15:45:19Z

@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly.

TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data.

amanasifkhalid

LGTM, thanks for getting rid of some of the "no fallthrough" cruft.

AndyAyersMS · 2024-02-27T15:59:33Z

Failure is a timeout spmi replay for linux arm32.

EgorBo · 2024-02-29T17:43:45Z

Improvements on arm64:

[Perf] Linux/arm64: 32 Improvements on 2/27/2024 9:28:58 PM perf-autofiling-issues#30175

EgorBo · 2024-03-07T17:51:53Z

Improvements on arm64:

[Perf] Linux/arm64: 35 Improvements on 2/27/2024 9:28:58 PM perf-autofiling-issues#30633
[Perf] Windows/arm64: 35 Improvements on 2/27/2024 9:28:58 PM perf-autofiling-issues#30646
[Perf] Windows/arm64: 14 Improvements on 2/27/2024 9:28:58 PM perf-autofiling-issues#30660

AndyAyersMS requested a review from amanasifkhalid February 12, 2024 20:59

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024

ghost assigned AndyAyersMS Feb 12, 2024

amanasifkhalid approved these changes Feb 12, 2024

View reviewed changes

amanasifkhalid reviewed Feb 12, 2024

View reviewed changes

Merge branch 'main' into UncondToCondBlockWeightFix

21c44de

more ambitious fix

f329bc9

amanasifkhalid approved these changes Feb 27, 2024

View reviewed changes

AndyAyersMS merged commit f729653 into dotnet:main Feb 27, 2024
127 of 129 checks passed

EgorBo mentioned this pull request Feb 29, 2024

[Perf] Linux/arm64: 3 Regressions on 2/27/2024 9:28:58 PM #99124

Closed

DrewScoggins mentioned this pull request Mar 5, 2024

[Perf] Linux/x64: 48 Regressions on 2/27/2024 9:28:58 PM #99315

Open

EgorBo mentioned this pull request Mar 7, 2024

[Perf] Windows/arm64: 9 Regressions on 2/27/2024 9:28:58 PM #99417

Closed

github-actions bot locked and limited conversation to collaborators Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: update block weight for uncond to cond flow opt #98324

JIT: update block weight for uncond to cond flow opt #98324

AndyAyersMS commented Feb 12, 2024 •

edited

Loading

ghost commented Feb 12, 2024

AndyAyersMS commented Feb 12, 2024

amanasifkhalid left a comment

amanasifkhalid Feb 12, 2024

AndyAyersMS Feb 12, 2024

ryujit-bot commented Feb 13, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for osx/arm64 ran on windows/x64

Assembly diffs for windows/arm64 ran on windows/x64

Assembly diffs for windows/x64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for linux/x64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for linux/x64 ran on linux/x64

ryujit-bot commented Feb 13, 2024

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Assembly diffs for windows/x86 ran on windows/x86

AndyAyersMS commented Feb 13, 2024

AndyAyersMS commented Feb 26, 2024

AndyAyersMS commented Feb 26, 2024

AndyAyersMS commented Feb 27, 2024

amanasifkhalid left a comment

AndyAyersMS commented Feb 27, 2024

EgorBo commented Feb 29, 2024

EgorBo commented Mar 7, 2024 •

edited

Loading

JIT: update block weight for uncond to cond flow opt #98324

JIT: update block weight for uncond to cond flow opt #98324

Conversation

AndyAyersMS commented Feb 12, 2024 • edited Loading

ghost commented Feb 12, 2024

AndyAyersMS commented Feb 12, 2024

amanasifkhalid left a comment

Choose a reason for hiding this comment

amanasifkhalid Feb 12, 2024

Choose a reason for hiding this comment

AndyAyersMS Feb 12, 2024

Choose a reason for hiding this comment

ryujit-bot commented Feb 13, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for osx/arm64 ran on windows/x64

Assembly diffs for windows/arm64 ran on windows/x64

Assembly diffs for windows/x64 ran on windows/x64

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for linux/x64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

Throughput diffs for linux/arm ran on windows/x86

Throughput diffs for windows/x86 ran on windows/x86

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for linux/x64 ran on linux/x64

ryujit-bot commented Feb 13, 2024

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Assembly diffs for windows/x86 ran on windows/x86

AndyAyersMS commented Feb 13, 2024

AndyAyersMS commented Feb 26, 2024

AndyAyersMS commented Feb 26, 2024

AndyAyersMS commented Feb 27, 2024

amanasifkhalid left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Feb 27, 2024

EgorBo commented Feb 29, 2024

EgorBo commented Mar 7, 2024 • edited Loading

AndyAyersMS commented Feb 12, 2024 •

edited

Loading

EgorBo commented Mar 7, 2024 •

edited

Loading