Force 2 iterations of OptRepeat #100473

BruceForstall · 2024-03-31T20:58:04Z

No description provided.

dotnet-policy-service · 2024-03-31T20:58:24Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-04-01T13:55:53Z

Ouch, quite expensive (@SingleAccretion predicted similar numbers)
It's still nice to have JitOptRepeat in good shape and it can point us to missing opportunities.
Maybe there are some hidden fruits to reduce the TP? E.g.

Don't run it for huge methods (e.g. non-linear complexity for number of basic blocks, etc)
Run on demand? E.g. if RBO removes a branch and invalidates dominators or Assertprop propagates a constant and potentionally enables more opportunities for RBO?
Don't run some opts in the 2nd iter like assertprop and rangecheck if they don't contribute to the final diffs a lot

But of course that is quite non-trivial to guess whether you might benefit from the 2nd iter or not. Maybe if we study these diffs we'll figure out what we benefit the most from in most cases?

BruceForstall · 2024-04-01T18:02:29Z

Diffs

Code size improvement from about 3 to 4.5MB. TP cost of about 20%.

BruceForstall · 2024-04-01T18:04:00Z

Ouch, quite expensive (@SingleAccretion predicted similar numbers) It's still nice to have JitOptRepeat in good shape and it can point us to missing opportunities. Maybe there are some hidden fruits to reduce the TP? E.g.

Don't run it for huge methods (e.g. non-linear complexity for number of basic blocks, etc)

Run on demand? E.g. if RBO removes a branch and invalidates dominators or Assertprop propagates a constant and potentionally enables more opportunities for RBO?

Don't run some opts in the 2nd iter like assertprop and rangecheck if they don't contribute to the final diffs a lot

But of course that is quite non-trivial to guess whether you might benefit from the 2nd iter or not. Maybe if we study these diffs we'll figure out what we benefit the most from in most cases?

Lots of good ideas here. I agree that we wouldn't want to run it unconditionally. But running on big methods might be the most beneficial. And maybe we'd be willing to take the 20% TP cost for NAOT compiles.

BruceForstall · 2024-04-01T21:05:43Z

cc @dotnet/jit-contrib

Example of enabling JitOptRepeat always, with 2 iterations. Code size improvement from about 3 to 4.5MB. TP cost of about 20%.

BruceForstall · 2024-04-02T05:32:54Z

Diffs

BruceForstall · 2024-04-02T05:34:38Z

BruceForstall · 2024-04-04T17:32:47Z

Here is a run of asmdiffs on win-x64 between JitOptRepeat 2 and 3 iterations, showing there is some benefit to be had to run more than twice.

Diffs are based on 2,384,245 contexts (931,974 MinOpts, 1,452,271 FullOpts).

MISSED contexts: base: 7,980 (0.33%), diff: 8,122 (0.34%)

Base JIT options: JitOptRepeat#*

Diff JIT options: JitOptRepeat#*;JitOptRepeatCount#3

Overall (-149,570 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,524,826	-2,200	-1.16%
benchmarks.run_pgo.windows.x64.checked.mch	35,113,082	-8,390	+36.54%
benchmarks.run_tiered.windows.x64.checked.mch	12,444,446	-1,131	-1.25%
coreclr_tests.run.windows.x64.checked.mch	398,753,006	-53,627	+50.20%
libraries.crossgen2.windows.x64.checked.mch	44,921,842	-7,029	-0.59%
libraries.pmi.windows.x64.checked.mch	61,240,935	-6,528	+125.54%
libraries_tests.run.windows.x64.Release.mch	274,740,189	-23,431	+20.96%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	135,902,387	-44,863	+160.81%
realworld.run.windows.x64.checked.mch	11,060,751	-1,196	-0.12%
smoke_tests.nativeaot.windows.x64.checked.mch	3,565,641	-1,175	-0.83%

FullOpts (-149,570 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,524,465	-2,200	-1.16%
benchmarks.run_pgo.windows.x64.checked.mch	20,929,241	-8,390	+36.54%
benchmarks.run_tiered.windows.x64.checked.mch	3,279,500	-1,131	-1.25%
coreclr_tests.run.windows.x64.checked.mch	118,412,975	-53,627	+50.20%
libraries.crossgen2.windows.x64.checked.mch	44,920,652	-7,029	-0.59%
libraries.pmi.windows.x64.checked.mch	61,127,434	-6,528	+125.54%
libraries_tests.run.windows.x64.Release.mch	102,077,791	-23,431	+20.96%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	125,588,969	-44,863	+160.81%
realworld.run.windows.x64.checked.mch	10,647,587	-1,196	-0.12%
smoke_tests.nativeaot.windows.x64.checked.mch	3,565,548	-1,175	-0.83%

EgorBo · 2024-04-04T17:45:29Z

Sounds like 2 iterations are enough? 🙂 It'd be nice to have a breakdown TP report like @jakobbotsch did here for example to know what is the most expensive thing

BruceForstall · 2024-04-04T17:45:43Z

Here's a run of asmdiffs on win-x64 between JitOptRepeat 3 and 4 iterations. There's still diffs.

Diffs are based on 2,384,210 contexts (931,974 MinOpts, 1,452,236 FullOpts).

MISSED contexts: base: 8,122 (0.34%), diff: 8,157 (0.34%)

Base JIT options: JitOptRepeat#*;JitOptRepeatCount#3

Diff JIT options: JitOptRepeat#*;JitOptRepeatCount#4

Overall (-71,360 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,522,626	-1,154	+0.26%
benchmarks.run_pgo.windows.x64.checked.mch	35,104,692	-3,889	+70.31%
benchmarks.run_tiered.windows.x64.checked.mch	12,443,315	-295	-0.03%
coreclr_tests.run.windows.x64.checked.mch	398,699,105	-64,162	+420.88%
libraries.crossgen2.windows.x64.checked.mch	44,914,813	-3,739	-0.10%
libraries.pmi.windows.x64.checked.mch	61,219,837	-8,283	+306.51%
libraries_tests.run.windows.x64.Release.mch	274,716,321	+913	+41.18%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	135,840,633	+11,773	+374.53%
realworld.run.windows.x64.checked.mch	11,059,555	-2,040	-0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	3,564,466	-484	-0.19%

FullOpts (-71,360 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,522,265	-1,154	+0.26%
benchmarks.run_pgo.windows.x64.checked.mch	20,920,851	-3,889	+70.31%
benchmarks.run_tiered.windows.x64.checked.mch	3,278,369	-295	-0.03%
coreclr_tests.run.windows.x64.checked.mch	118,359,074	-64,162	+420.88%
libraries.crossgen2.windows.x64.checked.mch	44,913,623	-3,739	-0.10%
libraries.pmi.windows.x64.checked.mch	61,106,336	-8,283	+306.51%
libraries_tests.run.windows.x64.Release.mch	102,053,923	+913	+41.18%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	125,527,215	+11,773	+374.53%
realworld.run.windows.x64.checked.mch	10,646,391	-2,040	-0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	3,564,373	-484	-0.19%

BruceForstall · 2024-04-04T17:52:44Z

Here's a run of asmdiffs on win-x64 between JitOptRepeat 4 and 5 iterations.

Diffs are based on 2,384,205 contexts (931,974 MinOpts, 1,452,231 FullOpts).

MISSED contexts: base: 8,157 (0.34%), diff: 8,162 (0.34%)

Base JIT options: JitOptRepeat#*;JitOptRepeatCount#4

Diff JIT options: JitOptRepeat#*;JitOptRepeatCount#5

Overall (-14,538 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,521,472	-388	-0.12%
benchmarks.run_pgo.windows.x64.checked.mch	35,100,803	-1,606	+79.30%
benchmarks.run_tiered.windows.x64.checked.mch	12,443,020	-318	-0.19%
coreclr_tests.run.windows.x64.checked.mch	398,634,943	-28,742	+118.37%
libraries.crossgen2.windows.x64.checked.mch	44,911,074	+2,441	-0.31%
libraries.pmi.windows.x64.checked.mch	61,206,674	+4,410	+746.88%
libraries_tests.run.windows.x64.Release.mch	274,717,234	+4,295	+46.76%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	135,838,676	+5,179	+577.00%
realworld.run.windows.x64.checked.mch	11,057,515	+452	+0.04%
smoke_tests.nativeaot.windows.x64.checked.mch	3,563,982	-261	-0.02%

FullOpts (-14,538 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,521,111	-388	-0.12%
benchmarks.run_pgo.windows.x64.checked.mch	20,916,962	-1,606	+79.30%
benchmarks.run_tiered.windows.x64.checked.mch	3,278,074	-318	-0.19%
coreclr_tests.run.windows.x64.checked.mch	118,294,912	-28,742	+118.37%
libraries.crossgen2.windows.x64.checked.mch	44,909,884	+2,441	-0.31%
libraries.pmi.windows.x64.checked.mch	61,093,173	+4,410	+746.88%
libraries_tests.run.windows.x64.Release.mch	102,054,836	+4,295	+46.76%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	125,525,258	+5,179	+577.00%
realworld.run.windows.x64.checked.mch	10,644,351	+452	+0.04%
smoke_tests.nativeaot.windows.x64.checked.mch	3,563,889	-261	-0.02%

BruceForstall · 2024-04-04T18:00:48Z

And just "for fun", here is asmdiffs on win-x64 between JitOptRepeat 8 and 9 iterations! Yes, there are still diffs. One question: are optimization changes somehow monotonic, or is it possible the optimizations are oscillating (e.g., one iteration creates a CSE, another removes it)?

Diffs are based on 2,384,201 contexts (931,974 MinOpts, 1,452,227 FullOpts).

MISSED contexts: 8,166 (0.34%)

Base JIT options: JitOptRepeat#*;JitOptRepeatCount#8

Diff JIT options: JitOptRepeat#*;JitOptRepeatCount#9

Overall (+4,080 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,520,257	-64	-0.02%
benchmarks.run_pgo.windows.x64.checked.mch	35,095,669	-227	+87.21%
benchmarks.run_tiered.windows.x64.checked.mch	12,442,539	-38	+0.00%
coreclr_tests.run.windows.x64.checked.mch	398,475,394	-1,406	+2445.47%
libraries.crossgen2.windows.x64.checked.mch	44,910,655	+1,299	-0.03%
libraries.pmi.windows.x64.checked.mch	61,193,216	+2,633	+2193.97%
libraries_tests.run.windows.x64.Release.mch	274,706,838	+3,977	+61.81%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	135,827,290	-2,776	+1920.18%
realworld.run.windows.x64.checked.mch	11,057,374	+600	+0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	3,563,813	+82	+0.07%

FullOpts (+4,080 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,519,896	-64	-0.02%
benchmarks.run_pgo.windows.x64.checked.mch	20,911,828	-227	+87.21%
benchmarks.run_tiered.windows.x64.checked.mch	3,277,593	-38	+0.00%
coreclr_tests.run.windows.x64.checked.mch	118,135,363	-1,406	+2445.47%
libraries.crossgen2.windows.x64.checked.mch	44,909,465	+1,299	-0.03%
libraries.pmi.windows.x64.checked.mch	61,079,715	+2,633	+2193.97%
libraries_tests.run.windows.x64.Release.mch	102,044,440	+3,977	+61.81%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	125,513,872	-2,776	+1920.18%
realworld.run.windows.x64.checked.mch	10,644,210	+600	+0.00%
smoke_tests.nativeaot.windows.x64.checked.mch	3,563,720	+82	+0.07%

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 31, 2024

dotnet-policy-service bot assigned BruceForstall Mar 31, 2024

build-analysis bot mentioned this pull request Apr 1, 2024

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

BruceForstall force-pushed the FixOptRepeat_ForceRepeat2InRelease branch from bea3f5f to 19c5d79 Compare April 1, 2024 22:39

BruceForstall changed the title ~~Force OptRepeat, 2 iterations. Enable in Release.~~ Force 2 iterations of OptRepeat Apr 2, 2024

Force 2 OptRepeat iterations

2c9dbda

BruceForstall force-pushed the FixOptRepeat_ForceRepeat2InRelease branch from 19c5d79 to 2c9dbda Compare April 2, 2024 02:04

BruceForstall added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Apr 2, 2024

BruceForstall closed this Apr 3, 2024

github-actions bot locked and limited conversation to collaborators May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force 2 iterations of OptRepeat #100473

Force 2 iterations of OptRepeat #100473

BruceForstall commented Mar 31, 2024

dotnet-policy-service bot commented Mar 31, 2024

EgorBo commented Apr 1, 2024 •

edited

Loading

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 2, 2024

BruceForstall commented Apr 2, 2024

BruceForstall commented Apr 4, 2024

EgorBo commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

Force 2 iterations of OptRepeat #100473

Force 2 iterations of OptRepeat #100473

Conversation

BruceForstall commented Mar 31, 2024

dotnet-policy-service bot commented Mar 31, 2024

EgorBo commented Apr 1, 2024 • edited Loading

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 1, 2024

BruceForstall commented Apr 2, 2024

BruceForstall commented Apr 2, 2024

BruceForstall commented Apr 4, 2024

EgorBo commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

BruceForstall commented Apr 4, 2024

EgorBo commented Apr 1, 2024 •

edited

Loading