Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Establish loop invariant base case based on IR #97182

Merged
merged 7 commits into from
Jan 31, 2024

Conversation

jakobbotsch
Copy link
Member

Avoid having a cross-phase dependency on loop inversion here. Instead, validate that the condition is an actual zero-trip test.

A few diffs expected due to the removal of the quirk in loop cloning; those are cases where we prove the loop invariant is trivially true in the base case.

Avoid having a cross-phase dependency on loop inversion here. Instead,
validate that the condition is an actual zero-trip test.
@ghost ghost assigned jakobbotsch Jan 18, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 18, 2024
@ghost
Copy link

ghost commented Jan 18, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Avoid having a cross-phase dependency on loop inversion here. Instead, validate that the condition is an actual zero-trip test.

A few diffs expected due to the removal of the quirk in loop cloning; those are cases where we prove the loop invariant is trivially true in the base case.

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,259,627 contexts (1,008,044 MinOpts, 1,251,583 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 2 (0.00%)

Overall (-2,772 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 79,928,956 -1,304
benchmarks.run_tiered.linux.arm64.checked.mch 22,277,428 -632
coreclr_tests.run.linux.arm64.checked.mch 509,755,544 -944
libraries.pmi.linux.arm64.checked.mch 76,286,724 +252
libraries_tests.run.linux.arm64.Release.mch 400,455,948 -156
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 165,114,004 +12
FullOpts (-2,772 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 54,380,584 -1,304
benchmarks.run_tiered.linux.arm64.checked.mch 4,938,464 -632
coreclr_tests.run.linux.arm64.checked.mch 160,847,688 -944
libraries.pmi.linux.arm64.checked.mch 76,166,740 +252
libraries_tests.run.linux.arm64.Release.mch 183,717,044 -156
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,616,728 +12

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,836 contexts (981,298 MinOpts, 1,268,538 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%)

Overall (-1,606 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,179,263 -536
benchmarks.run_tiered.linux.x64.checked.mch 15,898,771 -679
coreclr_tests.run.linux.x64.checked.mch 403,326,592 -490
libraries.pmi.linux.x64.checked.mch 60,406,292 +291
libraries_tests.run.linux.x64.Release.mch 348,614,608 -192
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,684,821 +0
FullOpts (-1,606 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,837,218 -536
benchmarks.run_tiered.linux.x64.checked.mch 3,640,387 -679
coreclr_tests.run.linux.x64.checked.mch 123,835,431 -490
libraries.pmi.linux.x64.checked.mch 60,293,435 +291
libraries_tests.run.linux.x64.Release.mch 164,859,444 -192
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 122,067,035 +0

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,029,494 contexts (927,368 MinOpts, 1,102,126 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%)

Overall (-2,852 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,557,412 -1,376
benchmarks.run_tiered.osx.arm64.checked.mch 15,509,064 -648
coreclr_tests.run.osx.arm64.checked.mch 483,595,744 -944
libraries.pmi.osx.arm64.checked.mch 80,212,804 +252
libraries_tests.run.osx.arm64.Release.mch 314,052,980 -148
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 163,157,008 +12
FullOpts (-2,852 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 18,184,064 -1,376
benchmarks.run_tiered.osx.arm64.checked.mch 4,004,792 -648
coreclr_tests.run.osx.arm64.checked.mch 153,422,976 -944
libraries.pmi.osx.arm64.checked.mch 80,091,676 +252
libraries_tests.run.osx.arm64.Release.mch 112,315,392 -148
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 150,003,316 +12

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,070,984 contexts (937,853 MinOpts, 1,133,131 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 5 (0.00%)

Overall (-2,836 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 46,635,420 -1,304
benchmarks.run_tiered.windows.arm64.checked.mch 15,506,668 -632
coreclr_tests.run.windows.arm64.checked.mch 496,311,484 -944
libraries.pmi.windows.arm64.checked.mch 79,839,104 +252
libraries_tests.run.windows.arm64.Release.mch 327,035,492 -220
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,570,108 +12
FullOpts (-2,836 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 30,377,228 -1,304
benchmarks.run_tiered.windows.arm64.checked.mch 4,328,920 -632
coreclr_tests.run.windows.arm64.checked.mch 156,637,080 -944
libraries.pmi.windows.arm64.checked.mch 79,719,120 +252
libraries_tests.run.windows.arm64.Release.mch 123,561,644 -220
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,416,396 +12

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,098,661 contexts (926,221 MinOpts, 1,172,440 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 3 (0.00%)

Overall (-1,662 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x64.checked.mch 35,808,303 -706
benchmarks.run_tiered.windows.x64.checked.mch 12,549,902 -635
coreclr_tests.run.windows.x64.checked.mch 392,970,471 -640
libraries.pmi.windows.x64.checked.mch 61,645,769 +507
libraries_tests.run.windows.x64.Release.mch 279,151,239 -188
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,561,629 +0
FullOpts (-1,662 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x64.checked.mch 21,776,222 -706
benchmarks.run_tiered.windows.x64.checked.mch 3,454,165 -635
coreclr_tests.run.windows.x64.checked.mch 120,248,493 -640
libraries.pmi.windows.x64.checked.mch 61,532,248 +507
libraries_tests.run.windows.x64.Release.mch 106,976,623 -188
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,635,563 +0

Details here


Assembly diffs for linux/arm ran on linux/x86

Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts).

MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%)

Overall (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 321,790,644 +0
libraries.pmi.linux.arm.checked.mch 49,833,584 +204
libraries_tests.run.linux.arm.Release.mch 244,212,716 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,429,008 +8
FullOpts (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 109,318,208 +0
libraries.pmi.linux.arm.checked.mch 49,727,360 +204
libraries_tests.run.linux.arm.Release.mch 122,360,208 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,358,270 +8

Assembly diffs for windows/x86 ran on linux/x86

Diffs are based on 2,291,556 contexts (838,165 MinOpts, 1,453,391 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 7 (0.00%)

Overall (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,717,704 +66
libraries.pmi.windows.x86.checked.mch 49,272,984 +220
libraries_tests.run.windows.x86.Release.mch 185,799,057 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,837,385 -3
FullOpts (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,716,644 +66
libraries.pmi.windows.x86.checked.mch 49,177,751 +220
libraries_tests.run.windows.x86.Release.mch 88,499,017 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,157,324 -3

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.02%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
FullOpts (-0.03% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.03%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.02%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.windows.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.windows.x64.checked.mch -0.02%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.02%

Details here


@jakobbotsch jakobbotsch marked this pull request as ready for review January 29, 2024 14:02
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 29, 2024

cc @dotnet/jit-contrib PTAL @AndyAyersMS since Bruce is out.

This passed CI but I misclicked the "Update branch" button instead of "Ready for review", so hopefully CI will pass again...

Diffs, as mentioned above due to the removal of the quirk in loop cloning. We can prove some new IV structures legal with IsZeroTripTest (namely some constant upper bounded loops that we do not invert).

@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts).

MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%)

Overall (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 321,790,644 +0
libraries.pmi.linux.arm.checked.mch 49,833,584 +204
libraries_tests.run.linux.arm.Release.mch 244,212,716 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,429,008 +8
FullOpts (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 109,318,208 +0
libraries.pmi.linux.arm.checked.mch 49,727,360 +204
libraries_tests.run.linux.arm.Release.mch 122,360,208 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,358,270 +8

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,291,556 contexts (838,165 MinOpts, 1,453,391 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 7 (0.00%)

Overall (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,717,704 +66
libraries.pmi.windows.x86.checked.mch 49,272,984 +220
libraries_tests.run.windows.x86.Release.mch 185,799,057 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,837,385 -3
FullOpts (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,716,644 +66
libraries.pmi.windows.x86.checked.mch 49,177,751 +220
libraries_tests.run.windows.x86.Release.mch 88,499,017 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,157,324 -3

Details here


Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to address comments subsequently.

return true;
}

for (FlowEdge* enterEdge : EntryEdges())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is handling a preheader, can you add a comment? Can do this in a follow-up.

Also we'd expect that entering is BBJ_ALWAYS

The check in optExtractInitTestIncr is still using bbFallsThrough so I wonder if we're missing some cases from that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is handling a preheader, can you add a comment? Can do this in a follow-up.

Also we'd expect that entering is BBJ_ALWAYS

The check in optExtractInitTestIncr is still using bbFallsThrough so I wonder if we're missing some cases from that.

Currently FlowGraphNaturalLoop::AnalyzeIteration is general enough to be used before preheaders have been created. We don't actually use it during that, but still I kept this similarly general (and matching optExtractInitTestIncr).
We could probably clean up these two methods at the same time. I agree we should also generalize optExtractInitTestIncr, in particular to handle loops we did not invert. We could also consider using dominators to try harder to prove the "loop invariant" to be true when initially entered, since I think we usually have them available in the places that call FlowGraphNaturalLoop::AnalyzeIteration, so we could do something similar to RBO here.

I'll add a comment. I also just noticed that I forgot the equality check on the limits, so need to add that too.

}

//------------------------------------------------------------------------
// CondInitBlockEnterSide: Determine whether a BBJ_COND init block enters the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit -- from the name I thought initially this was checking for side entries into loops.

Maybe something like InitBlockEntersLoopOnTrue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it.

@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,259,627 contexts (1,008,044 MinOpts, 1,251,583 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 2 (0.00%)

Overall (-2,772 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 79,930,032 -1,304
benchmarks.run_tiered.linux.arm64.checked.mch 22,277,444 -632
coreclr_tests.run.linux.arm64.checked.mch 509,755,716 -944
libraries.pmi.linux.arm64.checked.mch 76,286,868 +252
libraries_tests.run.linux.arm64.Release.mch 400,456,432 -156
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 165,114,112 +12
FullOpts (-2,772 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 54,381,660 -1,304
benchmarks.run_tiered.linux.arm64.checked.mch 4,938,480 -632
coreclr_tests.run.linux.arm64.checked.mch 160,847,860 -944
libraries.pmi.linux.arm64.checked.mch 76,166,884 +252
libraries_tests.run.linux.arm64.Release.mch 183,717,528 -156
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,616,836 +12

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,836 contexts (981,298 MinOpts, 1,268,538 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%)

Overall (-1,606 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,189,191 -536
benchmarks.run_tiered.linux.x64.checked.mch 15,898,651 -679
coreclr_tests.run.linux.x64.checked.mch 403,326,918 -490
libraries.pmi.linux.x64.checked.mch 60,406,427 +291
libraries_tests.run.linux.x64.Release.mch 348,617,418 -192
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,685,567 +0
FullOpts (-1,606 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,847,146 -536
benchmarks.run_tiered.linux.x64.checked.mch 3,640,267 -679
coreclr_tests.run.linux.x64.checked.mch 123,835,757 -490
libraries.pmi.linux.x64.checked.mch 60,293,570 +291
libraries_tests.run.linux.x64.Release.mch 164,862,254 -192
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 122,067,781 +0

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,029,494 contexts (927,368 MinOpts, 1,102,126 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%)

Overall (-2,852 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,558,040 -1,376
benchmarks.run_tiered.osx.arm64.checked.mch 15,509,076 -648
coreclr_tests.run.osx.arm64.checked.mch 483,595,856 -944
libraries.pmi.osx.arm64.checked.mch 80,212,944 +252
libraries_tests.run.osx.arm64.Release.mch 314,053,392 -148
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 163,157,116 +12
FullOpts (-2,852 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 18,184,692 -1,376
benchmarks.run_tiered.osx.arm64.checked.mch 4,004,804 -648
coreclr_tests.run.osx.arm64.checked.mch 153,423,088 -944
libraries.pmi.osx.arm64.checked.mch 80,091,816 +252
libraries_tests.run.osx.arm64.Release.mch 112,315,804 -148
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 150,003,424 +12

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,070,984 contexts (937,853 MinOpts, 1,133,131 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 5 (0.00%)

Overall (-2,836 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 46,636,208 -1,304
benchmarks.run_tiered.windows.arm64.checked.mch 15,506,676 -632
coreclr_tests.run.windows.arm64.checked.mch 496,311,480 -944
libraries.pmi.windows.arm64.checked.mch 79,839,096 +252
libraries_tests.run.windows.arm64.Release.mch 327,034,696 -220
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,570,192 +12
FullOpts (-2,836 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 30,378,016 -1,304
benchmarks.run_tiered.windows.arm64.checked.mch 4,328,928 -632
coreclr_tests.run.windows.arm64.checked.mch 156,637,076 -944
libraries.pmi.windows.arm64.checked.mch 79,719,112 +252
libraries_tests.run.windows.arm64.Release.mch 123,560,848 -220
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,416,480 +12

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,098,661 contexts (926,221 MinOpts, 1,172,440 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 3 (0.00%)

Overall (-1,662 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x64.checked.mch 35,812,172 -706
benchmarks.run_tiered.windows.x64.checked.mch 12,549,776 -635
coreclr_tests.run.windows.x64.checked.mch 392,970,524 -640
libraries.pmi.windows.x64.checked.mch 61,645,679 +507
libraries_tests.run.windows.x64.Release.mch 279,155,563 -188
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,562,577 +0
FullOpts (-1,662 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x64.checked.mch 21,780,091 -706
benchmarks.run_tiered.windows.x64.checked.mch 3,454,039 -635
coreclr_tests.run.windows.x64.checked.mch 120,248,546 -640
libraries.pmi.windows.x64.checked.mch 61,532,158 +507
libraries_tests.run.windows.x64.Release.mch 106,980,947 -188
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,636,511 +0

Details here


Assembly diffs for linux/arm ran on linux/x86

Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts).

MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%)

Overall (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 321,790,690 +0
libraries.pmi.linux.arm.checked.mch 49,833,712 +204
libraries_tests.run.linux.arm.Release.mch 244,212,956 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,429,126 +8
FullOpts (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 109,318,254 +0
libraries.pmi.linux.arm.checked.mch 49,727,488 +204
libraries_tests.run.linux.arm.Release.mch 122,360,448 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,358,388 +8

Assembly diffs for windows/x86 ran on linux/x86

Diffs are based on 2,291,532 contexts (838,165 MinOpts, 1,453,367 FullOpts).

MISSED contexts: base: 24 (0.00%), diff: 31 (0.00%)

Overall (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,713,486 +66
libraries.pmi.windows.x86.checked.mch 49,264,590 +220
libraries_tests.run.windows.x86.Release.mch 185,796,587 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,830,242 -3
FullOpts (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,712,426 +66
libraries.pmi.windows.x86.checked.mch 49,169,357 +220
libraries_tests.run.windows.x86.Release.mch 88,496,547 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,150,181 -3

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.02%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
FullOpts (-0.03% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.03%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.01%
benchmarks.run_tiered.windows.arm64.checked.mch -0.02%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.windows.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.windows.x64.checked.mch -0.02%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
FullOpts (-0.02% to +0.00%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.02%

Details here


@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,053,638 contexts (830,101 MinOpts, 1,223,537 FullOpts).

MISSED contexts: base: 71,236 (3.35%), diff: 71,237 (3.35%)

Overall (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 321,790,690 +0
libraries.pmi.linux.arm.checked.mch 49,833,712 +204
libraries_tests.run.linux.arm.Release.mch 244,212,956 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,429,126 +8
FullOpts (+226 bytes)
Collection Base size (bytes) Diff size (bytes)
coreclr_tests.run.linux.arm.checked.mch 109,318,254 +0
libraries.pmi.linux.arm.checked.mch 49,727,488 +204
libraries_tests.run.linux.arm.Release.mch 122,360,448 +14
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,358,388 +8

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,291,532 contexts (838,165 MinOpts, 1,453,367 FullOpts).

MISSED contexts: base: 24 (0.00%), diff: 31 (0.00%)

Overall (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,713,486 +66
libraries.pmi.windows.x86.checked.mch 49,264,590 +220
libraries_tests.run.windows.x86.Release.mch 185,796,587 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,830,242 -3
FullOpts (+280 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries.crossgen2.windows.x86.checked.mch 31,712,426 +66
libraries.pmi.windows.x86.checked.mch 49,169,357 +220
libraries_tests.run.windows.x86.Release.mch 88,496,547 -3
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,150,181 -3

Details here


@jakobbotsch
Copy link
Member Author

Looks like I have some failures to investigate (probably some kind of conflict with #97488).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 29, 2024

Hmm, well the issue seems introduced by #97488. optExtractInitTestIncr needs to be updated and presumably BasicBlock::bbFallsThrough() needs to be updated as well -- it still returns true for BBJ_COND. optExtractInitTestIncr is picking illegal init blocks currently.
cc @amanasifkhalid

@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,259,462 contexts (1,008,044 MinOpts, 1,251,418 FullOpts).

MISSED contexts: base: 159 (0.01%), diff: 160 (0.01%)

Overall (+42,496 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 14,972,060 +136
coreclr_tests.run.linux.arm64.checked.mch 509,739,552 +15,164
libraries.pmi.linux.arm64.checked.mch 76,280,108 +252
libraries_tests.run.linux.arm64.Release.mch 400,018,564 +26,788
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 165,109,668 +12
realworld.run.linux.arm64.checked.mch 15,918,288 +144
FullOpts (+42,496 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 14,712,308 +136
coreclr_tests.run.linux.arm64.checked.mch 160,831,696 +15,164
libraries.pmi.linux.arm64.checked.mch 76,160,124 +252
libraries_tests.run.linux.arm64.Release.mch 183,279,660 +26,788
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,612,392 +12
realworld.run.linux.arm64.checked.mch 15,336,748 +144

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,694 contexts (981,298 MinOpts, 1,268,396 FullOpts).

MISSED contexts: base: 134 (0.01%), diff: 135 (0.01%)

Overall (+36,024 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 13,722,967 +71
benchmarks.run_pgo.linux.x64.checked.mch 69,144,788 +714
coreclr_tests.run.linux.x64.checked.mch 403,316,142 +12,690
libraries.pmi.linux.x64.checked.mch 60,405,130 +291
libraries_tests.run.linux.x64.Release.mch 348,249,622 +22,187
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,683,678 +0
realworld.run.linux.x64.checked.mch 13,212,110 +71
FullOpts (+36,024 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 13,459,034 +71
benchmarks.run_pgo.linux.x64.checked.mch 47,802,743 +714
coreclr_tests.run.linux.x64.checked.mch 123,824,981 +12,690
libraries.pmi.linux.x64.checked.mch 60,292,273 +291
libraries_tests.run.linux.x64.Release.mch 164,494,458 +22,187
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 122,065,892 +0
realworld.run.linux.x64.checked.mch 12,823,228 +71

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,029,378 contexts (927,368 MinOpts, 1,102,010 FullOpts).

MISSED contexts: base: 109 (0.01%), diff: 110 (0.01%)

Overall (+28,984 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,184,380 +136
coreclr_tests.run.osx.arm64.checked.mch 483,585,340 +10,140
libraries.pmi.osx.arm64.checked.mch 80,206,160 +396
libraries_tests.run.osx.arm64.Release.mch 313,700,180 +18,156
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 163,152,648 +12
realworld.run.osx.arm64.checked.mch 15,075,948 +144
FullOpts (+28,984 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,183,752 +136
coreclr_tests.run.osx.arm64.checked.mch 153,412,572 +10,140
libraries.pmi.osx.arm64.checked.mch 80,085,032 +396
libraries_tests.run.osx.arm64.Release.mch 111,962,592 +18,156
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 149,998,956 +12
realworld.run.osx.arm64.checked.mch 14,511,996 +144

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,070,840 contexts (937,853 MinOpts, 1,132,987 FullOpts).

MISSED contexts: base: 139 (0.01%), diff: 143 (0.01%)

Overall (+36,936 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,965,692 +144
benchmarks.run_tiered.windows.arm64.checked.mch 15,506,140 +64
coreclr_tests.run.windows.arm64.checked.mch 496,297,948 +14,168
libraries.pmi.windows.arm64.checked.mch 79,832,212 +252
libraries_tests.run.windows.arm64.Release.mch 326,696,628 +22,152
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,564,392 +12
realworld.run.windows.arm64.checked.mch 15,891,320 +144
FullOpts (+36,936 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,965,156 +144
benchmarks.run_tiered.windows.arm64.checked.mch 4,328,392 +64
coreclr_tests.run.windows.arm64.checked.mch 156,623,544 +14,168
libraries.pmi.windows.arm64.checked.mch 79,712,228 +252
libraries_tests.run.windows.arm64.Release.mch 123,222,780 +22,152
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,410,680 +12
realworld.run.windows.arm64.checked.mch 15,327,340 +144

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,098,518 contexts (926,221 MinOpts, 1,172,297 FullOpts).

MISSED contexts: base: 138 (0.01%), diff: 140 (0.01%)

Overall (+33,203 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,736,891 +119
benchmarks.run_pgo.windows.x64.checked.mch 35,778,033 +710
coreclr_tests.run.windows.x64.checked.mch 392,964,023 +12,721
libraries.pmi.windows.x64.checked.mch 61,645,095 +507
libraries_tests.run.windows.x64.Release.mch 278,843,071 +19,044
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,560,760 +0
realworld.run.windows.x64.checked.mch 14,184,922 +102
FullOpts (+33,203 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,736,528 +119
benchmarks.run_pgo.windows.x64.checked.mch 21,745,952 +710
coreclr_tests.run.windows.x64.checked.mch 120,242,045 +12,721
libraries.pmi.windows.x64.checked.mch 61,531,574 +507
libraries_tests.run.windows.x64.Release.mch 106,668,455 +19,044
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,634,694 +0
realworld.run.windows.x64.checked.mch 13,798,313 +102

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,053,494 contexts (830,101 MinOpts, 1,223,393 FullOpts).

MISSED contexts: base: 71,368 (3.36%), diff: 71,369 (3.36%)

Overall (+34,456 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,772,250 +140
benchmarks.run_pgo.linux.arm.checked.mch 68,605,354 +118
benchmarks.run_tiered.linux.arm.checked.mch 18,108,580 +104
coreclr_tests.run.linux.arm.checked.mch 321,785,900 +10,384
libraries.pmi.linux.arm.checked.mch 49,828,568 +204
libraries_tests.run.linux.arm.Release.mch 244,096,990 +23,358
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,425,864 +8
realworld.run.linux.arm.checked.mch 13,618,802 +140
FullOpts (+34,456 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,410,744 +140
benchmarks.run_pgo.linux.arm.checked.mch 55,931,552 +118
benchmarks.run_tiered.linux.arm.checked.mch 10,724,598 +104
coreclr_tests.run.linux.arm.checked.mch 109,313,464 +10,384
libraries.pmi.linux.arm.checked.mch 49,722,344 +204
libraries_tests.run.linux.arm.Release.mch 122,244,482 +23,358
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,355,126 +8
realworld.run.linux.arm.checked.mch 13,183,502 +140

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,290,734 contexts (838,165 MinOpts, 1,452,569 FullOpts).

MISSED contexts: base: 808 (0.04%), diff: 815 (0.04%)

Overall (+28,904 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,120,212 +80
benchmarks.run_pgo.windows.x86.checked.mch 45,137,426 +81
benchmarks.run_tiered.windows.x86.checked.mch 9,472,425 +86
coreclr_tests.run.windows.x86.checked.mch 309,367,285 +10,748
libraries.crossgen2.windows.x86.checked.mch 31,673,137 +116
libraries.pmi.windows.x86.checked.mch 49,151,797 +220
libraries_tests.run.windows.x86.Release.mch 184,751,383 +17,496
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,727,320 -3
realworld.run.windows.x86.checked.mch 11,283,258 +80
FullOpts (+28,904 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,119,931 +80
benchmarks.run_pgo.windows.x86.checked.mch 38,525,060 +81
benchmarks.run_tiered.windows.x86.checked.mch 5,202,833 +86
coreclr_tests.run.windows.x86.checked.mch 107,571,708 +10,748
libraries.crossgen2.windows.x86.checked.mch 31,672,077 +116
libraries.pmi.windows.x86.checked.mch 49,056,564 +220
libraries_tests.run.windows.x86.Release.mch 87,451,343 +17,496
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,047,259 -3
realworld.run.windows.x86.checked.mch 10,987,544 +80

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.00% to +0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
FullOpts (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.03%
libraries_tests.run.linux.arm64.Release.mch +0.03%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.03%
FullOpts (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +0.03%
libraries_tests.run.linux.x64.Release.mch +0.03%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.00% to +0.02%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.02%
FullOpts (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries_tests.run.osx.arm64.Release.mch +0.03%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.03%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%
FullOpts (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_tiered.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.04%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.windows.x64.checked.mch +0.02%
libraries_tests.run.windows.x64.Release.mch +0.03%
FullOpts (-0.00% to +0.04%)
Collection PDIFF
coreclr_tests.run.windows.x64.checked.mch +0.03%
libraries_tests.run.windows.x64.Release.mch +0.04%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.03%
FullOpts (+0.00% to +0.04%)
Collection PDIFF
coreclr_tests.run.linux.arm.checked.mch +0.02%
libraries_tests.run.linux.arm.Release.mch +0.04%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
coreclr_tests.run.windows.x86.checked.mch +0.02%
libraries_tests.run.windows.x86.Release.mch +0.04%
FullOpts (+0.00% to +0.05%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
coreclr_tests.run.windows.x86.checked.mch +0.03%
libraries_tests.run.windows.x86.Release.mch +0.05%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.00% to +0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
FullOpts (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.03%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
smoke_tests.nativeaot.linux.x64.checked.mch +0.01%
coreclr_tests.run.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.03%
FullOpts (+0.00% to +0.03%)
Collection PDIFF
smoke_tests.nativeaot.linux.x64.checked.mch +0.01%
coreclr_tests.run.linux.x64.checked.mch +0.03%
libraries_tests.run.linux.x64.Release.mch +0.03%

Details here


@ryujit-bot
Copy link

Diff results for #97182

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.00% to +0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.03%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.03%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.04%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
coreclr_tests.run.linux.x64.checked.mch +0.03%
libraries_tests.run.linux.x64.Release.mch +0.04%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.00% to +0.02%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.02%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries_tests.run.osx.arm64.Release.mch +0.03%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.03%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.01%
FullOpts (-0.01% to +0.04%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.04%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.00% to +0.03%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.01%
benchmarks.run.windows.x64.checked.mch +0.01%
coreclr_tests.run.windows.x64.checked.mch +0.02%
libraries_tests.run.windows.x64.Release.mch +0.03%
FullOpts (-0.00% to +0.04%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.01%
benchmarks.run.windows.x64.checked.mch +0.01%
coreclr_tests.run.windows.x64.checked.mch +0.03%
libraries_tests.run.windows.x64.Release.mch +0.04%

Details here


@ryujit-bot
Copy link

Diff results for #97182

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,507,309 contexts (1,007,092 MinOpts, 1,500,217 FullOpts).

MISSED contexts: base: 8 (0.00%), diff: 1 (0.00%)

Overall (+40,840 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,557,684 +144
benchmarks.run_pgo.linux.arm64.checked.mch 80,094,232 -60
benchmarks.run_tiered.linux.arm64.checked.mch 24,601,160 -208
coreclr_tests.run.linux.arm64.checked.mch 508,772,900 +14,648
libraries.crossgen2.linux.arm64.checked.mch 55,844,108 +248
libraries.pmi.linux.arm64.checked.mch 76,294,184 +252
libraries_tests.run.linux.arm64.Release.mch 395,688,828 +25,700
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 165,003,208 +12
realworld.run.linux.arm64.checked.mch 15,903,660 +144
smoke_tests.nativeaot.linux.arm64.checked.mch 2,946,792 -40
FullOpts (+40,840 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,252,756 +144
benchmarks.run_pgo.linux.arm64.checked.mch 54,159,076 -60
benchmarks.run_tiered.linux.arm64.checked.mch 4,862,504 -208
coreclr_tests.run.linux.arm64.checked.mch 160,583,988 +14,648
libraries.crossgen2.linux.arm64.checked.mch 55,842,472 +248
libraries.pmi.linux.arm64.checked.mch 76,174,200 +252
libraries_tests.run.linux.arm64.Release.mch 180,556,876 +25,700
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,505,744 +12
realworld.run.linux.arm64.checked.mch 15,322,736 +144
smoke_tests.nativeaot.linux.arm64.checked.mch 2,945,804 -40

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,517,900 contexts (991,070 MinOpts, 1,526,830 FullOpts).

MISSED contexts: base: 8 (0.00%), diff: 1 (0.00%)

Overall (+36,557 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 14,336,796 +99
benchmarks.run_pgo.linux.x64.checked.mch 71,590,604 +663
benchmarks.run_tiered.linux.x64.checked.mch 21,435,743 -218
coreclr_tests.run.linux.x64.checked.mch 403,710,882 +12,598
libraries.crossgen2.linux.x64.checked.mch 38,727,192 +237
libraries.pmi.linux.x64.checked.mch 60,419,372 +291
libraries_tests.run.linux.x64.Release.mch 337,107,943 +22,926
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,558,366 +0
realworld.run.linux.x64.checked.mch 13,175,050 +71
smoke_tests.nativeaot.linux.x64.checked.mch 4,234,485 -110
FullOpts (+36,557 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 14,037,493 +99
benchmarks.run_pgo.linux.x64.checked.mch 47,790,615 +663
benchmarks.run_tiered.linux.x64.checked.mch 3,694,963 -218
coreclr_tests.run.linux.x64.checked.mch 123,956,180 +12,598
libraries.crossgen2.linux.x64.checked.mch 38,725,994 +237
libraries.pmi.linux.x64.checked.mch 60,306,515 +291
libraries_tests.run.linux.x64.Release.mch 153,348,250 +22,926
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,940,598 +0
realworld.run.linux.x64.checked.mch 12,789,166 +71
smoke_tests.nativeaot.linux.x64.checked.mch 4,233,536 -110

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,270,860 contexts (932,669 MinOpts, 1,338,191 FullOpts).

MISSED contexts: base: 9 (0.00%), diff: 2 (0.00%)

Overall (+28,752 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,186,448 +136
benchmarks.run_pgo.osx.arm64.checked.mch 34,437,920 -60
benchmarks.run_tiered.osx.arm64.checked.mch 15,516,336 -208
coreclr_tests.run.osx.arm64.checked.mch 486,460,744 +10,180
libraries.crossgen2.osx.arm64.checked.mch 55,725,580 +248
libraries.pmi.osx.arm64.checked.mch 80,219,132 +396
libraries_tests.run.osx.arm64.Release.mch 324,580,644 +17,904
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 162,573,872 +12
realworld.run.osx.arm64.checked.mch 15,061,040 +144
FullOpts (+28,752 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,185,912 +136
benchmarks.run_pgo.osx.arm64.checked.mch 18,136,248 -60
benchmarks.run_tiered.osx.arm64.checked.mch 4,011,632 -208
coreclr_tests.run.osx.arm64.checked.mch 153,807,060 +10,180
libraries.crossgen2.osx.arm64.checked.mch 55,723,952 +248
libraries.pmi.osx.arm64.checked.mch 80,098,004 +396
libraries_tests.run.osx.arm64.Release.mch 120,864,796 +17,904
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 149,420,144 +12
realworld.run.osx.arm64.checked.mch 14,497,084 +144

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,341,100 contexts (938,449 MinOpts, 1,402,651 FullOpts).

MISSED contexts: base: 8 (0.00%), diff: 9 (0.00%)

Overall (+36,536 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,965,984 +144
benchmarks.run_pgo.windows.arm64.checked.mch 45,573,984 -60
benchmarks.run_tiered.windows.arm64.checked.mch 15,587,216 -144
coreclr_tests.run.windows.arm64.checked.mch 495,312,096 +12,992
libraries.crossgen2.windows.arm64.checked.mch 59,069,324 +248
libraries.pmi.windows.arm64.checked.mch 79,845,240 +252
libraries_tests.run.windows.arm64.Release.mch 330,792,848 +23,028
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,580,828 +12
realworld.run.windows.arm64.checked.mch 15,904,740 +144
smoke_tests.nativeaot.windows.arm64.checked.mch 3,970,172 -80
FullOpts (+36,536 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,965,448 +144
benchmarks.run_pgo.windows.arm64.checked.mch 29,562,216 -60
benchmarks.run_tiered.windows.arm64.checked.mch 4,409,808 -144
coreclr_tests.run.windows.arm64.checked.mch 156,582,232 +12,992
libraries.crossgen2.windows.arm64.checked.mch 59,067,688 +248
libraries.pmi.windows.arm64.checked.mch 79,725,256 +252
libraries_tests.run.windows.arm64.Release.mch 127,359,252 +23,028
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,427,080 +12
realworld.run.windows.arm64.checked.mch 15,340,760 +144
smoke_tests.nativeaot.windows.arm64.checked.mch 3,969,160 -80

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,512,201 contexts (997,391 MinOpts, 1,514,810 FullOpts).

MISSED contexts: base: 8 (0.00%), diff: 3 (0.00%)

Overall (+34,541 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 47,041,738 +622
benchmarks.run.windows.x64.checked.mch 8,742,508 +102
benchmarks.run_pgo.windows.x64.checked.mch 36,236,407 +642
benchmarks.run_tiered.windows.x64.checked.mch 12,416,111 -110
coreclr_tests.run.windows.x64.checked.mch 393,193,248 +12,446
libraries.crossgen2.windows.x64.checked.mch 39,485,973 +438
libraries.pmi.windows.x64.checked.mch 61,661,946 +507
libraries_tests.run.windows.x64.Release.mch 282,113,061 +19,689
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,067,068 +0
realworld.run.windows.x64.checked.mch 14,130,922 +102
smoke_tests.nativeaot.windows.x64.checked.mch 5,083,011 +103
FullOpts (+34,541 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,550,689 +622
benchmarks.run.windows.x64.checked.mch 8,742,145 +102
benchmarks.run_pgo.windows.x64.checked.mch 22,065,751 +642
benchmarks.run_tiered.windows.x64.checked.mch 3,316,872 -110
coreclr_tests.run.windows.x64.checked.mch 120,404,394 +12,446
libraries.crossgen2.windows.x64.checked.mch 39,484,786 +438
libraries.pmi.windows.x64.checked.mch 61,548,425 +507
libraries_tests.run.windows.x64.Release.mch 106,254,743 +19,689
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,447,962 +0
realworld.run.windows.x64.checked.mch 13,744,313 +102
smoke_tests.nativeaot.windows.x64.checked.mch 5,082,064 +103

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,239,390 contexts (829,328 MinOpts, 1,410,062 FullOpts).

MISSED contexts: base: 71,273 (3.08%), diff: 71,274 (3.08%)

Overall (+33,226 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,292,204 +140
benchmarks.run_pgo.linux.arm.checked.mch 63,958,726 +118
benchmarks.run_tiered.linux.arm.checked.mch 21,548,242 +104
coreclr_tests.run.linux.arm.checked.mch 321,753,238 +10,062
libraries.crossgen2.linux.arm.checked.mch 34,522,594 +100
libraries.pmi.linux.arm.checked.mch 49,856,628 +204
libraries_tests.run.linux.arm.Release.mch 243,861,550 +22,350
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,524,240 +8
realworld.run.linux.arm.checked.mch 13,606,688 +140
FullOpts (+33,226 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,903,002 +140
benchmarks.run_pgo.linux.arm.checked.mch 52,758,760 +118
benchmarks.run_tiered.linux.arm.checked.mch 12,895,242 +104
coreclr_tests.run.linux.arm.checked.mch 109,275,650 +10,062
libraries.crossgen2.linux.arm.checked.mch 34,521,364 +100
libraries.pmi.linux.arm.checked.mch 49,750,404 +204
libraries_tests.run.linux.arm.Release.mch 122,892,418 +22,350
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,453,504 +8
realworld.run.linux.arm.checked.mch 13,171,388 +140

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,293,488 contexts (839,658 MinOpts, 1,453,830 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 8 (0.00%)

Overall (+29,276 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,124,689 +77
benchmarks.run_pgo.windows.x86.checked.mch 44,988,104 +88
benchmarks.run_tiered.windows.x86.checked.mch 9,472,179 +89
coreclr_tests.run.windows.x86.checked.mch 309,393,378 +10,753
libraries.crossgen2.windows.x86.checked.mch 31,716,415 +116
libraries.pmi.windows.x86.checked.mch 49,289,080 +220
libraries_tests.run.windows.x86.Release.mch 186,679,151 +17,856
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 103,828,161 -3
realworld.run.windows.x86.checked.mch 11,355,574 +80
FullOpts (+29,276 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,124,408 +77
benchmarks.run_pgo.windows.x86.checked.mch 38,399,473 +88
benchmarks.run_tiered.windows.x86.checked.mch 5,202,358 +89
coreclr_tests.run.windows.x86.checked.mch 107,605,244 +10,753
libraries.crossgen2.windows.x86.checked.mch 31,715,355 +116
libraries.pmi.windows.x86.checked.mch 49,193,847 +220
libraries_tests.run.windows.x86.Release.mch 88,405,451 +17,856
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,148,097 -3
realworld.run.windows.x86.checked.mch 11,059,860 +80

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.00% to +0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.03%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.03%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.04%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
coreclr_tests.run.linux.x64.checked.mch +0.03%
libraries_tests.run.linux.x64.Release.mch +0.04%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.00% to +0.02%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.02%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_tiered.osx.arm64.checked.mch -0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries_tests.run.osx.arm64.Release.mch +0.03%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.03%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.01%
FullOpts (-0.01% to +0.04%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.04%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.00% to +0.03%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.01%
benchmarks.run.windows.x64.checked.mch +0.01%
coreclr_tests.run.windows.x64.checked.mch +0.02%
libraries_tests.run.windows.x64.Release.mch +0.03%
FullOpts (-0.00% to +0.04%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.01%
benchmarks.run.windows.x64.checked.mch +0.01%
coreclr_tests.run.windows.x64.checked.mch +0.03%
libraries_tests.run.windows.x64.Release.mch +0.04%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.00% to +0.03%)
Collection PDIFF
coreclr_tests.run.linux.arm.checked.mch +0.01%
libraries_tests.run.linux.arm.Release.mch +0.03%
FullOpts (+0.00% to +0.04%)
Collection PDIFF
coreclr_tests.run.linux.arm.checked.mch +0.02%
libraries_tests.run.linux.arm.Release.mch +0.04%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
coreclr_tests.run.windows.x86.checked.mch +0.02%
libraries_tests.run.windows.x86.Release.mch +0.04%
FullOpts (+0.00% to +0.05%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
coreclr_tests.run.windows.x86.checked.mch +0.03%
libraries_tests.run.windows.x86.Release.mch +0.05%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.01% to +0.02%)
Collection PDIFF
smoke_tests.nativeaot.linux.arm64.checked.mch -0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%
FullOpts (-0.01% to +0.02%)
Collection PDIFF
smoke_tests.nativeaot.linux.arm64.checked.mch -0.01%
libraries_tests.run.linux.arm64.Release.mch +0.02%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
benchmarks.run_tiered.linux.arm64.checked.mch -0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.01% to +0.02%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
libraries_tests.run.linux.x64.Release.mch +0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
libraries_tests.run.linux.x64.Release.mch +0.03%
coreclr_tests.run.linux.x64.checked.mch +0.02%
benchmarks.run_tiered.linux.x64.checked.mch -0.01%

Details here


@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 31, 2024

We stop loop cloning in some more OSR cases with this change. For example, in LUDecomp.DoLUIteration (link: https://github.com/dotnet/performance/blob/17d59b4925fe3ed2736ee5d7a382f745fca06ec6/src/benchmarks/micro/runtime/Bytemark/ludecomp.cs#L151) we have this flowgraph, and get this diff with the change:

 Considering loop L02 to clone for optimizations.
 Analyzing iteration for L02 with header BB04
   Preheader = BB02
   Checking exiting block BB08
   Init = [000111], test = [000005], incr = [000047]
-  Condition is established before entry at [000020]
-  IterVar = V09
-  Test is [000004] (invariant local limit )
-Checking loop L02 for optimization candidates (array bounds)
-Found ArrIndex at BB06 STMT00009 tree [000189] which is equivalent to: V01[V10], bounds check nodes: [000189]
-Induction V09 is not used as index on dim 0
-Found ArrIndex at BB06 STMT00010 tree [000202] which is equivalent to: V06[V10], bounds check nodes: [000202]
-V06 is not loop invariant
-Found ArrIndex at BB03 STMT00002 tree [000214] which is equivalent to: V02[V09], bounds check nodes: [000214]
-Loop L02 can be cloned for ArrIndex V02[V09] on dim 0
-Found ArrIndex at BB03 STMT00003 tree [000227] which is equivalent to: V03[V09], bounds check nodes: [000227]
-Loop L02 can be cloned for ArrIndex V03[V09] on dim 0
+  Init block BB14 enters the loop when condition [000020] evaluates to false
+    Relop does not involve iteration variable
+  Loop condition may not be true on the first iteration
+Loop cloning: rejecting loop L02. Could not analyze iteration.

The test is

------------ BB08 [0010] [052..05E) -> BB03,BB09 (cond), preds={BB04,BB07} succs={BB09,BB03}

***** BB08 [0010]
STMT00012 ( 0x052[E-] ... 0x056 )
               [000047] DA---+-----STORE_LCL_VAR int    V09 loc4         
               [000046] -----+-----                         └──▌  ADD       int   
               [000044] -----+-----                            ├──▌  LCL_VAR   int    V09 loc4         
               [000045] -----+-----                            └──▌  CNS_INT   int    1

***** BB08 [0010]
STMT00001 ( 0x058[E-] ... 0x05C )
     (  9,  7) [000005] -----------JTRUE     void  
     (  7,  5) [000004] J------N---                         └──▌  LT        int   
     (  3,  2) [000002] -----------                            ├──▌  LCL_VAR   int    V09 loc4         
     (  3,  2) [000003] -----------                            └──▌  LCL_VAR   int    V04 arg4         

while [000020], the identified dominating compare that is outside the loop, is

------------ BB14 [0005] [02D..039) -> BB10,BB02 (cond), preds={BB11,BB13} succs={BB02,BB10}

***** BB14 [0005]
STMT00029 ( 0x02D[E-] ... 0x031 )
               [000111] DA---+-----STORE_LCL_VAR int    V10 loc5         
               [000110] -----+-----                         └──▌  ADD       int   
               [000108] -----+-----                            ├──▌  LCL_VAR   int    V10 loc5         
               [000109] -----+-----                            └──▌  CNS_INT   int    1

***** BB14 [0005]
STMT00005 ( 0x033[E-] ... 0x037 )
     (  7,  6) [000021] -----------JTRUE     void  
     (  5,  4) [000020] J------N---                         └──▌  LT        int   
     (  3,  2) [000018] -----------                            ├──▌  LCL_VAR   int    V10 loc5         
     (  1,  1) [000019] -----------                            └──▌  CNS_INT   int    101

Clearly these are unrelated compares and [000020] does not actually guarantee that [000005] is true on entry, so the fact that it was produced by loop inversion is not sufficient.

In both base and diff we fail to clone the inner loop L03. That's because BB06 and BB07 haven't been compacted at this point so we don't recognize the IR as the iteration + compare:

***** BB06 [0008]
STMT00011 ( 0x046[E-] ... 0x04A )
               [000043] DA---+-----STORE_LCL_VAR int    V10 loc5         
               [000042] -----+-----                         └──▌  ADD       int   
               [000040] -----+-----                            ├──▌  LCL_VAR   int    V10 loc5         
               [000041] -----+-----                            └──▌  CNS_INT   int    1

------------ BB07 [0009] [04C..052) -> BB06,BB08 (cond), preds={BB06} succs={BB08,BB06}

***** BB07 [0009]
STMT00008 ( 0x04C[E-] ... 0x050 )
     (  7,  6) [000029] -----------JTRUE     void  
     (  5,  4) [000028] J------N---                         └──▌  LT        int   
     (  3,  2) [000026] -----------                            ├──▌  LCL_VAR   int    V10 loc5         
     (  1,  1) [000027] -----------                            └──▌  CNS_INT   int    101

I think we should be able to generalize optExtractInitTestIncr to handle cases like this with relative ease.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 31, 2024

A large part of the diff is a repeated one:

[11:33:08]          112 ( 3.03% of base) : 565304.dasm - System.Text.StringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):System.Text.StringBuilder:this (Instrumented Tier1)
[11:33:08]          112 ( 2.97% of base) : 482762.dasm - System.Text.StringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):System.Text.StringBuilder:this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 541847.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 322125.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 560908.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 425423.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 541534.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 494106.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 439150.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 520847.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 541843.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          112 ( 2.61% of base) : 541839.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 306927.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 453727.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 311282.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 460376.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 571540.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 452934.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 495294.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)
[11:33:08]          110 ( 2.55% of base) : 558635.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ReadOnlySpan`1[System.Object]):this (Instrumented Tier1)

It looks like that's because of new cloning with this change. The flowgraph looks like this, and the diff with this change is:

 Considering loop L04 to clone for optimizations.
 Analyzing iteration for L04 with header BB56
   Preheader = BB54
   Checking exiting block BB56
     Could not extract an IV
   Checking exiting block BB57
     Could not extract an IV
   Checking exiting block BB55
   Init = [001280], test = [001306], incr = [001301]
-  Loop condition is not always true
-Loop cloning: rejecting loop L04. Could not analyze iteration.
+  Init block BB53 enters the loop when condition [001284] evaluates to true
+    op1 is the iteration variable
+  Condition is established before entry at [001284]
+  IterVar = V05
+  Test is [001305] (array length limit )
+Checking loop L04 for optimization candidates (array bounds)
+Found ArrIndex at BB56 STMT00110 tree [002273] which is equivalent to: V02[V05], bounds check nodes: [002273]
+Loop L04 can be cloned for ArrIndex V02[V05] on dim 0

So it looks like we get a loop shape where we didn't do loop inversion, but we can now still prove that the loop invariant is true everywhere. The test is:

***** BB55 [0029]
STMT00305 ( INL55 @ 0x000[E-] ... ??? ) <- INLRT @ 0x177[E-]
               [001301] DA--G+-----STORE_LCL_VAR int    V05 loc1         
               [001300] ----G+-----                         └──▌  ADD       int   
               [001298] -----+-----                            ├──▌  LCL_VAR   int    V05 loc1         
               [001299] -----+-----                            └──▌  CNS_INT   int    1

***** BB55 [0029]
STMT00306 ( INL55 @ 0x006[E-] ... ??? ) <- INLRT @ 0x177[E-]
               [001306] ---XG+-----JTRUE     void  
               [001305] N--XG+-N-U-                         └──▌  GE        int   
               [001303] -----+-----                            ├──▌  LCL_VAR   int    V05 loc1         
               [001304] ---X-+-----                            └──▌  ARR_LENGTH int   
               [000451] -----+-----                               └──▌  LCL_VAR   ref    V02 arg2         

The dominating compare outside the loop is:

------------ BB53 [0028] [162..163) -> BB54,BB149 (cond), preds={BB52} succs={BB149,BB54}

***** BB53 [0028]
STMT00302 ( INL54 @ 0x000[E-] ... ??? ) <- INLRT @ 0x162[E-]
               [001280] DA--G+-----STORE_LCL_VAR int    V05 loc1         
               [001279] ----G+-----                         └──▌  ADD       int   
               [001277] -----+-----                            ├──▌  LCL_VAR   int    V05 loc1         
               [001278] -----+-----                            └──▌  CNS_INT   int    1

***** BB53 [0028]
STMT00303 ( INL54 @ 0x006[E-] ... ??? ) <- INLRT @ 0x162[E-]
               [001285] ---XG+-----JTRUE     void  
               [001284] N--XG+-N-U-                         └──▌  LT        int   
               [001282] -----+-----                            ├──▌  LCL_VAR   int    V05 loc1         
               [001283] ---X-+-----                            └──▌  ARR_LENGTH int   
               [000419] -----+-----                               └──▌  LCL_VAR   ref    V02 arg2         

If I had to guess, the loop we're cloning now is the following, after inlining MoveNext and doing some tail merging:

ch = MoveNext(format, ref pos);
while (char.IsAsciiDigit(ch) && width < WidthLimit)
{
width = width * 10 + ch - '0';
ch = MoveNext(format, ref pos);
}

@jakobbotsch jakobbotsch merged commit f545368 into dotnet:main Jan 31, 2024
136 of 139 checks passed
@jakobbotsch jakobbotsch deleted the check-loop-inversion-IR branch January 31, 2024 11:26
@github-actions github-actions bot locked and limited conversation to collaborators Mar 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants