Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer forward substitution #98380

Closed
wants to merge 4 commits into from

Conversation

MichalPetryka
Copy link
Contributor

@MichalPetryka MichalPetryka commented Feb 13, 2024

Adds a simple, single-def only, substitution for locals for Importer intrinsics.

TP and memory impact will be the most important things to check with this.

Currently only enabled for typeof checking intrinsic to test codegen and TP.

Codegen for:

        public static bool Test()
        {
            Type t = typeof(int);
            Helpers.NoInline();
            return t.IsValueType;
        }

Before:

G_M000_IG01:                ;; offset=0x0000
       4883EC28             sub      rsp, 40
 
G_M000_IG02:                ;; offset=0x0004
       FF150E131D00         call     [DisasmoPlayground.Helpers:NoInline()]
       48B90809C6245D010000 mov      rcx, 0x15D24C60908
       3909                 cmp      dword ptr [rcx], ecx
 
G_M000_IG03:                ;; offset=0x0016
       4883C428             add      rsp, 40
       FF25B070EEFF         tail.jmp [System.RuntimeType:IsValueTypeImpl():bool:this]

After:

G_M46022_IG01:  ;; offset=0x0000
       4883EC28             sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25
G_M46022_IG02:  ;; offset=0x0004
       FF15DE307E00         call     [DisasmoPlayground.Helpers:NoInline()]
       B801000000           mov      eax, 1
						;; size=11 bbWeight=1 PerfScore 3.25
G_M46022_IG03:  ;; offset=0x000F
       4883C428             add      rsp, 40
       C3                   ret      
						;; size=5 bbWeight=1 PerfScore 1.25

Part of a 2nd attempt at #85197.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 13, 2024
@ghost
Copy link

ghost commented Feb 13, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds a simple, single-def only, substitution for locals for Importer intrinsics.

TP and memory impact will be the most important things to check with this.

Part of a 2nd attempt at #85197.

Author: MichalPetryka
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@MichalPetryka
Copy link
Contributor Author

@MihuBot

@ryujit-bot
Copy link

Diff results for #98380

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,528,526 contexts (1,004,573 MinOpts, 1,523,953 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 25 (0.00%)

Overall (-388 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch 381,227,132 -240
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 164,772,920 -148
FullOpts (-388 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch 166,376,500 -240
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,370,120 -148

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,531,951 contexts (984,928 MinOpts, 1,547,023 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 28 (0.00%)

Overall (-154 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.x64.Release.mch 328,194,239 -64
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,492,281 -90
FullOpts (-154 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.x64.Release.mch 146,291,195 -64
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,920,132 -90

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,298,729 contexts (931,660 MinOpts, 1,367,069 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 24 (0.00%)

Overall (-256 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.osx.arm64.Release.mch 314,003,048 -108
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,846,280 -148
FullOpts (-256 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.osx.arm64.Release.mch 113,365,136 -108
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,787,464 -148

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,380,829 contexts (948,159 MinOpts, 1,432,670 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 23 (0.00%)

Overall (-148 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,283,108 -148
FullOpts (-148 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,223,720 -148

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,804,143 contexts (1,155,868 MinOpts, 1,648,275 FullOpts).

MISSED contexts: base: 3,198 (0.11%), diff: 3,226 (0.11%)

Overall (-427 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x64.Release.mch 313,414,947 -271
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 149,398,279 -156
FullOpts (-427 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x64.Release.mch 111,740,408 -271
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 138,769,766 -156

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.09% to +0.12%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.11%
benchmarks.run_pgo.linux.arm64.checked.mch +0.10%
benchmarks.run_tiered.linux.arm64.checked.mch +0.10%
coreclr_tests.run.linux.arm64.checked.mch +0.11%
libraries.crossgen2.linux.arm64.checked.mch +0.09%
libraries.pmi.linux.arm64.checked.mch +0.12%
libraries_tests.run.linux.arm64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.12%
realworld.run.linux.arm64.checked.mch +0.12%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.11%
MinOpts (+0.07% to +0.55%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.08%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%
benchmarks.run_tiered.linux.arm64.checked.mch +0.08%
coreclr_tests.run.linux.arm64.checked.mch +0.14%
libraries.crossgen2.linux.arm64.checked.mch +0.13%
libraries.pmi.linux.arm64.checked.mch +0.55%
libraries_tests.run.linux.arm64.Release.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.23%
realworld.run.linux.arm64.checked.mch +0.13%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
FullOpts (+0.09% to +0.12%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.11%
benchmarks.run_pgo.linux.arm64.checked.mch +0.11%
benchmarks.run_tiered.linux.arm64.checked.mch +0.11%
coreclr_tests.run.linux.arm64.checked.mch +0.09%
libraries.crossgen2.linux.arm64.checked.mch +0.09%
libraries.pmi.linux.arm64.checked.mch +0.12%
libraries_tests.run.linux.arm64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.12%
realworld.run.linux.arm64.checked.mch +0.12%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.11%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.10% to +0.52%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.12%
benchmarks.run_pgo.linux.x64.checked.mch +0.11%
benchmarks.run_tiered.linux.x64.checked.mch +0.10%
coreclr_tests.run.linux.x64.checked.mch +0.52%
libraries.crossgen2.linux.x64.checked.mch +0.10%
libraries.pmi.linux.x64.checked.mch +0.12%
libraries_tests.run.linux.x64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.13%
realworld.run.linux.x64.checked.mch +0.12%
smoke_tests.nativeaot.linux.x64.checked.mch +0.11%
MinOpts (+0.07% to +1.15%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.09%
benchmarks.run_pgo.linux.x64.checked.mch +0.08%
benchmarks.run_tiered.linux.x64.checked.mch +0.09%
coreclr_tests.run.linux.x64.checked.mch +1.15%
libraries.crossgen2.linux.x64.checked.mch +0.15%
libraries.pmi.linux.x64.checked.mch +0.61%
libraries_tests.run.linux.x64.Release.mch +0.09%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.26%
realworld.run.linux.x64.checked.mch +0.16%
smoke_tests.nativeaot.linux.x64.checked.mch +0.07%
FullOpts (+0.09% to +0.13%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.12%
benchmarks.run_pgo.linux.x64.checked.mch +0.11%
benchmarks.run_tiered.linux.x64.checked.mch +0.12%
coreclr_tests.run.linux.x64.checked.mch +0.09%
libraries.crossgen2.linux.x64.checked.mch +0.10%
libraries.pmi.linux.x64.checked.mch +0.12%
libraries_tests.run.linux.x64.Release.mch +0.12%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.13%
realworld.run.linux.x64.checked.mch +0.12%
smoke_tests.nativeaot.linux.x64.checked.mch +0.11%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.09% to +0.13%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.12%
benchmarks.run_pgo.osx.arm64.checked.mch +0.11%
benchmarks.run_tiered.osx.arm64.checked.mch +0.10%
coreclr_tests.run.osx.arm64.checked.mch +0.11%
libraries.crossgen2.osx.arm64.checked.mch +0.09%
libraries.pmi.osx.arm64.checked.mch +0.12%
libraries_tests.run.osx.arm64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.13%
realworld.run.osx.arm64.checked.mch +0.12%
MinOpts (+0.08% to +0.55%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.13%
benchmarks.run_pgo.osx.arm64.checked.mch +0.08%
benchmarks.run_tiered.osx.arm64.checked.mch +0.09%
coreclr_tests.run.osx.arm64.checked.mch +0.14%
libraries.crossgen2.osx.arm64.checked.mch +0.14%
libraries.pmi.osx.arm64.checked.mch +0.55%
libraries_tests.run.osx.arm64.Release.mch +0.08%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.23%
realworld.run.osx.arm64.checked.mch +0.13%
FullOpts (+0.09% to +0.12%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.12%
benchmarks.run_pgo.osx.arm64.checked.mch +0.11%
benchmarks.run_tiered.osx.arm64.checked.mch +0.11%
coreclr_tests.run.osx.arm64.checked.mch +0.09%
libraries.crossgen2.osx.arm64.checked.mch +0.09%
libraries.pmi.osx.arm64.checked.mch +0.12%
libraries_tests.run.osx.arm64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.12%
realworld.run.osx.arm64.checked.mch +0.12%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.09% to +0.12%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.12%
benchmarks.run_pgo.windows.arm64.checked.mch +0.11%
benchmarks.run_tiered.windows.arm64.checked.mch +0.10%
coreclr_tests.run.windows.arm64.checked.mch +0.11%
libraries.crossgen2.windows.arm64.checked.mch +0.09%
libraries.pmi.windows.arm64.checked.mch +0.12%
libraries_tests.run.windows.arm64.Release.mch +0.10%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.12%
realworld.run.windows.arm64.checked.mch +0.12%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.11%
MinOpts (+0.07% to +0.55%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.12%
benchmarks.run_pgo.windows.arm64.checked.mch +0.09%
benchmarks.run_tiered.windows.arm64.checked.mch +0.09%
coreclr_tests.run.windows.arm64.checked.mch +0.14%
libraries.crossgen2.windows.arm64.checked.mch +0.13%
libraries.pmi.windows.arm64.checked.mch +0.55%
libraries_tests.run.windows.arm64.Release.mch +0.08%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.23%
realworld.run.windows.arm64.checked.mch +0.13%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.07%
FullOpts (+0.09% to +0.12%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.12%
benchmarks.run_pgo.windows.arm64.checked.mch +0.11%
benchmarks.run_tiered.windows.arm64.checked.mch +0.11%
coreclr_tests.run.windows.arm64.checked.mch +0.09%
libraries.crossgen2.windows.arm64.checked.mch +0.09%
libraries.pmi.windows.arm64.checked.mch +0.12%
libraries_tests.run.windows.arm64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.12%
realworld.run.windows.arm64.checked.mch +0.12%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.11%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.10% to +0.53%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.12%
benchmarks.run.windows.x64.checked.mch +0.12%
benchmarks.run_pgo.windows.x64.checked.mch +0.12%
benchmarks.run_tiered.windows.x64.checked.mch +0.11%
coreclr_tests.run.windows.x64.checked.mch +0.53%
libraries.crossgen2.windows.x64.checked.mch +0.10%
libraries.pmi.windows.x64.checked.mch +0.12%
libraries_tests.run.windows.x64.Release.mch +0.11%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.13%
realworld.run.windows.x64.checked.mch +0.13%
smoke_tests.nativeaot.windows.x64.checked.mch +0.11%
MinOpts (+0.07% to +1.18%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.09%
benchmarks.run.windows.x64.checked.mch +0.15%
benchmarks.run_pgo.windows.x64.checked.mch +0.10%
benchmarks.run_tiered.windows.x64.checked.mch +0.11%
coreclr_tests.run.windows.x64.checked.mch +1.18%
libraries.crossgen2.windows.x64.checked.mch +0.16%
libraries.pmi.windows.x64.checked.mch +0.64%
libraries_tests.run.windows.x64.Release.mch +0.09%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.26%
realworld.run.windows.x64.checked.mch +0.16%
smoke_tests.nativeaot.windows.x64.checked.mch +0.07%
FullOpts (+0.09% to +0.13%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.12%
benchmarks.run.windows.x64.checked.mch +0.12%
benchmarks.run_pgo.windows.x64.checked.mch +0.12%
benchmarks.run_tiered.windows.x64.checked.mch +0.12%
coreclr_tests.run.windows.x64.checked.mch +0.09%
libraries.crossgen2.windows.x64.checked.mch +0.10%
libraries.pmi.windows.x64.checked.mch +0.12%
libraries_tests.run.windows.x64.Release.mch +0.12%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.13%
realworld.run.windows.x64.checked.mch +0.13%
smoke_tests.nativeaot.windows.x64.checked.mch +0.11%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.21% to +0.75%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.34%
benchmarks.run_pgo.linux.arm.checked.mch +0.30%
benchmarks.run_tiered.linux.arm.checked.mch +0.31%
coreclr_tests.run.linux.arm.checked.mch +0.27%
libraries.crossgen2.linux.arm.checked.mch +0.21%
libraries.pmi.linux.arm.checked.mch +0.39%
libraries_tests.run.linux.arm.Release.mch +0.33%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.43%
realworld.run.linux.arm.checked.mch +0.75%
MinOpts (+0.19% to +5.01%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.21%
benchmarks.run_pgo.linux.arm.checked.mch +0.19%
benchmarks.run_tiered.linux.arm.checked.mch +0.20%
coreclr_tests.run.linux.arm.checked.mch +0.25%
libraries.crossgen2.linux.arm.checked.mch +0.34%
libraries.pmi.linux.arm.checked.mch +5.01%
libraries_tests.run.linux.arm.Release.mch +0.28%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +1.58%
realworld.run.linux.arm.checked.mch +0.64%
FullOpts (+0.21% to +0.75%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.34%
benchmarks.run_pgo.linux.arm.checked.mch +0.31%
benchmarks.run_tiered.linux.arm.checked.mch +0.34%
coreclr_tests.run.linux.arm.checked.mch +0.28%
libraries.crossgen2.linux.arm.checked.mch +0.21%
libraries.pmi.linux.arm.checked.mch +0.38%
libraries_tests.run.linux.arm.Release.mch +0.35%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.39%
realworld.run.linux.arm.checked.mch +0.75%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.24% to +0.49%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.43%
benchmarks.run_pgo.windows.x86.checked.mch +0.37%
benchmarks.run_tiered.windows.x86.checked.mch +0.38%
coreclr_tests.run.windows.x86.checked.mch +0.49%
libraries.crossgen2.windows.x86.checked.mch +0.24%
libraries.pmi.windows.x86.checked.mch +0.36%
libraries_tests.run.windows.x86.Release.mch +0.38%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.45%
realworld.run.windows.x86.checked.mch +0.42%
MinOpts (+0.23% to +6.66%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.34%
benchmarks.run_pgo.windows.x86.checked.mch +0.23%
benchmarks.run_tiered.windows.x86.checked.mch +0.24%
coreclr_tests.run.windows.x86.checked.mch +0.84%
libraries.crossgen2.windows.x86.checked.mch +0.35%
libraries.pmi.windows.x86.checked.mch +6.66%
libraries_tests.run.windows.x86.Release.mch +0.35%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +1.97%
realworld.run.windows.x86.checked.mch +1.01%
FullOpts (+0.24% to +0.43%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.43%
benchmarks.run_pgo.windows.x86.checked.mch +0.38%
benchmarks.run_tiered.windows.x86.checked.mch +0.41%
coreclr_tests.run.windows.x86.checked.mch +0.30%
libraries.crossgen2.windows.x86.checked.mch +0.24%
libraries.pmi.windows.x86.checked.mch +0.36%
libraries_tests.run.windows.x86.Release.mch +0.38%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.42%
realworld.run.windows.x86.checked.mch +0.42%

Details here


@ryujit-bot
Copy link

Diff results for #98380

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,242,891 contexts (830,236 MinOpts, 1,412,655 FullOpts).

MISSED contexts: base: 73,620 (3.18%), diff: 73,633 (3.18%)

Overall (-78 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,374,878 -78
FullOpts (-78 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,345,052 -78

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,599,900 contexts (1,005,465 MinOpts, 1,594,435 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 26 (0.00%)

Overall (-199 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x86.Release.mch 198,337,512 -45
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 113,331,594 -154
FullOpts (-199 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x86.Release.mch 93,832,270 -45
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 104,401,775 -154

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.06% to +0.08%)
Collection PDIFF
realworld.run.linux.arm64.checked.mch +0.08%
coreclr_tests.run.linux.arm64.checked.mch +0.06%
benchmarks.run_tiered.linux.arm64.checked.mch +0.06%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
libraries_tests.run.linux.arm64.Release.mch +0.06%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
benchmarks.run.linux.arm64.checked.mch +0.07%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%
MinOpts (+0.03% to +0.07%)
Collection PDIFF
realworld.run.linux.arm64.checked.mch +0.04%
coreclr_tests.run.linux.arm64.checked.mch +0.04%
benchmarks.run_tiered.linux.arm64.checked.mch +0.05%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
libraries_tests.run.linux.arm64.Release.mch +0.05%
libraries.pmi.linux.arm64.checked.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.07%
benchmarks.run.linux.arm64.checked.mch +0.03%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.05%
benchmarks.run_pgo.linux.arm64.checked.mch +0.05%
FullOpts (+0.06% to +0.08%)
Collection PDIFF
realworld.run.linux.arm64.checked.mch +0.08%
coreclr_tests.run.linux.arm64.checked.mch +0.07%
benchmarks.run_tiered.linux.arm64.checked.mch +0.07%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
libraries_tests.run.linux.arm64.Release.mch +0.07%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
benchmarks.run.linux.arm64.checked.mch +0.07%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.06% to +0.08%)
Collection PDIFF
realworld.run.linux.x64.checked.mch +0.08%
benchmarks.run.linux.x64.checked.mch +0.08%
libraries.crossgen2.linux.x64.checked.mch +0.07%
libraries_tests.run.linux.x64.Release.mch +0.07%
benchmarks.run_tiered.linux.x64.checked.mch +0.07%
libraries.pmi.linux.x64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.08%
coreclr_tests.run.linux.x64.checked.mch +0.06%
benchmarks.run_pgo.linux.x64.checked.mch +0.07%
MinOpts (+0.04% to +0.07%)
Collection PDIFF
realworld.run.linux.x64.checked.mch +0.05%
benchmarks.run.linux.x64.checked.mch +0.04%
libraries.crossgen2.linux.x64.checked.mch +0.06%
libraries_tests.run.linux.x64.Release.mch +0.06%
benchmarks.run_tiered.linux.x64.checked.mch +0.06%
libraries.pmi.linux.x64.checked.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.07%
smoke_tests.nativeaot.linux.x64.checked.mch +0.05%
coreclr_tests.run.linux.x64.checked.mch +0.05%
benchmarks.run_pgo.linux.x64.checked.mch +0.06%
FullOpts (+0.07% to +0.08%)
Collection PDIFF
realworld.run.linux.x64.checked.mch +0.08%
benchmarks.run.linux.x64.checked.mch +0.08%
libraries.crossgen2.linux.x64.checked.mch +0.07%
libraries_tests.run.linux.x64.Release.mch +0.07%
benchmarks.run_tiered.linux.x64.checked.mch +0.08%
libraries.pmi.linux.x64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.08%
coreclr_tests.run.linux.x64.checked.mch +0.07%
benchmarks.run_pgo.linux.x64.checked.mch +0.07%

Details here


@AndyAyersMS
Copy link
Member

You will still need to prove that the use(s) happen after the def, which may not be easy in the importer.

Generally the importer works in a kind of reverse postorder, so it's likely that you would see defs before uses, but when there are cycles this may not work out.

You could perhaps build the dfs tree and ask if the def is a dfs ancestor of the use (and so happens before). But I am not 100% confident the dfs query will be accurate before EH canons, which happen later So you might need to limit this to defs and uses that are not in handler or filter regions.

Also you may need to watch out for reimportation; this are rare but can happen. For instance if the def block is reimported but the use is not, then the tree referred to in your cache will be the wrong one.

The other option is to try and enhance normal forward sub to be somewhat more aggressive for single-def locals with easily substitutable values (eg invariants like constants) -- currently we are cautions for two reasons:

  1. we have to search for the uses which can be costly
  2. we have to run interference checks which can be costly

For invariant nodes (2) is no longer a concern as the cannot interfere or be interfered with. You might be able to leave breadcrumbs so that (1) is only done when we have an invariant node, and then run forward sub in RPO and keep a map like you keep now so as we run across the uses we can check if the def is available and substitute it.

Note morph running in RPO effectively does this already (via cross-block AP) so I'm also curious what phase ordering issues you are solving here... we will get this eventually and perhaps the fix is to just replay some of the importer opts in morph once we've done this propagation there.

@MichalPetryka
Copy link
Contributor Author

You will still need to prove that the use(s) happen after the def, which may not be easy in the importer.

Are statements not imported in order of execution in a single basic block? The current code relies on that being the case (and is restricted to assignments in the same BB cause of that). I know that basic blocks aren't imported in order but I assumed statements are.

Also you may need to watch out for reimportation; this are rare but can happen. For instance if the def block is reimported but the use is not, then the tree referred to in your cache will be the wrong one.

Does that also affect code limited to a single BB like here?

The other option is to try and enhance normal forward sub to be somewhat more aggressive

Like I said, this was originally a part of the ftn ptr/delegate non PGO inlining and since inlining can't be done after importer, I need to handle it here for that. For why that needs it: Roslyn is really annoying and decides to spill ftn ptrs and delegates to locals before calling them in a lot of cases, without such substitution the inlining would catch almost nothing, at least they should all happen to be in the same BB.

@MichalPetryka MichalPetryka marked this pull request as draft February 17, 2024 15:10
@ryujit-bot
Copy link

Diff results for #98380

Assembly diffs

Assembly diffs for osx/arm64 ran on linux/x64

Diffs are based on 2,293,419 contexts (933,869 MinOpts, 1,359,550 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 24 (0.00%)

Overall (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.osx.arm64.Release.mch 313,596,116 -108
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 161,380,836 -152
FullOpts (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.osx.arm64.Release.mch 110,668,672 -108
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 148,332,748 -152

Assembly diffs for windows/arm64 ran on linux/x64

Diffs are based on 2,376,907 contexts (945,143 MinOpts, 1,431,764 FullOpts).

MISSED contexts: base: 5 (0.00%), diff: 29 (0.00%)

Overall (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch 321,613,268 -108
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 171,355,492 -152
FullOpts (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.arm64.Release.mch 117,556,728 -108
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 158,296,296 -152

Assembly diffs for windows/x64 ran on linux/x64

Diffs are based on 2,416,952 contexts (937,064 MinOpts, 1,479,888 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 24 (0.00%)

Overall (-354 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x64.Release.mch 281,663,453 -273
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 136,467,122 -81
FullOpts (-354 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x64.Release.mch 106,974,625 -273
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 126,174,325 -81

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.04% to +0.08%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.07%
benchmarks.run_pgo.linux.arm64.checked.mch +0.06%
benchmarks.run_tiered.linux.arm64.checked.mch +0.05%
coreclr_tests.run.linux.arm64.checked.mch +0.04%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries_tests.run.linux.arm64.Release.mch +0.05%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
realworld.run.linux.arm64.checked.mch +0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
MinOpts (+0.01% to +0.07%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.06%
benchmarks.run_pgo.linux.arm64.checked.mch +0.03%
benchmarks.run_tiered.linux.arm64.checked.mch +0.03%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries.crossgen2.linux.arm64.checked.mch +0.07%
libraries.pmi.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.03%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.04%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.04%
FullOpts (+0.06% to +0.08%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.07%
benchmarks.run_pgo.linux.arm64.checked.mch +0.06%
benchmarks.run_tiered.linux.arm64.checked.mch +0.07%
coreclr_tests.run.linux.arm64.checked.mch +0.06%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries_tests.run.linux.arm64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
realworld.run.linux.arm64.checked.mch +0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.05% to +0.09%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.08%
benchmarks.run_pgo.linux.x64.checked.mch +0.06%
benchmarks.run_tiered.linux.x64.checked.mch +0.06%
coreclr_tests.run.linux.x64.checked.mch +0.05%
libraries.crossgen2.linux.x64.checked.mch +0.07%
libraries.pmi.linux.x64.checked.mch +0.08%
libraries_tests.run.linux.x64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.09%
realworld.run.linux.x64.checked.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.07%
MinOpts (+0.02% to +0.08%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.07%
benchmarks.run_pgo.linux.x64.checked.mch +0.04%
benchmarks.run_tiered.linux.x64.checked.mch +0.04%
coreclr_tests.run.linux.x64.checked.mch +0.03%
libraries.crossgen2.linux.x64.checked.mch +0.08%
libraries.pmi.linux.x64.checked.mch +0.02%
libraries_tests.run.linux.x64.Release.mch +0.04%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.04%
realworld.run.linux.x64.checked.mch +0.02%
smoke_tests.nativeaot.linux.x64.checked.mch +0.04%
FullOpts (+0.06% to +0.09%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.08%
benchmarks.run_pgo.linux.x64.checked.mch +0.06%
benchmarks.run_tiered.linux.x64.checked.mch +0.07%
coreclr_tests.run.linux.x64.checked.mch +0.06%
libraries.crossgen2.linux.x64.checked.mch +0.07%
libraries.pmi.linux.x64.checked.mch +0.08%
libraries_tests.run.linux.x64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.09%
realworld.run.linux.x64.checked.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.07%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.04% to +0.08%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.07%
benchmarks.run_pgo.osx.arm64.checked.mch +0.06%
benchmarks.run_tiered.osx.arm64.checked.mch +0.05%
coreclr_tests.run.osx.arm64.checked.mch +0.04%
libraries.crossgen2.osx.arm64.checked.mch +0.06%
libraries.pmi.osx.arm64.checked.mch +0.08%
libraries_tests.run.osx.arm64.Release.mch +0.05%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.08%
realworld.run.osx.arm64.checked.mch +0.08%
MinOpts (+0.01% to +0.08%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.08%
benchmarks.run_pgo.osx.arm64.checked.mch +0.04%
benchmarks.run_tiered.osx.arm64.checked.mch +0.04%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries.crossgen2.osx.arm64.checked.mch +0.07%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.03%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.04%
realworld.run.osx.arm64.checked.mch +0.01%
FullOpts (+0.06% to +0.08%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.07%
benchmarks.run_pgo.osx.arm64.checked.mch +0.06%
benchmarks.run_tiered.osx.arm64.checked.mch +0.06%
coreclr_tests.run.osx.arm64.checked.mch +0.06%
libraries.crossgen2.osx.arm64.checked.mch +0.06%
libraries.pmi.osx.arm64.checked.mch +0.08%
libraries_tests.run.osx.arm64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.08%
realworld.run.osx.arm64.checked.mch +0.08%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.04% to +0.08%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.07%
benchmarks.run_pgo.windows.arm64.checked.mch +0.06%
benchmarks.run_tiered.windows.arm64.checked.mch +0.05%
coreclr_tests.run.windows.arm64.checked.mch +0.04%
libraries.crossgen2.windows.arm64.checked.mch +0.06%
libraries.pmi.windows.arm64.checked.mch +0.08%
libraries_tests.run.windows.arm64.Release.mch +0.05%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.08%
realworld.run.windows.arm64.checked.mch +0.08%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.07%
MinOpts (+0.01% to +0.07%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.07%
benchmarks.run_pgo.windows.arm64.checked.mch +0.04%
benchmarks.run_tiered.windows.arm64.checked.mch +0.04%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries.crossgen2.windows.arm64.checked.mch +0.07%
libraries.pmi.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.03%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.04%
realworld.run.windows.arm64.checked.mch +0.02%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.04%
FullOpts (+0.06% to +0.08%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.07%
benchmarks.run_pgo.windows.arm64.checked.mch +0.06%
benchmarks.run_tiered.windows.arm64.checked.mch +0.06%
coreclr_tests.run.windows.arm64.checked.mch +0.06%
libraries.crossgen2.windows.arm64.checked.mch +0.06%
libraries.pmi.windows.arm64.checked.mch +0.08%
libraries_tests.run.windows.arm64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.08%
realworld.run.windows.arm64.checked.mch +0.08%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.07%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.05% to +0.09%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.07%
benchmarks.run_pgo.windows.x64.checked.mch +0.06%
benchmarks.run_tiered.windows.x64.checked.mch +0.06%
coreclr_tests.run.windows.x64.checked.mch +0.05%
libraries.crossgen2.windows.x64.checked.mch +0.07%
libraries.pmi.windows.x64.checked.mch +0.08%
libraries_tests.run.windows.x64.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.09%
realworld.run.windows.x64.checked.mch +0.08%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%
MinOpts (+0.02% to +0.09%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.09%
benchmarks.run_pgo.windows.x64.checked.mch +0.04%
benchmarks.run_tiered.windows.x64.checked.mch +0.05%
coreclr_tests.run.windows.x64.checked.mch +0.03%
libraries.crossgen2.windows.x64.checked.mch +0.09%
libraries.pmi.windows.x64.checked.mch +0.02%
libraries_tests.run.windows.x64.Release.mch +0.04%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.04%
realworld.run.windows.x64.checked.mch +0.02%
smoke_tests.nativeaot.windows.x64.checked.mch +0.04%
FullOpts (+0.06% to +0.09%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.07%
benchmarks.run_pgo.windows.x64.checked.mch +0.06%
benchmarks.run_tiered.windows.x64.checked.mch +0.07%
coreclr_tests.run.windows.x64.checked.mch +0.06%
libraries.crossgen2.windows.x64.checked.mch +0.07%
libraries.pmi.windows.x64.checked.mch +0.08%
libraries_tests.run.windows.x64.Release.mch +0.07%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.09%
realworld.run.windows.x64.checked.mch +0.08%
smoke_tests.nativeaot.windows.x64.checked.mch +0.08%

Details here


@ryujit-bot
Copy link

Diff results for #98380

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,544,323 contexts (1,012,486 MinOpts, 1,531,837 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 27 (0.00%)

Overall (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch 383,425,724 -108
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 165,117,628 -152
FullOpts (-260 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.arm64.Release.mch 167,614,840 -108
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 151,714,824 -152

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,535,346 contexts (984,660 MinOpts, 1,550,686 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 25 (0.00%)

Overall (-162 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.x64.Release.mch 331,724,724 -64
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,521,916 -98
FullOpts (-162 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.linux.x64.Release.mch 147,893,997 -64
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,949,734 -98

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,250,497 contexts (832,187 MinOpts, 1,418,310 FullOpts).

MISSED contexts: base: 73,582 (3.17%), diff: 73,597 (3.17%)

Overall (-80 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 94,233,016 -80
FullOpts (-80 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 84,203,152 -80

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,354,228 contexts (851,833 MinOpts, 1,502,395 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 24 (0.00%)

Overall (-121 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x86.Release.mch 190,248,273 -45
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 104,013,755 -76
FullOpts (-121 bytes)
Collection Base size (bytes) Diff size (bytes)
libraries_tests.run.windows.x86.Release.mch 91,155,217 -45
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 95,338,820 -76

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (+0.09% to +0.27%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.14%
benchmarks.run_pgo.linux.arm.checked.mch +0.12%
benchmarks.run_tiered.linux.arm.checked.mch +0.13%
coreclr_tests.run.linux.arm.checked.mch +0.09%
libraries.crossgen2.linux.arm.checked.mch +0.10%
libraries.pmi.linux.arm.checked.mch +0.16%
libraries_tests.run.linux.arm.Release.mch +0.12%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.17%
realworld.run.linux.arm.checked.mch +0.27%
MinOpts (+0.02% to +0.13%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.13%
benchmarks.run_pgo.linux.arm.checked.mch +0.07%
benchmarks.run_tiered.linux.arm.checked.mch +0.08%
coreclr_tests.run.linux.arm.checked.mch +0.04%
libraries.crossgen2.linux.arm.checked.mch +0.12%
libraries.pmi.linux.arm.checked.mch +0.03%
libraries_tests.run.linux.arm.Release.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.07%
realworld.run.linux.arm.checked.mch +0.02%
FullOpts (+0.10% to +0.27%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.14%
benchmarks.run_pgo.linux.arm.checked.mch +0.13%
benchmarks.run_tiered.linux.arm.checked.mch +0.14%
coreclr_tests.run.linux.arm.checked.mch +0.12%
libraries.crossgen2.linux.arm.checked.mch +0.10%
libraries.pmi.linux.arm.checked.mch +0.16%
libraries_tests.run.linux.arm.Release.mch +0.13%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.18%
realworld.run.linux.arm.checked.mch +0.27%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.10% to +0.19%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.15%
benchmarks.run_pgo.windows.x86.checked.mch +0.14%
benchmarks.run_tiered.windows.x86.checked.mch +0.13%
coreclr_tests.run.windows.x86.checked.mch +0.10%
libraries.crossgen2.windows.x86.checked.mch +0.10%
libraries.pmi.windows.x86.checked.mch +0.15%
libraries_tests.run.windows.x86.Release.mch +0.13%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.19%
realworld.run.windows.x86.checked.mch +0.17%
MinOpts (+0.02% to +0.15%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.15%
benchmarks.run_pgo.windows.x86.checked.mch +0.08%
benchmarks.run_tiered.windows.x86.checked.mch +0.08%
coreclr_tests.run.windows.x86.checked.mch +0.04%
libraries.crossgen2.windows.x86.checked.mch +0.13%
libraries.pmi.windows.x86.checked.mch +0.03%
libraries_tests.run.windows.x86.Release.mch +0.07%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.07%
realworld.run.windows.x86.checked.mch +0.02%
FullOpts (+0.10% to +0.19%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.15%
benchmarks.run_pgo.windows.x86.checked.mch +0.14%
benchmarks.run_tiered.windows.x86.checked.mch +0.14%
coreclr_tests.run.windows.x86.checked.mch +0.13%
libraries.crossgen2.windows.x86.checked.mch +0.10%
libraries.pmi.windows.x86.checked.mch +0.15%
libraries_tests.run.windows.x86.Release.mch +0.14%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.19%
realworld.run.windows.x86.checked.mch +0.17%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.06% to +0.08%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.08%
realworld.run.linux.arm64.checked.mch +0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
coreclr_tests.run.linux.arm64.checked.mch +0.06%
libraries_tests.run.linux.arm64.Release.mch +0.06%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
benchmarks.run_tiered.linux.arm64.checked.mch +0.06%
MinOpts (+0.03% to +0.06%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.03%
realworld.run.linux.arm64.checked.mch +0.04%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.05%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.06%
coreclr_tests.run.linux.arm64.checked.mch +0.04%
libraries_tests.run.linux.arm64.Release.mch +0.05%
benchmarks.run_pgo.linux.arm64.checked.mch +0.05%
libraries.pmi.linux.arm64.checked.mch +0.05%
libraries.crossgen2.linux.arm64.checked.mch +0.05%
benchmarks.run_tiered.linux.arm64.checked.mch +0.05%
FullOpts (+0.06% to +0.08%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.08%
realworld.run.linux.arm64.checked.mch +0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.07%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.08%
coreclr_tests.run.linux.arm64.checked.mch +0.07%
libraries_tests.run.linux.arm64.Release.mch +0.07%
benchmarks.run_pgo.linux.arm64.checked.mch +0.07%
libraries.pmi.linux.arm64.checked.mch +0.08%
libraries.crossgen2.linux.arm64.checked.mch +0.06%
benchmarks.run_tiered.linux.arm64.checked.mch +0.07%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.06% to +0.09%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch +0.07%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.09%
realworld.run.linux.x64.checked.mch +0.08%
coreclr_tests.run.linux.x64.checked.mch +0.06%
libraries_tests.run.linux.x64.Release.mch +0.07%
libraries.pmi.linux.x64.checked.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.08%
benchmarks.run.linux.x64.checked.mch +0.08%
benchmarks.run_pgo.linux.x64.checked.mch +0.07%
libraries.crossgen2.linux.x64.checked.mch +0.07%
MinOpts (+0.04% to +0.07%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch +0.06%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.07%
realworld.run.linux.x64.checked.mch +0.04%
coreclr_tests.run.linux.x64.checked.mch +0.05%
libraries_tests.run.linux.x64.Release.mch +0.06%
libraries.pmi.linux.x64.checked.mch +0.06%
smoke_tests.nativeaot.linux.x64.checked.mch +0.05%
benchmarks.run.linux.x64.checked.mch +0.04%
benchmarks.run_pgo.linux.x64.checked.mch +0.06%
libraries.crossgen2.linux.x64.checked.mch +0.06%
FullOpts (+0.07% to +0.09%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch +0.08%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.09%
realworld.run.linux.x64.checked.mch +0.08%
coreclr_tests.run.linux.x64.checked.mch +0.07%
libraries_tests.run.linux.x64.Release.mch +0.07%
libraries.pmi.linux.x64.checked.mch +0.08%
smoke_tests.nativeaot.linux.x64.checked.mch +0.08%
benchmarks.run.linux.x64.checked.mch +0.08%
benchmarks.run_pgo.linux.x64.checked.mch +0.07%
libraries.crossgen2.linux.x64.checked.mch +0.07%

Details here


@AndyAyersMS
Copy link
Member

Does that also affect code limited to a single BB like here?

Ah, this restriction wasn't obvious to me from the code, I guess I see how you're accomplishing that now. I think you could make this clearer, the name "local version" doesn't indicate to me that this is tied to a specific block somehow.

I'm still concerned that it may be possible to get things wrong; it isn't obvious to me that the importer will always create stores and load trees in execution order, so that when creating a load it knows the most recently created store will provide the value.

Like I said, this was originally a part of the ftn ptr/delegate non PGO inlining and since inlining can't be done after importer, I need to handle it here for that.

There are quite few phases that run after importation and before inlining, so ths transformation doesn't have to happen during importation. Perhaps it would be better to simply detect this opportunity during importation (setting a flag on call, block, or globally as needed) and then during inlining if the flag is set walk back a little ways searching for a recent def. Or if the flag is set, run a custom forward sub pass (that does not rely on liveness) before inlining.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants