Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/x64: 1 Regression on 3/15/2024 7:33:56 PM #99964

Closed
performanceautofiler bot opened this issue Mar 19, 2024 · 5 comments
Closed

[Perf] Windows/x64: 1 Regression on 3/15/2024 7:33:56 PM #99964

performanceautofiler bot opened this issue Mar 19, 2024 · 5 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Mar 19, 2024

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 453713a527ae01197b260fbc427918ee93b3cd5b
Compare 52eb3ed834d9bd43d178faef2ff84a4dfd84c555
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Runtime.InteropServices.Tests.SafeHandleTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
19.30 ns 23.87 ns 1.24 0.15 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Runtime.InteropServices.Tests.SafeHandleTests*'

Payloads

Baseline
Compare

System.Runtime.InteropServices.Tests.SafeHandleTests.AddRef_GetHandle_Release

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Mar 19, 2024
@DrewScoggins DrewScoggins removed the untriaged New issue has not been triaged by the area owner label Mar 19, 2024
@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Mar 19, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 19, 2024
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Mar 19, 2024
@DrewScoggins
Copy link
Member

DrewScoggins commented Mar 19, 2024

Looks related to #99790
Windows AMD Regressions: dotnet/perf-autofiling-issues#31388
Linux x64 Regressions: dotnet/perf-autofiling-issues#31340

Duplicate: dotnet/perf-autofiling-issues#31923

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 19, 2024
@amanasifkhalid amanasifkhalid added this to the 9.0.0 milestone Mar 19, 2024
@amanasifkhalid amanasifkhalid removed the untriaged New issue has not been triaged by the area owner label Mar 19, 2024
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 20, 2024
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 20, 2024
@amanasifkhalid amanasifkhalid added the Priority:2 Work that is important, but not critical for the release label May 3, 2024
@amanasifkhalid
Copy link
Member

Notes Recent Score Orig Score Ubuntu 2022.04 x64 Windows 2010.0.18362 x64 Benchmark
1.24 1.24 1.24
1.24
System.Numerics.Tests.Perf_BigInteger.Remainder(arguments: 1024,512 bits)
1.22 1.13 1.22
1.13
System.Tests.Perf_UInt64.ToString(value: 18446744073709551615)
1.20 1.17 1.20
1.17
System.Tests.Perf_Version.TryFormatL
1.19 1.20 1.19
1.20
System.Collections.Perf_Frozen(Int16).ToFrozenDictionary(Count: 512)
1.16 1.17 1.16
1.17
System.IO.Tests.Perf_Path.GetDirectoryName
1.16 1.21 1.16
1.21
System.Collections.CtorFromCollection(Int32).FrozenDictionaryOptimized(Size: 512)
1.16 1.18 1.16
1.18
System.Tests.Perf_Int64.ToString(value: 9223372036854775807)
1.15 1.20 1.15
1.20
System.Tests.Perf_Int32.ToString(value: 12345)
1.14 1.16 1.14
1.16
System.Tests.Perf_Version.ToStringL
1.14 1.16 1.14
1.16
System.Linq.Tests.Perf_Enumerable.Zip(input: IEnumerable)
1.13 1.13 1.13
1.13
System.Tests.Perf_UInt64.TryFormat(value: 18446744073709551615)
1.12 1.13 1.12
1.13
System.Tests.Perf_Int32.ToString(value: 2147483647)
1.12 1.24 1.12
1.24
System.Runtime.InteropServices.Tests.SafeHandleTests.AddRef_GetHandle_Release
1.11 1.11 1.11
1.11
System.Collections.Perf_Frozen(ReferenceType).ToFrozenDictionary(Count: 512)
1.10 1.11 1.13
1.14
1.08
1.08
System.Collections.Perf_Frozen(Int16).ToFrozenDictionary(Count: 64)
1.10 1.09 1.10
1.09
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)
1.09 1.10 1.09
1.10
System.Collections.Perf_Frozen(ReferenceType).Contains_True(Count: 64)
1.09 1.09 1.09
1.09
System.Tests.Perf_Int64.TryFormat(value: 9223372036854775807)
1.08 1.14 1.08
1.14
System.Collections.Perf_Frozen(ReferenceType).ToFrozenSet(Count: 512)
1.04 1.17 1.04
1.17
System.Tests.Perf_Int64.ToString(value: 12345)
1.02 1.12 1.02
1.12
System.Tests.Perf_UInt64.ToString(value: 12345)

@amanasifkhalid
Copy link
Member

Windows data isn't all that recent. I'll run some Kusto queries to get newer data. For the Linux regression:

image
It looks like it gradually improved, and then recently went back up.

@amanasifkhalid
Copy link
Member

amanasifkhalid commented Jul 25, 2024

I've looked at the Windows x64 regressions with recent scores >=1.1. The recent behavior of most of them seems to be dominated (or fixed) by changes in block layout and/or block compaction -- purple is Windows 10, blue is 11:

System.Numerics.Tests.Perf_BigInteger.Remainder(arguments: 1024,512 bits)
image

System.Tests.Perf_UInt64.ToString(value: 18446744073709551615)
image

``System.Collections.Perf_Frozen.ToFrozenDictionary(Count: 512)
image

System.IO.Tests.Perf_Path.GetDirectoryName
image

System.Collections.CtorFromCollection<Int32>.FrozenDictionaryOptimized(Size: 512)
image

System.Collections.Perf_Frozen<ReferenceType>.ToFrozenDictionary(Count: 512)
image

I'll follow up on layout regressions in #102763. A few of the other regressions look a bit more stubborn:

System.Tests.Perf_Version.TryFormatL
image

System.Tests.Perf_Int64.ToString(value: 9223372036854775807)
image

System.Tests.Perf_Int32.ToString(value: 12345)
image

System.Runtime.InteropServices.Tests.SafeHandleTests.AddRef_GetHandle_Release
image

I'll take a closer look at these to see if there's some obvious missed opportunity.

@amanasifkhalid
Copy link
Member

I looked at the latter four benchmarks above, and I'm not seeing anything actionable here. My hypothesis is that by getting rid of the BBF_NONE_QUIRK exception when deciding to optimize a branch to an empty BBJ_ALWAYS block, we pessimized other flow opts that ultimately affected block layout, hence the initial regressions. The new layout seems much more resilient to previous flow opts failing to simplify the flow graph; for example, if we have a chain of BBJ_COND -> BBJ_ALWAYS (empty) -> BBJ_ALWAYS and fail to remove or compact the second block, the RPO layout tends to maintain fallthrough behavior if this path is sufficiently hot, so we don't have unnecessary branches in the final codegen. As such, I'm not seeing any layout issues in these benchmarks, and getting rid of the remaining restriction for optimizing branches to empty BBJ_ALWAYS blocks (detailed in #99790) doesn't produce any meaningful diffs, either. Removing this restriction seems like a worthwhile goal for .NET 10, though I don't see a need to introduce that churn into .NET 9 at this point.

The other benchmarks with clear layout regressions are tracked in #102763 or #103972, so I'm going to close this in favor of those issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

4 participants