[Perf] Linux/x64: Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary #105328

performanceautofiler · 2024-07-23T08:16:17Z

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	TigerUbuntu
Baseline	078530371f31d23945990b47314f11d6e9f1f7c1
Compare	a3fd0957239a6cf44edee68e9bc0a158f15425c9
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
TryGetValue_True_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	46.48 μs	50.63 μs	1.09	0.04	False
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	1.13 μs	1.99 μs	1.76	0.22	False
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	100.12 ns	181.76 ns	1.82	0.01	True
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	100.14 ns	180.20 ns	1.80	0.01	True
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	10.08 ns	19.76 ns	1.96	0.03	True
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	1.19 μs	1.98 μs	1.66	0.05	True
TryGetValue_False_FrozenDictionary - Duration of single invocation ADX - Test Multi Config Graph	9.94 ns	20.07 ns	2.02	0.03	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Perf_LengthBucketsFrozenDictionary*'

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

LoopedBard3 · 2024-07-23T16:32:51Z

Likely caused by: #105131, linked improvements in PR.

LoopedBard3 · 2024-07-23T17:02:04Z

Related regressions from throughout the strength reduction story:
Linux/x64: dotnet/perf-autofiling-issues#38697, dotnet/perf-autofiling-issues#38692
Windows/x64: dotnet/perf-autofiling-issues#38716, dotnet/perf-autofiling-issues#38689, dotnet/perf-autofiling-issues#38686
Windows/arm64: dotnet/perf-autofiling-issues#39309

dotnet-policy-service · 2024-07-24T18:29:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jakobbotsch · 2024-08-14T13:02:56Z

Report for the regressions in the win-x64 issues above: https://github.com/jakobbotsch/perf-diff-finder/blob/main/strengthreduction/results.md

Of the 15 benchmarks mentioned in those issues only 3 of them have actual strength reductions in a hot method:

System.Tests.Perf_String.IndexerCheckBoundCheckHoist
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10000, ItemsPerBucket: 1)
System.Linq.Tests.Perf_Enumerable.SelectToList(input: IEnumerable)

jakobbotsch · 2024-08-14T13:20:41Z

The regressions from 7/15 are likely caused by #104603 instead.

jakobbotsch · 2024-08-14T13:51:09Z

The larger regressions in this issue (like System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10, ItemsPerBucket: 5)) seem to be only on some particular Intel CPUs, so I am going to guess it is JCC erratum (indeed it looks like there is a cmp/jge pair straddling a 32 byte boundary in a hot block in the diff).

For IndexerCheckBoundCheckHoist I am seeing some AMD CPU regressions (not totally clear, I'm seeing some bimodality after the strength reduction) while I see improvements on Intel/ARM64 CPUs on the benchmark.

stephentoub · 2024-08-14T14:44:44Z

so I am going to guess it is JCC erratum

Do we have a plan for when we're going to tackle that in general?

jakobbotsch · 2024-08-14T15:06:44Z

so I am going to guess it is JCC erratum

Do we have a plan for when we're going to tackle that in general?

It was in our initial plans for .NET 9 (#93243), but ended up not making it. Personally I hope we can solve this with new perf lab hardware instead... I believe the CPUs on these queues are i7-8700's that are 7 years old at this point. I think it's been a while since the last Intel CPU with the problem was available for sale.

jakobbotsch · 2024-08-14T15:21:13Z

Oddly the IndexerCheckBoundCheckHoist regression does not reproduce on my own AMD CPU:

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 10 (10.0.19045.4651/22H2/2022Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-preview.6.24328.19
  [Host]     : .NET 9.0.0 (9.0.24.32707), X64 RyuJIT AVX2
  Job-QZGOXF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-VNPDCN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
IndexerCheckBoundCheckHoist	Job-QZGOXF	\41e02e5549a7954b068f342cd9bd6611ef764cc3\corerun.exe	15.44 ns	0.018 ns	0.015 ns	15.44 ns	15.41 ns	15.47 ns	0.99	-	NA
IndexerCheckBoundCheckHoist	Job-VNPDCN	\f9c08462ba81a2aacc9fd85d321e12080c17b9ee\corerun.exe	15.54 ns	0.029 ns	0.027 ns	15.53 ns	15.51 ns	15.60 ns	1.00	-	NA

The multi-config graph looks like this:

Does not seem like there is anything actionable there.

jakobbotsch · 2024-08-14T16:24:22Z

Here's a report for the 13 win-x64 regressions in dotnet/perf-autofiling-issues#38686 and dotnet/perf-autofiling-issues#38716 from July 15th around #104603: https://github.com/jakobbotsch/perf-diff-finder/blob/main/straighten/results.md

Lots of large-scale diffs there which makes it hard. Some notes from just looking at the codegen difference:

`System.Reflection.Invoke.Property_Set_int`

Large scale diffs that I haven't tried digging into yet.

`System.Collections.IterateForEach<Int32>.ImmutableHashSet(Size: 512)`

Lots of different block ordering, but haven't dug deeper.

`System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 10)`

Diff here is a new IV widening leading to the removal of a zero extension. The regression is only seen on AMD EPYC CPUs it looks like.

`Perf_Enumerable.Repeat`

Only diff is removal of an unnecessary compare. Multi-config graph does not show a regression for any other config.

`System.Linq.Tests.Perf_Enumerable.SelectToList(input: IEnumerable)`

Same as above.

`System.Linq.Tests.Perf_OrderBy.OrderByCustomComparer(NumberOfPeople: 512)`

Lots of different block ordering, but haven't dug deeper.

`System.Collections.CtorFromCollection<Int32>.LinkedList(Size: 512)`

Only diff is removal of an unnecessary compare and some different registers allocated.

`System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.DeserializeFromReader(Mode: Reflection)`

Returned back to normal.

`Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Object_Valid`

This looks bimodal.

`System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: None)`

Large scale differences in block layout / register allocation. Haven't dug deeper into why, but presumably that's just caused by different flow graph layouts caused by the PR. Doesn't look like there's a regression here from .NET 9.

`System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: None)`

Like above.

`System.Tests.Perf_String.Concat_CharEnumerable`

Also large diffs here, but I don't see an overall .NET 9 regression.

jakobbotsch · 2024-08-29T11:52:11Z

System.Reflection.Invoke.Property_Set_int has returned back to normal now. Based on the report above, I believe the only benchmarkss that require some more investigation at this point are System.Collections.IterateForEach<Int32>.ImmutableHashSet(Size: 512) and System.Linq.Tests.Perf_OrderBy.OrderByCustomComparer(NumberOfPeople: 512).
For ImmutableHashSet it looks like thebenchmark did not regress on the same CPU on Ubuntu:

The first regression around 10th of June is from enabling CET which is Windows only, so that makes sense. The regression around July 15th is the one that on first view would be caused by #104603, but given no regression on Ubuntu it does not seem like the regression is inherent to the codegen.

For OrderByCustomComparer I arrive at a similar conclusion... the same CPU on other OS's did not regress (and, in fact, somewhat improved):

So seems like this isn't an inherent problem with the codegen either.

I don't think there's much more actionable here... let's revisit in .NET 10 with more flowgraph work.

performanceautofiler bot added arch-x64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jul 23, 2024

performanceautofiler bot mentioned this issue Jul 23, 2024

[SENTINEL] Autofile run complete at 7/23/2024 8:35:05 AM. 17 issues filed. dotnet/perf-autofiling-issues#38714

Closed

LoopedBard3 transferred this issue from dotnet/perf-autofiling-issues Jul 23, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 23, 2024

LoopedBard3 changed the title ~~[Perf] Linux/x64: 11 Regressions on 7/18/2024 10:05:20 PM~~ [Perf] Linux/x64: Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary Jul 23, 2024

jakobbotsch self-assigned this Jul 23, 2024

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 24, 2024

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 24, 2024

JulieLeeMSFT added this to the 9.0.0 milestone Jul 24, 2024

jakobbotsch added Priority:2 Work that is important, but not critical for the release and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 25, 2024

DrewScoggins mentioned this issue Aug 22, 2024

Performance Fundamental: Net 8 -> Net 9 Manual Comparison Report #106824

Open

jakobbotsch modified the milestones: 9.0.0, 10.0.0 Aug 29, 2024

[Perf] Linux/x64: Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary #105328

[Perf] Linux/x64: Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary #105328

Comments

performanceautofiler bot commented Jul 23, 2024 • edited by LoopedBard3 Loading

Run Information

Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary

Repro

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 100, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 1000, ItemsPerBucket: 1)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_False_FrozenDictionary(Count: 10, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

Docs

LoopedBard3 commented Jul 23, 2024 • edited Loading

LoopedBard3 commented Jul 23, 2024 • edited Loading

dotnet-policy-service bot commented Jul 24, 2024

jakobbotsch commented Aug 14, 2024

jakobbotsch commented Aug 14, 2024

jakobbotsch commented Aug 14, 2024

stephentoub commented Aug 14, 2024

jakobbotsch commented Aug 14, 2024

jakobbotsch commented Aug 14, 2024 • edited Loading

jakobbotsch commented Aug 14, 2024

System.Reflection.Invoke.Property_Set_int

System.Collections.IterateForEach<Int32>.ImmutableHashSet(Size: 512)

System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 10)

Perf_Enumerable.Repeat

System.Linq.Tests.Perf_Enumerable.SelectToList(input: IEnumerable)

System.Linq.Tests.Perf_OrderBy.OrderByCustomComparer(NumberOfPeople: 512)

System.Collections.CtorFromCollection<Int32>.LinkedList(Size: 512)

System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.DeserializeFromReader(Mode: Reflection)

Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Object_Valid

System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: None)

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: None)

System.Tests.Perf_String.Concat_CharEnumerable

jakobbotsch commented Aug 29, 2024

performanceautofiler bot commented Jul 23, 2024 •

edited by LoopedBard3

Loading

LoopedBard3 commented Jul 23, 2024 •

edited

Loading

LoopedBard3 commented Jul 23, 2024 •

edited

Loading

jakobbotsch commented Aug 14, 2024 •

edited

Loading

`System.Reflection.Invoke.Property_Set_int`

`System.Collections.IterateForEach<Int32>.ImmutableHashSet(Size: 512)`

`System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 10)`

`Perf_Enumerable.Repeat`

`System.Linq.Tests.Perf_Enumerable.SelectToList(input: IEnumerable)`

`System.Linq.Tests.Perf_OrderBy.OrderByCustomComparer(NumberOfPeople: 512)`

`System.Collections.CtorFromCollection<Int32>.LinkedList(Size: 512)`

`System.Text.Json.Serialization.Tests.ReadJson<MyEventsListerViewModel>.DeserializeFromReader(Mode: Reflection)`

`Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Object_Valid`

`System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: None)`

`System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: None)`

`System.Tests.Perf_String.Concat_CharEnumerable`