Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

performanceautofiler · 2022-02-01T09:48:29Z

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	215c328e039b3a6b6bfdc6514253307105f25816
Compare	ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff	Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation	19.80 ns	24.26 ns	1.22	0.18	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

ghost · 2022-02-01T16:28:41Z

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	215c328e039b3a6b6bfdc6514253307105f25816
Compare	ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff	Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation	19.80 ns	24.26 ns	1.22	0.18	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`area-System.Security`, `untriaged`
Milestone:	-

ghost · 2022-02-01T16:33:37Z

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	215c328e039b3a6b6bfdc6514253307105f25816
Compare	ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff	Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation	19.80 ns	24.26 ns	1.22	0.18	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`area-System.Xml`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`, `refs/heads/main`, `x64`, `Regression`, `RunKind=micro`, `CoreClr`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone:	-

krwq · 2022-02-17T12:18:45Z

@adamsitnik this seems a bit like a noise - looking at graph my gut tells me this is well within stdev, any clues?

adamsitnik · 2022-02-17T12:53:39Z

@krwq that was my initial impression as well, but then I've clicked on "Toggle annotations" (1), the blue line (2) and moving average (3) and it's rather an actual regression:

eiriktsarpalis · 2022-08-05T18:16:32Z

Performance for the particular benchmarks appears to have regressed substantially over the recent months:

I don't believe we've made any change to Xml over that time, codegen related perhaps?

JulieLeeMSFT · 2022-08-23T13:22:07Z

@AndyAyersMS PTAL.
Cc @dotnet/jit-contrib @mangod9.

EgorBo · 2022-08-23T13:30:12Z

Regression in FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

JulieLeeMSFT · 2022-08-23T17:28:13Z

FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

cc @Maoni0.

AndyAyersMS · 2022-08-26T20:08:35Z

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

AndyAyersMS · 2022-08-29T19:19:23Z

Also we haven't historically run triage on full PGO results; I'm a bit surprised we even opened this issue. If we look at perf of this test with the default settings we see

(default)

For purposes of comparison:

(full pgo)

Just for completeness here's the other bit of data we have:

(dynamic pgo)

The regressions in the PGO modes on Jan 31 (which inspired this issue) seem to well correlated with #64521. But clearly since then all modes have regressed.

The most troubling of these is the recent regression in default perf:

We have a reporting outage around the first and largest regression. Commit range there has 66 commits: bceefa9...e117479. I think @DrewScoggins is trying to backfill data for that gap.

Source of the second regression is unclear; implicated commit range is 9814b1b...e3b2af3 which is 16 commits.

For PGO there have been several regressions over the past months. It will take some time to sort through them all.

AndyAyersMS · 2022-08-30T05:06:40Z

PGO regressions (best guesses)

AndyAyersMS · 2022-09-01T22:50:00Z

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

@krwq any info you can add here? If not, will start looking at this more closely tomorrow.

krwq · 2022-09-01T23:32:37Z

@AndyAyersMS yes, they measured exactly same amount of work. Unfortunately I don't have more info I could add here. I only observed newer code being consistently slower on my machine which is inline with what the issue description is showing and seems the only thing the XML code is doing is string concatenation and given PerfView also showed string related stuff as being slower it seems root cause is somewhere in coreclr but I don't know that code enough to pinpoint anything in particular.

AndyAyersMS · 2022-09-07T00:57:03Z

Looking at default behavior for now (that is, product defaults, not Full PGO), I see 10% regressions vs .NET 6:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-LJOSDL : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2
Job-UKYGTY : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
Job-HTVXBH : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Allocated	Alloc Ratio
NonEmptyNameSpaceToString	Job-LJOSDL	.NET 5.0	net5.0	20.14 ns	0.253 ns	0.338 ns	20.04 ns	19.51 ns	20.77 ns	1.07	0.03	0.0127	80 B	1.00
NonEmptyNameSpaceToString	Job-UKYGTY	.NET 6.0	net6.0	18.92 ns	0.336 ns	0.448 ns	18.81 ns	18.34 ns	20.02 ns	1.00	0.00	0.0127	80 B	1.00
NonEmptyNameSpaceToString	Job-HTVXBH	.NET 7.0	net7.0	21.05 ns	0.240 ns	0.320 ns	21.01 ns	20.74 ns	22.34 ns	1.11	0.03	0.0127	80 B	1.00

Comparing filtered BDN profiles hasn't been illuminating yet, though they are similar to the ones from @krwq above in that the native part of the allocation seems slower.

Per discussion in #74771 the .NET 7 regression seems greatly reduced if we don't use region-based GC:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-PYJGQF : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
Toolchain=net7.0 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.93 ns	0.470 ns	0.627 ns	19.84 ns	19.10 ns	22.15 ns	0.0127	80 B

and a similar with DOTNET_GCWriteBarrier=3

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22425.9
[Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2
Job-HBMGZU : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCWriteBarrier=3 PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0
Toolchain=net7.0 InvocationCount=9983200 IterationCount=25
IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	20.04 ns	0.630 ns	0.842 ns	19.96 ns	18.94 ns	23.21 ns	0.0127	80 B

AndyAyersMS · 2022-09-09T17:43:35Z

Moving this to area-gc since it seems to be related to recent gc changes.

Again, this is mainly about the non-PGO behavior.

ghost · 2022-09-09T17:43:57Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	215c328e039b3a6b6bfdc6514253307105f25816
Compare	ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff	Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
NonEmptyNameSpaceToString - Duration of single invocation	19.80 ns	24.26 ns	1.22	0.18	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

Payloads

Baseline
Compare

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417.
IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM.
IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	AndyAyersMS
Labels:	`blocking-release`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `area-GC-coreclr`, `runtime-coreclr`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone:	7.0.0

JulieLeeMSFT · 2022-09-12T20:12:33Z

@AndyAyersMS, are you still looking into this? Should we assign a different owner in GC team?

mrsharm · 2022-09-12T20:16:39Z

I'll take over from here since this is prospectively a GC issue. Investigation in Progress.

mrsharm · 2022-09-13T08:57:33Z

Data

Without GC Normalization

There seems to be a regression within a 5% margin:

Benchmark Name	Baseline	Comparand	Baseline Mean Duration	Comparand Mean Duration	Δ Mean Duration	Δ% Mean Duration	Baseline number of iterations	Comparand number of iterations	Δ number of iterations	Δ% number of iterations	Baseline gc count	Comparand gc count	Δ gc count	Δ% gc count	Baseline median	Comparand median	Δ median	Δ% median	Baseline non induced gc count	Comparand non induced gc count	Δ non induced gc count	Δ% non induced gc count	Baseline total allocated (mb)	Comparand total allocated (mb)	Δ total allocated (mb)	Δ% total allocated (mb)	Baseline total pause time (msec)	Comparand total pause time (msec)	Δ total pause time (msec)	Δ% total pause time (msec)	Baseline gc pause time %	Comparand gc pause time %	Δ gc pause time %	Δ% gc pause time %	Baseline JIT_WriteBarrier	Comparand JIT_WriteBarrier	Δ JIT_WriteBarrier	Δ% JIT_WriteBarrier	Baseline JIT_ByRefWriteBarrier	Comparand JIT_ByRefWriteBarrier	Δ JIT_ByRefWriteBarrier	Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	writebarrier_3	19.37	19.85	0.48	2.48	18	15	-3	-16.67	2612	2497	-115	-4.4	19.27	19.62	0.35	1.8	2376	2257	-119	-5.01	25090.88	23737.18	-1353.7	-5.4	11.29	11.26	-0.03	-0.3	2.4	2.5	0.11	4.42	0	344	344	∞	0	0	0	NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	19.37	19.46	0.09	0.44	18	15	-3	-16.67	2612	2530	-82	-3.14	19.27	19.27	-0.01	-0.03	2376	2290	-86	-3.62	25090.88	24207.49	-883.39	-3.52	11.29	11.41	0.12	1.07	2.4	2.58	0.19	7.74	0	0	0	NaN	0	0	0	NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-FQYTMY : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.37 ns	0.438 ns	0.469 ns	19.27 ns	18.87 ns	20.32 ns	0.0076	80 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-HLWNHM : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.46 ns	0.352 ns	0.329 ns	19.27 ns	19.09 ns	19.97 ns	0.0076	80 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-QBQSZP : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.85 ns	0.429 ns	0.401 ns	19.62 ns	19.45 ns	20.66 ns	0.0076	80 B

With GC Normalization

Benchmark Name	Baseline	Comparand	Baseline Mean Duration	Comparand Mean Duration	Δ Mean Duration	Δ% Mean Duration	Baseline number of iterations	Comparand number of iterations	Δ number of iterations	Δ% number of iterations	Baseline gc count	Comparand gc count	Δ gc count	Δ% gc count	Baseline median	Comparand median	Δ median	Δ% median	Baseline non induced gc count	Comparand non induced gc count	Δ non induced gc count	Δ% non induced gc count	Baseline total allocated (mb)	Comparand total allocated (mb)	Δ total allocated (mb)	Δ% total allocated (mb)	Baseline total pause time (msec)	Comparand total pause time (msec)	Δ total pause time (msec)	Δ% total pause time (msec)	Baseline gc pause time %	Comparand gc pause time %	Δ gc pause time %	Δ% gc pause time %	Baseline JIT_WriteBarrier	Comparand JIT_WriteBarrier	Δ JIT_WriteBarrier	Δ% JIT_WriteBarrier	Baseline JIT_ByRefWriteBarrier	Comparand JIT_ByRefWriteBarrier	Δ JIT_ByRefWriteBarrier	Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	108.59	322.66	214.06	197.12	128	128	0	0	4	4	0	0	100	0	-100	-100	0	0	0	NaN	1.89	1.88	-0	-0.23	1.52	1.52	0.01	0.39	1.89	1.83	-0.06	-3.41	0	0	0	NaN	0	0	0	NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	writebarrier_3	108.59	178.52	69.92	64.39	128	128	0	0	4	4	0	0	100	50	-50	-50	0	0	0	NaN	1.89	1.88	-0.01	-0.34	1.52	1.52	0.01	0.41	1.89	2.02	0.13	7.11	0	0	0	NaN	0	0	0	NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-QRXOHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	108.6 ns	45.77 ns	153.7 ns	13.59 ns	100.0 ns	0.0 ns	1,000.0 ns	100.0 ns	100.0 ns	9,208,633.1	488 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-VEMLHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove PowerPlanMode=00000000-0000-0000-0000-000000000000 Force=False
InvocationCount=1 IterationCount=128 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	322.7 ns	431.3 ns	1,448.5 ns	128.0 ns	0.0 ns	0.0 ns	14,300.0 ns	0.0 ns	0.0 ns	3,099,273.6	488 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
[Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
Job-LKBXKI : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	178.5 ns	155.5 ns	522.3 ns	46.17 ns	50.00 ns	0.0 ns	2,950.0 ns	50.00 ns	50.00 ns	5,601,750.5	488 B

mrsharm · 2022-09-23T20:29:20Z

We have been working through root-causing and fixing this issue and here are our conclusions:

We root caused this issue to the introduction of a higher number and size of decommits for regions (enabled in 7.0) as opposed to segments (default in 6.0).
We are working on a fix: Gradual decommit in wks #73620 that improves the numbers for these benchmarks, however, even with this fix we are not at the same level as segments in terms of mean execution time, number and size of virtual commits and decommits as before and this is something we plan to further address.
We think the next steps here are to merge the fix that improves the regression and work on further improvements in 8.0. More specifically, we should merge the fix mentioned above and move the issue to 8.0 (cc: @mangod9)
We found that running the benchmarks above using default parameters results in an inequitable for our GC evaluation comparison purposes where:
- There are an unequal number of invocations per iteration resulting in a lob sided effect on the mean. Fixed by passing in a derived invocation count by doing a dry run and making use of that invocation count for all our runs.
- Outliers are removed. Fixed by not removing outliers by setting the following in the command line args to run the microbenchmarks --outliers DontRemove.
- There are a variable number of induced GCs resulting from the unequal number of invocations and evaluation overhead runs. Fixed by removing the evaluation overhead stage.
Using a more equitable comparison repeated a number of times, we consistently observed the following:
- The regressions for the tests for regions were never above 10%.
- More consistent results that were less volatile.

Results

Benchmark	Baseline	Comparand	ΔMean	ΔMean %
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	1.56	7.62
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	decommit_fix	0.97	4.75

Details

Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-OERVVK : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

The baseline run is based on Peter’s fix rebased on the latest main from yesterday and using the segments’ CLR GC.
The regions run is based on the latest main from yesterday.
The decommit_fix is based on Peter’s fix rebased on the latest main.
The commit hash for the main used: 82d76a4

Full comparison table with the virtual commit and decommit data: Compare.xlsx

Tracking these issues here: #73592

CC: @dotnet/gc

mangod9 · 2022-09-23T21:07:49Z

Thanks @mrsharm. @adamsitnik checking that you agree with the analysis of ensuring the GC impact is equitable across runs and with that the regression is not too high?

AndyAyersMS · 2022-09-23T21:15:56Z

I have seen this effect when comparing things too (say OSR on vs off) -- if the two variants behave differently BDN may change its measurement approach to the point where the measurement strategy becomes a contributing factor in the overall reported results.

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

mangod9 · 2022-09-23T21:27:41Z

is this the .NET equivalent of the uncertainty principle :)

mrsharm · 2022-09-23T21:29:12Z

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

Definitely, that's why we kept an eye on standard error and standard error normalized by the mean as well to ensure the repeatability of these experiments:

Benchmark Name	Baseline	Comparand	Δ standard error	Δ standard error / mean
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	-0.2263	-0.0114
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	decommit_fix	-0.2705	-0.0133

Getting to this low standard error was a matter of discerning a high enough invocation count described above. The result was a consistent and equitable comparison i.e., we ran this experiment for 20+ times and got similar results.

mangod9 · 2022-09-30T17:31:39Z

Assume its ok to close this now since the normalized regression after the decommit fix is reasonable?

performanceautofiler bot added CoreClr untriaged New issue has not been triaged by the area owner labels Feb 1, 2022

EgorBo changed the title ~~[Perf] Changes at 1/31/2022 4:53:41 PM~~ Regressions in System.Xml.Linq.Perf_XName (FullPGO) Feb 1, 2022

dotnet-issue-labeler bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Feb 1, 2022

EgorBo transferred this issue from dotnet/perf-autofiling-issues Feb 1, 2022

EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Feb 1, 2022

bartonjs added area-System.Xml and removed area-System.Security labels Feb 1, 2022

jeffhandley added runtime-coreclr specific to the CoreCLR runtime and removed CoreClr labels Mar 3, 2022

jkotas added arch-x64 and removed x64 labels Mar 14, 2022

krwq added this to the 7.0.0 milestone Mar 24, 2022

krwq removed the untriaged New issue has not been triaged by the area owner label Mar 24, 2022

krwq self-assigned this Aug 11, 2022

jeffhandley added the blocking-release label Aug 15, 2022

JulieLeeMSFT assigned AndyAyersMS Aug 23, 2022

AndyAyersMS mentioned this issue Sep 7, 2022

String.Replace(char, char) now slower in some cases #74771

Closed

AndyAyersMS added the area-GC-coreclr label Sep 9, 2022

AndyAyersMS removed the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 9, 2022

mrsharm assigned mrsharm and unassigned AndyAyersMS Sep 12, 2022

mangod9 closed this as completed Sep 30, 2022

ghost locked as resolved and limited conversation to collaborators Oct 30, 2022

jeffhandley removed refs/heads/main labels Dec 28, 2022

Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

Comments

performanceautofiler bot commented Feb 1, 2022

Run Information

Regressions in System.Xml.Linq.Perf_XName

Repro

Payloads

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

Docs

ghost commented Feb 1, 2022

Run Information

Regressions in System.Xml.Linq.Perf_XName

Repro

Payloads

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

Docs

ghost commented Feb 1, 2022

Run Information

Regressions in System.Xml.Linq.Perf_XName

Repro

Payloads

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

Docs

krwq commented Feb 17, 2022

adamsitnik commented Feb 17, 2022

eiriktsarpalis commented Aug 5, 2022

JulieLeeMSFT commented Aug 23, 2022

EgorBo commented Aug 23, 2022

JulieLeeMSFT commented Aug 23, 2022

AndyAyersMS commented Aug 26, 2022

AndyAyersMS commented Aug 29, 2022 • edited Loading

AndyAyersMS commented Aug 30, 2022

AndyAyersMS commented Sep 1, 2022

krwq commented Sep 1, 2022

AndyAyersMS commented Sep 7, 2022 • edited Loading

AndyAyersMS commented Sep 9, 2022 • edited Loading

ghost commented Sep 9, 2022

Run Information

Regressions in System.Xml.Linq.Perf_XName

Repro

Payloads

Histogram

System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString

Description of detection logic

Docs

JulieLeeMSFT commented Sep 12, 2022

mrsharm commented Sep 12, 2022

mrsharm commented Sep 13, 2022

Data

Without GC Normalization

Baseline - Segments

Regions

Regions w/ More Precise Writebarrier

With GC Normalization

Baseline - Segments

Regions

Regions w/ More Precise Writebarrier

mrsharm commented Sep 23, 2022 • edited Loading

Results

Details

mangod9 commented Sep 23, 2022

AndyAyersMS commented Sep 23, 2022

mangod9 commented Sep 23, 2022

mrsharm commented Sep 23, 2022

mangod9 commented Sep 30, 2022

AndyAyersMS commented Aug 29, 2022 •

edited

Loading

AndyAyersMS commented Sep 7, 2022 •

edited

Loading

AndyAyersMS commented Sep 9, 2022 •

edited

Loading

mrsharm commented Sep 23, 2022 •

edited

Loading