[Perf] Windows/x64: 4 Regressions on 5/2/2023 10:35:24 AM #85987

performanceautofiler · 2023-05-09T11:26:55Z

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.ReadOnlySpan

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Trim - Duration of single invocation	1.21 ns	6.16 ns	5.08	0.06	False	25.748831262174555	32.550956279423076	1.26417218505917)	Trace	Trace

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.ReadOnlySpan*'

Payloads

Baseline
Compare

Histogram

System.Memory.ReadOnlySpan.Trim(input: "")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.1584372859667615 > 1.2741125892426048.
IsChangePoint: Marked as a change because one of 4/1/2023 8:54:51 PM, 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -100.89420966987244 (T) = (0 -6.161853322392986) / Math.Sqrt((0.03688747735264881 / (16)) + (5.8223116837808174E-05 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -3.6894142377733004 = (1.3139921128654337 - 6.161853322392986) / 1.3139921128654337 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Boolean

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
TryParse - Duration of single invocation	8.23 ns	16.09 ns	1.95	0.08	True	107.86852800069764	103.31347656402268	0.9577721924911639)	Trace	Trace
Parse - Duration of single invocation	11.15 ns	23.30 ns	2.09	0.04	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Boolean*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Boolean.TryParse(value: "Bogus")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 16.09071662999389 > 8.645237780024415.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -434.8770093633579 (T) = (0 -16.110760034502622) / Math.Sqrt((0.001710119303328648 / (16)) + (0.0039529119334182775 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -0.9521170175578498 = (8.252968387447188 - 16.110760034502622) / 8.252968387447188 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked as regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

System.Tests.Perf_Boolean.Parse(value: " True ")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 23.3007016972274 > 11.713417691505466.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -237.4715523711942 (T) = (0 -23.51061790159726) / Math.Sqrt((0.001930512359178462 / (16)) + (0.04661613103783327 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -1.1090681140482666 = (11.147396210200926 - 23.51061790159726) / 11.147396210200926 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked as regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_BigInteger

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Ctor_ByteArray - Duration of single invocation	12.28 ns	14.23 ns	1.16	0.10	False	150.3472191466558	156.214019115072	1.0390216726435988)	Trace	Trace

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_BigInteger*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_BigInteger.Ctor_ByteArray(numberString: -2147483648)

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.229877638173855 > 12.895417745614084.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -17.21045122210507 (T) = (0 -14.152620211094012) / Math.Sqrt((0.013736477857531142 / (16)) + (0.21019152484066211 / (21))) is less than -2.0301079282477414 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (21) - 2, .025) and -0.14517624476570484 = (12.358464712992314 - 14.152620211094012) / 12.358464712992314 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

cincuranet · 2023-05-09T16:30:57Z

Commit range is 3e8f17a...4772b5d. Maybe #85620, @jakobbotsch?

jakobbotsch · 2023-05-09T16:48:58Z

Very possible, I'll take a look

ghost · 2023-05-10T15:33:19Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.ReadOnlySpan

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Trim - Duration of single invocation	1.21 ns	6.16 ns	5.08	0.06	False	25.748831262174555	32.550956279423076	1.26417218505917)	Trace	Trace

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.ReadOnlySpan*'

Payloads

Baseline
Compare

Histogram

System.Memory.ReadOnlySpan.Trim(input: "")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 6.1584372859667615 > 1.2741125892426048.
IsChangePoint: Marked as a change because one of 4/1/2023 8:54:51 PM, 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -100.89420966987244 (T) = (0 -6.161853322392986) / Math.Sqrt((0.03688747735264881 / (16)) + (5.8223116837808174E-05 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -3.6894142377733004 = (1.3139921128654337 - 6.161853322392986) / 1.3139921128654337 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Boolean

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
TryParse - Duration of single invocation	8.23 ns	16.09 ns	1.95	0.08	True	107.86852800069764	103.31347656402268	0.9577721924911639)	Trace	Trace
Parse - Duration of single invocation	11.15 ns	23.30 ns	2.09	0.04	True

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Boolean*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Boolean.TryParse(value: "Bogus")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 16.09071662999389 > 8.645237780024415.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -434.8770093633579 (T) = (0 -16.110760034502622) / Math.Sqrt((0.001710119303328648 / (16)) + (0.0039529119334182775 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -0.9521170175578498 = (8.252968387447188 - 16.110760034502622) / 8.252968387447188 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked as regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

System.Tests.Perf_Boolean.Parse(value: " True ")

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 23.3007016972274 > 11.713417691505466.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -237.4715523711942 (T) = (0 -23.51061790159726) / Math.Sqrt((0.001930512359178462 / (16)) + (0.04661613103783327 / (18))) is less than -2.03693334345674 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (18) - 2, .025) and -1.1090681140482666 = (11.147396210200926 - 23.51061790159726) / 11.147396210200926 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked as regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	da0aa0cb6944dd49d6c1d1859c4530fe7e38b76f
Compare	c62f69be1405a8e41b56ffc05f22d791bf4c7d2d
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_BigInteger

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Ctor_ByteArray - Duration of single invocation	12.28 ns	14.23 ns	1.16	0.10	False	150.3472191466558	156.214019115072	1.0390216726435988)	Trace	Trace

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Payloads

Baseline
Compare

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_BigInteger*'

Payloads

Baseline
Compare

Histogram

System.Numerics.Tests.Perf_BigInteger.Ctor_ByteArray(numberString: -2147483648)

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.229877638173855 > 12.895417745614084.
IsChangePoint: Marked as a change because one of 5/2/2023 4:25:00 AM, 5/9/2023 7:24:34 AM falls between 4/30/2023 6:17:41 PM and 5/9/2023 7:24:34 AM.
IsRegressionStdDev: Marked as regression because -17.21045122210507 (T) = (0 -14.152620211094012) / Math.Sqrt((0.013736477857531142 / (16)) + (0.21019152484066211 / (21))) is less than -2.0301079282477414 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (16) + (21) - 2, .025) and -0.14517624476570484 = (12.358464712992314 - 14.152620211094012) / 12.358464712992314 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Baseline
Compare
Diff

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	jakobbotsch
Labels:	`os-windows`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `needs-area-label`
Milestone:	-

jakobbotsch · 2023-05-15T10:58:45Z

I have missed a check for a local store in PR #85620 -- we leave

               [000092] -A-XG------                         ▌  STORE_BLK struct<System.ReadOnlySpan`1, 16> (copy)
               [000090] -----+-----                         ├──▌  LCL_VAR   byref  V00 RetBuf       
               [000091] ----G+-----                         └──▌  LCL_VAR   struct<System.ReadOnlySpan`1, 16>(AX)(P) V11 tmp9         
                                                               ▌    byref  V11.System.ReadOnlySpan`1[ushort]:_reference (offs=0x00) -> V40 tmp38        
                                                               ▌    int    V11.System.ReadOnlySpan`1[ushort]:_length (offs=0x08) -> V41 tmp39

as a block copy after that PR and since the destination can be arbitrary heap memory that now requires a helper (that we didn't previously need). Well, not fundamentally, block copying in the backend could be smarter, but currently isn't.

jakobbotsch · 2023-05-15T12:01:22Z

I think #80086 tracks making it smarter. The cases that are regressing here seem to end up with helper calls for Span<T>/ReadOnlySpan<T> for the byref field, even though that is unnecessary. Let me try to fix that instead.

jakobbotsch · 2023-05-15T13:02:23Z

Even with #80086 fixed we end up failling back to movsq inside xarch's genCodeForCpObj, which appears to be very slow. Looking at the System.Memory.ReadOnlySpan benchmark and comparing base (current main):

       mov      rdi, rbx
       lea      rsi, bword ptr [rsp+30H]
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq

to diff (#80086 fixed)

G_M48932_IG12:  ;; offset=00B5H
       mov      rdi, rbx
       lea      rsi, bword ptr [rsp+30H]
       movsq
       movsq

to diff2 (morphing to field-by-field copy, original codegen)

       mov      rax, bword ptr [rsp+30H]
       mov      bword ptr [rsi], rax
       mov      eax, dword ptr [rsp+38H]
       mov      dword ptr [rsi+08H], eax

gives me

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
Trim	Job-FJFRPI	base\corerun.exe	4.2079 ns	0.0115 ns	0.0107 ns	4.2093 ns	4.1878 ns	4.2257 ns	1.00	-	NA
Trim	Job-OPHUHH	diff\corerun.exe	4.4444 ns	0.0294 ns	0.0275 ns	4.4376 ns	4.4005 ns	4.4931 ns	1.06	-	NA
Trim	Job-IQSHDD	diff2\corerun.exe	0.8768 ns	0.0117 ns	0.0104 ns	0.8727 ns	0.8663 ns	0.8998 ns	0.21	-	NA

The movsq inefficiency looks like #7469... we should probably reprioritize that issue given that Span<T>/ReadOnlySpan<T> falls into the category of "needs atomic field copies but does not require write barrier".

For the time being I will just revert parts of #85620 by allowing field-by-field morphing when the destination is potential heap and the source is a local.

jakobbotsch · 2023-05-17T13:00:43Z

Keeping this open until I can verify the graphs are back to the old perf levels.

jakobbotsch · 2023-05-19T09:29:50Z

System.Memory.ReadOnlySpan.Trim is fixed, but there is still a significant regression for System.Tests.Perf_Boolean.TryParse. The codegen diff there is https://www.diffchecker.com/5Qg6aVNW/ which gives the following perf differences on my machine:

Method	Job	Toolchain	value	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
TryParse	Job-PWSREE	base\corerun.exe	Bogus	4.382 ns	0.0249 ns	0.0233 ns	4.385 ns	4.353 ns	4.427 ns	1.00	0.00	-	NA
TryParse	Job-QMXGIG	diff\corerun.exe	Bogus	11.154 ns	0.1018 ns	0.0952 ns	11.147 ns	11.019 ns	11.312 ns	2.55	0.03	-	NA

The diff looks like what I would expect, it replaces field-by-field copies using GPR registers with a single SIMD copy. @EgorBo is it expected that this is so much slower than using multiple GPR registers?

EgorBo · 2023-05-19T09:35:52Z

System.Memory.ReadOnlySpan.Trim is fixed, but there is still a significant regression for System.Tests.Perf_Boolean.TryParse. The codegen diff there is https://www.diffchecker.com/5Qg6aVNW/ which gives the following perf differences on my machine:

Method Job Toolchain value Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
TryParse Job-PWSREE base\corerun.exe Bogus 4.382 ns 0.0249 ns 0.0233 ns 4.385 ns 4.353 ns 4.427 ns 1.00 0.00 - NA
TryParse Job-QMXGIG diff\corerun.exe Bogus 11.154 ns 0.1018 ns 0.0952 ns 11.147 ns 11.019 ns 11.312 ns 2.55 0.03 - NA
The diff looks like what I would expect, it replaces field-by-field copies using GPR registers with a single SIMD copy. @EgorBo is it expected that this is so much slower than using multiple GPR registers?

I don't see these regressions in the FullPGO win-x64 runs: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows%2010.0.18362_PGOType%3Dfullpgo/AllTestindex.html so maybe it's just some intel erratum issue or something like that?

jakobbotsch · 2023-05-19T09:40:56Z

I don't see these regressions in the FullPGO win-x64 runs: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows%2010.0.18362_PGOType%3Dfullpgo/AllTestindex.html so maybe it's just some intel erratum issue or something like that?

I think you improved it subsequently, but you can see that my "fix" PR only was a minor improvement on the graph, and didn't exactly return it back to previous levels. My table above is from my own machine (5950X).

I think it's likely the same kind of store-forwarding problem that @AndyAyersMS saw recently. Since the Span<T> length is 4 bytes we have a previous store like:

mov      dword ptr [rsp+68H], eax

and then

vmovdqu  xmm0, xmmword ptr [rsp+60H]
vmovdqu  xmmword ptr [rsp+50H], xmm0

is significantly worse compared to the original

	mov      rdx, bword ptr [rsp+60H]	
       mov      bword ptr [rsp+50H], rdx	
       mov      edx, dword ptr [rsp+68H]	
       mov      dword ptr [rsp+58H], edx

. The latter only reads the 4 bytes that were previously written, so there is no stall.

Of course this problem is not limited to structures with GC pointers, so the heuristic in block morphing was really just getting lucky here...

jakobbotsch · 2023-05-19T09:44:43Z

OTOH we do zero the full structure in the prolog, so I'm not sure if it's store-forwarding after all (can the CPU piece together two separate stores?). Will try to see if I can check some of the hardware counters.

EgorBo · 2023-05-19T10:28:54Z

Maybe for 2 simd loads we recieve a worse penalty for crossing cache line boundary?

Interesting, didn't realize stole-forwarding is such a problem (if it is)

jakobbotsch · 2023-05-19T10:43:55Z

Maybe for 2 simd loads we recieve a worse penalty for crossing cache line boundary?

That's also possible, is the penalty supposed to be this large?

I looked at the base/diff in vtune (couldn't get µProf to work). Base is before #85620, diff is the same commit but with #86246 manually applied.

The diff shows:

Compared to the base:

However, the "loads blocked by store forwarding" does not really show up where I would expect it, it shows up in System.Boolean.TrimWhiteSpaceAndNull:

The base has the same exact assembly but no "loads blocked by store forwarding", so maybe there's just some drift or misattribution going on by vtune:

Seems odd... let me retry some runs with memory randomization.

EgorBo · 2023-05-19T10:49:52Z

We had quite a few regressions in the past where we had no good explanation (becuase codegen was the same) and we ended up blaming code layout in the loader heap (how jitted functions located)/GC.

jakobbotsch · 2023-05-19T11:31:36Z

Seems like with the block copy the CPU is stuck waiting to resolve that compare/conditional branch, the CPI is awful compared to the base. If I then choose to expand specifically that one with field-by-field copies, it improves, but then the following block copy (that I left) becomes the bottle neck.

Sadly we don't really have the framework necessary to analyze this and make a smart decision. So I need to consider whether I should fully revert the change (and accept that we cannot really touch that heuristic) or not.

AndyAyersMS · 2023-05-19T13:39:27Z

OTOH we do zero the full structure in the prolog, so I'm not sure if it's store-forwarding after all (can the CPU piece together two separate stores?). Will try to see if I can check some of the hardware counters.

You mean a pattern like (wide-store, narrow-store, wide-load)? It is possible the HW can merge stores I suppose, or maybe forward from multiple outstanding stores, if all this happens in a close sequence. But the commentary on https://stackoverflow.com/questions/46135766/can-modern-x86-implementations-store-forward-from-more-than-one-prior-store would suggest it is unlikely.

Seems odd... let me retry some runs with memory randomization.

I wonder if you are seeing some kind of severe sample skid. Would not ever expect a narrow (byte) load to be impacted by a store forwarding stall.

jakobbotsch · 2023-05-19T14:12:10Z

You mean a pattern like (wide-store, narrow-store, wide-load)? It is possible the HW can merge stores I suppose, or maybe forward from multiple outstanding stores, if all this happens in a close sequence. But the commentary on https://stackoverflow.com/questions/46135766/can-modern-x86-implementations-store-forward-from-more-than-one-prior-store would suggest it is unlikely.

Yeah, that's what I meant.

I wonder if you are seeing some kind of severe sample skid. Would not ever expect a narrow (byte) load to be impacted by a store forwarding stall.

Seems likely to me also.

jakobbotsch · 2023-06-29T14:54:38Z

I'm going to call this last System.Tests.Perf_Boolean.TryParse benchmark fixed by improvements elsewhere (in this case enabling of tiered PGO). I think unifying the logic for types with and without GC pointers is general goodness and if we want to be smarter we should explicitly try to model some of these concerns about overlapping field and block stores.

performanceautofiler bot assigned AndyAyersMS May 9, 2023

performanceautofiler bot added arch-x64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels May 9, 2023

cincuranet removed the untriaged New issue has not been triaged by the area owner label May 9, 2023

cincuranet transferred this issue from dotnet/perf-autofiling-issues May 9, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 9, 2023

ghost added the untriaged New issue has not been triaged by the area owner label May 9, 2023

jakobbotsch assigned jakobbotsch and unassigned AndyAyersMS May 9, 2023

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 10, 2023

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 11, 2023

JulieLeeMSFT added this to the 8.0.0 milestone May 11, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label May 11, 2023

jakobbotsch mentioned this issue May 15, 2023

JIT: Fix new helper calls for some block copies involving promoted locals #86246

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 15, 2023

jakobbotsch closed this as completed in #86246 May 15, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 15, 2023

jakobbotsch reopened this May 17, 2023

markples mentioned this issue May 19, 2023

Possible optimisation for derefencing a span pointer #80086

Closed

kunalspathak mentioned this issue Jun 6, 2023

[Perf] Linux/arm64: 8 Regressions on 5/2/2023 12:51:17 PM dotnet/perf-autofiling-issues#17666

Closed

jakobbotsch added the Priority:2 Work that is important, but not critical for the release label Jun 20, 2023

jakobbotsch closed this as completed Jun 29, 2023

ghost locked as resolved and limited conversation to collaborators Jul 29, 2023

[Perf] Windows/x64: 4 Regressions on 5/2/2023 10:35:24 AM #85987

[Perf] Windows/x64: 4 Regressions on 5/2/2023 10:35:24 AM #85987

Comments

performanceautofiler bot commented May 9, 2023

Run Information

Regressions in System.Memory.ReadOnlySpan

Repro

Payloads

Payloads

Histogram

System.Memory.ReadOnlySpan.Trim(input: "")

Description of detection logic

JIT Disasms

Docs

Run Information

Regressions in System.Tests.Perf_Boolean

Repro

Payloads

Payloads

Histogram

System.Tests.Perf_Boolean.TryParse(value: "Bogus")

Description of detection logic

JIT Disasms

System.Tests.Perf_Boolean.Parse(value: " True ")

Description of detection logic

JIT Disasms

Docs

Run Information

Regressions in System.Numerics.Tests.Perf_BigInteger

Repro

Payloads

Payloads

Histogram

System.Numerics.Tests.Perf_BigInteger.Ctor_ByteArray(numberString: -2147483648)

Description of detection logic

JIT Disasms

Docs

cincuranet commented May 9, 2023

jakobbotsch commented May 9, 2023

ghost commented May 10, 2023

Run Information

Regressions in System.Memory.ReadOnlySpan

Repro

Payloads

Payloads

Histogram

System.Memory.ReadOnlySpan.Trim(input: "")

Description of detection logic

JIT Disasms

Docs

Run Information

Regressions in System.Tests.Perf_Boolean

Repro

Payloads

Payloads

Histogram

System.Tests.Perf_Boolean.TryParse(value: "Bogus")

Description of detection logic

JIT Disasms

System.Tests.Perf_Boolean.Parse(value: " True ")

Description of detection logic

JIT Disasms

Docs

Run Information

Regressions in System.Numerics.Tests.Perf_BigInteger

Repro

Payloads

Payloads

Histogram

System.Numerics.Tests.Perf_BigInteger.Ctor_ByteArray(numberString: -2147483648)

Description of detection logic

JIT Disasms

Docs

jakobbotsch commented May 15, 2023

jakobbotsch commented May 15, 2023

jakobbotsch commented May 15, 2023 • edited Loading

jakobbotsch commented May 17, 2023

jakobbotsch commented May 19, 2023

EgorBo commented May 19, 2023

jakobbotsch commented May 19, 2023 • edited Loading

jakobbotsch commented May 15, 2023 •

edited

Loading

jakobbotsch commented May 19, 2023 •

edited

Loading

jakobbotsch commented May 19, 2023 •

edited

Loading

AndyAyersMS commented May 19, 2023 •

edited

Loading