[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866

DrewScoggins · 2021-02-04T19:31:24Z

Run Information

Architecture	x64
OS	ubuntu 18.04
Baseline	2f2593177dafbe702407fe0b7ac156a7829b7ee6
Compare	6cf1b8ec012d52880d46fa4773f60ed52ddc9f3d
Diff	Link

Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>

Benchmark	Baseline	Test	Test/Base	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
IterateGetPositionTenSegments	63.58 ns	70.75 ns	1.11

Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Buffers.Tests.ReadOnlySequenceTests&lt;Char&gt;*'

.

Payloads

Baseline
Compare

Histogram

System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionTenSegments

[61.840 ; 64.058) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[64.058 ; 65.542) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[65.542 ; 68.087) | @@@@@@@@@
[68.087 ; 69.574) | 
[69.574 ; 71.007) | @@@@@@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

danmoseley · 2021-02-04T23:56:03Z

No smoking gun but this change is perhaps the most likely relevant in the diff?

f6d8e88

@dotnet/jit-contrib thoughts?

AndyAyersMS · 2021-02-05T00:15:24Z

Could be. My guess is that that PR altered inlining and from there could be a number of things that impacted perf.

Someone on codegen should follow up.

danmoseley · 2021-02-05T00:21:36Z

OK, I've moved it to that area.

SingleAccretion · 2021-02-05T12:41:54Z

I will take a look. There were some positive diffs for SequenceReader on Windows x64...

SingleAccretion · 2021-02-05T14:58:23Z

I studied this method, taken verbatim (except for the type parameter) from the benchmark, with the unix_x64_x64 AltJit:

[MethodImpl(MethodImplOptions.NoInlining)]
private int IterateGetPosition(ReadOnlySequence<char> sequence)
{
    int consume = 0;

    SequencePosition position = sequence.Start;
    int offset = (int)(sequence.Length / 10);
    SequencePosition end = sequence.GetPosition(0, sequence.End);

    while (!position.Equals(end))
    {
        position = sequence.GetPosition(offset, position);
        consume += position.GetInteger();
    }

    return consume;
}

The (new) folding for it kicks in 12 times during the compilation, but does not affect the inlining, and the final assembly diff, while present, is decidedly non-informative for why the regression is there: https://www.diffchecker.com/aAIwTy3K.

Unfortunately, I do not have a Unix environment on which I could run the real benchmark (and get the actual assembly and perf numbers), so I will not be able to provide much more information on this. It looks like this could be related to alignment, but at the same time, the inner loop is very big.

danmoseley · 2021-02-05T16:54:24Z

Thank you @SingleAccretion .

I have gotten useful results doing perf measurements on WSL2, if you are interested in that option.

SingleAccretion · 2021-02-05T17:08:04Z

I will try that option and see how far do I get. It may take a considerable amount of time 😄.

SingleAccretion · 2021-02-07T00:00:04Z

After having set up WSL2 (Ubuntu LTS 20.04), I have confirmed that the assembly I have obtained via the AltJit is exactly the same as the one that BDN's from-memory disassembler produces: see the diff.

To run the benchmarks, I used the following command line (note that I "restored" BDN's defaults to cut down on noise):

dotnet run -c Release -f net6.0 --iterationTime 500 --maxIterationCount 100  --statisticalTest 3ms --disasm --filter 'System.Buffers.Tests.ReadOnlySequenceTests<char>.IterateGetPositionTenSegments' --corerun "~/source/dotnet/runtime/" "~/source/dotnet/runtime-fix/"

Here the "base" - "runtime", is fc48ad5, "runtime-fix" - f6d8e88. I ran a few benchmarks, back to back, here are the results:

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	120.0 ns	2.42 ns	3.15 ns	119.1 ns	116.6 ns	128.1 ns	1.02	Same	0.03	778 B
IterateGetPositionTenSegments	/runtime/	117.2 ns	2.26 ns	2.42 ns	116.6 ns	114.0 ns	122.3 ns	1.00	Base	0.00	784 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	114.8 ns	2.27 ns	2.87 ns	115.3 ns	111.2 ns	119.1 ns	1.00	Same	0.04	778 B
IterateGetPositionTenSegments	/runtime/	114.1 ns	2.29 ns	3.14 ns	113.6 ns	110.6 ns	121.8 ns	1.00	Base	0.00	784 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	115.8 ns	2.24 ns	2.10 ns	115.8 ns	111.3 ns	119.4 ns	0.99	Same	0.03	778 B
IterateGetPositionTenSegments	/runtime/	116.4 ns	2.33 ns	2.68 ns	116.2 ns	113.4 ns	122.1 ns	1.00	Base	0.00	784 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	120.9 ns	2.45 ns	3.67 ns	120.9 ns	114.4 ns	128.5 ns	1.09	Same	0.04	778 B
IterateGetPositionTenSegments	/runtime/	112.2 ns	2.26 ns	2.69 ns	110.7 ns	109.8 ns	118.7 ns	1.00	Base	0.00	784 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	118.6 ns	2.41 ns	3.53 ns	116.8 ns	114.9 ns	127.1 ns	1.06	Same	0.02	778 B
IterateGetPositionTenSegments	/runtime/	112.7 ns	2.20 ns	2.86 ns	111.2 ns	110.6 ns	119.3 ns	1.00	Base	0.00	784 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	115.4 ns	2.32 ns	3.01 ns	115.0 ns	111.7 ns	120.5 ns	1.01	Same	0.04	778 B
IterateGetPositionTenSegments	/runtime/	112.8 ns	2.28 ns	4.16 ns	110.1 ns	108.9 ns	121.9 ns	1.00	Base	0.00	784 B

As can be seen, the regression does not reproduce reliably, only sometimes. I've swapped these two lines in the benchmark code:

- int offset = (int)(sequence.Length / 10);
- SequencePosition end = sequence.GetPosition(0, sequence.End);
+ SequencePosition end = sequence.GetPosition(0, sequence.End);
+ int offset = (int)(sequence.Length / 10);

This stabilized things somewhat:

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	110.8 ns	2.20 ns	2.16 ns	110.6 ns	108.5 ns	114.8 ns	0.99	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	108.5 ns	2.19 ns	4.05 ns	106.3 ns	104.6 ns	117.2 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	Code Size
IterateGetPositionTenSegments	/runtime-fix/	104.3 ns	0.61 ns	0.51 ns	104.3 ns	103.5 ns	105.6 ns	0.99	Same	761 B
IterateGetPositionTenSegments	/runtime/	105.3 ns	0.49 ns	0.46 ns	105.2 ns	104.6 ns	106.1 ns	1.00	Base	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	109.3 ns	0.63 ns	0.49 ns	109.2 ns	108.7 ns	110.3 ns	1.01	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	107.8 ns	2.16 ns	2.49 ns	108.7 ns	104.5 ns	112.5 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	108.3 ns	2.12 ns	1.66 ns	108.6 ns	104.8 ns	110.6 ns	0.96	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	113.0 ns	2.27 ns	2.33 ns	113.6 ns	109.7 ns	117.3 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	108.8 ns	2.10 ns	3.07 ns	109.3 ns	105.1 ns	115.3 ns	1.01	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	107.9 ns	2.17 ns	2.90 ns	106.3 ns	104.6 ns	113.8 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	110.7 ns	2.19 ns	1.94 ns	110.3 ns	108.4 ns	114.4 ns	0.99	Same	0.04	761 B
IterateGetPositionTenSegments	/runtime/	108.2 ns	2.19 ns	3.83 ns	105.5 ns	104.3 ns	116.0 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	111.2 ns	2.23 ns	2.48 ns	110.4 ns	108.3 ns	116.6 ns	1.03	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	107.8 ns	2.15 ns	2.47 ns	106.8 ns	105.0 ns	111.9 ns	1.00	Base	0.00	767 B

Method	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	MannWhitney(3ms)	RatioSD	Code Size
IterateGetPositionTenSegments	/runtime-fix/	106.2 ns	2.09 ns	2.64 ns	104.9 ns	103.4 ns	111.6 ns	0.99	Same	0.03	761 B
IterateGetPositionTenSegments	/runtime/	107.7 ns	2.17 ns	2.82 ns	106.8 ns	104.5 ns	113.3 ns	1.00	Base	0.00	767 B

My conclusion based on the above data and the fact that the benchmark code is strictly better as it has two less movs, is that this is not a real product regression and the issue can be closed.

This looks and feels like an alignment problem, but aligning a loop this big does not seem like a good idea for the code at large.

danmoseley · 2021-02-07T04:04:12Z

Cc @kunalspathak

JulieLeeMSFT · 2021-02-09T03:10:45Z

@kunalspathak please check the analysis from @SingleAccretion and see if we can close this issue.

kunalspathak · 2021-02-12T09:19:21Z

I will take a look sometime next week.

kunalspathak · 2021-04-28T00:16:37Z

I agree with @SingleAccretion . I am pasting the diff screenshot as the diff links above do not work.

We have 2 less mov after the change. These mov are not even in a loop so it shouldn't matter much. This test is sensitive to data alignment because it operates on char array that is allocated and passed as an input to the benchmark. Further, it seeks to various positions of the memory in the benchmark.

The benchmark overall history shows slight regression around that time, but the measurement is instable, so I won't rely too much on the numbers. The diff is 7ns which is in the error range. Closing the issue.

danmoseley · 2021-04-28T01:02:07Z

This test is sensitive to data alignment because it operates on char array that is allocated and passed as an input to the benchmark.

@adamsitnik I am wondering whether this is still the case. I know you added memory randomization in BDN in Jan (dotnet/BenchmarkDotNet#1587) and I guess we pulled this in since. I also see you did dotnet/performance#1587 to move the allocs in this test into GlobalSetup so it would work. Am I right in thinking this is solved now? Not sure how to match the dates to @kunalspathak graph above though.

AndyAyersMS · 2021-04-28T02:33:02Z

We have not yet enabled data randomization -- I believe @DrewScoggins is about to turn it on for a few tests so we can get a feel for how it will impact our ability to understand perf in alignment-sensitive tests.

DrewScoggins added os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-x64 labels Feb 4, 2021

dotnet-issue-labeler bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Feb 4, 2021

danmoseley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Threading labels Feb 5, 2021

JulieLeeMSFT assigned SingleAccretion Feb 5, 2021

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 5, 2021

JulieLeeMSFT added this to the 6.0.0 milestone Feb 5, 2021

JulieLeeMSFT assigned kunalspathak and unassigned SingleAccretion Feb 9, 2021

kunalspathak closed this as completed Apr 28, 2021

ghost locked as resolved and limited conversation to collaborators May 28, 2021

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866

[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866

DrewScoggins commented Feb 4, 2021

Payloads

Histogram

System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionTenSegments

Docs

danmoseley commented Feb 4, 2021

AndyAyersMS commented Feb 5, 2021

danmoseley commented Feb 5, 2021

SingleAccretion commented Feb 5, 2021 •

edited

Loading

SingleAccretion commented Feb 5, 2021 •

edited

Loading

danmoseley commented Feb 5, 2021

SingleAccretion commented Feb 5, 2021

SingleAccretion commented Feb 7, 2021 •

edited

Loading

danmoseley commented Feb 7, 2021

JulieLeeMSFT commented Feb 9, 2021

kunalspathak commented Feb 12, 2021

kunalspathak commented Apr 28, 2021

danmoseley commented Apr 28, 2021

AndyAyersMS commented Apr 28, 2021

[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866

[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866

Comments

DrewScoggins commented Feb 4, 2021

Run Information

Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>

Repro

Payloads

Histogram

System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionTenSegments

Docs

danmoseley commented Feb 4, 2021

AndyAyersMS commented Feb 5, 2021

danmoseley commented Feb 5, 2021

SingleAccretion commented Feb 5, 2021 • edited Loading

SingleAccretion commented Feb 5, 2021 • edited Loading

danmoseley commented Feb 5, 2021

SingleAccretion commented Feb 5, 2021

SingleAccretion commented Feb 7, 2021 • edited Loading

danmoseley commented Feb 7, 2021

JulieLeeMSFT commented Feb 9, 2021

kunalspathak commented Feb 12, 2021

kunalspathak commented Apr 28, 2021

danmoseley commented Apr 28, 2021

AndyAyersMS commented Apr 28, 2021

SingleAccretion commented Feb 5, 2021 •

edited

Loading

SingleAccretion commented Feb 5, 2021 •

edited

Loading

SingleAccretion commented Feb 7, 2021 •

edited

Loading