-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments #47866
Comments
No smoking gun but this change is perhaps the most likely relevant in the diff? @dotnet/jit-contrib thoughts? |
Could be. My guess is that that PR altered inlining and from there could be a number of things that impacted perf. Someone on codegen should follow up. |
OK, I've moved it to that area. |
I will take a look. There were some positive diffs for |
I studied this method, taken verbatim (except for the type parameter) from the benchmark, with the [MethodImpl(MethodImplOptions.NoInlining)]
private int IterateGetPosition(ReadOnlySequence<char> sequence)
{
int consume = 0;
SequencePosition position = sequence.Start;
int offset = (int)(sequence.Length / 10);
SequencePosition end = sequence.GetPosition(0, sequence.End);
while (!position.Equals(end))
{
position = sequence.GetPosition(offset, position);
consume += position.GetInteger();
}
return consume;
} The (new) folding for it kicks in 12 times during the compilation, but does not affect the inlining, and the final assembly diff, while present, is decidedly non-informative for why the regression is there: https://www.diffchecker.com/aAIwTy3K. Unfortunately, I do not have a Unix environment on which I could run the real benchmark (and get the actual assembly and perf numbers), so I will not be able to provide much more information on this. It looks like this could be related to alignment, but at the same time, the inner loop is very big. |
Thank you @SingleAccretion . I have gotten useful results doing perf measurements on WSL2, if you are interested in that option. |
I will try that option and see how far do I get. It may take a considerable amount of time 😄. |
After having set up WSL2 (Ubuntu LTS 20.04), I have confirmed that the assembly I have obtained via the AltJit is exactly the same as the one that BDN's from-memory disassembler produces: see the diff. To run the benchmarks, I used the following command line (note that I "restored" BDN's defaults to cut down on noise): dotnet run -c Release -f net6.0 --iterationTime 500 --maxIterationCount 100 --statisticalTest 3ms --disasm --filter 'System.Buffers.Tests.ReadOnlySequenceTests<char>.IterateGetPositionTenSegments' --corerun "~/source/dotnet/runtime/" "~/source/dotnet/runtime-fix/" Here the "base" - "runtime", is fc48ad5, "runtime-fix" - f6d8e88. I ran a few benchmarks, back to back, here are the results:
As can be seen, the regression does not reproduce reliably, only sometimes. I've swapped these two lines in the benchmark code: - int offset = (int)(sequence.Length / 10);
- SequencePosition end = sequence.GetPosition(0, sequence.End);
+ SequencePosition end = sequence.GetPosition(0, sequence.End);
+ int offset = (int)(sequence.Length / 10); This stabilized things somewhat:
My conclusion based on the above data and the fact that the benchmark code is strictly better as it has two less This looks and feels like an alignment problem, but aligning a loop this big does not seem like a good idea for the code at large. |
@kunalspathak please check the analysis from @SingleAccretion and see if we can close this issue. |
I will take a look sometime next week. |
I agree with @SingleAccretion . I am pasting the diff screenshot as the diff links above do not work. We have 2 less The benchmark overall history shows slight regression around that time, but the measurement is instable, so I won't rely too much on the numbers. The diff is 7ns which is in the error range. Closing the issue. |
@adamsitnik I am wondering whether this is still the case. I know you added memory randomization in BDN in Jan (dotnet/BenchmarkDotNet#1587) and I guess we pulled this in since. I also see you did dotnet/performance#1587 to move the allocs in this test into GlobalSetup so it would work. Am I right in thinking this is solved now? Not sure how to match the dates to @kunalspathak graph above though. |
We have not yet enabled data randomization -- I believe @DrewScoggins is about to turn it on for a few tests so we can get a feel for how it will impact our ability to understand perf in alignment-sensitive tests. |
Run Information
Regressions in System.Buffers.Tests.ReadOnlySequenceTests<Char>
Historical Data in Reporting System
Repro
.
Payloads
Baseline
Compare
Histogram
System.Buffers.Tests.ReadOnlySequenceTests.IterateGetPositionTenSegments
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: