-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure loop alignment's performance impact on Microbenchmarks #44051
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
All measurements done on:
QuickSortSpanThe benchmark did regressed after my 32B method alignment changes as I pointed in DrewScoggins/performance-2#2319 (comment). I thought that the loop alignment changes would help but because of #43713 we don’t record the loops in the benchmarks in IniArrayThis benchmark regressed after my 32B method alignment changes as I pointed out in DrewScoggins/performance-2#2267 (comment). I did tested my loop alignment changes on this benchmark and it gives us 2X speed improvement as I wrote in DrewScoggins/performance-2#2267 (comment). SequenceCompareToThis benchmark shows regression after my 32B method alignment changes but it is in the bimodal level. I have given my analysis in DrewScoggins/performance-2#2286 (comment), but there is not much actionable at this point. I didn’t get chance to verify my loop alignment changes on this one yet. LoopReturnThis benchmark shows 20% win after my loop alignment changes. I have shared some data in DrewScoggins/performance-2#2674 (comment). |
All measurements done on:
CSieveThis benchmark shows around 6% improvement with loop alignment changes:
Assembly code G_M3098_IG04 and G_M3098_IG08
Assembly code G_M3098_IG04 and G_M3098_IG08
HeapSortI don't see much changes in this benchmark, however I do see loops getting aligned which makes me wonder if there is some data alignment that is going wrong. Below is section of assembly code of Assembly code G_M60435_IG04 (before loop alignment)
Assembly code G_M60435_IG04 (before loop alignment)
MulMatrixI see slight improvement of approx. 1% (min/max improves as well). Before:
After:
Below is some portion of assembly code for Assembly code after loop alignment
Assembly code after loop alignment
Base64EncodeInPlaceI noticed that the benchmark code is aligned at 32B boundary. The loop inside the benchmark is big and it doesn't meet the current threshold of 96 bytes (3 * 32B chunks) to get aligned. However, looking at the bimodal behavior of test, it looks like this benchmark is hitting data alignment issue as I called out in DrewScoggins/performance-2#2291 (comment). IndexerCheckPathLengthMy method alignment changes affected this benchmark because the loop was earlier in one chunk of 32B and it got pushed to be in two chunks instead. As described here, DrewScoggins/performance-2#2290 (comment), the loop alignment should have got back the performance, but I am not sure if it helps because of benchSparseMultThe loop alignment changes has no effect on loops because it is already aligned correctly. You can see the perf numbers in DrewScoggins/performance-2#2271 (comment). That makes me wonder if this is another memory alignment related issue that depends on the alignment of arrays created for the benchmark System.Collections.IterateForEach.ImmutableArrayI am still investigating this one. As seen in my comments in DrewScoggins/performance-2#2285 (comment), the loop alignment won't help here because the loop is already aligned correctly. System.Collections.ContainsTrue.SpanAs seen in data I shared at DrewScoggins/performance-2#2284 (comment) , I see negligible improvement in this benchmark but was hoping to see some more. However, as Andy calls out:
System.Collections.ContainsKeyFalse<Int32, Int32>.ImmutableDictionaryThis one surprisingly sees regression as I pointed in DrewScoggins/performance-2#2281 (comment) . Andy made a good point if it is suffering from JCC erratum. I am doing some experiments to test this out. |
@dotnet/jit-contrib , @adamsitnik |
@kunalspathak thanks for the great analysis and being so transparent with all the data! I think that it would be valuable to run all the benchmarks we have with and without your loop alignment changes. Using the ResultsComparer could give us a good overview of how many benchmarks would improve and regress in total. The tool also allows getting top X best|worst differences so it could help us to find other benchmarks affected by the alignment.
Please be warned, running all of them takes around 5-6 hours ;) |
Thanks a lot @adamsitnik for the suggestion. I ran the benchmarks for various heuristics that I was trying and the |
System.Collections.IndexerSet.Span(Size: 512)This benchmark improved by approx. 36%.
The benchmark appears to be bimodal as seen below, and data alignment could also be the contributor to the bimodality. But on my machine, I saw consistent improvement after loop alignment changes. Looking at the disassembly, the improvements were coming out because of alignment of loop Assembly code before loop alignment
Assembly code after loop alignment
|
PerfLabTests.GetMemberBefore loop alignment:
After loop alignment:
It is surprising that the Allocated is different between 2 runs and not sure why. I tried dumping disassembly of all the methods to see which methods got loop aligned, and there were not much benchmark related. Methods where alignment was done
Same thing with |
In theory, it should never happen, but it can be caused by the Tiered JIT background thread allocating memory like in dotnet/BenchmarkDotNet#1542 As long as it's rare you can ignore it (we would need to use a memory profiler and see what exactly is being allocated in both runs) |
Most of the analysis is done and "loop alignment" feature will be ON in .NET 6.0. |
As mentioned in #43227, one of the task we have identified for stabilizing the performance is performing loop alignment for hot loops. Since the alignment can be very sensitive for the performance, this issue tracks the investigation work we will be doing to measure the impact of loop alignment work on Microbenchmarks. All the findings will be tracked in this issue.
Currently, we are tracking the benchmarks that our performance team has identified to have effect of alignment. The list of issues can be seen at https://github.com/drewScoggins/performance-2/issues?q=is%3Aopen+is%3Aissue+label%3AAlignment
The text was updated successfully, but these errors were encountered: