-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression of System.MemoryExtensions.SequenceEqual measured on .NET 8.0 Intel Core i7-7700HQ #95346
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionHello guys, I've run the test several times, the issue is reproducible. ConfigurationBenchmarkDotNet v0.13.10, Windows 10 (10.0.19045.3693/22H2/2022Update) Data
I've attached the code and BenchmarkDotNet.Artifacts here AnalysisI've checked the asm listings.
|
Unlikely to be related to the different |
It could potentially be related to alignment or the jcc erratum (#93243) On most modern hardware, the various |
This is very likely the In .NET 8.0, the branch to the loop start overlays a 32-byte boundary, when the loop is 32-byte aligned: ;; .NET 8.0
vmovups ymm0, ymmword ptr[rcx+rax]
vpcmpeqb ymm0, ymm0, ymmword ptr[rdx+rax]
vpmovmskb r10d, ymm0
cmp r10d, -1
jne notequal
add rax, 16
cmp r8, rax
ja short loopstart
endofloop: ;; offset=0x0021 .NET 7.0
loopstart: ;; offset=0x0000
vmovdqu ymm0, ymmword ptr[rcx+rax]
vpcmpeqb ymm0, ymm0, ymmword ptr[rdx+rax]
vpmovmskb r9d, ymm0
cmp r9d, -1
jne short notequal
add rax, 16
cmp r8, rax
ja short loopstart
endofloop: ;; offset=0x001d |
Let's close it then since we already have an issue for JCC mitigation - #93243 |
Description
Hello guys,
I've tested the function System.MemoryExtensions.SequenceEqual with BenchmarkDotNet v0.13.10 on my laptop wtith Intel Core i7-7700HQ CPU and got about 35% performance regression comparing to .NET 6.0 and .NET 7.0 with 4K data buffers.
I've run the test several times, the issue is reproducible.
Configuration
BenchmarkDotNet v0.13.10, Windows 10 (10.0.19045.3693/22H2/2022Update)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100
[Host] : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
.NET 6.0 : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
.NET 7.0 : .NET 7.0.14 (7.0.1423.51910), X64 RyuJIT AVX2
.NET 8.0 : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
.NET Framework 4.8 : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256
Data
I've attached the code and BenchmarkDotNet.Artifacts here
SequenceEqualBench.zip
Analysis
I've checked the asm listings.
<title></title> <style type="text/css"> body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small } a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em; } a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em; } comment { display:none; } </style>.NET 6.0, 7.0 and 8.0 use the different VMOV commands to load the data into ymm registers in the AVX loop..
.NET 6.0 - vmovupd
.NET 7.0 - vmovdqu
.NET 8.0 - vmovups
So, I guess if it can explain the issue
The text was updated successfully, but these errors were encountered: