-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressions in System.Memory.ReadOnlySpan #61679
Comments
Most likely from 7308c4f but revisit next week. |
Tagging subscribers to this area: @GrabYourPitchforks, @dotnet/area-system-memory Issue DetailsRun Information
Regressions in System.Memory.ReadOnlySpan
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.ReadOnlySpan*' PayloadsHistogramSystem.Memory.ReadOnlySpan.IndexOfString(input: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", value: "x", comparisonType: OrdinalIgnoreCase)
DocsProfiling workflow for dotnet/runtime repository
|
Suspect: #61023 |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRun Information
Regressions in System.Memory.ReadOnlySpan
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.ReadOnlySpan*' PayloadsHistogramSystem.Memory.ReadOnlySpan.IndexOfString(input: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", value: "x", comparisonType: OrdinalIgnoreCase)
DocsProfiling workflow for dotnet/runtime repository
|
Can repro locally: BenchmarkDotNet=v0.13.1.1620-nightly, OS=Windows 10.0.22000 PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog IterationTime=250.0000 ms
|
Can't repro the above numbers with locally built 6.0 and 7.0, however -- I get around 207ns for both. |
The local 6.0 and 7.0 codegen is identical to the respective SDK versions, so not clear why the perf differs. The 6.0 and 7.0 codegen for ;; 6.0
cmp r13d,eax
sete al
movzx eax,al
test eax,eax
je short M02_L15
;; 7.0
cmp r13d,eax
jne short M02_L15 and there are 6 places where we no longer zero extend index values (possibly from #57970): ;; 6.0
cmp r13d,[rcx+8]
jae near ptr M02_L19
movsxd rax,r13d
movzx r13d,word ptr [rcx+rax*2+10]
;; 7.0
cmp r13d,[rcx+8]
jae near ptr M02_L19
mov eax,r13d
movzx r13d,word ptr [rcx+rax*2+10] These both happen in blocks within doubly nested loops. Source code here is: runtime/src/libraries/System.Private.CoreLib/src/System/Globalization/OrdinalCasing.Icu.cs Lines 292 to 354 in 1d065b6
In particular the test simplification happens at the end of the inner while loop cmp r13d,eax // simplified test
jne short M02_L15
add r12,4 // pSrc++; pDst++;
add r15,4
jmp short M02_L14
M02_L13:
cmp r13d,eax
jne short M02_L15
add r12,2
add r15,2
M02_L14:
cmp r15,rbp
jbe near ptr M02_L01 // branch to top of inner while loop and one possible explanation for the perf impact here is that by simplifying this compare we've altered the offset of the I don't have offsets from BDN disassembly but can get them from my local builds. So will double-check if the above is plausible. |
Hmm, that doesn't seem to be holding up. There are some 32 byte jcc branches higher up in 7.0 (from the ;; 6.0 tail
G_M62321_IG18: ;; offset=02B2H
448B6C242C mov r13d, dword ptr [rsp+2CH]
443BE8 cmp r13d, eax
0F94C0 sete al
0FB6C0 movzx rax, al
; ............................... 32B boundary ...............................
85C0 test eax, eax
7420 je SHORT G_M62321_IG21
4983C404 add r12, 4
4983C704 add r15, 4
EB0D jmp SHORT G_M62321_IG20
;; bbWeight=8 PerfScore 50.00
G_M62321_IG19: ;; offset=02CEH
443BE8 cmp r13d, eax
7511 jne SHORT G_M62321_IG21
4983C402 add r12, 2
4983C702 add r15, 2
;; bbWeight=8 PerfScore 14.00
G_M62321_IG20: ;; offset=02DBH
4C3BFD cmp r15, rbp
0F867AFDFFFF jbe G_M62321_IG04
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (jbe: 4 ; jcc erratum) 32B boundary ...............................
;; bbWeight=16 PerfScore 20.00
G_M62321_IG21: ;; offset=02E4H
4C3BFD cmp r15, rbp
7723 ja SHORT G_M62321_IG24
4983C602 add r14, 2
4C3BF3 cmp r14, rbx
0F8659FDFFFF jbe G_M62321_IG03
;; 7.0 tail
G_M62321_IG18: ;; offset=02AEH
448B6C242C mov r13d, dword ptr [rsp+2CH]
443BE8 cmp r13d, eax
7520 jne SHORT G_M62321_IG21
4983C404 add r12, 4
4983C704 add r15, 4
; ............................... 32B boundary ...............................
EB0D jmp SHORT G_M62321_IG20
;; bbWeight=8 PerfScore 38.00
G_M62321_IG19: ;; offset=02C2H
443BE8 cmp r13d, eax
7511 jne SHORT G_M62321_IG21
4983C402 add r12, 2
4983C702 add r15, 2
;; bbWeight=8 PerfScore 14.00
G_M62321_IG20: ;; offset=02CFH
4C3BFD cmp r15, rbp
0F8686FDFFFF jbe G_M62321_IG04
;; bbWeight=16 PerfScore 20.00
G_M62321_IG21: ;; offset=02D8H
4C3BFD cmp r15, rbp
7723 ja SHORT G_M62321_IG24
4983C602 add r14, 2
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (add: 1) 32B boundary ...............................
4C3BF3 cmp r14, rbx
0F8665FDFFFF jbe G_M62321_IG03
;; bbWeight=4 PerfScore 11.00 |
Run Information
Regressions in System.Memory.ReadOnlySpan
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Memory.ReadOnlySpan.IndexOfString(input: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAXAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", value: "x", comparisonType: OrdinalIgnoreCase)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: