Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

DictionarySlim backport improvements: remove null buckets and entries checks #23003

Closed

Conversation

TylerBrinkley
Copy link

@TylerBrinkley TylerBrinkley commented Mar 4, 2019

Initialize Dictionary's buckets and entries with non-null values to remove null checks.

I'm having trouble running the tests locally which I believe has something to do with installing the VS 2019 RC so I'm going to rely on CI here instead. I also don't have any benchmarks though the changes should only introduce improvements.

I'd suggest hiding white-space changes when reviewing.

Part of https://github.com/dotnet/corefx/issues/33392.
Alternative to #22599.

@TylerBrinkley
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests please

@TylerBrinkley
Copy link
Author

Build timed out after 6 hours. Not sure what the issue is.

@TylerBrinkley
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests please

@safern
Copy link
Member

safern commented Mar 12, 2019

@TylerBrinkley do you have any diff for the performance before and after this change?

cc: @danmosemsft

@danmoseley
Copy link
Member

I thought @MarcoRossignoli already tried this change and found it hurt perf in Dictionary - @MarcoRossignoli ?

As @safern suggests any change to Dictionary needs performance profiling with great care.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 12, 2019

In my attempt I used dummy entry that increase static footprint. This change is slightly different and maybe a bit better(no static footprint), only more ctor code for defaults(we pay always for ctor and we should gain perf only on possible TryAdd).
BTW I agree with @safern that we should try to measure if worth it because my idea(after perf measure) is that remove null check is unuseful because as @jkotas said that branch after first pass should be "well predicted".

it hurt perf

In my case increase static footprint with no observable perf gain

@TylerBrinkley
Copy link
Author

Here are my benchmark results.

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
Frequency=2835936 Hz, Resolution=352.6173 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-009812
  [Host]     : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
  Job-OMNCIS : .NET Core e1310cc1-e735-45a3-8dc3-5bf6f690008e (CoreCLR 4.6.27513.0, CoreFX 4.7.19.16301), 64bit RyuJIT
  Job-NCDQEB : .NET Core 94335f94-8974-48bb-a0e8-8be3fe21e22d (CoreCLR 4.6.27513.0, CoreFX 4.7.19.16301), 64bit RyuJIT

Runtime=Core  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  
Method Before Mean (us) Before Error (us) After Mean (us) After Error (us) Time Improvement
Add<int> 14.73 0.2206 12.27 0.078 1.20x
Add<string> 22.91 0.1968 24.26 0.2941 0.94x
ContainsKeyFalse<int> 6.122 0.0825 3.872 0.0771 1.58x
ContainsKeyFalse<string> 12.8 0.0733 10.29 0.0858 1.24x
ContainsKeyTrue<int> 4.201 0.0737 3.485 0.0394 1.21x
ContainsKeyTrue<string> 11.6 0.1233 12.15 0.0721 0.95x
CtorDefault<int> 7508 184.1 11966 390.9 0.63x
CtorDefault<string> 63170 616.5 70520 279 0.90x
IndexerSet<int> 5.548 0.0536 4.786 0.0482 1.16x
IndexerSet<string> 13.88 0.102 14.52 0.0915 0.96x
Remove<int> 9.592 1.292 9.493 1.118 1.01x
Remove<string> 16.06 0.3089 17.42 0.4715 0.92x
TryGetValueFalse<int> 7.964 0.0756 6.591 0.0688 1.21x
TryGetValueFalse<string> 12.73 0.1001 11.04 0.0848 1.15x
TryGetValueTrue<int> 4.838 0.0951 4.939 0.0597 0.98x
TryGetValueTrue<string> 15.53 0.1238 15.92 0.1565 0.98x

@MarcoRossignoli
Copy link
Member

Wow, I wonder why ContainsKeyFalse is so good 1.58x with only null check removed...as a matter of interest have tried to dump asm?

@TylerBrinkley
Copy link
Author

@MarcoRossignoli How does one go about that?

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 14, 2019

There is an official guide https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md
An my personal guide https://github.com/MarcoRossignoli/marcorossignoli.github.io/blob/jitdump/docs/corefx/viewing-jit-dumps.md

You need to deploy a custom app using your local coreclr bins and debug compiled jitter(that recognize the knobs and emits dump on console, usually I forward to txt file and after Tool.Diff with VS)

If you're in throuble DM me on twitter I can help you on soma chats(gitter etc...)

@TylerBrinkley
Copy link
Author

I ran the dump of FindEntry as suggested and got the following 2 method dumps.

; Assembly listing for method Dictionary`2:FindEntry(ref):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T10] (  8,  7   )     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T07] (  8,  8.50)     ref  ->  rsi         ld-addr-op class-hnd
;  V02 loc0         [V02,T00] (  9, 26   )     int  ->  registers
;  V03 loc1         [V03,T22] (  5,  3   )     ref  ->  rbx         class-hnd
;  V04 loc2         [V04,T04] (  7, 17.50)     ref  ->  rbp         class-hnd
;  V05 loc3         [V05,T01] (  7, 25   )     int  ->  r14
;  V06 loc4         [V06,T16] (  4,  4.50)     ref  ->  r15         class-hnd
;  V07 loc5         [V07,T13] (  3,  5   )     int  ->  r15
;* V08 loc6         [V08    ] (  0,  0   )     ref  ->  zero-ref    ld-addr-op class-hnd
;  V09 loc7         [V09,T17] (  3,  4.50)     ref  ->  r12         class-hnd
;  V10 loc8         [V10,T14] (  3,  5   )     int  ->  [rsp+0x4C]
;  V11 OutArgs      [V11    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V12 tmp1         [V12,T23] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
;* V13 tmp2         [V13    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
;  V14 tmp3         [V14,T15] (  5,  4.50)    long  ->  r11         "impRuntimeLookup typehandle"
;* V15 tmp4         [V15    ] (  0,  0   )    long  ->  zero-ref    "VirtualCall with runtime lookup"
;  V16 tmp5         [V16,T12] (  2,  6   )    long  ->  rcx         "impRuntimeLookup slot"
;  V17 tmp6         [V17,T11] (  2,  8   )     ref  ->  [rsp+0x38]   class-hnd "impAppendStmt"
;* V18 tmp7         [V18    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "bubbling QMark1"
;  V19 tmp8         [V19,T03] (  5, 18   )    long  ->  r11         "impRuntimeLookup typehandle"
;* V20 tmp9         [V20    ] (  0,  0   )    long  ->  zero-ref    "VirtualCall with runtime lookup"
;  V21 tmp10        [V21,T24] (  3,  2.50)    long  ->  rcx         "impRuntimeLookup slot"
;  V22 tmp11        [V22,T20] (  4,  3.50)    long  ->  rax         "impRuntimeLookup typehandle"
;* V23 tmp12        [V23    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup slot"
;* V24 tmp13        [V24    ] (  0,  0   )    long  ->  zero-ref    "impRuntimeLookup typehandle"
;* V25 tmp14        [V25    ] (  0,  0   )     ref  ->  zero-ref    "argument with side effect"
;  V26 cse0         [V26,T05] (  3, 12   )   byref  ->  [rsp+0x30]   "ValNumCSE"
;  V27 cse1         [V27,T06] (  3, 12   )   byref  ->  [rsp+0x28]   "ValNumCSE"
;  V28 cse2         [V28,T25] (  3,  1.50)     int  ->  rdx         "ValNumCSE"
;  V29 cse3         [V29,T26] (  3,  1.50)     int  ->  rdx         "ValNumCSE"
;  V30 cse4         [V30,T18] (  5,  4   )    long  ->  registers   "ValNumCSE"
;  V31 cse5         [V31,T08] (  3, 10   )    long  ->  rdx         "ValNumCSE"
;  V32 cse6         [V32,T09] (  3, 10   )    long  ->  rcx         "ValNumCSE"
;  V33 cse7         [V33,T02] (  6, 20.50)     int  ->  [rsp+0x48]   "ValNumCSE"
;  V34 cse8         [V34,T21] (  6,  3   )     int  ->  [rsp+0x44]   "ValNumCSE"
;  V35 cse9         [V35,T19] (  5,  4   )    long  ->  r12         "ValNumCSE"
;
; Lcl frame size = 88

G_M16827_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 88
       mov      qword ptr [rsp+50H], rcx
       mov      rdi, rcx
       mov      rsi, rdx

G_M16827_IG02:
       test     rsi, rsi
       je       G_M16827_IG19

G_M16827_IG03:
       mov      rbx, gword ptr [rdi+8]
       mov      rbp, gword ptr [rdi+16]
       xor      r14d, r14d
       mov      r15, gword ptr [rdi+24]
       test     r15, r15
       jne      G_M16827_IG08
       mov      rcx, rsi
       mov      rax, qword ptr [rsi]
       mov      rax, qword ptr [rax+64]
       call     qword ptr [rax+24]Object:GetHashCode():int:this
       mov      r15d, eax
       and      r15d, 0xD1FFAB1E
       mov      r12d, dword ptr [rbx+8]
       mov      eax, r15d
       cdq
       idiv     edx:eax, r12d
       cmp      edx, r12d
       jae      G_M16827_IG22
       movsxd   rcx, edx
       mov      r13d, dword ptr [rbx+4*rcx+16]
       dec      r13d
       mov      r12, qword ptr [rdi]
       mov      rcx, r12
       mov      rdx, qword ptr [rcx+48]
       mov      rbx, qword ptr [rdx]
       mov      rax, qword ptr [rbx+16]
       test     rax, rax
       jne      SHORT G_M16827_IG04
       mov      rdx, 0xD1FFAB1E
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS

G_M16827_IG04:
       mov      rcx, rax
       call     EqualityComparer`1:get_Default():ref
       mov      r12, rax
       mov      ebx, dword ptr [rbp+8]

G_M16827_IG05:
       cmp      ebx, r13d
       jbe      G_M16827_IG17
       movsxd   rdx, r13d
       lea      rdx, [rdx+2*rdx]
       lea      rax, bword ptr [rbp+8*rdx+16]
       mov      bword ptr [rsp+30H], rax
       cmp      dword ptr [rax+16], r15d
       jne      SHORT G_M16827_IG06
       mov      rdx, gword ptr [rbp+8*rdx+16]
       mov      rcx, r12
       mov      r8, rsi
       mov      r9, qword ptr [r12]
       mov      r9, qword ptr [r9+64]
       call     qword ptr [r9+48]EqualityComparer`1:Equals(ref,ref):bool:this
       test     eax, eax
       jne      G_M16827_IG17

G_M16827_IG06:
       mov      rax, bword ptr [rsp+30H]
       mov      r13d, dword ptr [rax+20]
       cmp      ebx, r14d
       jle      G_M16827_IG20

G_M16827_IG07:
       inc      r14d
       jmp      SHORT G_M16827_IG05

G_M16827_IG08:
       mov      r12, qword ptr [rdi]
       mov      rcx, r12
       mov      rdx, qword ptr [rcx+48]
       mov      rdx, qword ptr [rdx]
       mov      r13, rdx
       mov      r11, qword ptr [r13+56]
       test     r11, r11
       jne      SHORT G_M16827_IG09
       mov      rdx, 0xD1FFAB1E
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax

G_M16827_IG09:
       mov      rcx, r15
       mov      rdx, rsi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       mov      r8d, eax
       and      r8d, 0xD1FFAB1E
       mov      eax, dword ptr [rbx+8]
       mov      dword ptr [rsp+44H], eax
       mov      ecx, dword ptr [rsp+44H]
       mov      eax, r8d
       cdq
       idiv     edx:eax, ecx
       cmp      edx, ecx
       jae      G_M16827_IG22
       movsxd   rcx, edx
       mov      ecx, dword ptr [rbx+4*rcx+16]
       dec      ecx
       mov      ebx, ecx

G_M16827_IG10:
       mov      ecx, dword ptr [rbp+8]
       mov      eax, ecx
       mov      dword ptr [rsp+48H], eax
       cmp      eax, ebx
       jbe      G_M16827_IG16
       movsxd   rcx, ebx
       lea      rcx, [rcx+2*rcx]
       lea      r9, bword ptr [rbp+8*rcx+16]
       mov      bword ptr [rsp+28H], r9
       mov      dword ptr [rsp+4CH], r8d
       cmp      dword ptr [r9+16], r8d
       jne      SHORT G_M16827_IG12
       mov      r10, gword ptr [rbp+8*rcx+16]
       mov      gword ptr [rsp+38H], r10
       mov      rcx, r12
       mov      r11, qword ptr [r13+40]
       test     r11, r11
       jne      SHORT G_M16827_IG15
       mov      rdx, 0xD1FFAB1E
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
       mov      r10, gword ptr [rsp+38H]

G_M16827_IG11:
       mov      rcx, r15
       mov      rdx, r10
       mov      r8, rsi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       test     eax, eax
       jne      SHORT G_M16827_IG14

G_M16827_IG12:
       mov      r9, bword ptr [rsp+28H]
       mov      ebx, dword ptr [r9+20]
       mov      eax, dword ptr [rsp+48H]
       cmp      eax, r14d
       jle      SHORT G_M16827_IG21

G_M16827_IG13:
       inc      r14d
       mov      r8d, dword ptr [rsp+4CH]
       jmp      G_M16827_IG10

G_M16827_IG14:
       mov      r13d, ebx
       jmp      SHORT G_M16827_IG17

G_M16827_IG15:
       mov      r10, gword ptr [rsp+38H]
       jmp      SHORT G_M16827_IG11

G_M16827_IG16:
       mov      r13d, ebx
       jmp      SHORT G_M16827_IG17

G_M16827_IG17:
       mov      eax, r13d

G_M16827_IG18:
       add      rsp, 88
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret

G_M16827_IG19:
       mov      ecx, 4
       call     ThrowHelper:ThrowArgumentNullException(int)
       int3

G_M16827_IG20:
       call     ThrowHelper:ThrowInvalidOperationException_ConcurrentOperationsNotSupported()
       int3

G_M16827_IG21:
       call     ThrowHelper:ThrowInvalidOperationException_ConcurrentOperationsNotSupported()
       int3

G_M16827_IG22:
       call     CORINFO_HELP_RNGCHKFAIL
       int3

; Total bytes of code 556, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this
; ============================================================

and

; Assembly listing for method Dictionary`2:FindEntry(int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T08] (  5,  5   )     ref  ->  rcx         this class-hnd
;  V01 arg1         [V01,T06] (  6,  7   )     int  ->  rsi         ld-addr-op
;  V02 loc0         [V02,T00] (  9, 26   )     int  ->  r12
;  V03 loc1         [V03,T15] (  5,  3   )     ref  ->  rdi         class-hnd
;  V04 loc2         [V04,T02] (  9, 22   )     ref  ->  rbx         class-hnd
;  V05 loc3         [V05,T01] (  7, 25   )     int  ->  rbp
;  V06 loc4         [V06,T11] (  4,  4.50)     ref  ->  r14         class-hnd
;  V07 loc5         [V07,T09] (  3,  5   )     int  ->  rcx
;* V08 loc6         [V08    ] (  0,  0   )     int  ->  zero-ref    ld-addr-op
;* V09 loc7         [V09    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V10 loc8         [V10,T10] (  3,  5   )     int  ->  r13
;  V11 OutArgs      [V11    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V12 tmp1         [V12    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
;* V13 tmp2         [V13    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
;  V14 tmp3         [V14,T12] (  2,  4   )    bool  ->  r11         "Inline return value spill temp"
;  V15 tmp4         [V15,T07] (  2,  8   )     int  ->  r11         ld-addr-op "Inlining Arg"
;* V16 tmp5         [V16    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
;* V17 tmp6         [V17,T13] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
;* V18 tmp7         [V18    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact "Single-def Box Helper"
;  V19 cse0         [V19,T16] (  3,  1.50)     int  ->  rdx         "ValNumCSE"
;  V20 cse1         [V20,T17] (  3,  1.50)     int  ->  rdx         "ValNumCSE"
;  V21 cse2         [V21,T04] (  4, 14   )    long  ->  rdx         "ValNumCSE"
;  V22 cse3         [V22,T05] (  4, 14   )    long  ->  r15         "ValNumCSE"
;  V23 cse4         [V23,T03] (  6, 17   )     int  ->  registers   "ValNumCSE"
;  V24 cse5         [V24,T14] (  6,  3   )     int  ->  r15         "ValNumCSE"
;
; Lcl frame size = 40

G_M32795_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 40
       mov      esi, edx

G_M32795_IG02:
       mov      rdi, gword ptr [rcx+8]
       mov      rbx, gword ptr [rcx+16]
       xor      ebp, ebp
       mov      r14, gword ptr [rcx+24]
       test     r14, r14
       jne      SHORT G_M32795_IG06
       mov      ecx, esi
       and      ecx, 0xD1FFAB1E
       mov      r15d, dword ptr [rdi+8]
       mov      eax, ecx
       cdq
       idiv     edx:eax, r15d
       cmp      edx, r15d
       jae      G_M32795_IG14
       movsxd   rdx, edx
       mov      r12d, dword ptr [rdi+4*rdx+16]
       dec      r12d
       mov      r13d, dword ptr [rbx+8]

G_M32795_IG03:
       cmp      r13d, r12d
       jbe      G_M32795_IG10
       movsxd   rdx, r12d
       shl      rdx, 4
       cmp      dword ptr [rbx+rdx+16], ecx
       jne      SHORT G_M32795_IG04
       mov      r11d, dword ptr [rbx+rdx+24]
       cmp      r11d, esi
       sete     r11b
       movzx    r11, r11b
       test     r11d, r11d
       jne      G_M32795_IG10

G_M32795_IG04:
       mov      r12d, dword ptr [rbx+rdx+20]
       cmp      r13d, ebp
       jle      G_M32795_IG12

G_M32795_IG05:
       inc      ebp
       jmp      SHORT G_M32795_IG03

G_M32795_IG06:
       mov      rcx, r14
       mov      edx, esi
       mov      r11, 0xD1FFAB1E
       cmp      dword ptr [rcx], ecx
       call     [IEqualityComparer`1:GetHashCode(int):int:this]
       mov      r13d, eax
       and      r13d, 0xD1FFAB1E
       mov      r15d, dword ptr [rdi+8]
       mov      eax, r13d
       cdq
       idiv     edx:eax, r15d
       cmp      edx, r15d
       jae      SHORT G_M32795_IG14
       movsxd   rdx, edx
       mov      r12d, dword ptr [rdi+4*rdx+16]
       dec      r12d
       mov      edx, dword ptr [rbx+8]
       mov      edi, edx

G_M32795_IG07:
       cmp      edi, r12d
       jbe      SHORT G_M32795_IG10
       movsxd   r15, r12d
       shl      r15, 4
       cmp      dword ptr [rbx+r15+16], r13d
       jne      SHORT G_M32795_IG08
       mov      edx, dword ptr [rbx+r15+24]
       mov      rcx, r14
       mov      r8d, esi
       mov      r11, 0xD1FFAB1E
       cmp      dword ptr [rcx], ecx
       call     [IEqualityComparer`1:Equals(int,int):bool:this]
       test     eax, eax
       jne      SHORT G_M32795_IG10

G_M32795_IG08:
       mov      r12d, dword ptr [rbx+r15+20]
       cmp      edi, ebp
       jle      SHORT G_M32795_IG13

G_M32795_IG09:
       inc      ebp
       jmp      SHORT G_M32795_IG07

G_M32795_IG10:
       mov      eax, r12d

G_M32795_IG11:
       add      rsp, 40
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret

G_M32795_IG12:
       call     ThrowHelper:ThrowInvalidOperationException_ConcurrentOperationsNotSupported()
       int3

G_M32795_IG13:
       call     ThrowHelper:ThrowInvalidOperationException_ConcurrentOperationsNotSupported()
       int3

G_M32795_IG14:
       call     CORINFO_HELP_RNGCHKFAIL
       int3

; Total bytes of code 312, prolog size 18 for method Dictionary`2:FindEntry(int):int:this
; ============================================================

@MarcoRossignoli
Copy link
Member

Before/After?

@TylerBrinkley
Copy link
Author

TylerBrinkley commented Mar 15, 2019

After, why is it generating 2 versions of the method?

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 15, 2019

Dictionary is generics so we have one copy shared for ref types and one for every value type https://alexandrnikitin.github.io/blog/dotnet-generics-under-the-hood/ can you confirm your test with string/int?

Dictionary2:FindEntry(ref):int:this -> string
Dictionary2:FindEntry(int):int:this -> int

Value types do not share anything and each value type has its own Method Table and EEClass and its own JITted code. In other words, for each value type used as a generic type parameter, CLR will produce a different piece of code. This could lead to what is known as “code bloat” or code explosion and increase the memory footprint of the program. But that’s inevitable because the compiler has to know the size of the value type and the layout of its fields during the compilation process.

@MarcoRossignoli
Copy link
Member

We could try to compare ContainsKeyFalse<int>

@TylerBrinkley
Copy link
Author

For those wondering, this is how the benchmarks were performed.

Version CoreCLR Directory CoreClr Branch CoreCLR Build Command CoreFX Directory CoreFX Branch CoreFX Build Command
Before c:\dev\coreclr2 TylerBrinkley/coreclr master .\build.cmd -release -skiptests c:\dev\corefx2 TylerBrinkley/corefx master .\build.cmd -c Release /p:CoreCLROverridePath=c:\dev\coreclr2\bin\Product\Windows_NT.x64.Release\
After c:\dev\coreclr TylerBrinkley/coreclr dictionary-improvements .\build.cmd -release -skiptests c:\dev\corefx TylerBrinkley/corefx master .\build.cmd -c Release /p:CoreCLROverridePath=c:\dev\coreclr\bin\Product\Windows_NT.x64.Release\

Benchmarks command

dotnet run -c Release -f netcoreapp3.0 -- --corerun "C:\Dev\corefx2\artifacts\bin\testhost\netcoreapp-Windows_NT -Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe" "C:\Dev\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCor e.App\9.9.9\CoreRun.exe" --filter *.Dictionary

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 15, 2019

@jkotas I did some "pair performance programming" with @TylerBrinkley on gitter(great!) about his result on performance that to me seems huge for a null check.
The test is https://github.com/dotnet/performance/blob/2296c92879ed0744cec9b983ac28ac208e7018bd/src/benchmarks/micro/corefx/System.Collections/Contains/ContainsKeyFalse.cs#L56 (int)
We investigated the asm emitted by jitter and the difference is as expected(ContainsKey call FindEntry)
before

...
; Tier-1 compilation
...
G_M32795_IG02:
       mov      edi, -1 <--REMOVED IN AFTER
       mov      rbx, gword ptr [rcx+8]
       mov      rbp, gword ptr [rcx+16]
       xor      r14d, r14d
       test     rbx, rbx   <--REMOVED IN AFTER
       je       G_M32795_IG10 <-- REMOVED IN AFTER
       mov      r15, gword ptr [rcx+24]
       test     r15, r15
       jne      SHORT G_M32795_IG06
       mov      ecx, esi
       and      ecx, 0xD1FFAB1E
       mov      r12d, dword ptr [rbx+8]
       mov      eax, ecx
       cdq
       idiv     edx:eax, r12d
       cmp      edx, r12d
       jae      G_M32795_IG14
       movsxd   rdx, edx
       mov      edi, dword ptr [rbx+4*rdx+16]
       dec      edi
       mov      r13d, dword ptr [rbp+8]
...
; Total bytes of code 323, prolog size 18 for method Dictionary`2:FindEntry(int):int:this
; ============================================================

after

...
; Tier-1 compilation
...
G_M32795_IG02:
       mov      rdi, gword ptr [rcx+8]
       mov      rbx, gword ptr [rcx+16]
       xor      ebp, ebp
       mov      r14, gword ptr [rcx+24]
       test     r14, r14
       jne      SHORT G_M32795_IG06
       mov      ecx, esi
       and      ecx, 0xD1FFAB1E
       mov      r15d, dword ptr [rdi+8]
       mov      eax, ecx
       cdq
       idiv     edx:eax, r15d
       cmp      edx, r15d
       jae      G_M32795_IG14
       movsxd   rdx, edx
       mov      r12d, dword ptr [rdi+4*rdx+16]
       dec      r12d
       mov      r13d, dword ptr [rbx+8]
...
; Total bytes of code 312, prolog size 18 for method Dictionary`2:FindEntry(int):int:this
; ============================================================

Is possible to you a boost of ~1.5x ?

@jkotas
Copy link
Member

jkotas commented Mar 15, 2019

This looks like another one of those rare cases where the improvement gets amplified due to processor micro-architecture that I have mentioned in #22832 (comment) . You would have to run under Intel VTune to figure out what is going on.

@MarcoRossignoli
Copy link
Member

I'll play with It!Too curious!

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 18, 2019

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727536 Hz, Resolution=366.6313 ns, Timer=TSC
.NET Core SDK=3.0.100-preview3-010431
  [Host]     : .NET Core 3.0.0-preview3-27503-5 (CoreCLR 4.6.27422.72, CoreFX 4.7.19.12807), 64bit RyuJIT
  Job-ZWTXZW : .NET Core 35da75ce-57d9-4fd2-8136-0b709d016534 (CoreCLR 4.6.27615.0, CoreFX 4.7.19.16801), 64bit RyuJIT
  Job-XQDVUB : .NET Core 6c76f4a5-4bfa-47fc-83b0-d4e97369a2ef (CoreCLR 4.6.27616.0, CoreFX 4.7.19.16801), 64bit RyuJIT
  Job-VUANFM : .NET Core 35da75ce-57d9-4fd2-8136-0b709d016534 (CoreCLR 4.6.27615.0, CoreFX 4.7.19.16801), 64bit RyuJIT
  Job-HOPUNA : .NET Core 6c76f4a5-4bfa-47fc-83b0-d4e97369a2ef (CoreCLR 4.6.27616.0, CoreFX 4.7.19.16801), 64bit RyuJIT

Runtime=Core  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1  
Namespace Type Method Toolchain InvocationCount UnrollFactor Size Count Mean Error StdDev Median Min Max Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
System.Collections CtorDefaultSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 21.815 ns 0.2402 ns 0.2247 ns 21.869 ns 21.334 ns 22.171 ns 1.42 0.02 0.0172 - - 72 B
System.Collections CtorDefaultSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 15.349 ns 0.1782 ns 0.1667 ns 15.393 ns 15.002 ns 15.542 ns 1.00 0.00 0.0172 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 128.457 ns 4.0996 ns 4.5567 ns 126.316 ns 123.780 ns 138.512 ns 1.17 0.05 0.0172 - - 72 B
System.Collections CtorDefaultSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? ? 110.161 ns 1.1410 ns 1.0673 ns 110.411 ns 108.260 ns 111.344 ns 1.00 0.00 0.0171 - - 72 B
System.Collections TryAddDefaultSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 32,924.131 ns 609.3213 ns 569.9596 ns 32,895.109 ns 31,745.027 ns 33,960.268 ns 1.00 0.02 6.3425 3.1712 - 34488 B
System.Collections TryAddDefaultSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 33,006.353 ns 242.9810 ns 227.2845 ns 33,012.316 ns 32,703.115 ns 33,411.249 ns 1.00 0.00 6.3025 3.1513 - 34488 B
System.Collections TryAddDefaultSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 52,749.620 ns 1,848.3470 ns 2,128.5591 ns 51,956.761 ns 50,151.302 ns 57,477.254 ns 1.04 0.05 7.6861 3.8430 - 48088 B
System.Collections TryAddDefaultSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 50,237.372 ns 852.7540 ns 797.6666 ns 50,051.914 ns 49,107.854 ns 52,366.569 ns 1.00 0.00 7.6367 3.8183 - 48088 B
System.Collections TryAddGiventSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 13,684.181 ns 139.2281 ns 130.2341 ns 13,683.944 ns 13,284.541 ns 13,843.677 ns 0.99 0.01 2.5171 0.0536 - 10544 B
System.Collections TryAddGiventSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 13,759.875 ns 81.5787 ns 76.3088 ns 13,758.598 ns 13,651.617 ns 13,893.956 ns 1.00 0.00 2.4649 0.0548 - 10544 B
System.Collections TryAddGiventSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 31,824.597 ns 360.5556 ns 337.2639 ns 31,860.260 ns 31,169.846 ns 32,538.425 ns 0.94 0.01 3.3028 1.0163 - 14712 B
System.Collections TryAddGiventSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 33,845.118 ns 279.7196 ns 247.9641 ns 33,809.036 ns 33,354.795 ns 34,223.386 ns 1.00 0.00 3.3458 1.0707 - 14712 B
System.Collections AddDefaultSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 32,615.291 ns 533.5167 ns 499.0519 ns 32,639.838 ns 31,760.124 ns 33,428.997 ns 0.99 0.02 6.2762 3.1381 - 34488 B
System.Collections AddDefaultSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 33,056.564 ns 493.8030 ns 461.9037 ns 33,062.298 ns 32,151.169 ns 33,723.558 ns 1.00 0.00 6.2367 3.0520 - 34488 B
System.Collections AddDefaultSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 50,524.098 ns 759.5907 ns 710.5217 ns 50,595.916 ns 49,324.605 ns 51,756.622 ns 0.99 0.02 7.6122 3.8061 - 48088 B
System.Collections AddDefaultSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 ? 512 51,290.153 ns 654.9175 ns 612.6102 ns 51,435.342 ns 50,270.497 ns 52,192.810 ns 1.00 0.00 7.7025 3.7412 - 48088 B
System.Collections Remove<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 17,020.985 ns 756.3876 ns 742.8738 ns 17,216.308 ns 15,154.924 ns 17,866.529 ns 0.98 0.09 - - - -
System.Collections Remove<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 17,337.466 ns 1,328.7589 ns 1,476.9120 ns 16,982.471 ns 15,437.120 ns 20,483.799 ns 1.00 0.00 - - - -
System.Collections Remove<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 34,066.156 ns 2,740.9782 ns 3,046.5900 ns 33,137.729 ns 30,881.481 ns 40,262.842 ns 1.07 0.09 - - - -
System.Collections Remove<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 32,162.724 ns 758.0103 ns 744.4675 ns 32,344.138 ns 30,495.766 ns 33,072.817 ns 1.00 0.00 - - - -
System.Collections Clear<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 1,834.271 ns 261.9801 ns 280.3158 ns 1,932.385 ns 1,467.680 ns 2,387.191 ns 1.00 0.24 - - - -
System.Collections Clear<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 1,844.870 ns 300.8720 ns 346.4847 ns 1,738.914 ns 1,453.125 ns 2,746.966 ns 1.00 0.00 - - - -
System.Collections Clear<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 2,267.010 ns 59.2739 ns 65.8828 ns 2,252.326 ns 2,129.138 ns 2,377.714 ns 1.00 0.03 - - - -
System.Collections Clear<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1000 1 512 ? 2,262.779 ns 45.2746 ns 48.4433 ns 2,264.571 ns 2,189.045 ns 2,358.062 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 12,369.335 ns 82.7140 ns 77.3707 ns 12,393.089 ns 12,211.844 ns 12,488.977 ns 0.96 0.03 - - - -
System.Collections ContainsKeyFalse<Int32, Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 12,763.571 ns 325.8649 ns 348.6718 ns 12,689.099 ns 12,393.787 ns 13,563.181 ns 1.00 0.00 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 27,591.543 ns 239.0810 ns 223.6365 ns 27,548.807 ns 27,197.371 ns 27,926.493 ns 0.99 0.01 - - - -
System.Collections ContainsKeyFalse<String, String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 27,871.967 ns 153.6038 ns 128.2662 ns 27,895.713 ns 27,635.355 ns 28,152.549 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 10,581.839 ns 121.2560 ns 113.4229 ns 10,599.723 ns 10,297.464 ns 10,689.356 ns 0.98 0.02 - - - -
System.Collections ContainsKeyTrue<Int32, Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 10,763.141 ns 185.2447 ns 164.2146 ns 10,712.409 ns 10,547.245 ns 11,088.108 ns 1.00 0.00 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 27,944.541 ns 404.7164 ns 378.5720 ns 27,867.408 ns 27,398.930 ns 28,414.455 ns 0.99 0.02 - - - -
System.Collections ContainsKeyTrue<String, String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 28,100.774 ns 311.5359 ns 291.4108 ns 28,088.232 ns 27,567.453 ns 28,478.724 ns 1.00 0.00 - - - -
System.Collections TryGetValueFalse<Int32, Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 13,362.002 ns 320.3854 ns 284.0133 ns 13,330.927 ns 13,022.782 ns 14,033.842 ns 1.00 0.02 - - - -
System.Collections TryGetValueFalse<Int32, Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 13,329.907 ns 121.5754 ns 107.7734 ns 13,347.254 ns 13,050.198 ns 13,494.177 ns 1.00 0.00 - - - -
System.Collections TryGetValueFalse<String, String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 28,466.702 ns 318.2735 ns 297.7132 ns 28,536.652 ns 27,891.744 ns 28,834.164 ns 0.99 0.01 - - - -
System.Collections TryGetValueFalse<String, String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 28,655.151 ns 264.0356 ns 246.9791 ns 28,755.246 ns 28,041.748 ns 28,949.169 ns 1.00 0.00 - - - -
System.Collections TryGetValueTrue<Int32, Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 11,982.685 ns 474.9386 ns 508.1790 ns 11,805.154 ns 11,479.659 ns 13,395.740 ns 1.03 0.05 - - - -
System.Collections TryGetValueTrue<Int32, Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 11,690.238 ns 70.7106 ns 62.6831 ns 11,686.835 ns 11,614.059 ns 11,838.795 ns 1.00 0.00 - - - -
System.Collections TryGetValueTrue<String, String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 30,202.409 ns 216.0964 ns 202.1367 ns 30,228.992 ns 29,768.358 ns 30,458.048 ns 1.01 0.01 - - - -
System.Collections TryGetValueTrue<String, String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 29,853.314 ns 324.6100 ns 287.7583 ns 29,870.874 ns 29,092.414 ns 30,404.332 ns 1.00 0.00 - - - -
System.Collections AddGivenSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 13,605.407 ns 163.3799 ns 152.8257 ns 13,603.134 ns 13,362.133 ns 13,843.756 ns 0.96 0.03 2.5087 0.0545 - 10544 B
System.Collections AddGivenSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 14,066.399 ns 329.6820 ns 366.4407 ns 14,058.976 ns 13,509.975 ns 14,883.917 ns 1.00 0.00 2.4656 0.0573 - 10544 B
System.Collections AddGivenSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 32,203.190 ns 455.7726 ns 426.3300 ns 32,173.893 ns 31,478.446 ns 33,040.753 ns 0.99 0.01 3.3231 1.0225 - 14712 B
System.Collections AddGivenSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 32,511.824 ns 333.7594 ns 312.1988 ns 32,581.977 ns 31,900.689 ns 32,839.372 ns 1.00 0.00 3.3784 1.0395 - 14712 B
System.Collections CtorFromCollection<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 14,127.818 ns 134.0445 ns 125.3853 ns 14,097.558 ns 13,917.860 ns 14,390.705 ns 0.98 0.01 2.4955 0.0567 - 10544 B
System.Collections CtorFromCollection<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 14,374.089 ns 195.0084 ns 182.4110 ns 14,402.162 ns 13,887.485 ns 14,597.157 ns 1.00 0.00 2.4770 0.0576 - 10544 B
System.Collections CtorFromCollection<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 34,231.104 ns 990.4472 ns 1,100.8795 ns 33,914.245 ns 32,530.264 ns 36,739.675 ns 1.02 0.03 3.3708 1.1236 - 14712 B
System.Collections CtorFromCollection<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 33,764.522 ns 257.5896 ns 240.9494 ns 33,687.732 ns 33,411.179 ns 34,186.468 ns 1.00 0.00 3.3675 1.0776 - 14712 B
System.Collections CtorGivenSize<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 1,083.812 ns 7.9334 ns 7.4209 ns 1,086.113 ns 1,069.331 ns 1,095.499 ns 0.99 0.01 2.5179 0.0043 - 10544 B
System.Collections CtorGivenSize<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 1,091.339 ns 6.8781 ns 6.4337 ns 1,090.991 ns 1,083.150 ns 1,103.838 ns 1.00 0.00 2.5166 0.0044 - 10544 B
System.Collections CtorGivenSize<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 2,874.944 ns 224.4094 ns 249.4305 ns 2,827.830 ns 2,111.252 ns 3,347.562 ns 1.04 0.07 3.5067 - - 14712 B
System.Collections CtorGivenSize<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 2,771.098 ns 169.7099 ns 181.5877 ns 2,809.528 ns 2,058.405 ns 2,877.783 ns 1.00 0.00 3.4643 0.0110 - 14712 B
System.Collections IndexerSet<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 12,732.056 ns 148.7102 ns 139.1037 ns 12,740.382 ns 12,528.177 ns 12,938.366 ns 1.00 0.01 - - - -
System.Collections IndexerSet<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 12,786.379 ns 123.2714 ns 115.3082 ns 12,820.163 ns 12,598.507 ns 12,917.563 ns 1.00 0.00 - - - -
System.Collections IndexerSet<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 31,854.845 ns 241.1103 ns 225.5348 ns 31,866.961 ns 31,379.290 ns 32,231.098 ns 0.90 0.02 - - - -
System.Collections IndexerSet<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 35,220.089 ns 981.9315 ns 1,091.4143 ns 35,321.883 ns 33,481.532 ns 37,867.556 ns 1.00 0.00 - - - -
System.Collections IterateForEach<Int32> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 4,670.192 ns 40.6562 ns 38.0299 ns 4,669.529 ns 4,623.388 ns 4,758.523 ns 0.99 0.01 - - - -
System.Collections IterateForEach<Int32> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 4,697.273 ns 51.2225 ns 47.9136 ns 4,690.445 ns 4,619.711 ns 4,794.552 ns 1.00 0.00 - - - -
System.Collections IterateForEach<String> Dictionary C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 4,622.255 ns 58.6887 ns 54.8974 ns 4,629.527 ns 4,491.054 ns 4,696.100 ns 1.00 0.02 - - - -
System.Collections IterateForEach<String> Dictionary c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe 1 16 512 ? 4,644.691 ns 62.4886 ns 58.4519 ns 4,654.142 ns 4,513.939 ns 4,760.660 ns 1.00 0.00 - - - -

This is my results

C:\git\performance\src\benchmarks\micro (master -> origin)
λ dotnet.exe run -c release -f netcoreapp3.0 -- --filter System.Collections*.Dictionary System.Collections.Tests.Perf_Dictionary --coreRun c:\git\corefxupstream\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe C:\git\corefx\artifacts\bin\testhost\netcoreapp-Windows_NT-Release-x64\shared\Microsoft.NETCore.App\9.9.9\CoreRun.exe --join

I see a regression on default ctor, I tested more times, maybe difference in microarchitecture.
I played with VTune in the week for ContainsKeyFalse difference but I didn't get that huge gain with current test. I get great difference in case of empy dictionary(maybe unusual do calls on empty dic and so negligible), because null check is a shortcut in that case, without it there are more asm to run(enumerate buckets, idiv etc...)

@TylerBrinkley
Copy link
Author

/azp run

@TylerBrinkley
Copy link
Author

I will close this now and re-open in dotnet/platform when that's ready.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants