-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Improve deserialization perf with changes to property name lookup #40998
Conversation
abfd511
to
a3c8a26
Compare
4a22ac4
to
f65b726
Compare
Wow, those are significant gains particularly for missing properties and when folks use case-insensitive comparison (which is on by default within aspnet). The default setting for deserializer is case-sensitive, correct (i.e.
How is it feasible for us to do this when we previously reserved 2 bytes for length?
Did this actually show perf wins, on it's own? I would have thought that would be a wash.
Can you pinpoint/highlight, in the code, where these issue were to help with the code review? |
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Show resolved
Hide resolved
// Start with the current property index, and then go forwards\backwards. | ||
int propertyIndex = frame.PropertyIndex; | ||
|
||
int count = localPropertyRefsSorted.Length; | ||
int iForward = Math.Min(propertyIndex, count); | ||
int iBackward = iForward - 1; | ||
|
||
while (iForward < count || iBackward >= 0) | ||
for (;;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some evidence to suggest that this forward/backwards search will be beneficial in practice, for perf?
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Show resolved
Hide resolved
Yes the default is case-sensitive and ASP changes it to case-insensitive. Using the same test class turning on case-insensitivity is 30% slower due to this issue: Case-insensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is camel-cased)
Case-sensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is Pascal-cased)
Also supporting case insensitive is a big deal w.r.t. serializer design. Some high-performance serializers don't support it because of the way they perform property-name lookup e.g. by generating IL representing a custom B-tree based on keys determined statically and thus can't represent all casing variants.
Embedding the length in the key only prevents us from the perf hit of comparing lengths for the property names that fit within the key (previously <=6 now its <=7 characters). So we really only need one byte and only need a length of 8 (for all properties of length >= 8) to tell the compare code it needs to compare contents, not just the key.
Not really. I didn't game the test data for property names of 7 -- but those are now faster since we don't need to compare the contents. It also helps more with non-ascii since now you have an extra byte in the key to prevent false hits.
I added a comment to the location. |
If I am reading the data correctly, I see case-sensitive being faster than case-insensitive, not slower. Which is it?
|
Looks like the first one had a large error (so the benchmarks aren't very stable). Json.NET numbers shouldn't change in the before/after, correct? |
|
||
namespace System.Text.Json | ||
{ | ||
public static partial class JsonSerializer | ||
{ | ||
// AggressiveInlining used although a large method it is only called from one locations and is on a hot path. | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub, @jkotas - Given our previous discussions on the use of this attribute, what are your thoughts on the use of AggressiveInlining
in cases like these, where there are very few (1-2) callers of somewhat large methods, and benchmarks show improvements for adding them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases like this, it may be ok if you know what you are doing. There are a lot of cases where it can fire back - regressing the rest of the calling method, hitting JIT complexity thresholds and disabling optimizations, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this method and similar ones is for readability \ understandability. Otherwise the method would not exist and code would just be inline, so I suppose that is another option -- to just push the implementation down to the caller.
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than pending/unresolved feedback - looks good.
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
@@ -359,3 +485,4 @@ public class EmptyClassWithExtensionProperty | |||
public IDictionary<string, JsonElement> MyOverflow { get; set; } | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line.
src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs
Outdated
Show resolved
Hide resolved
{ | ||
key |= (ulong) propertyName[4] << (4 * BitsInByte) | ||
| (ulong) propertyName[5] << (5 * BitsInByte) | ||
| (ulong) propertyName[6] << (6 * BitsInByte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write the highest index at first, so the JIT will emit only one bound check, instead of one for every indexed access to the span.
So
key |= (ulong)propertyName[6] << (6 * BitsInByte)
| (ulong)propertyName[5] << (5 * BitInByte)
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there is something strange happening, as this optimization won't kick in here.
It seems that MemoryMarshal.Read<uint>(propertyName)
causes the JIT not to elide the subsequent bound checks after the access to index 6.
(at least when I remove the call to MemoryMarshal the optimization kicks in as expected).
asm
G_M53623_IG04:
83F803 cmp eax, 3
0F8EFD000000 jle G_M53623_IG10
488D7DF0 lea rdi, bword ptr [rbp-10H]
488B37 mov rsi, bword ptr [rdi]
8B7F08 mov edi, dword ptr [rdi+8]
83FF04 cmp edi, 4
0F8C74010000 jl G_M53623_IG18
G_M53623_IG05:
8B3E mov edi, dword ptr [rsi]
83F807 cmp eax, 7
755D jne SHORT G_M53623_IG06
837DF806 cmp dword ptr [rbp-08H], 6 ; bound check
0F8679010000 jbe G_M53623_IG20
488B45F0 mov rax, bword ptr [rbp-10H]
0FB64006 movzx rax, byte ptr [rax+6]
48C1E030 shl rax, 48
480BF8 or rdi, rax
837DF805 cmp dword ptr [rbp-08H], 5 ; bound check
0F8660010000 jbe G_M53623_IG20
488B45F0 mov rax, bword ptr [rbp-10H]
0FB64005 movzx rax, byte ptr [rax+5]
48C1E028 shl rax, 40
480BF8 or rdi, rax
837DF804 cmp dword ptr [rbp-08H], 4 ; bound check
0F8647010000 jbe G_M53623_IG20
488B45F0 mov rax, bword ptr [rbp-10H]
0FB64004 movzx rax, byte ptr [rax+4]
48C1E020 shl rax, 32
480BF8 or rdi, rax
48B80000000000000007 mov rax, 0x700000000000000
480BF8 or rdi, rax
E981000000 jmp G_M53623_IG09
This can be circumvented with a local like
// ...
ulong key = MemoryMarshal.Read<uint>(propertyName);
ReadOnlySpan<byte> tmp = propertyName;
if (length == 7)
{
key |= (ulong)tmp[6] << (6 * BitsInByte)
| (ulong)tmp[5] << (5 * BitsInByte)
| (ulong)tmp[4] << (4 * BitsInByte)
| (ulong)7 << (7 * BitsInByte);
}
// ...
asm
G_M53623_IG04:
83F803 cmp eax, 3
0F8ED1000000 jle G_M53623_IG10
488D7DF0 lea rdi, bword ptr [rbp-10H]
488B37 mov rsi, bword ptr [rdi]
8B7F08 mov edi, dword ptr [rdi+8]
83FF04 cmp edi, 4
0F8C48010000 jl G_M53623_IG18
G_M53623_IG05:
8B3E mov edi, dword ptr [rsi]
488D75F0 lea rsi, bword ptr [rbp-10H]
488B16 mov rdx, bword ptr [rsi]
8B7608 mov esi, dword ptr [rsi+8]
83F807 cmp eax, 7
753D jne SHORT G_M53623_IG06
83FE06 cmp esi, 6
0F8644010000 jbe G_M53623_IG20
0FB64206 movzx rax, byte ptr [rdx+6]
8BF0 mov esi, eax
48C1E630 shl rsi, 48
480BFE or rdi, rsi
0FB64205 movzx rax, byte ptr [rdx+5]
48C1E028 shl rax, 40
480BF8 or rdi, rax
0FB65204 movzx rdx, byte ptr [rdx+4]
8BC2 mov eax, edx
48C1E020 shl rax, 32
480BF8 or rdi, rax
48B80000000000000007 mov rax, 0x700000000000000
480BF8 or rdi, rax
EB6B jmp SHORT G_M53623_IG09
@AndyAyersMS is this known?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No...
cc @dotnet/jit-contrib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end this code just needs to copy from a variable length byte array (up to length 7) to a ulong
. I could write a native method using memcpy, but that is overkill since we don't have any native code for System.Text.Json...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there is something strange happening, as this optimization won't kick in here.
I tried all variants
- Original (acsending order).
- In descending order (same assembly as original)
- In descending order + temp variable (didn't see a difference in assembly w.r.t. boundary check)
- Using switch\case (overhead for setting up switch)
The original and descending order performed best (by inspecting assembly and somewhat verifying benchmark -- close to margin of error). However, I'll use descending order based on possible bounds check optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x64 Windows code has no bounds checks in this example. x64 Linux has bounds checks because we are not "promoting" propertyName struct. We are running into this issue:
Incoming multi-reg structs with more than one field are not getting promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are not promoting the struct we are not tracking the _length field properly and can't eliminate the bounds checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand it: when I use the tmp
, so it is local and the bound checks for [5]
and [4]
can be elided, as the one check for [6]
is done (as expected)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The bounds check for [6]
is not needed either once we are inside if (length==7)
but we don't realize that length
and tmp._length
have the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened https://github.com/dotnet/coreclr/issues/26710 to track this. The item is also in the list in first-class-structs roadmap document https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md#improve-struct-promotion
{ | ||
key = MemoryMarshal.Read<uint>(propertyName); | ||
if (length > 4) | ||
|
||
if (length == 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside the other comment, maybe the code can be written as
key = MemoryMarshal.Read<uint>(propertyName);
ReadOnlySpan<byte> tmp = propertyName;
switch (length)
{
case 7: key |= (ulong)tmp[6] << (6 * BitsInByte); goto case 6;
case 6: key |= (ulong)tmp[5] << (5 * BitsInByte); goto case 5;
case 5: key |= (ulong)tmp[4] << (4 * BitsInByte); goto default;
default: key |= (ulong)length << (7 * BitsInByte); break;
}
as this avoids the repeated steps for the lower indices, the dasm looks quite good, but I haven't perf-tested this variant.
dasm
; Assembly listing for method Program:GetKey(struct):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 11, 6.25) struct (16) [rbp-0x10] do-not-enreg[XSFB] addr-exposed ld-addr-op
; V01 loc0 [V01,T02] ( 8, 5 ) int -> rdi
; V02 loc1 [V02,T03] ( 6, 3 ) long -> rax
; V03 loc2 [V03,T01] ( 10, 5 ) long -> rsi
;* V04 loc3 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op
; V05 loc4 [V05,T04] ( 6, 3 ) long -> rax
; V06 loc5 [V06,T18] ( 2, 1 ) long -> rax
;# V07 OutArgs [V07 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V09 tmp2 [V09 ] ( 0, 0 ) int -> zero-ref "impAppendStmt"
;* V10 tmp3 [V10 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
; V11 tmp4 [V11,T09] ( 2, 2 ) byref -> rax "Inlining Arg"
;* V12 tmp5 [V12 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V13 tmp6 [V13 ] ( 0, 0 ) int -> zero-ref "impAppendStmt"
;* V14 tmp7 [V14 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
; V15 tmp8 [V15,T10] ( 2, 2 ) byref -> rsi "Inlining Arg"
;* V16 tmp9 [V16 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V17 tmp10 [V17 ] ( 0, 0 ) int -> zero-ref "impAppendStmt"
;* V18 tmp11 [V18 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
; V19 tmp12 [V19,T11] ( 2, 2 ) byref -> r8 "Inlining Arg"
; V20 tmp13 [V20,T00] ( 6, 7 ) long -> rax "Single return block return value"
; V21 tmp14 [V21,T07] ( 4, 2 ) byref -> rdx V04._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
; V22 tmp15 [V22,T08] ( 4, 2 ) int -> rcx V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
; V23 tmp16 [V23,T19] ( 2, 0.75) byref -> rax V08._pointer(offs=0x00) P-INDEP "field V08._pointer (fldOffset=0x0)"
; V24 tmp17 [V24,T22] ( 2, 0.50) int -> rdi V08._length(offs=0x08) P-INDEP "field V08._length (fldOffset=0x8)"
; V25 tmp18 [V25,T15] ( 2, 1 ) byref -> rax V10._pointer(offs=0x00) P-INDEP "field V10._pointer (fldOffset=0x0)"
;* V26 tmp19 [V26 ] ( 0, 0 ) int -> zero-ref V10._length(offs=0x08) P-INDEP "field V10._length (fldOffset=0x8)"
; V27 tmp20 [V27,T20] ( 2, 0.75) byref -> rsi V12._pointer(offs=0x00) P-INDEP "field V12._pointer (fldOffset=0x0)"
; V28 tmp21 [V28,T23] ( 2, 0.50) int -> rax V12._length(offs=0x08) P-INDEP "field V12._length (fldOffset=0x8)"
; V29 tmp22 [V29,T16] ( 2, 1 ) byref -> rsi V14._pointer(offs=0x00) P-INDEP "field V14._pointer (fldOffset=0x0)"
;* V30 tmp23 [V30 ] ( 0, 0 ) int -> zero-ref V14._length(offs=0x08) P-INDEP "field V14._length (fldOffset=0x8)"
; V31 tmp24 [V31,T21] ( 2, 0.75) byref -> r8 V16._pointer(offs=0x00) P-INDEP "field V16._pointer (fldOffset=0x0)"
; V32 tmp25 [V32,T24] ( 2, 0.50) int -> rax V16._length(offs=0x08) P-INDEP "field V16._length (fldOffset=0x8)"
; V33 tmp26 [V33,T17] ( 2, 1 ) byref -> r8 V18._pointer(offs=0x00) P-INDEP "field V18._pointer (fldOffset=0x0)"
;* V34 tmp27 [V34 ] ( 0, 0 ) int -> zero-ref V18._length(offs=0x08) P-INDEP "field V18._length (fldOffset=0x8)"
; V35 tmp28 [V35,T12] ( 3, 1.50) byref -> rdi "BlockOp address local"
; V36 tmp29 [V36,T13] ( 3, 1.50) byref -> rax "BlockOp address local"
; V37 tmp30 [V37,T05] ( 3, 3 ) byref -> rax "BlockOp address local"
; V38 tmp31 [V38,T14] ( 3, 1.50) byref -> rax "BlockOp address local"
; V39 rat0 [V39,T06] ( 3, 3 ) int -> r8 "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 16
G_M53623_IG01:
55 push rbp
4883EC10 sub rsp, 16
488D6C2410 lea rbp, [rsp+10H]
48897DF0 mov bword ptr [rbp-10H], rdi
488975F8 mov qword ptr [rbp-08H], rsi
G_M53623_IG02:
8B7DF8 mov edi, dword ptr [rbp-08H]
83FF07 cmp edi, 7
7E35 jle SHORT G_M53623_IG04
488D7DF0 lea rdi, bword ptr [rbp-10H]
488B07 mov rax, bword ptr [rdi]
8B7F08 mov edi, dword ptr [rdi+8]
83FF08 cmp edi, 8
0F8C3B010000 jl G_M53623_IG17
G_M53623_IG03:
488B00 mov rax, qword ptr [rax]
48BFFFFFFFFFFFFFFF00 mov rdi, 0xFFFFFFFFFFFFFF
4823C7 and rax, rdi
48BF0000000000000008 mov rdi, 0x800000000000000
480BC7 or rax, rdi
E913010000 jmp G_M53623_IG16
G_M53623_IG04:
83FF03 cmp edi, 3
0F8E91000000 jle G_M53623_IG10
488D45F0 lea rax, bword ptr [rbp-10H]
488B30 mov rsi, bword ptr [rax]
8B4008 mov eax, dword ptr [rax+8]
83F804 cmp eax, 4
0F8C08010000 jl G_M53623_IG18
G_M53623_IG05:
8B06 mov eax, dword ptr [rsi]
8BF0 mov esi, eax
488D45F0 lea rax, bword ptr [rbp-10H]
488B10 mov rdx, bword ptr [rax]
8B4808 mov ecx, dword ptr [rax+8]
448D47FB lea r8d, [rdi-5]
4183F802 cmp r8d, 2
7757 ja SHORT G_M53623_IG09
418BC0 mov eax, r8d
4C8D0503010000 lea r8, [reloc @RWD00]
458B0480 mov r8d, dword ptr [r8+4*rax]
4C8D0D7AFFFFFF lea r9, G_M53623_IG02
4D03C1 add r8, r9
41FFE0 jmp r8
G_M53623_IG06:
83F906 cmp ecx, 6
0F86E2000000 jbe G_M53623_IG20
0FB64206 movzx rax, byte ptr [rdx+6]
48C1E030 shl rax, 48
480BF0 or rsi, rax
G_M53623_IG07:
83F905 cmp ecx, 5
0F86CE000000 jbe G_M53623_IG20
0FB64205 movzx rax, byte ptr [rdx+5]
48C1E028 shl rax, 40
480BF0 or rsi, rax
G_M53623_IG08:
83F904 cmp ecx, 4
0F86BA000000 jbe G_M53623_IG20
0FB64204 movzx rax, byte ptr [rdx+4]
48C1E020 shl rax, 32
480BF0 or rsi, rax
G_M53623_IG09:
4863FF movsxd rdi, edi
48C1E738 shl rdi, 56
480BF7 or rsi, rdi
488BC6 mov rax, rsi
EB79 jmp SHORT G_M53623_IG16
G_M53623_IG10:
83FF01 cmp edi, 1
7E51 jle SHORT G_M53623_IG14
488D45F0 lea rax, bword ptr [rbp-10H]
4C8B00 mov r8, bword ptr [rax]
8B4008 mov eax, dword ptr [rax+8]
83F802 cmp eax, 2
0F8C7D000000 jl G_M53623_IG19
G_M53623_IG11:
410FB700 movzx rax, word ptr [r8]
83FF03 cmp edi, 3
7526 jne SHORT G_M53623_IG12
837DF802 cmp dword ptr [rbp-08H], 2
7679 jbe SHORT G_M53623_IG20
488B7DF0 mov rdi, bword ptr [rbp-10H]
0FB64F02 movzx rcx, byte ptr [rdi+2]
8BD1 mov edx, ecx
48C1E210 shl rdx, 16
480BC2 or rax, rdx
48BE0000000000000003 mov rsi, 0x300000000000000
480BC6 or rax, rsi
EB0D jmp SHORT G_M53623_IG13
G_M53623_IG12:
48BF0000000000000002 mov rdi, 0x200000000000000
480BC7 or rax, rdi
G_M53623_IG13:
EB23 jmp SHORT G_M53623_IG16
G_M53623_IG14:
83FF01 cmp edi, 1
751C jne SHORT G_M53623_IG15
837DF800 cmp dword ptr [rbp-08H], 0
763F jbe SHORT G_M53623_IG20
488B45F0 mov rax, bword ptr [rbp-10H]
0FB600 movzx rax, byte ptr [rax]
48BF0000000000000001 mov rdi, 0x100000000000000
480BC7 or rax, rdi
EB02 jmp SHORT G_M53623_IG16
G_M53623_IG15:
33C0 xor rax, rax
G_M53623_IG16:
488D6500 lea rsp, [rbp]
5D pop rbp
C3 ret
G_M53623_IG17:
BF28000000 mov edi, 40
E8DEAEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int)
CC int3
G_M53623_IG18:
BF28000000 mov edi, 40
E8D3AEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int)
CC int3
G_M53623_IG19:
BF28000000 mov edi, 40
E8C8AEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int)
CC int3
G_M53623_IG20:
E8526C3679 call CORINFO_HELP_RNGCHKFAIL
CC int3
RWD00 dd 000000B4h ; case G_M53623_IG08
dd 000000A0h ; case G_M53623_IG07
dd 0000008Ch ; case G_M53623_IG06
; Total bytes of code 399, prolog size 10 for method Program:GetKey(struct):long
; ============================================================
Fixed that typo. Yes case sensitvity is faster due to the linked issue. If that linked issue is fixed then case insensitivity is just as fast as case sensitive iff the JSON is always consistent w.r.t. casing meaning case insensitivity can become slower when multiple JSON inputs have different casing because each variant is cached (up to 64 entries). |
Test failure on Windows Build x86_Release not related:
|
Significant end-to-end deserialization perf gains:
Basic changes (for about 1/3 of the gains)
TryIsPropertyRefEqual()
, add[AggressiveInlining]
.GetKey()
, minimizeif
comparisons.GetKey()
, optimize for property names >= 8 bytes.SequenceEqual
when property name is 7 bytes and otherwise reduces collisions for other scenarios.Basic changes (for about 2/3 of the gains)
[AggressiveInlining]
toGetProperty()
,HandlePropertyName()
andHandleValue()
.Benchmarks
Running the ReadJson<> benchmarks showed results in the 11-17% range where % = (original-current) / original. One anomaly is the first one with a much higher% -- that was due to high Error.
Perf impact for missing JSON properties
Previously we didn't cache misses.
The test POCO used for missing properties and case-insensitive is a simple, flat object with 9 primitive properties (
string
andint
). Having more properties increases the % gain while fewer properties decreases it (due to general overhead).BEFORE (missing properties)
AFTER (missing properties)
Case-insensitive properties
There was an issue here regarding the cache not adding the correct case-insensitive key.
BEFORE (with two sets of incoming JSON of different casing to trigger extra cache entries)
AFTER (with two sets of incoming JSON of different casing to trigger extra cache entries)
BEFORE (with only case insensitive JSON)
AFTER (with only case insensitive JSON)