Improve deserialization perf with changes to property name lookup #40998

steveharter · 2019-09-10T21:44:54Z

Significant end-to-end deserialization perf gains:

~11-17% on normal simple objects (no collections or child objects).
~40% on missing properties (when a JSON property is not found on the POCO)
~40% on case-insensitive properties (when a JSON property only matches because case-insensitive is enabled)

Basic changes (for about 1/3 of the gains)

In TryIsPropertyRefEqual(), add [AggressiveInlining].
In GetKey(), minimize if comparisons.
In GetKey(), optimize for property names >= 8 bytes.
Also not impacting perf in this specific case: expanded the key from 6 to 7 bytes; avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions for other scenarios.

Basic changes (for about 2/3 of the gains)

Apply [AggressiveInlining] to GetProperty(), HandlePropertyName() and HandleValue().

Benchmarks

Running the ReadJson<> benchmarks showed results in the 11-17% range where % = (original-current) / original. One anomaly is the first one with a much higher% -- that was due to high Error.

Faster	base/diff	Base Median (ns)	Diff Median (ns)	Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr	1.56	852.92	548.18	bimodal
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes	1.21	1624.47	1341.85
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf	1.21	593.28	491.52
System.Text.Json.Serialization.Tests.ReadJson.Deseriali	1.18	462955.27	393372.96
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString	1.17	1722.79	1466.58
System.Text.Json.Serialization.Tests.ReadJson.Deseriali	1.14	470687.99	412734.27
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr	1.14	42914.59	37686.85
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf	1.13	41069.02	36241.96
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream	1.13	1915.02	1694.84
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr	1.13	43991.16	38970.73

Perf impact for missing JSON properties
Previously we didn't cache misses.

The test POCO used for missing properties and case-insensitive is a simple, flat object with 9 primitive properties (string and int). Having more properties increases the % gain while fewer properties decreases it (due to general overhead).

BEFORE (missing properties)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	2,391.7 ns	31.698 ns	29.650 ns	2,374.2 ns	2,368.3 ns	2,456.8 ns	0.5126	-	-	3240 B
SystemTextJson	1,575.4 ns	8.670 ns	7.686 ns	1,574.4 ns	1,563.7 ns	1,589.4 ns	0.0878	-	-	552 B

AFTER (missing properties)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	2,241.0 ns	12.956 ns	11.485 ns	2,242.9 ns	2,216.2 ns	2,256.6 ns	0.5096	-	-	3240 B
SystemTextJson	946.3 ns	8.291 ns	7.755 ns	942.8 ns	936.5 ns	960.2 ns	0.0337	-	-	224 B

Case-insensitive properties
There was an issue here regarding the cache not adding the correct case-insensitive key.

BEFORE (with two sets of incoming JSON of different casing to trigger extra cache entries)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	3.192 us	0.0149 us	0.0132 us	3.190 us	3.172 us	3.223 us	0.5331	-	-	3.32 KB
SystemTextJson	4.441 us	0.0131 us	0.0116 us	4.439 us	4.421 us	4.460 us	0.2314	-	-	1.47 KB

AFTER (with two sets of incoming JSON of different casing to trigger extra cache entries)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	3.224 us	0.0359 us	0.0336 us	3.229 us	3.159 us	3.271 us	0.5387	-	-	3.32 KB
SystemTextJson	2.787 us	0.0244 us	0.0228 us	2.782 us	2.755 us	2.828 us	0.2101	-	-	1.3 KB

BEFORE (with only case insensitive JSON)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	18.315 us	4.1845 us	4.8189 us	17.100 us	13.100 us	30.800 us	-	-	-	3.32 KB
SystemTextJson	5.318 us	0.5276 us	0.6076 us	5.085 us	4.631 us	6.696 us	0.2314	-	-	1.47 KB

AFTER (with only case insensitive JSON)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	14.205 us	0.8944 us	0.9941 us	14.000 us	12.900 us	16.500 us	-	-	-	3.32 KB
SystemTextJson	2.880 us	0.0364 us	0.0323 us	2.871 us	2.844 us	2.961 us	0.2019	-	-	1.3 KB

ahsonkhan · 2019-09-11T01:05:40Z

Significant end-to-end deserialization perf gains

Wow, those are significant gains particularly for missing properties and when folks use case-insensitive comparison (which is on by default within aspnet).

The default setting for deserializer is case-sensitive, correct (i.e. PropertyNameCaseInsensitive = false)? How does it's perf compare? Any low-hanging fruit there that we can investigate to get similar wins there?

Expand the key from 6 to 7 bytes;

How is it feasible for us to do this when we previously reserved 2 bytes for length?

avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions.

Did this actually show perf wins, on it's own? I would have thought that would be a wash.

Previously we didn't cache misses.
There was an issue here regarding the cache not adding the correct case-insensitive key.

Can you pinpoint/highlight, in the code, where these issue were to help with the code review?

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

ahsonkhan · 2019-09-11T01:13:09Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

                // Start with the current property index, and then go forwards\backwards.
                int propertyIndex = frame.PropertyIndex;

                int count = localPropertyRefsSorted.Length;
                int iForward = Math.Min(propertyIndex, count);
                int iBackward = iForward - 1;

-                while (iForward < count || iBackward >= 0)
+                for (;;)


Do we have some evidence to suggest that this forward/backwards search will be beneficial in practice, for perf?

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs

src/System.Text.Json/tests/Serialization/PropertyNameTests.cs

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

steveharter · 2019-09-11T22:01:39Z

The default setting for deserializer is case-sensitive, correct (i.e. PropertyNameCaseInsensitive = false)? How does it's perf compare? Any low-hanging fruit there that we can investigate to get similar wins there?

Yes the default is case-sensitive and ASP changes it to case-insensitive. Using the same test class turning on case-insensitivity is 30% slower due to this issue:

Case-insensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is camel-cased)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	14.795 us	1.3507 us	1.5555 us	14.450 us	13.000 us	18.800 us	-	-	-	3.32 KB
SystemTextJson	2.733 us	0.0152 us	0.0142 us	2.724 us	2.714 us	2.765 us	0.2065	-	-	1.3 KB

Case-sensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is Pascal-cased)

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0/1k Op	Gen 1/1k Op	Gen 2/1k Op	Allocated Memory/Op
JSON.NET	21.710 us	5.4886 us	6.3206 us	20.500 us	13.200 us	36.600 us	-	-	-	3056 B
SystemTextJson	1.853 us	0.0103 us	0.0096 us	1.848 us	1.845 us	1.875 us	0.1336	-	-	856 B

Also supporting case insensitive is a big deal w.r.t. serializer design. Some high-performance serializers don't support it because of the way they perform property-name lookup e.g. by generating IL representing a custom B-tree based on keys determined statically and thus can't represent all casing variants.

Expand the key from 6 to 7 bytes;
How is it feasible for us to do this when we previously reserved 2 bytes for length?

Embedding the length in the key only prevents us from the perf hit of comparing lengths for the property names that fit within the key (previously <=6 now its <=7 characters). So we really only need one byte and only need a length of 8 (for all properties of length >= 8) to tell the compare code it needs to compare contents, not just the key.

avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions.
Did this actually show perf wins, on it's own? I would have thought that would be a wash.

Not really. I didn't game the test data for property names of 7 -- but those are now faster since we don't need to compare the contents. It also helps more with non-ascii since now you have an extra byte in the key to prevent false hits.

There was an issue here regarding the cache not adding the correct case-insensitive key.
Can you pinpoint/highlight, in the code, where these issue were to help with the code review?

I added a comment to the location.

ahsonkhan · 2019-09-12T01:45:33Z

Using the same test class turning on case-sensitivity is 30% slower

If I am reading the data correctly, I see case-sensitive being faster than case-insensitive, not slower. Which is it?

Case-insensitive
SystemTextJson | 2.733 us

Case-sensitive
SystemTextJson | 1.853 us

ahsonkhan · 2019-09-12T01:49:06Z

BEFORE (with only case insensitive JSON)
JSON.NET | 18.315 us

AFTER (with only case insensitive JSON)
JSON.NET | 14.205 us

Looks like the first one had a large error (so the benchmarks aren't very stable). Json.NET numbers shouldn't change in the before/after, correct?

ahsonkhan · 2019-09-12T05:39:31Z

...ystem.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.HandlePropertyName.cs


 namespace System.Text.Json
 {
    public static partial class JsonSerializer
    {
+        // AggressiveInlining used although a large method it is only called from one locations and is on a hot path.
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]


@stephentoub, @jkotas - Given our previous discussions on the use of this attribute, what are your thoughts on the use of AggressiveInlining in cases like these, where there are very few (1-2) callers of somewhat large methods, and benchmarks show improvements for adding them?

For cases like this, it may be ok if you know what you are doing. There are a lot of cases where it can fire back - regressing the rest of the calling method, hitting JIT complexity thresholds and disabling optimizations, etc.

The reason for this method and similar ones is for readability \ understandability. Otherwise the method would not exist and code would just be inline, so I suppose that is another option -- to just push the implementation down to the caller.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

ahsonkhan

Other than pending/unresolved feedback - looks good.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

ahsonkhan · 2019-09-12T06:25:58Z

src/System.Text.Json/tests/Serialization/PropertyNameTests.cs

@@ -359,3 +485,4 @@ public class EmptyClassWithExtensionProperty
        public IDictionary<string, JsonElement> MyOverflow { get; set; }
    }
 }
+


nit: remove extra line.

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

gfoidl · 2019-09-12T12:20:01Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

+                {
+                    key |= (ulong) propertyName[4] << (4 * BitsInByte)
+                        | (ulong) propertyName[5] << (5 * BitsInByte)
+                        | (ulong) propertyName[6] << (6 * BitsInByte)


You can write the highest index at first, so the JIT will emit only one bound check, instead of one for every indexed access to the span.

So

key |= (ulong)propertyName[6] << (6 * BitsInByte) | (ulong)propertyName[5] << (5 * BitInByte) // ...

Hm, there is something strange happening, as this optimization won't kick in here.

It seems that MemoryMarshal.Read<uint>(propertyName) causes the JIT not to elide the subsequent bound checks after the access to index 6.
(at least when I remove the call to MemoryMarshal the optimization kicks in as expected).

asm

G_M53623_IG04: 83F803 cmp eax, 3 0F8EFD000000 jle G_M53623_IG10 488D7DF0 lea rdi, bword ptr [rbp-10H] 488B37 mov rsi, bword ptr [rdi] 8B7F08 mov edi, dword ptr [rdi+8] 83FF04 cmp edi, 4 0F8C74010000 jl G_M53623_IG18 G_M53623_IG05: 8B3E mov edi, dword ptr [rsi] 83F807 cmp eax, 7 755D jne SHORT G_M53623_IG06 837DF806 cmp dword ptr [rbp-08H], 6 ; bound check 0F8679010000 jbe G_M53623_IG20 488B45F0 mov rax, bword ptr [rbp-10H] 0FB64006 movzx rax, byte ptr [rax+6] 48C1E030 shl rax, 48 480BF8 or rdi, rax 837DF805 cmp dword ptr [rbp-08H], 5 ; bound check 0F8660010000 jbe G_M53623_IG20 488B45F0 mov rax, bword ptr [rbp-10H] 0FB64005 movzx rax, byte ptr [rax+5] 48C1E028 shl rax, 40 480BF8 or rdi, rax 837DF804 cmp dword ptr [rbp-08H], 4 ; bound check 0F8647010000 jbe G_M53623_IG20 488B45F0 mov rax, bword ptr [rbp-10H] 0FB64004 movzx rax, byte ptr [rax+4] 48C1E020 shl rax, 32 480BF8 or rdi, rax 48B80000000000000007 mov rax, 0x700000000000000 480BF8 or rdi, rax E981000000 jmp G_M53623_IG09

This can be circumvented with a local like

// ... ulong key = MemoryMarshal.Read<uint>(propertyName); ReadOnlySpan<byte> tmp = propertyName; if (length == 7) { key |= (ulong)tmp[6] << (6 * BitsInByte) | (ulong)tmp[5] << (5 * BitsInByte) | (ulong)tmp[4] << (4 * BitsInByte) | (ulong)7 << (7 * BitsInByte); } // ...

asm

G_M53623_IG04: 83F803 cmp eax, 3 0F8ED1000000 jle G_M53623_IG10 488D7DF0 lea rdi, bword ptr [rbp-10H] 488B37 mov rsi, bword ptr [rdi] 8B7F08 mov edi, dword ptr [rdi+8] 83FF04 cmp edi, 4 0F8C48010000 jl G_M53623_IG18 G_M53623_IG05: 8B3E mov edi, dword ptr [rsi] 488D75F0 lea rsi, bword ptr [rbp-10H] 488B16 mov rdx, bword ptr [rsi] 8B7608 mov esi, dword ptr [rsi+8] 83F807 cmp eax, 7 753D jne SHORT G_M53623_IG06 83FE06 cmp esi, 6 0F8644010000 jbe G_M53623_IG20 0FB64206 movzx rax, byte ptr [rdx+6] 8BF0 mov esi, eax 48C1E630 shl rsi, 48 480BFE or rdi, rsi 0FB64205 movzx rax, byte ptr [rdx+5] 48C1E028 shl rax, 40 480BF8 or rdi, rax 0FB65204 movzx rdx, byte ptr [rdx+4] 8BC2 mov eax, edx 48C1E020 shl rax, 32 480BF8 or rdi, rax 48B80000000000000007 mov rax, 0x700000000000000 480BF8 or rdi, rax EB6B jmp SHORT G_M53623_IG09

@AndyAyersMS is this known?

No...

cc @dotnet/jit-contrib

In the end this code just needs to copy from a variable length byte array (up to length 7) to a ulong. I could write a native method using memcpy, but that is overkill since we don't have any native code for System.Text.Json...

Hm, there is something strange happening, as this optimization won't kick in here.

I tried all variants

Original (acsending order).

In descending order (same assembly as original)

In descending order + temp variable (didn't see a difference in assembly w.r.t. boundary check)

Using switch\case (overhead for setting up switch)

The original and descending order performed best (by inspecting assembly and somewhat verifying benchmark -- close to margin of error). However, I'll use descending order based on possible bounds check optimization.

x64 Windows code has no bounds checks in this example. x64 Linux has bounds checks because we are not "promoting" propertyName struct. We are running into this issue:

https://github.com/dotnet/coreclr/blob/9479f67577bbb02ea611777b00308f42252fb2bc/src/jit/lclvars.cpp#L1914-L1926

Incoming multi-reg structs with more than one field are not getting promoted.

Because we are not promoting the struct we are not tracking the _length field properly and can't eliminate the bounds checks.

Just to understand it: when I use the tmp, so it is local and the bound checks for [5] and [4] can be elided, as the one check for [6] is done (as expected)?

Yes, that's correct. The bounds check for [6] is not needed either once we are inside if (length==7) but we don't realize that length and tmp._length have the same values.

I opened https://github.com/dotnet/coreclr/issues/26710 to track this. The item is also in the list in first-class-structs roadmap document https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md#improve-struct-promotion

gfoidl · 2019-09-12T14:16:36Z

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs

            {
                key = MemoryMarshal.Read<uint>(propertyName);
-                if (length > 4)
+
+                if (length == 7)


Beside the other comment, maybe the code can be written as

key = MemoryMarshal.Read<uint>(propertyName); ReadOnlySpan<byte> tmp = propertyName; switch (length) { case 7: key |= (ulong)tmp[6] << (6 * BitsInByte); goto case 6; case 6: key |= (ulong)tmp[5] << (5 * BitsInByte); goto case 5; case 5: key |= (ulong)tmp[4] << (4 * BitsInByte); goto default; default: key |= (ulong)length << (7 * BitsInByte); break; }

as this avoids the repeated steps for the lower indices, the dasm looks quite good, but I haven't perf-tested this variant.

dasm

; Assembly listing for method Program:GetKey(struct):long ; Emitting BLENDED_CODE for X64 CPU with AVX - Unix ; optimized code ; rbp based frame ; partially interruptible ; Final local variable assignments ; ; V00 arg0 [V00 ] ( 11, 6.25) struct (16) [rbp-0x10] do-not-enreg[XSFB] addr-exposed ld-addr-op ; V01 loc0 [V01,T02] ( 8, 5 ) int -> rdi ; V02 loc1 [V02,T03] ( 6, 3 ) long -> rax ; V03 loc2 [V03,T01] ( 10, 5 ) long -> rsi ;* V04 loc3 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op ; V05 loc4 [V05,T04] ( 6, 3 ) long -> rax ; V06 loc5 [V06,T18] ( 2, 1 ) long -> rax ;# V07 OutArgs [V07 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace" ;* V08 tmp1 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ;* V09 tmp2 [V09 ] ( 0, 0 ) int -> zero-ref "impAppendStmt" ;* V10 tmp3 [V10 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ; V11 tmp4 [V11,T09] ( 2, 2 ) byref -> rax "Inlining Arg" ;* V12 tmp5 [V12 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ;* V13 tmp6 [V13 ] ( 0, 0 ) int -> zero-ref "impAppendStmt" ;* V14 tmp7 [V14 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ; V15 tmp8 [V15,T10] ( 2, 2 ) byref -> rsi "Inlining Arg" ;* V16 tmp9 [V16 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ;* V17 tmp10 [V17 ] ( 0, 0 ) int -> zero-ref "impAppendStmt" ;* V18 tmp11 [V18 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ; V19 tmp12 [V19,T11] ( 2, 2 ) byref -> r8 "Inlining Arg" ; V20 tmp13 [V20,T00] ( 6, 7 ) long -> rax "Single return block return value" ; V21 tmp14 [V21,T07] ( 4, 2 ) byref -> rdx V04._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)" ; V22 tmp15 [V22,T08] ( 4, 2 ) int -> rcx V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)" ; V23 tmp16 [V23,T19] ( 2, 0.75) byref -> rax V08._pointer(offs=0x00) P-INDEP "field V08._pointer (fldOffset=0x0)" ; V24 tmp17 [V24,T22] ( 2, 0.50) int -> rdi V08._length(offs=0x08) P-INDEP "field V08._length (fldOffset=0x8)" ; V25 tmp18 [V25,T15] ( 2, 1 ) byref -> rax V10._pointer(offs=0x00) P-INDEP "field V10._pointer (fldOffset=0x0)" ;* V26 tmp19 [V26 ] ( 0, 0 ) int -> zero-ref V10._length(offs=0x08) P-INDEP "field V10._length (fldOffset=0x8)" ; V27 tmp20 [V27,T20] ( 2, 0.75) byref -> rsi V12._pointer(offs=0x00) P-INDEP "field V12._pointer (fldOffset=0x0)" ; V28 tmp21 [V28,T23] ( 2, 0.50) int -> rax V12._length(offs=0x08) P-INDEP "field V12._length (fldOffset=0x8)" ; V29 tmp22 [V29,T16] ( 2, 1 ) byref -> rsi V14._pointer(offs=0x00) P-INDEP "field V14._pointer (fldOffset=0x0)" ;* V30 tmp23 [V30 ] ( 0, 0 ) int -> zero-ref V14._length(offs=0x08) P-INDEP "field V14._length (fldOffset=0x8)" ; V31 tmp24 [V31,T21] ( 2, 0.75) byref -> r8 V16._pointer(offs=0x00) P-INDEP "field V16._pointer (fldOffset=0x0)" ; V32 tmp25 [V32,T24] ( 2, 0.50) int -> rax V16._length(offs=0x08) P-INDEP "field V16._length (fldOffset=0x8)" ; V33 tmp26 [V33,T17] ( 2, 1 ) byref -> r8 V18._pointer(offs=0x00) P-INDEP "field V18._pointer (fldOffset=0x0)" ;* V34 tmp27 [V34 ] ( 0, 0 ) int -> zero-ref V18._length(offs=0x08) P-INDEP "field V18._length (fldOffset=0x8)" ; V35 tmp28 [V35,T12] ( 3, 1.50) byref -> rdi "BlockOp address local" ; V36 tmp29 [V36,T13] ( 3, 1.50) byref -> rax "BlockOp address local" ; V37 tmp30 [V37,T05] ( 3, 3 ) byref -> rax "BlockOp address local" ; V38 tmp31 [V38,T14] ( 3, 1.50) byref -> rax "BlockOp address local" ; V39 rat0 [V39,T06] ( 3, 3 ) int -> r8 "ReplaceWithLclVar is creating a new local variable" ; ; Lcl frame size = 16 G_M53623_IG01: 55 push rbp 4883EC10 sub rsp, 16 488D6C2410 lea rbp, [rsp+10H] 48897DF0 mov bword ptr [rbp-10H], rdi 488975F8 mov qword ptr [rbp-08H], rsi G_M53623_IG02: 8B7DF8 mov edi, dword ptr [rbp-08H] 83FF07 cmp edi, 7 7E35 jle SHORT G_M53623_IG04 488D7DF0 lea rdi, bword ptr [rbp-10H] 488B07 mov rax, bword ptr [rdi] 8B7F08 mov edi, dword ptr [rdi+8] 83FF08 cmp edi, 8 0F8C3B010000 jl G_M53623_IG17 G_M53623_IG03: 488B00 mov rax, qword ptr [rax] 48BFFFFFFFFFFFFFFF00 mov rdi, 0xFFFFFFFFFFFFFF 4823C7 and rax, rdi 48BF0000000000000008 mov rdi, 0x800000000000000 480BC7 or rax, rdi E913010000 jmp G_M53623_IG16 G_M53623_IG04: 83FF03 cmp edi, 3 0F8E91000000 jle G_M53623_IG10 488D45F0 lea rax, bword ptr [rbp-10H] 488B30 mov rsi, bword ptr [rax] 8B4008 mov eax, dword ptr [rax+8] 83F804 cmp eax, 4 0F8C08010000 jl G_M53623_IG18 G_M53623_IG05: 8B06 mov eax, dword ptr [rsi] 8BF0 mov esi, eax 488D45F0 lea rax, bword ptr [rbp-10H] 488B10 mov rdx, bword ptr [rax] 8B4808 mov ecx, dword ptr [rax+8] 448D47FB lea r8d, [rdi-5] 4183F802 cmp r8d, 2 7757 ja SHORT G_M53623_IG09 418BC0 mov eax, r8d 4C8D0503010000 lea r8, [reloc @RWD00] 458B0480 mov r8d, dword ptr [r8+4*rax] 4C8D0D7AFFFFFF lea r9, G_M53623_IG02 4D03C1 add r8, r9 41FFE0 jmp r8 G_M53623_IG06: 83F906 cmp ecx, 6 0F86E2000000 jbe G_M53623_IG20 0FB64206 movzx rax, byte ptr [rdx+6] 48C1E030 shl rax, 48 480BF0 or rsi, rax G_M53623_IG07: 83F905 cmp ecx, 5 0F86CE000000 jbe G_M53623_IG20 0FB64205 movzx rax, byte ptr [rdx+5] 48C1E028 shl rax, 40 480BF0 or rsi, rax G_M53623_IG08: 83F904 cmp ecx, 4 0F86BA000000 jbe G_M53623_IG20 0FB64204 movzx rax, byte ptr [rdx+4] 48C1E020 shl rax, 32 480BF0 or rsi, rax G_M53623_IG09: 4863FF movsxd rdi, edi 48C1E738 shl rdi, 56 480BF7 or rsi, rdi 488BC6 mov rax, rsi EB79 jmp SHORT G_M53623_IG16 G_M53623_IG10: 83FF01 cmp edi, 1 7E51 jle SHORT G_M53623_IG14 488D45F0 lea rax, bword ptr [rbp-10H] 4C8B00 mov r8, bword ptr [rax] 8B4008 mov eax, dword ptr [rax+8] 83F802 cmp eax, 2 0F8C7D000000 jl G_M53623_IG19 G_M53623_IG11: 410FB700 movzx rax, word ptr [r8] 83FF03 cmp edi, 3 7526 jne SHORT G_M53623_IG12 837DF802 cmp dword ptr [rbp-08H], 2 7679 jbe SHORT G_M53623_IG20 488B7DF0 mov rdi, bword ptr [rbp-10H] 0FB64F02 movzx rcx, byte ptr [rdi+2] 8BD1 mov edx, ecx 48C1E210 shl rdx, 16 480BC2 or rax, rdx 48BE0000000000000003 mov rsi, 0x300000000000000 480BC6 or rax, rsi EB0D jmp SHORT G_M53623_IG13 G_M53623_IG12: 48BF0000000000000002 mov rdi, 0x200000000000000 480BC7 or rax, rdi G_M53623_IG13: EB23 jmp SHORT G_M53623_IG16 G_M53623_IG14: 83FF01 cmp edi, 1 751C jne SHORT G_M53623_IG15 837DF800 cmp dword ptr [rbp-08H], 0 763F jbe SHORT G_M53623_IG20 488B45F0 mov rax, bword ptr [rbp-10H] 0FB600 movzx rax, byte ptr [rax] 48BF0000000000000001 mov rdi, 0x100000000000000 480BC7 or rax, rdi EB02 jmp SHORT G_M53623_IG16 G_M53623_IG15: 33C0 xor rax, rax G_M53623_IG16: 488D6500 lea rsp, [rbp] 5D pop rbp C3 ret G_M53623_IG17: BF28000000 mov edi, 40 E8DEAEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int) CC int3 G_M53623_IG18: BF28000000 mov edi, 40 E8D3AEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int) CC int3 G_M53623_IG19: BF28000000 mov edi, 40 E8C8AEFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int) CC int3 G_M53623_IG20: E8526C3679 call CORINFO_HELP_RNGCHKFAIL CC int3 RWD00 dd 000000B4h ; case G_M53623_IG08 dd 000000A0h ; case G_M53623_IG07 dd 0000008Ch ; case G_M53623_IG06 ; Total bytes of code 399, prolog size 10 for method Program:GetKey(struct):long ; ============================================================

steveharter · 2019-09-12T18:29:22Z

If I am reading the data correctly, I see case-sensitive being faster than case-insensitive, not slower. Which is it?

Fixed that typo. Yes case sensitvity is faster due to the linked issue. If that linked issue is fixed then case insensitivity is just as fast as case sensitive iff the JSON is always consistent w.r.t. casing meaning case insensitivity can become slower when multiple JSON inputs have different casing because each variant is cached (up to 64 entries).

steveharter · 2019-09-12T20:47:26Z

Test failure on Windows Build x86_Release not related:

  Starting:    System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Fatal error. Internal CLR error. (0x80131506)

…0998)

…tnet#40998)

…tnet/corefx#40998) Commit migrated from dotnet/corefx@d3c6628

steveharter added tenet-performance Performance related issue area-System.Text.Json labels Sep 10, 2019

steveharter added this to the 5.0 milestone Sep 10, 2019

steveharter requested review from ahsonkhan and layomia September 10, 2019 21:44

steveharter self-assigned this Sep 10, 2019

steveharter force-pushed the PropertyNameNits branch from abfd511 to a3c8a26 Compare September 10, 2019 22:16

Improve perf of property name lookup

f65b726

steveharter force-pushed the PropertyNameNits branch from 4a22ac4 to f65b726 Compare September 10, 2019 22:24

ahsonkhan reviewed Sep 11, 2019

View reviewed changes

steveharter commented Sep 11, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs Show resolved Hide resolved

Refactor GetKey(), use AggressiveInlining and improve tests

d912a91

ahsonkhan reviewed Sep 12, 2019

View reviewed changes

src/System.Text.Json/src/System/Text/Json/Serialization/JsonClassInfo.cs Outdated Show resolved Hide resolved

ahsonkhan approved these changes Sep 12, 2019

View reviewed changes

gfoidl reviewed Sep 12, 2019

View reviewed changes

Misc feedback

491b466

steveharter merged commit d3c6628 into dotnet:master Sep 12, 2019

steveharter deleted the PropertyNameNits branch September 12, 2019 20:48

steveharter added a commit that referenced this pull request Oct 10, 2019

Improve deserialization perf with changes to property name lookup (#4…

c6c12ce

…0998)

steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019

Improve deserialization perf with changes to property name lookup (do…

e707e2c

…tnet#40998)

steveharter added a commit to steveharter/dotnet_corefx that referenced this pull request Oct 14, 2019

Improve deserialization perf with changes to property name lookup (do…

657f6a5

…tnet#40998)

erozenfeld mentioned this pull request Jan 31, 2020

Implement struct promotion for incoming multireg structs dotnet/runtime#13417

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Improve deserialization perf with changes to property name lookup (do…

64e7991

…tnet/corefx#40998) Commit migrated from dotnet/corefx@d3c6628

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve deserialization perf with changes to property name lookup #40998

Improve deserialization perf with changes to property name lookup #40998

steveharter commented Sep 10, 2019 •

edited

Loading

ahsonkhan commented Sep 11, 2019 •

edited

Loading

ahsonkhan Sep 11, 2019

steveharter commented Sep 11, 2019 •

edited

Loading

ahsonkhan commented Sep 12, 2019 •

edited

Loading

ahsonkhan commented Sep 12, 2019 •

edited

Loading

ahsonkhan Sep 12, 2019

jkotas Sep 12, 2019

steveharter Sep 12, 2019

ahsonkhan left a comment

ahsonkhan Sep 12, 2019

gfoidl Sep 12, 2019

gfoidl Sep 12, 2019

AndyAyersMS Sep 12, 2019

steveharter Sep 12, 2019

steveharter Sep 12, 2019 •

edited

Loading

erozenfeld Sep 13, 2019 •

edited

Loading

erozenfeld Sep 13, 2019

gfoidl Sep 13, 2019

erozenfeld Sep 13, 2019

erozenfeld Sep 13, 2019

gfoidl Sep 12, 2019

steveharter commented Sep 12, 2019

steveharter commented Sep 12, 2019

Improve deserialization perf with changes to property name lookup #40998

Improve deserialization perf with changes to property name lookup #40998

Conversation

steveharter commented Sep 10, 2019 • edited Loading

ahsonkhan commented Sep 11, 2019 • edited Loading

Choose a reason for hiding this comment

steveharter commented Sep 11, 2019 • edited Loading

ahsonkhan commented Sep 12, 2019 • edited Loading

ahsonkhan commented Sep 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveharter Sep 12, 2019 • edited Loading

Choose a reason for hiding this comment

erozenfeld Sep 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveharter commented Sep 12, 2019

steveharter commented Sep 12, 2019

steveharter commented Sep 10, 2019 •

edited

Loading

ahsonkhan commented Sep 11, 2019 •

edited

Loading

steveharter commented Sep 11, 2019 •

edited

Loading

ahsonkhan commented Sep 12, 2019 •

edited

Loading

ahsonkhan commented Sep 12, 2019 •

edited

Loading

steveharter Sep 12, 2019 •

edited

Loading

erozenfeld Sep 13, 2019 •

edited

Loading