Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Improve deserialization perf with changes to property name lookup #40998

Merged
merged 3 commits into from
Sep 12, 2019

Conversation

steveharter
Copy link
Member

@steveharter steveharter commented Sep 10, 2019

Significant end-to-end deserialization perf gains:

  • ~11-17% on normal simple objects (no collections or child objects).
  • ~40% on missing properties (when a JSON property is not found on the POCO)
  • ~40% on case-insensitive properties (when a JSON property only matches because case-insensitive is enabled)

Basic changes (for about 1/3 of the gains)

  • In TryIsPropertyRefEqual(), add [AggressiveInlining].
  • In GetKey(), minimize if comparisons.
  • In GetKey(), optimize for property names >= 8 bytes.
  • Also not impacting perf in this specific case: expanded the key from 6 to 7 bytes; avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions for other scenarios.

Basic changes (for about 2/3 of the gains)

  • Apply [AggressiveInlining] to GetProperty(), HandlePropertyName() and HandleValue().

Benchmarks

Running the ReadJson<> benchmarks showed results in the 11-17% range where % = (original-current) / original. One anomaly is the first one with a much higher% -- that was due to high Error.

Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.56 852.92 548.18 bimodal
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf8Bytes 1.21 1624.47 1341.85
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.21 593.28 491.52
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.18 462955.27 393372.96
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromString 1.17 1722.79 1466.58
System.Text.Json.Serialization.Tests.ReadJson.Deseriali 1.14 470687.99 412734.27
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.14 42914.59 37686.85
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromUtf 1.13 41069.02 36241.96
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStream 1.13 1915.02 1694.84
System.Text.Json.Serialization.Tests.ReadJson.DeserializeFromStr 1.13 43991.16 38970.73

Perf impact for missing JSON properties
Previously we didn't cache misses.

The test POCO used for missing properties and case-insensitive is a simple, flat object with 9 primitive properties (string and int). Having more properties increases the % gain while fewer properties decreases it (due to general overhead).

BEFORE (missing properties)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 2,391.7 ns 31.698 ns 29.650 ns 2,374.2 ns 2,368.3 ns 2,456.8 ns 0.5126 - - 3240 B
SystemTextJson 1,575.4 ns 8.670 ns 7.686 ns 1,574.4 ns 1,563.7 ns 1,589.4 ns 0.0878 - - 552 B

AFTER (missing properties)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 2,241.0 ns 12.956 ns 11.485 ns 2,242.9 ns 2,216.2 ns 2,256.6 ns 0.5096 - - 3240 B
SystemTextJson 946.3 ns 8.291 ns 7.755 ns 942.8 ns 936.5 ns 960.2 ns 0.0337 - - 224 B

Case-insensitive properties
There was an issue here regarding the cache not adding the correct case-insensitive key.

BEFORE (with two sets of incoming JSON of different casing to trigger extra cache entries)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 3.192 us 0.0149 us 0.0132 us 3.190 us 3.172 us 3.223 us 0.5331 - - 3.32 KB
SystemTextJson 4.441 us 0.0131 us 0.0116 us 4.439 us 4.421 us 4.460 us 0.2314 - - 1.47 KB

AFTER (with two sets of incoming JSON of different casing to trigger extra cache entries)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 3.224 us 0.0359 us 0.0336 us 3.229 us 3.159 us 3.271 us 0.5387 - - 3.32 KB
SystemTextJson 2.787 us 0.0244 us 0.0228 us 2.782 us 2.755 us 2.828 us 0.2101 - - 1.3 KB

BEFORE (with only case insensitive JSON)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 18.315 us 4.1845 us 4.8189 us 17.100 us 13.100 us 30.800 us - - - 3.32 KB
SystemTextJson 5.318 us 0.5276 us 0.6076 us 5.085 us 4.631 us 6.696 us 0.2314 - - 1.47 KB

AFTER (with only case insensitive JSON)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 14.205 us 0.8944 us 0.9941 us 14.000 us 12.900 us 16.500 us - - - 3.32 KB
SystemTextJson 2.880 us 0.0364 us 0.0323 us 2.871 us 2.844 us 2.961 us 0.2019 - - 1.3 KB

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 11, 2019

Significant end-to-end deserialization perf gains

Wow, those are significant gains particularly for missing properties and when folks use case-insensitive comparison (which is on by default within aspnet).

The default setting for deserializer is case-sensitive, correct (i.e. PropertyNameCaseInsensitive = false)? How does it's perf compare? Any low-hanging fruit there that we can investigate to get similar wins there?

Expand the key from 6 to 7 bytes;

How is it feasible for us to do this when we previously reserved 2 bytes for length?

avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions.

Did this actually show perf wins, on it's own? I would have thought that would be a wash.

Previously we didn't cache misses.
There was an issue here regarding the cache not adding the correct case-insensitive key.

Can you pinpoint/highlight, in the code, where these issue were to help with the code review?

// Start with the current property index, and then go forwards\backwards.
int propertyIndex = frame.PropertyIndex;

int count = localPropertyRefsSorted.Length;
int iForward = Math.Min(propertyIndex, count);
int iBackward = iForward - 1;

while (iForward < count || iBackward >= 0)
for (;;)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some evidence to suggest that this forward/backwards search will be beneficial in practice, for perf?

@steveharter
Copy link
Member Author

steveharter commented Sep 11, 2019

The default setting for deserializer is case-sensitive, correct (i.e. PropertyNameCaseInsensitive = false)? How does it's perf compare? Any low-hanging fruit there that we can investigate to get similar wins there?

Yes the default is case-sensitive and ASP changes it to case-insensitive. Using the same test class turning on case-insensitivity is 30% slower due to this issue:

Case-insensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is camel-cased)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 14.795 us 1.3507 us 1.5555 us 14.450 us 13.000 us 18.800 us - - - 3.32 KB
SystemTextJson 2.733 us 0.0152 us 0.0142 us 2.724 us 2.714 us 2.765 us 0.2065 - - 1.3 KB

Case-sensitive with correct JSON (e.g. POCO has Pascal-cased properties and JSON is Pascal-cased)

Method Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
JSON.NET 21.710 us 5.4886 us 6.3206 us 20.500 us 13.200 us 36.600 us - - - 3056 B
SystemTextJson 1.853 us 0.0103 us 0.0096 us 1.848 us 1.845 us 1.875 us 0.1336 - - 856 B

Also supporting case insensitive is a big deal w.r.t. serializer design. Some high-performance serializers don't support it because of the way they perform property-name lookup e.g. by generating IL representing a custom B-tree based on keys determined statically and thus can't represent all casing variants.

Expand the key from 6 to 7 bytes;
How is it feasible for us to do this when we previously reserved 2 bytes for length?

Embedding the length in the key only prevents us from the perf hit of comparing lengths for the property names that fit within the key (previously <=6 now its <=7 characters). So we really only need one byte and only need a length of 8 (for all properties of length >= 8) to tell the compare code it needs to compare contents, not just the key.

avoids calling SequenceEqual when property name is 7 bytes and otherwise reduces collisions.
Did this actually show perf wins, on it's own? I would have thought that would be a wash.

Not really. I didn't game the test data for property names of 7 -- but those are now faster since we don't need to compare the contents. It also helps more with non-ascii since now you have an extra byte in the key to prevent false hits.

There was an issue here regarding the cache not adding the correct case-insensitive key.
Can you pinpoint/highlight, in the code, where these issue were to help with the code review?

I added a comment to the location.

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 12, 2019

Using the same test class turning on case-sensitivity is 30% slower

If I am reading the data correctly, I see case-sensitive being faster than case-insensitive, not slower. Which is it?

Case-insensitive
SystemTextJson | 2.733 us

Case-sensitive
SystemTextJson | 1.853 us

@ahsonkhan
Copy link
Member

ahsonkhan commented Sep 12, 2019

BEFORE (with only case insensitive JSON)
JSON.NET | 18.315 us

AFTER (with only case insensitive JSON)
JSON.NET | 14.205 us

Looks like the first one had a large error (so the benchmarks aren't very stable). Json.NET numbers shouldn't change in the before/after, correct?


namespace System.Text.Json
{
public static partial class JsonSerializer
{
// AggressiveInlining used although a large method it is only called from one locations and is on a hot path.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub, @jkotas - Given our previous discussions on the use of this attribute, what are your thoughts on the use of AggressiveInlining in cases like these, where there are very few (1-2) callers of somewhat large methods, and benchmarks show improvements for adding them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases like this, it may be ok if you know what you are doing. There are a lot of cases where it can fire back - regressing the rest of the calling method, hitting JIT complexity thresholds and disabling optimizations, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this method and similar ones is for readability \ understandability. Otherwise the method would not exist and code would just be inline, so I suppose that is another option -- to just push the implementation down to the caller.

Copy link
Member

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than pending/unresolved feedback - looks good.

@@ -359,3 +485,4 @@ public class EmptyClassWithExtensionProperty
public IDictionary<string, JsonElement> MyOverflow { get; set; }
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line.

{
key |= (ulong) propertyName[4] << (4 * BitsInByte)
| (ulong) propertyName[5] << (5 * BitsInByte)
| (ulong) propertyName[6] << (6 * BitsInByte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write the highest index at first, so the JIT will emit only one bound check, instead of one for every indexed access to the span.

So

key |= (ulong)propertyName[6] << (6 * BitsInByte)
    | (ulong)propertyName[5] << (5 * BitInByte)
      // ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there is something strange happening, as this optimization won't kick in here.

It seems that MemoryMarshal.Read<uint>(propertyName) causes the JIT not to elide the subsequent bound checks after the access to index 6.
(at least when I remove the call to MemoryMarshal the optimization kicks in as expected).

asm
G_M53623_IG04:
       83F803               cmp      eax, 3
       0F8EFD000000         jle      G_M53623_IG10
       488D7DF0             lea      rdi, bword ptr [rbp-10H]
       488B37               mov      rsi, bword ptr [rdi]
       8B7F08               mov      edi, dword ptr [rdi+8]
       83FF04               cmp      edi, 4
       0F8C74010000         jl       G_M53623_IG18

G_M53623_IG05:
       8B3E                 mov      edi, dword ptr [rsi]
       83F807               cmp      eax, 7
       755D                 jne      SHORT G_M53623_IG06
       837DF806             cmp      dword ptr [rbp-08H], 6 ; bound check
       0F8679010000         jbe      G_M53623_IG20
       488B45F0             mov      rax, bword ptr [rbp-10H]
       0FB64006             movzx    rax, byte  ptr [rax+6]
       48C1E030             shl      rax, 48
       480BF8               or       rdi, rax
       837DF805             cmp      dword ptr [rbp-08H], 5 ; bound check
       0F8660010000         jbe      G_M53623_IG20
       488B45F0             mov      rax, bword ptr [rbp-10H]
       0FB64005             movzx    rax, byte  ptr [rax+5]
       48C1E028             shl      rax, 40
       480BF8               or       rdi, rax
       837DF804             cmp      dword ptr [rbp-08H], 4 ; bound check
       0F8647010000         jbe      G_M53623_IG20
       488B45F0             mov      rax, bword ptr [rbp-10H]
       0FB64004             movzx    rax, byte  ptr [rax+4]
       48C1E020             shl      rax, 32
       480BF8               or       rdi, rax
       48B80000000000000007 mov      rax, 0x700000000000000
       480BF8               or       rdi, rax
       E981000000           jmp      G_M53623_IG09

This can be circumvented with a local like

// ...
ulong key = MemoryMarshal.Read<uint>(propertyName);
ReadOnlySpan<byte> tmp = propertyName;

if (length == 7)
{
	key |= (ulong)tmp[6] << (6 * BitsInByte)
		| (ulong)tmp[5] << (5 * BitsInByte)
		| (ulong)tmp[4] << (4 * BitsInByte)
		| (ulong)7 << (7 * BitsInByte);
}
// ...
asm
G_M53623_IG04:
       83F803               cmp      eax, 3
       0F8ED1000000         jle      G_M53623_IG10
       488D7DF0             lea      rdi, bword ptr [rbp-10H]
       488B37               mov      rsi, bword ptr [rdi]
       8B7F08               mov      edi, dword ptr [rdi+8]
       83FF04               cmp      edi, 4
       0F8C48010000         jl       G_M53623_IG18

G_M53623_IG05:
       8B3E                 mov      edi, dword ptr [rsi]
       488D75F0             lea      rsi, bword ptr [rbp-10H]
       488B16               mov      rdx, bword ptr [rsi]
       8B7608               mov      esi, dword ptr [rsi+8]
       83F807               cmp      eax, 7
       753D                 jne      SHORT G_M53623_IG06
       83FE06               cmp      esi, 6
       0F8644010000         jbe      G_M53623_IG20
       0FB64206             movzx    rax, byte  ptr [rdx+6]
       8BF0                 mov      esi, eax
       48C1E630             shl      rsi, 48
       480BFE               or       rdi, rsi
       0FB64205             movzx    rax, byte  ptr [rdx+5]
       48C1E028             shl      rax, 40
       480BF8               or       rdi, rax
       0FB65204             movzx    rdx, byte  ptr [rdx+4]
       8BC2                 mov      eax, edx
       48C1E020             shl      rax, 32
       480BF8               or       rdi, rax
       48B80000000000000007 mov      rax, 0x700000000000000
       480BF8               or       rdi, rax
       EB6B                 jmp      SHORT G_M53623_IG09

@AndyAyersMS is this known?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No...

cc @dotnet/jit-contrib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end this code just needs to copy from a variable length byte array (up to length 7) to a ulong. I could write a native method using memcpy, but that is overkill since we don't have any native code for System.Text.Json...

Copy link
Member Author

@steveharter steveharter Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there is something strange happening, as this optimization won't kick in here.

I tried all variants

  • Original (acsending order).
  • In descending order (same assembly as original)
  • In descending order + temp variable (didn't see a difference in assembly w.r.t. boundary check)
  • Using switch\case (overhead for setting up switch)

The original and descending order performed best (by inspecting assembly and somewhat verifying benchmark -- close to margin of error). However, I'll use descending order based on possible bounds check optimization.

Copy link
Member

@erozenfeld erozenfeld Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x64 Windows code has no bounds checks in this example. x64 Linux has bounds checks because we are not "promoting" propertyName struct. We are running into this issue:

https://github.com/dotnet/coreclr/blob/9479f67577bbb02ea611777b00308f42252fb2bc/src/jit/lclvars.cpp#L1914-L1926

Incoming multi-reg structs with more than one field are not getting promoted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are not promoting the struct we are not tracking the _length field properly and can't eliminate the bounds checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand it: when I use the tmp, so it is local and the bound checks for [5] and [4] can be elided, as the one check for [6] is done (as expected)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. The bounds check for [6] is not needed either once we are inside if (length==7) but we don't realize that length and tmp._length have the same values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
key = MemoryMarshal.Read<uint>(propertyName);
if (length > 4)

if (length == 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside the other comment, maybe the code can be written as

key = MemoryMarshal.Read<uint>(propertyName);
ReadOnlySpan<byte> tmp = propertyName;

switch (length)
{
    case 7: key |= (ulong)tmp[6] << (6 * BitsInByte); goto case 6;
    case 6: key |= (ulong)tmp[5] << (5 * BitsInByte); goto case 5;
    case 5: key |= (ulong)tmp[4] << (4 * BitsInByte); goto default;
    default: key |= (ulong)length << (7 * BitsInByte); break;
}

as this avoids the repeated steps for the lower indices, the dasm looks quite good, but I haven't perf-tested this variant.

dasm
; Assembly listing for method Program:GetKey(struct):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] ( 11,  6.25)  struct (16) [rbp-0x10]   do-not-enreg[XSFB] addr-exposed ld-addr-op
;  V01 loc0         [V01,T02] (  8,  5   )     int  ->  rdi
;  V02 loc1         [V02,T03] (  6,  3   )    long  ->  rax
;  V03 loc2         [V03,T01] ( 10,  5   )    long  ->  rsi
;* V04 loc3         [V04    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op
;  V05 loc4         [V05,T04] (  6,  3   )    long  ->  rax
;  V06 loc5         [V06,T18] (  2,  1   )    long  ->  rax
;# V07 OutArgs      [V07    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;* V08 tmp1         [V08    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;* V09 tmp2         [V09    ] (  0,  0   )     int  ->  zero-ref    "impAppendStmt"
;* V10 tmp3         [V10    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;  V11 tmp4         [V11,T09] (  2,  2   )   byref  ->  rax         "Inlining Arg"
;* V12 tmp5         [V12    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;* V13 tmp6         [V13    ] (  0,  0   )     int  ->  zero-ref    "impAppendStmt"
;* V14 tmp7         [V14    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;  V15 tmp8         [V15,T10] (  2,  2   )   byref  ->  rsi         "Inlining Arg"
;* V16 tmp9         [V16    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;* V17 tmp10        [V17    ] (  0,  0   )     int  ->  zero-ref    "impAppendStmt"
;* V18 tmp11        [V18    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;  V19 tmp12        [V19,T11] (  2,  2   )   byref  ->   r8         "Inlining Arg"
;  V20 tmp13        [V20,T00] (  6,  7   )    long  ->  rax         "Single return block return value"
;  V21 tmp14        [V21,T07] (  4,  2   )   byref  ->  rdx         V04._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
;  V22 tmp15        [V22,T08] (  4,  2   )     int  ->  rcx         V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;  V23 tmp16        [V23,T19] (  2,  0.75)   byref  ->  rax         V08._pointer(offs=0x00) P-INDEP "field V08._pointer (fldOffset=0x0)"
;  V24 tmp17        [V24,T22] (  2,  0.50)     int  ->  rdi         V08._length(offs=0x08) P-INDEP "field V08._length (fldOffset=0x8)"
;  V25 tmp18        [V25,T15] (  2,  1   )   byref  ->  rax         V10._pointer(offs=0x00) P-INDEP "field V10._pointer (fldOffset=0x0)"
;* V26 tmp19        [V26    ] (  0,  0   )     int  ->  zero-ref    V10._length(offs=0x08) P-INDEP "field V10._length (fldOffset=0x8)"
;  V27 tmp20        [V27,T20] (  2,  0.75)   byref  ->  rsi         V12._pointer(offs=0x00) P-INDEP "field V12._pointer (fldOffset=0x0)"
;  V28 tmp21        [V28,T23] (  2,  0.50)     int  ->  rax         V12._length(offs=0x08) P-INDEP "field V12._length (fldOffset=0x8)"
;  V29 tmp22        [V29,T16] (  2,  1   )   byref  ->  rsi         V14._pointer(offs=0x00) P-INDEP "field V14._pointer (fldOffset=0x0)"
;* V30 tmp23        [V30    ] (  0,  0   )     int  ->  zero-ref    V14._length(offs=0x08) P-INDEP "field V14._length (fldOffset=0x8)"
;  V31 tmp24        [V31,T21] (  2,  0.75)   byref  ->   r8         V16._pointer(offs=0x00) P-INDEP "field V16._pointer (fldOffset=0x0)"
;  V32 tmp25        [V32,T24] (  2,  0.50)     int  ->  rax         V16._length(offs=0x08) P-INDEP "field V16._length (fldOffset=0x8)"
;  V33 tmp26        [V33,T17] (  2,  1   )   byref  ->   r8         V18._pointer(offs=0x00) P-INDEP "field V18._pointer (fldOffset=0x0)"
;* V34 tmp27        [V34    ] (  0,  0   )     int  ->  zero-ref    V18._length(offs=0x08) P-INDEP "field V18._length (fldOffset=0x8)"
;  V35 tmp28        [V35,T12] (  3,  1.50)   byref  ->  rdi         "BlockOp address local"
;  V36 tmp29        [V36,T13] (  3,  1.50)   byref  ->  rax         "BlockOp address local"
;  V37 tmp30        [V37,T05] (  3,  3   )   byref  ->  rax         "BlockOp address local"
;  V38 tmp31        [V38,T14] (  3,  1.50)   byref  ->  rax         "BlockOp address local"
;  V39 rat0         [V39,T06] (  3,  3   )     int  ->   r8         "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 16

G_M53623_IG01:
       55                   push     rbp
       4883EC10             sub      rsp, 16
       488D6C2410           lea      rbp, [rsp+10H]
       48897DF0             mov      bword ptr [rbp-10H], rdi
       488975F8             mov      qword ptr [rbp-08H], rsi

G_M53623_IG02:
       8B7DF8               mov      edi, dword ptr [rbp-08H]
       83FF07               cmp      edi, 7
       7E35                 jle      SHORT G_M53623_IG04
       488D7DF0             lea      rdi, bword ptr [rbp-10H]
       488B07               mov      rax, bword ptr [rdi]
       8B7F08               mov      edi, dword ptr [rdi+8]
       83FF08               cmp      edi, 8
       0F8C3B010000         jl       G_M53623_IG17

G_M53623_IG03:
       488B00               mov      rax, qword ptr [rax]
       48BFFFFFFFFFFFFFFF00 mov      rdi, 0xFFFFFFFFFFFFFF
       4823C7               and      rax, rdi
       48BF0000000000000008 mov      rdi, 0x800000000000000
       480BC7               or       rax, rdi
       E913010000           jmp      G_M53623_IG16

G_M53623_IG04:
       83FF03               cmp      edi, 3
       0F8E91000000         jle      G_M53623_IG10
       488D45F0             lea      rax, bword ptr [rbp-10H]
       488B30               mov      rsi, bword ptr [rax]
       8B4008               mov      eax, dword ptr [rax+8]
       83F804               cmp      eax, 4
       0F8C08010000         jl       G_M53623_IG18

G_M53623_IG05:
       8B06                 mov      eax, dword ptr [rsi]
       8BF0                 mov      esi, eax
       488D45F0             lea      rax, bword ptr [rbp-10H]
       488B10               mov      rdx, bword ptr [rax]
       8B4808               mov      ecx, dword ptr [rax+8]
       448D47FB             lea      r8d, [rdi-5]
       4183F802             cmp      r8d, 2
       7757                 ja       SHORT G_M53623_IG09
       418BC0               mov      eax, r8d
       4C8D0503010000       lea      r8, [reloc @RWD00]
       458B0480             mov      r8d, dword ptr [r8+4*rax]
       4C8D0D7AFFFFFF       lea      r9, G_M53623_IG02
       4D03C1               add      r8, r9
       41FFE0               jmp      r8

G_M53623_IG06:
       83F906               cmp      ecx, 6
       0F86E2000000         jbe      G_M53623_IG20
       0FB64206             movzx    rax, byte  ptr [rdx+6]
       48C1E030             shl      rax, 48
       480BF0               or       rsi, rax

G_M53623_IG07:
       83F905               cmp      ecx, 5
       0F86CE000000         jbe      G_M53623_IG20
       0FB64205             movzx    rax, byte  ptr [rdx+5]
       48C1E028             shl      rax, 40
       480BF0               or       rsi, rax

G_M53623_IG08:
       83F904               cmp      ecx, 4
       0F86BA000000         jbe      G_M53623_IG20
       0FB64204             movzx    rax, byte  ptr [rdx+4]
       48C1E020             shl      rax, 32
       480BF0               or       rsi, rax

G_M53623_IG09:
       4863FF               movsxd   rdi, edi
       48C1E738             shl      rdi, 56
       480BF7               or       rsi, rdi
       488BC6               mov      rax, rsi
       EB79                 jmp      SHORT G_M53623_IG16

G_M53623_IG10:
       83FF01               cmp      edi, 1
       7E51                 jle      SHORT G_M53623_IG14
       488D45F0             lea      rax, bword ptr [rbp-10H]
       4C8B00               mov      r8, bword ptr [rax]
       8B4008               mov      eax, dword ptr [rax+8]
       83F802               cmp      eax, 2
       0F8C7D000000         jl       G_M53623_IG19

G_M53623_IG11:
       410FB700             movzx    rax, word  ptr [r8]
       83FF03               cmp      edi, 3
       7526                 jne      SHORT G_M53623_IG12
       837DF802             cmp      dword ptr [rbp-08H], 2
       7679                 jbe      SHORT G_M53623_IG20
       488B7DF0             mov      rdi, bword ptr [rbp-10H]
       0FB64F02             movzx    rcx, byte  ptr [rdi+2]
       8BD1                 mov      edx, ecx
       48C1E210             shl      rdx, 16
       480BC2               or       rax, rdx
       48BE0000000000000003 mov      rsi, 0x300000000000000
       480BC6               or       rax, rsi
       EB0D                 jmp      SHORT G_M53623_IG13

G_M53623_IG12:
       48BF0000000000000002 mov      rdi, 0x200000000000000
       480BC7               or       rax, rdi

G_M53623_IG13:
       EB23                 jmp      SHORT G_M53623_IG16

G_M53623_IG14:
       83FF01               cmp      edi, 1
       751C                 jne      SHORT G_M53623_IG15
       837DF800             cmp      dword ptr [rbp-08H], 0
       763F                 jbe      SHORT G_M53623_IG20
       488B45F0             mov      rax, bword ptr [rbp-10H]
       0FB600               movzx    rax, byte  ptr [rax]
       48BF0000000000000001 mov      rdi, 0x100000000000000
       480BC7               or       rax, rdi
       EB02                 jmp      SHORT G_M53623_IG16

G_M53623_IG15:
       33C0                 xor      rax, rax

G_M53623_IG16:
       488D6500             lea      rsp, [rbp]
       5D                   pop      rbp
       C3                   ret

G_M53623_IG17:
       BF28000000           mov      edi, 40
       E8DEAEFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)
       CC                   int3

G_M53623_IG18:
       BF28000000           mov      edi, 40
       E8D3AEFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)
       CC                   int3

G_M53623_IG19:
       BF28000000           mov      edi, 40
       E8C8AEFFFF           call     ThrowHelper:ThrowArgumentOutOfRangeException(int)
       CC                   int3

G_M53623_IG20:
       E8526C3679           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

RWD00  dd   000000B4h ; case G_M53623_IG08
       dd   000000A0h ; case G_M53623_IG07
       dd   0000008Ch ; case G_M53623_IG06

; Total bytes of code 399, prolog size 10 for method Program:GetKey(struct):long
; ============================================================

@steveharter
Copy link
Member Author

If I am reading the data correctly, I see case-sensitive being faster than case-insensitive, not slower. Which is it?

Fixed that typo. Yes case sensitvity is faster due to the linked issue. If that linked issue is fixed then case insensitivity is just as fast as case sensitive iff the JSON is always consistent w.r.t. casing meaning case insensitivity can become slower when multiple JSON inputs have different casing because each variant is cached (up to 64 entries).

@steveharter
Copy link
Member Author

Test failure on Windows Build x86_Release not related:

  Starting:    System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 4)
Fatal error. Internal CLR error. (0x80131506)

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants