This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improve deserialization perf with changes to property name lookup #40998
Improve deserialization perf with changes to property name lookup #40998
Changes from 2 commits
f65b726
d912a91
491b466
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some evidence to suggest that this forward/backwards search will be beneficial in practice, for perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we init key to 0 up front, can we lose the last else branch? Might be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would cause an extra initialization for the other cases. (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's better to avoid branches, and assignment isn't very expensive.
Also, without the extra branch, we produce smaller assembly:
sharplab.io - with ints
sharplab.io - with ulong
sharplab.io - whole method
It's 1 fewer instruction if you init up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside the other comment, maybe the code can be written as
as this avoids the repeated steps for the lower indices, the dasm looks quite good, but I haven't perf-tested this variant.
dasm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write the highest index at first, so the JIT will emit only one bound check, instead of one for every indexed access to the span.
So
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there is something strange happening, as this optimization won't kick in here.
It seems that
MemoryMarshal.Read<uint>(propertyName)
causes the JIT not to elide the subsequent bound checks after the access to index 6.(at least when I remove the call to MemoryMarshal the optimization kicks in as expected).
asm
This can be circumvented with a local like
asm
@AndyAyersMS is this known?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No...
cc @dotnet/jit-contrib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end this code just needs to copy from a variable length byte array (up to length 7) to a
ulong
. I could write a native method using memcpy, but that is overkill since we don't have any native code for System.Text.Json...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried all variants
The original and descending order performed best (by inspecting assembly and somewhat verifying benchmark -- close to margin of error). However, I'll use descending order based on possible bounds check optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x64 Windows code has no bounds checks in this example. x64 Linux has bounds checks because we are not "promoting" propertyName struct. We are running into this issue:
https://github.com/dotnet/coreclr/blob/9479f67577bbb02ea611777b00308f42252fb2bc/src/jit/lclvars.cpp#L1914-L1926
Incoming multi-reg structs with more than one field are not getting promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are not promoting the struct we are not tracking the _length field properly and can't eliminate the bounds checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand it: when I use the
tmp
, so it is local and the bound checks for[5]
and[4]
can be elided, as the one check for[6]
is done (as expected)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The bounds check for
[6]
is not needed either once we are insideif (length==7)
but we don't realize thatlength
andtmp._length
have the same values.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened https://github.com/dotnet/coreclr/issues/26710 to track this. The item is also in the list in first-class-structs roadmap document https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md#improve-struct-promotion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub, @jkotas - Given our previous discussions on the use of this attribute, what are your thoughts on the use of
AggressiveInlining
in cases like these, where there are very few (1-2) callers of somewhat large methods, and benchmarks show improvements for adding them?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases like this, it may be ok if you know what you are doing. There are a lot of cases where it can fire back - regressing the rest of the calling method, hitting JIT complexity thresholds and disabling optimizations, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this method and similar ones is for readability \ understandability. Otherwise the method would not exist and code would just be inline, so I suppose that is another option -- to just push the implementation down to the caller.