Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize asm size for the biggest corelib method - WebUtility..cctor #48906

Merged
merged 4 commits into from
Mar 3, 2021

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 1, 2021

According to the jit-diff tool WebUtility..cctor is the largest method (with a noticeable delay to compile it) - it's triggered when WebUtility.HtmlDecode public API is called.

This PR shrinks s_lookupTable codegen from 17605 bytes to 10083 277 bytes.

I could use source gen here since we already use them for eventpipe stuff but I don't think it worth the complexity.

@ghost
Copy link

ghost commented Mar 1, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

According to the jit-diff tool WebUtility..cctor is the largest method (with a noticeable delay to compile it) - it's triggered when WebUtility.HtmlDecode public API is called.

This PR shrinks s_lookupTable codegen from to 17605 bytes to 10083 bytes which is still a lot for an immutable compile-time known dictionary - I wonder how we can improve the performance for such cases in general.

I could use source gen here since we already use them for eventpipe stuff but I don't think it worth the complexity.

Author: EgorBo
Assignees: -
Labels:

area-System.Net

Milestone: -

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Mar 1, 2021

Splitting the data into two static ReadOnlySpans (one for keys, one for values) would probably be the most efficient, space-wise, and codegen-wise. Then just do a for loop, indexing into both spans and adding into dictionary.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 1, 2021

Splitting the data into two static ReadOnlySpans (one for keys, one for values) would probably be the most efficient, space-wise, and codegen-wise. Then just do a for loop, indexing into both spans and adding into dictionary.

Nice idea, wonder if it's possible to hide those two spans under the hood of the source gen then and let the user to specify any type of such immutable dictionaries

@MichalStrehovsky
Copy link
Member

Ah, I just realized that the static ReadOnlySpan trick won't work because these are ints (the static ReadOnlySpan currently only works for bytes due to endianness issues with anything bigger than that). Might as well just make them arrays. The storage for these will be similarly optimal but we'll have a useless allocation and a memcpy operation into the array.

@stephentoub
Copy link
Member

Yeah, we need #24961, if anyone wants to tackle that in the runtime and compiler. :-)

@jkotas
Copy link
Member

jkotas commented Mar 1, 2021

You can have just one Span of bytes with the data and do the endianness conversion yourself.

         ReadOnlySpan<byte> data = ...

         var d = new Dictionary<long, char>(data.Length / (sizeof(long) + sizeof(char)));
         while (!data.IsEmpty)
         {
              d[BinaryPrimitives.ReadInt64LittleEndian(data)] = (char)BinaryPrimitives.ReadInt16LittleEndian(data.Slice(sizeof(long)));
              data = data.Slice(sizeof(long) + sizeof(char));
         }

@jkotas
Copy link
Member

jkotas commented Mar 1, 2021

Or if you wanted to optimize the heck out of this, you can create a perfect or nearly perfect hash function for this set and use the ReadOnlySpan<byte> directly without copying it into Dictionary. The lookup function would be then just:

         ReadOnlySpan<byte> data = ...

         long key = ToUInt64Key(entity);
         int index = (sizeof(long) + sizeof(char)) * PerfectHash(key);
         if (BinaryPrimitives.ReadInt64LittleEndian(data.Slice(index)) == key) return BinaryPrimitives.ReadInt64LittleEndian(data.Slice(index + sizeof(long)));

0x73, 0x6D, 0x61, 0x69, 0x64, 0x00, 0x00, 0x00 /*ToUInt64Key("diams")*/ , 0x66, 0x26, /*'\x2666'*/
};

var dictionary = new Dictionary<ulong, char>(tableData.Length / (sizeof(ulong) + sizeof(char)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jan was suggesting not using Dictionary at all, and instead a perfect hash over a buffer.

If you use what you're doing here, looks like you can remove the byte in the 8th column as it's always zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this code is only executed once, my goal was to decrease the native size (from 17605 bytes to only 277 bytes!) using the first Jan's suggestion. If you insist on the second more performant one I can change!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think this is fine, I was secretly hoping you would do it only because it would be interesting 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And kudos for the huge improvemnet. Are there any more of these big static dictionaries I wonder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another, but likely not on a hot path:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look, thanks!

@EgorBo EgorBo merged commit 4586bc8 into dotnet:main Mar 3, 2021
@danmoseley danmoseley added the size-reduction Issues impacting final app size primary for size sensitive workloads label Mar 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 2, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net size-reduction Issues impacting final app size primary for size sensitive workloads
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants