Speed up named type lookups in managed type system #84285

MichalStrehovsky · 2023-04-04T07:47:32Z

When the type system needs to resolve a named type in a module, it will do a foreach loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).

But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds.

This adds a hashtable to the lookups.

Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib

When the type system needs to resolve a named type in a module, it will do a `foreach` loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO). But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds (we'll want to parallelize the MIBC loading on a background thread to get rid of the last mile, but first things first).

ghost · 2023-04-04T07:47:41Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

When the type system needs to resolve a named type in a module, it will do a foreach loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).

But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds.

I'm opening this as a draft because I'd like to get feedback. Here's the TODO aspects:

Currently using HashCode struct to compute hash codes. Might not be super optimal, especially because we do hash-a-byte hash-a-byte hash-a-byte-byte-byte. I'd like to avoid code duplication if possible: we need to compute this for both UTF-8 and UTF-16 encoded strings and come up with the same numbers.
The hash buckets are done with a valuetype arrays and a "linked list" in it. This is either clever or stupid. I don't know if there's a built-in dictionary type that could express what we need.
We'll want to do the same hashtable optimization for exported type lookups. I'm thinking making things generic and adding two structs that implement interface and then do where T : struct, IInterface.

Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib

Author:	MichalStrehovsky
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

jkotas · 2023-04-04T13:58:31Z

Would it be simpler and faster to just materialize all strings as string and use regular Dictionary for the lookup? It uses a bit more memory, but not that much more memory.

jkotas · 2023-04-04T14:17:59Z

Also, would it be better to use the split IBC data per assembly, like what we do for R2R?

Reading raw MIBC files make sense when they closely match the app. It makes less sense when the MIBC files represent some generic profile and 90%+ of the data is irrelevant to the app being compiled.

trylek

Overall looks great to me, thanks Michal! The only thing I would be somewhat concerned about is the initial perf cost of building the hashtable, that might pop up as startup slowdown. Could we possibly amortize the cost by building the hashtable incrementally during the first FindType so that we'd stop once we've found the type and only continue the next time FindType fails for the same module? Not sure if the more complex parallelism wouldn't make matters even worse but it's something we could consider if we hit perf issues with your change in its current form.

MichalStrehovsky · 2023-04-05T02:09:50Z

Would it be simpler and faster to just materialize all strings as string and use regular Dictionary for the lookup? It uses a bit more memory, but not that much more memory.

It would be simpler, but it doesn't look faster. After adding antivirus exclusion for ilc.exe and the folders around, and carefully listening to when CPU fans to avoid thermal throttling, I got somewhat stable perf numbers for compiling BasicWebApi:

Baseline: 10.451 s, 10.542 s, 10.513 s, 10.330 s, 10.292 s
PR: 10.131 s, 10.262 s, 10.242 s, 10.276 s, 10.287 s
Dictionary<(string, string), TypeDefinitionHandle>: 10.365 s, 10.309 s, 10.200 s, 10.364 s, 10.245 s

And this is for the PR version that hashes things a lot worse than what we can do if it's done properly (not byte-by-byte).

We can go with dictionary, but it's going to materialize more strings than we typically need. Right now this is amortized by NameMangler also materializing all the strings in the assembly, but that's something on my todo list to fix at some point (i.e. it's only going to get worse).

Maybe go with the dictionary now but leave a TODO that this is leaving perf on the table?

The only thing I would be somewhat concerned about is the initial perf cost of building the hashtable, that might pop up as startup slowdown

Yeah, there's a risk that if there's an assembly with lots of types and we only need a small handful, we might not be able to pay for the hashtable building. On average the type we're looking for is going to be in the middle, so on average we can only save half of the work, even if only one type was accessed. But this would add more synchronization concerns - this needs to be threadsafe. And at the point we'd add locking it might end up being a wash.

MichalStrehovsky · 2023-04-05T02:16:30Z

Also, would it be better to use the split IBC data per assembly, like what we do for R2R?

Can you send me a pointer? I don't know what this refers to.

jkotas · 2023-04-05T02:27:34Z

Can you send me a pointer? I don't know what this refers to.

--embed-pgo-data command line option for crossgen2:

runtime/src/coreclr/tools/aot/crossgen2/Crossgen2RootCommand.cs

Line 58 in 3c74d65

public Option<bool> EmbedPgoData { get; } =

. Implemented by https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/InstrumentationDataTableNode.cs .

It indexes the profile data to allows us to quickly find the profile data for given method without reading everything.

jkotas · 2023-04-05T02:41:23Z

Maybe go with the dictionary now but leave a TODO that this is leaving perf on the table?

Sounds good to me.

This reverts commit 2e40a84.

src/coreclr/tools/Common/TypeSystem/Ecma/EcmaModule.cs

MichalStrehovsky · 2023-04-05T05:04:49Z

src/coreclr/tools/Common/TypeSystem/Ecma/EcmaModule.cs

+            foreach (ExportedTypeHandle exportedTypeHandle in metadataReader.ExportedTypes)
+            {
+                ExportedType exportedType = metadataReader.GetExportedType(exportedTypeHandle);
+                if (exportedType.Implementation.Kind == HandleKind.ExportedType)


This line is a bugfix.

MichalStrehovsky · 2023-04-05T05:16:54Z

It indexes the profile data to allows us to quickly find the profile data for given method without reading everything.

We'll be able to read this information when #67080 is done - but currently the framework ILC sees is not R2R compiled.

jkotas · 2023-04-05T06:52:25Z

We'll be able to read this information when #67080 is done - but currently the framework ILC sees is not R2R compiled.

The most impactful profile data are attached to CoreLib that won't be automatically covered by this.

jkotas · 2023-04-05T14:23:32Z

src/coreclr/tools/Common/TypeSystem/Ecma/EcmaModule.cs

+            // so we can obtain the bytes again when needed.
+            // E.g. see the scheme explored in the first commit of https://github.com/dotnet/runtime/pull/84285.
+
+            var result = new Dictionary<(string Name, string Namespace), EntityHandle>();


If we care enough, this can be made more efficient by:

Pre-sizing the dictionary - either conservatively using total counts of TypeDefinitionHandle and ExportedTypeHandle tables, or precisely by doing a quick pass over these tables to see how many entries we are actually going to need.

Using a custom struct with faster non-randomized GetHashCode algorithm (e.g. use the FNV hashcode implementation from Roslyn)

Pre-sizing the dictionary can help a lot with GC throughput in my experience. That may matter in the case that we keep the code that does a GC before handing off to the object writer.

If we go the custom hasher route, it's not that much extra work to do the thing I did in the first commit and also get rid of the dictionary in the first place.

We could also explore FrozenDictionary although it probably won't work for netstandard2.0 right now.

I don't have big concerns based on the perf numbers I collected. This is an improvement even when not reading MIBC (the perf numbers with Dictionary were collected without MIBC).

MichalStrehovsky added the area-NativeAOT-coreclr label Apr 4, 2023

ghost assigned MichalStrehovsky Apr 4, 2023

build-analysis bot mentioned this pull request Apr 4, 2023

Arm32: System.Threading.Tasks.Dataflow.Tests failing with NRE #80857

Closed

trylek reviewed Apr 4, 2023

View reviewed changes

build-analysis bot mentioned this pull request Apr 4, 2023

CI System.Text.RegularExpressions.Tests failure #84323

Closed

Naïve cache

a49ec78

MichalStrehovsky added 2 commits April 5, 2023 13:26

Revert "Speed up named type lookups"

a5d9788

This reverts commit 2e40a84.

Merge branch 'naive' into gettypecache

214e6ff

danmoseley reviewed Apr 5, 2023

View reviewed changes

src/coreclr/tools/Common/TypeSystem/Ecma/EcmaModule.cs Outdated Show resolved Hide resolved

Add named type lookup cache

156f22c

MichalStrehovsky commented Apr 5, 2023

View reviewed changes

Oops

4cfc2e0

jkotas reviewed Apr 5, 2023

View reviewed changes

jkotas approved these changes Apr 5, 2023

View reviewed changes

MichalStrehovsky merged commit 18e2c5f into dotnet:main Apr 5, 2023

MichalStrehovsky deleted the gettypecache branch April 5, 2023 23:58

marek-safar mentioned this pull request Apr 6, 2023

[main] Update dependencies from 10 repositories #83624

Merged

ghost locked as resolved and limited conversation to collaborators May 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up named type lookups in managed type system #84285

Speed up named type lookups in managed type system #84285

MichalStrehovsky commented Apr 4, 2023 •

edited

Loading

ghost commented Apr 4, 2023

jkotas commented Apr 4, 2023

jkotas commented Apr 4, 2023

trylek left a comment

MichalStrehovsky commented Apr 5, 2023

MichalStrehovsky commented Apr 5, 2023

jkotas commented Apr 5, 2023

jkotas commented Apr 5, 2023

MichalStrehovsky Apr 5, 2023

MichalStrehovsky commented Apr 5, 2023

jkotas commented Apr 5, 2023

jkotas Apr 5, 2023 •

edited

Loading

agocke Apr 5, 2023

MichalStrehovsky Apr 5, 2023

Speed up named type lookups in managed type system #84285

Speed up named type lookups in managed type system #84285

Conversation

MichalStrehovsky commented Apr 4, 2023 • edited Loading

ghost commented Apr 4, 2023

jkotas commented Apr 4, 2023

jkotas commented Apr 4, 2023

trylek left a comment

Choose a reason for hiding this comment

MichalStrehovsky commented Apr 5, 2023

MichalStrehovsky commented Apr 5, 2023

jkotas commented Apr 5, 2023

jkotas commented Apr 5, 2023

MichalStrehovsky Apr 5, 2023

Choose a reason for hiding this comment

MichalStrehovsky commented Apr 5, 2023

jkotas commented Apr 5, 2023

jkotas Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

agocke Apr 5, 2023

Choose a reason for hiding this comment

MichalStrehovsky Apr 5, 2023

Choose a reason for hiding this comment

MichalStrehovsky commented Apr 4, 2023 •

edited

Loading

jkotas Apr 5, 2023 •

edited

Loading