Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up named type lookups in managed type system #84285

Merged
merged 6 commits into from
Apr 5, 2023

Conversation

MichalStrehovsky
Copy link
Member

@MichalStrehovsky MichalStrehovsky commented Apr 4, 2023

When the type system needs to resolve a named type in a module, it will do a foreach loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).

But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds.

This adds a hashtable to the lookups.

Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib

When the type system needs to resolve a named type in a module, it will do a `foreach` loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).

But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds.

This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds (we'll want to parallelize the MIBC loading on a background thread to get rid of the last mile, but first things first).
@ghost
Copy link

ghost commented Apr 4, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

When the type system needs to resolve a named type in a module, it will do a foreach loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).

But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds.

I'm opening this as a draft because I'd like to get feedback. Here's the TODO aspects:

  • Currently using HashCode struct to compute hash codes. Might not be super optimal, especially because we do hash-a-byte hash-a-byte hash-a-byte-byte-byte. I'd like to avoid code duplication if possible: we need to compute this for both UTF-8 and UTF-16 encoded strings and come up with the same numbers.
  • The hash buckets are done with a valuetype arrays and a "linked list" in it. This is either clever or stupid. I don't know if there's a built-in dictionary type that could express what we need.
  • We'll want to do the same hashtable optimization for exported type lookups. I'm thinking making things generic and adding two structs that implement interface and then do where T : struct, IInterface.

Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib

Author: MichalStrehovsky
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Apr 4, 2023

Would it be simpler and faster to just materialize all strings as string and use regular Dictionary for the lookup? It uses a bit more memory, but not that much more memory.

@jkotas
Copy link
Member

jkotas commented Apr 4, 2023

Also, would it be better to use the split IBC data per assembly, like what we do for R2R?

Reading raw MIBC files make sense when they closely match the app. It makes less sense when the MIBC files represent some generic profile and 90%+ of the data is irrelevant to the app being compiled.

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me, thanks Michal! The only thing I would be somewhat concerned about is the initial perf cost of building the hashtable, that might pop up as startup slowdown. Could we possibly amortize the cost by building the hashtable incrementally during the first FindType so that we'd stop once we've found the type and only continue the next time FindType fails for the same module? Not sure if the more complex parallelism wouldn't make matters even worse but it's something we could consider if we hit perf issues with your change in its current form.

@MichalStrehovsky
Copy link
Member Author

Would it be simpler and faster to just materialize all strings as string and use regular Dictionary for the lookup? It uses a bit more memory, but not that much more memory.

It would be simpler, but it doesn't look faster. After adding antivirus exclusion for ilc.exe and the folders around, and carefully listening to when CPU fans to avoid thermal throttling, I got somewhat stable perf numbers for compiling BasicWebApi:

Baseline: 10.451 s, 10.542 s, 10.513 s, 10.330 s, 10.292 s
PR: 10.131 s, 10.262 s, 10.242 s, 10.276 s, 10.287 s
Dictionary<(string, string), TypeDefinitionHandle>: 10.365 s, 10.309 s, 10.200 s, 10.364 s, 10.245 s

And this is for the PR version that hashes things a lot worse than what we can do if it's done properly (not byte-by-byte).

We can go with dictionary, but it's going to materialize more strings than we typically need. Right now this is amortized by NameMangler also materializing all the strings in the assembly, but that's something on my todo list to fix at some point (i.e. it's only going to get worse).

Maybe go with the dictionary now but leave a TODO that this is leaving perf on the table?

The only thing I would be somewhat concerned about is the initial perf cost of building the hashtable, that might pop up as startup slowdown

Yeah, there's a risk that if there's an assembly with lots of types and we only need a small handful, we might not be able to pay for the hashtable building. On average the type we're looking for is going to be in the middle, so on average we can only save half of the work, even if only one type was accessed. But this would add more synchronization concerns - this needs to be threadsafe. And at the point we'd add locking it might end up being a wash.

@MichalStrehovsky
Copy link
Member Author

Also, would it be better to use the split IBC data per assembly, like what we do for R2R?

Can you send me a pointer? I don't know what this refers to.

@jkotas
Copy link
Member

jkotas commented Apr 5, 2023

Can you send me a pointer? I don't know what this refers to.

--embed-pgo-data command line option for crossgen2:

public Option<bool> EmbedPgoData { get; } =
. Implemented by https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/InstrumentationDataTableNode.cs .

It indexes the profile data to allows us to quickly find the profile data for given method without reading everything.

@jkotas
Copy link
Member

jkotas commented Apr 5, 2023

Maybe go with the dictionary now but leave a TODO that this is leaving perf on the table?

Sounds good to me.

foreach (ExportedTypeHandle exportedTypeHandle in metadataReader.ExportedTypes)
{
ExportedType exportedType = metadataReader.GetExportedType(exportedTypeHandle);
if (exportedType.Implementation.Kind == HandleKind.ExportedType)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a bugfix.

@MichalStrehovsky
Copy link
Member Author

It indexes the profile data to allows us to quickly find the profile data for given method without reading everything.

We'll be able to read this information when #67080 is done - but currently the framework ILC sees is not R2R compiled.

@jkotas
Copy link
Member

jkotas commented Apr 5, 2023

We'll be able to read this information when #67080 is done - but currently the framework ILC sees is not R2R compiled.

The most impactful profile data are attached to CoreLib that won't be automatically covered by this.

// so we can obtain the bytes again when needed.
// E.g. see the scheme explored in the first commit of https://github.com/dotnet/runtime/pull/84285.

var result = new Dictionary<(string Name, string Namespace), EntityHandle>();
Copy link
Member

@jkotas jkotas Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care enough, this can be made more efficient by:

  • Pre-sizing the dictionary - either conservatively using total counts of TypeDefinitionHandle and ExportedTypeHandle tables, or precisely by doing a quick pass over these tables to see how many entries we are actually going to need.
  • Using a custom struct with faster non-randomized GetHashCode algorithm (e.g. use the FNV hashcode implementation from Roslyn)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-sizing the dictionary can help a lot with GC throughput in my experience. That may matter in the case that we keep the code that does a GC before handing off to the object writer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go the custom hasher route, it's not that much extra work to do the thing I did in the first commit and also get rid of the dictionary in the first place.

We could also explore FrozenDictionary although it probably won't work for netstandard2.0 right now.

I don't have big concerns based on the perf numbers I collected. This is an improvement even when not reading MIBC (the perf numbers with Dictionary were collected without MIBC).

@MichalStrehovsky MichalStrehovsky merged commit 18e2c5f into dotnet:main Apr 5, 2023
@MichalStrehovsky MichalStrehovsky deleted the gettypecache branch April 5, 2023 23:58
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants