-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up named type lookups in managed type system #84285
Conversation
When the type system needs to resolve a named type in a module, it will do a `foreach` loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO). But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds (we'll want to parallelize the MIBC loading on a background thread to get rid of the last mile, but first things first).
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsWhen the type system needs to resolve a named type in a module, it will do a But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds. I'm opening this as a draft because I'd like to get feedback. Here's the TODO aspects:
Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib
|
Would it be simpler and faster to just materialize all strings as |
Also, would it be better to use the split IBC data per assembly, like what we do for R2R? Reading raw MIBC files make sense when they closely match the app. It makes less sense when the MIBC files represent some generic profile and 90%+ of the data is irrelevant to the app being compiled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great to me, thanks Michal! The only thing I would be somewhat concerned about is the initial perf cost of building the hashtable, that might pop up as startup slowdown. Could we possibly amortize the cost by building the hashtable incrementally during the first FindType so that we'd stop once we've found the type and only continue the next time FindType fails for the same module? Not sure if the more complex parallelism wouldn't make matters even worse but it's something we could consider if we hit perf issues with your change in its current form.
It would be simpler, but it doesn't look faster. After adding antivirus exclusion for ilc.exe and the folders around, and carefully listening to when CPU fans to avoid thermal throttling, I got somewhat stable perf numbers for compiling BasicWebApi: Baseline: 10.451 s, 10.542 s, 10.513 s, 10.330 s, 10.292 s And this is for the PR version that hashes things a lot worse than what we can do if it's done properly (not byte-by-byte). We can go with dictionary, but it's going to materialize more strings than we typically need. Right now this is amortized by NameMangler also materializing all the strings in the assembly, but that's something on my todo list to fix at some point (i.e. it's only going to get worse). Maybe go with the dictionary now but leave a TODO that this is leaving perf on the table?
Yeah, there's a risk that if there's an assembly with lots of types and we only need a small handful, we might not be able to pay for the hashtable building. On average the type we're looking for is going to be in the middle, so on average we can only save half of the work, even if only one type was accessed. But this would add more synchronization concerns - this needs to be threadsafe. And at the point we'd add locking it might end up being a wash. |
Can you send me a pointer? I don't know what this refers to. |
It indexes the profile data to allows us to quickly find the profile data for given method without reading everything. |
Sounds good to me. |
This reverts commit 2e40a84.
foreach (ExportedTypeHandle exportedTypeHandle in metadataReader.ExportedTypes) | ||
{ | ||
ExportedType exportedType = metadataReader.GetExportedType(exportedTypeHandle); | ||
if (exportedType.Implementation.Kind == HandleKind.ExportedType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bugfix.
We'll be able to read this information when #67080 is done - but currently the framework ILC sees is not R2R compiled. |
The most impactful profile data are attached to CoreLib that won't be automatically covered by this. |
// so we can obtain the bytes again when needed. | ||
// E.g. see the scheme explored in the first commit of https://github.com/dotnet/runtime/pull/84285. | ||
|
||
var result = new Dictionary<(string Name, string Namespace), EntityHandle>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we care enough, this can be made more efficient by:
- Pre-sizing the dictionary - either conservatively using total counts of
TypeDefinitionHandle
andExportedTypeHandle
tables, or precisely by doing a quick pass over these tables to see how many entries we are actually going to need. - Using a custom struct with faster non-randomized GetHashCode algorithm (e.g. use the FNV hashcode implementation from Roslyn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-sizing the dictionary can help a lot with GC throughput in my experience. That may matter in the case that we keep the code that does a GC before handing off to the object writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go the custom hasher route, it's not that much extra work to do the thing I did in the first commit and also get rid of the dictionary in the first place.
We could also explore FrozenDictionary
although it probably won't work for netstandard2.0 right now.
I don't have big concerns based on the perf numbers I collected. This is an improvement even when not reading MIBC (the perf numbers with Dictionary were collected without MIBC).
When the type system needs to resolve a named type in a module, it will do a
foreach
loop over all types in the module looking for the type. This can get mildly hot and I've seen it in CPU profiles but it never looked too important to address (despite the TODO).But when MIBC files are passed to the compiler, this gets ridiculously hot. Compile Hello world by default: 0.98 seconds. Compile hello world with 5 MIBC files: 9.1 seconds. This adds a hashtable to the lookup and drops the MIBC case to 1.4 seconds.
This adds a hashtable to the lookups.
Cc @dotnet/ilc-contrib @dotnet/crossgen-contrib