-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce frozen collection creation overheads for ignore-case with ASCII values #100998
Conversation
Two changes: 1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that. 2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.
Tagging subscribers to this area: @dotnet/area-system-collections |
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs
Show resolved
Hide resolved
Would it be good to have some tests that have different combos of ascii/non-ascii keys, and then lookups with ascii/non-ascii args? |
There are some already. We could add more if there's a gap. |
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs
Show resolved
Hide resolved
Potential regression: Seemingly associated improvement: |
I don't believe either of those is related. For the possible regression, this change only impacted the code paths of constructing the dictionary but not actually then looking things up in the dictionary, but the test is only exercising the latter. And the improvements aren't using frozen collections. |
…I values (dotnet#100998) * Reduce frozen collection overheads for ignore-case with ASCII values Two changes: 1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that. 2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core. * Address PR feedback
Two changes:
Benchmark:
where s_inputs is ~1500 file paths (the particular set of values seen in dotnet/roslyn#72995).
.NET Core
.NET Framework