Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

Merged
merged 3 commits into from
Apr 21, 2024

Conversation

stephentoub
Copy link
Member

Two changes:

  1. We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that.
  2. When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.

Benchmark:

[Benchmark]
public FrozenSet<string> Create() => FrozenSet.ToFrozenSet(s_inputs, StringComparer.OrdinalIgnoreCase);

where s_inputs is ~1500 file paths (the particular set of values seen in dotnet/roslyn#72995).

.NET Core

Method Toolchain Mean Allocated
Create \main\corerun.exe 4.663 ms 150.48 KB
Create \pr\corerun.exe 3.328 ms 150.48 KB

.NET Framework

Method Toolchain Mean Allocated
Create \main\corerun.exe 26.49 ms 9.98 MB
Create \pr\corerun.exe 9.226 ms 185.54 KB

Two changes:
1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that.
2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@CyrusNajmabadi
Copy link
Member

Would it be good to have some tests that have different combos of ascii/non-ascii keys, and then lookups with ascii/non-ascii args?

@stephentoub
Copy link
Member Author

Would it be good to have some tests that have different combos of ascii/non-ascii keys, and then lookups with ascii/non-ascii args?

There are some already. We could add more if there's a gap.

@LoopedBard3
Copy link
Member

Potential regression:
Windows x64: dotnet/perf-autofiling-issues#33052

Seemingly associated improvement:
Windows x64: dotnet/perf-autofiling-issues#33072

@stephentoub
Copy link
Member Author

Potential regression
Seemingly associated improvement

I don't believe either of those is related. For the possible regression, this change only impacted the code paths of constructing the dictionary but not actually then looking things up in the dictionary, but the test is only exercising the latter. And the improvements aren't using frozen collections.

matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
…I values (dotnet#100998)

* Reduce frozen collection overheads for ignore-case with ASCII values

Two changes:
1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that.
2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.

* Address PR feedback
@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants