Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

stephentoub · 2024-04-13T01:48:41Z

Two changes:

We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that.
When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.

Benchmark:

[Benchmark]
public FrozenSet<string> Create() => FrozenSet.ToFrozenSet(s_inputs, StringComparer.OrdinalIgnoreCase);

where s_inputs is ~1500 file paths (the particular set of values seen in dotnet/roslyn#72995).

.NET Core

Method	Toolchain	Mean	Allocated
Create	\main\corerun.exe	4.663 ms	150.48 KB
Create	\pr\corerun.exe	3.328 ms	150.48 KB

.NET Framework

Method	Toolchain	Mean	Allocated
Create	\main\corerun.exe	26.49 ms	9.98 MB
Create	\pr\corerun.exe	9.226 ms	185.54 KB

Two changes: 1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that. 2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core.

dotnet-policy-service · 2024-04-13T01:49:05Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs

CyrusNajmabadi · 2024-04-13T01:57:34Z

Would it be good to have some tests that have different combos of ascii/non-ascii keys, and then lookups with ascii/non-ascii args?

stephentoub · 2024-04-13T02:08:22Z

Would it be good to have some tests that have different combos of ascii/non-ascii keys, and then lookups with ascii/non-ascii args?

There are some already. We could add more if there's a gap.

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs

LoopedBard3 · 2024-04-25T06:16:46Z

Potential regression:
Windows x64: dotnet/perf-autofiling-issues#33052

Seemingly associated improvement:
Windows x64: dotnet/perf-autofiling-issues#33072

stephentoub · 2024-04-25T13:53:07Z

Potential regression
Seemingly associated improvement

I don't believe either of those is related. For the possible regression, this change only impacted the code paths of constructing the dictionary but not actually then looking things up in the dictionary, but the test is only exercising the latter. And the improvements aren't using frozen collections.

…I values (dotnet#100998) * Reduce frozen collection overheads for ignore-case with ASCII values Two changes: 1) We have a routine for computing ordinal ignore-case hash codes when we don't know whether the inputs are all ASCII or not. It's written to work regardless of target platform, but on .NET Core it can use string.GetHashCode(span, OrdinalIgnoreCase), which is faster than the implementation that's there, which converts the input to upper case first and then gets the hashcode of that. 2) When analyzing keys, a HashSet is constructed of those keys to determine uniqueness of the relevant substrings being analyzed. Adding each substring to the HashSet involves getting a hash code for it. With ignore-case, we're currently using a comparer that has to work with non-ASCII, and it hits that above code path. (1) helps for .NET Core, but for .NET Framework this comparer will still end up allocating strings as part of computing the hash codes. We can do an up-front check for whether all of the values are ASCII, and if they are, we can use a better comparer that doesn't need to allocate on .NET Framework and which is also a tad faster on .NET Core. * Address PR feedback

stephentoub requested a review from adamsitnik April 13, 2024 01:48

dotnet-issue-labeler bot added the area-System.Collections label Apr 13, 2024

dotnet-policy-service bot assigned stephentoub Apr 13, 2024

CyrusNajmabadi reviewed Apr 13, 2024

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs Show resolved Hide resolved

CyrusNajmabadi reviewed Apr 13, 2024

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs Show resolved Hide resolved

danmoseley reviewed Apr 13, 2024

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs Show resolved Hide resolved

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs Show resolved Hide resolved

andrewjsaid reviewed Apr 17, 2024

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/KeyAnalyzer.cs Outdated Show resolved Hide resolved

stephentoub added 2 commits April 18, 2024 23:25

Merge branch 'main' into avoidallocationnonascii

9e3c4dd

Address PR feedback

195819c

stephentoub requested a review from eiriktsarpalis April 19, 2024 03:41

This was referenced Apr 19, 2024

CI error: System.Net.Quic.QuicException: The connection timed out from inactivity #91757

Closed

Timeout in System.Text.Json.Tests.Utf8JsonWriterTests.WriteNumbers #101193

Closed

eiriktsarpalis reviewed Apr 19, 2024

View reviewed changes

src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/Hashing.cs Show resolved Hide resolved

eiriktsarpalis approved these changes Apr 19, 2024

View reviewed changes

stephentoub merged commit 7e7c1f0 into dotnet:main Apr 21, 2024
82 of 87 checks passed

stephentoub deleted the avoidallocationnonascii branch April 21, 2024 01:23

github-actions bot locked and limited conversation to collaborators May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

stephentoub commented Apr 13, 2024

dotnet-policy-service bot commented Apr 13, 2024

CyrusNajmabadi commented Apr 13, 2024

stephentoub commented Apr 13, 2024

LoopedBard3 commented Apr 25, 2024

stephentoub commented Apr 25, 2024

Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

Reduce frozen collection creation overheads for ignore-case with ASCII values #100998

Conversation

stephentoub commented Apr 13, 2024

dotnet-policy-service bot commented Apr 13, 2024

CyrusNajmabadi commented Apr 13, 2024

stephentoub commented Apr 13, 2024

LoopedBard3 commented Apr 25, 2024

stephentoub commented Apr 25, 2024