Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FilePathNormalizer.GetHashCode case insensitive #10050

Merged
merged 6 commits into from
Mar 12, 2024

Conversation

davidwengier
Copy link
Contributor

@davidwengier davidwengier commented Mar 8, 2024

Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1957989 and hopefully also #10007

When I added FilePathNormalizingComparer I failed to test GetHashCode and it has a bug, meaning the comparer didn't work in dictionaries. For some as-yet unknown reason this seems to only be a problem on the first load of a new project in VS Code. 🤷‍♂️

@davidwengier davidwengier requested a review from a team as a code owner March 8, 2024 00:46
@@ -19,9 +19,13 @@ internal static class FilePathNormalizer

private class FilePathNormalizingComparer : IEqualityComparer<string>
{
private Func<char, char> _charConverter = RuntimeInformation.IsOSPlatform(OSPlatform.Linux)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more for my education, but why introduce a new function here / duplicate the operating system logic for each character? FilePathComparison is an existing static (factory?) that returns the case sensitivity to use. Is there any value is lowering each character individually / vs lowering the entire path outright. rsandbach@d6515d7

Copy link
Contributor Author

@davidwengier davidwengier Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is performance. This method can be called a lot, so allocating a string and calling ToLower on it (which allocates another) just gives more for the garbage collector to clean up.

I guess I could have checked if FilePathComparison.Instance returned an Ordinal comparison, rather than copy the code, but it didn't seem like a big deal. Unfortunately the StringComparison type it returns isn't directly useful though. If there was a CharComparison type I would have used it :)

@@ -19,9 +19,13 @@ internal static class FilePathNormalizer

private class FilePathNormalizingComparer : IEqualityComparer<string>
{
private Func<char, char> _charConverter = RuntimeInformation.IsOSPlatform(OSPlatform.Linux)
? c => c
: char.ToLowerInvariant;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwengier: Is invariant the right approach? After all, file paths are human-readable strings with a potential for important culture details being included in them. Asking because I'm unsure. 🤷

Copy link
Contributor Author

@davidwengier davidwengier Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked invariant because our existing conversions are Ordinal, which is culture-less, and given there is no ToLowerOrdinal(), avoiding quirks of specific cultures seemed workwhile. I don't claim to be sure of anything though either :)

@@ -147,7 +151,7 @@ public static int GetHashCode(string filePath)

foreach (var ch in normalizedSpan)
{
hashCombiner.Add(ch);
hashCombiner.Add(charConverter(ch));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions:

  1. On .NET Core, consider using the StringComparer.OrdinalIgnoreCase.GetHashCode(ReadOnlySpan<char>) or string.GetHashCode(ReadOnlySpan<char>, StringComparison.OrdinalIgnoreCase) overloads.
  2. If converting the span to lower invariant is the way to go, consider calling ReadOnlySpan<char>.ToLowerInvariant(...) rather than calling into a delegate for every char.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I looked for a method on a comparer that would be useful, but didn't find one. string.GetHashCode will be very helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like on .NET Framework, ReadOnlySpan<char>.ToLower..() just does a ToString() then calls ToLower() on it. Since we have to loop anyway, I'm leaning towards leaving this for that TFM. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird because I think CultureInvariant is different from an ordinal comparison. It still does some modifications. I think it still works for our needs though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely different, but for hash code I don't think we really care much, because there will still be an Equals check once the right bucket is found, and check does use ordinal comparison. In the case of hash code in a dictionary, I think its actually better to err on the side of more collisions, because we can trust Equals.

davidwengier and others added 2 commits March 9, 2024 09:47
…st/FilePathNormalizerTest.cs

Co-authored-by: Dustin Campbell <dustin@teamcampbell.org>
@davidwengier
Copy link
Contributor Author

@DustinCampbell I'm merging this so I can move on to doing a VS Code insertion, but if you have any comments about the last commit (28c3b6e) I'm more than happy to do a follow up. I don't love it because it could hide a product bug (document add notification before project add?) but on the other hand, I hope other tests would cover that.

@davidwengier davidwengier merged commit af1bc0f into dotnet:main Mar 12, 2024
12 checks passed
@davidwengier davidwengier deleted the CaseInsensitiveGetHashCode branch March 12, 2024 01:44
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Mar 12, 2024
@DustinCampbell
Copy link
Member

@DustinCampbell I'm merging this so I can move on to doing a VS Code insertion, but if you have any comments about the last commit (28c3b6e) I'm more than happy to do a follow up.

Looks good to me!

davidwengier added a commit that referenced this pull request Mar 15, 2024
Port of #10050 to a new branch,
which is at the point of
dotnet/vscode-csharp@bda22d3,
which is the current version of Razor bits in the `release` branch of
the C# extension

@phil-allen-msft
arunchndr added a commit to dotnet/vscode-csharp that referenced this pull request Mar 15, 2024
@RikkiGibson RikkiGibson modified the milestones: Next, 17.10 P3 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants