Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

EgorBo · 2020-11-14T18:31:55Z

Dotnet-GitSync-Bot · 2020-11-14T18:31:58Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs

EgorBo · 2020-11-14T22:08:11Z

@GrabYourPitchforks I'm not sure how to fix https://github.com/dotnet/runtime/blob/master/src/libraries/System.Collections/tests/Generic/Dictionary/HashCollisionScenarios/OutOfBoundsRegression.cs#L194-L196 test now - it relies on fact that we can easily generate collisions for a specific hash but actually 99.9.. % of such generated strings are non-ASCII so they go the slow path and don't produce collisions.

So I need to somehow generate 100 ascii string with the same hashcode 🤔

GrabYourPitchforks · 2020-11-14T22:14:19Z

@EgorBo I think the only reason they're not producing the same hash code is that we're calling the Marvin routine. If we update the fallback logic to perform an uppercase conversion but still use the naïve bit-shifting routines already present in GetNonRandomizedHashCodeOrdinalIgnoreCase, the unit tests should pass.

EgorBo · 2020-11-14T22:22:48Z

@EgorBo I think the only reason they're not producing the same hash code is that we're calling the Marvin routine. If we update the fallback logic to perform an uppercase conversion but still use the naïve bit-shifting routines already present in GetNonRandomizedHashCodeOrdinalIgnoreCase, the unit tests should pass.

ah ok, let me rewrite it then, thanks!

EgorBo · 2020-11-14T23:31:55Z

@GrabYourPitchforks updated, could you please take a look if it's what you meant

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs

GrabYourPitchforks

LGTM! I left some perf nit comments. These don't need to be addressed; they're mainly to point out areas of low hanging fruit just in case we did end up harming performance and we're looking for some easy ways to knock out a few percentage here and there.

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs

GrabYourPitchforks · 2020-11-15T02:27:30Z

I've reproed the unit test issue, one sec and I'll get a workaround out to you.

GrabYourPitchforks · 2020-11-15T02:34:32Z

@EgorBo The commit GrabYourPitchforks@113d1e6 in my private branch includes three changes that will be of interest here:

The file RandomizedStringEqualityComparer.cs includes a fix for Dictionary sometimes uses Ordinal hash code calculation instead of OrdinalIgnoreCase #44695.
The file OutOfBoundsRegression.cs contains a fix for the Assert.Equal(0x24716ca0, ...) that's failing in this PR.
The file OutOfBoundsRegression.cs contains additional regression tests for both StringComparer.OrdinalIgnoreCase does not ignore case for several Cyrillic letters when used with Dictionary #44681 and Dictionary sometimes uses Ordinal hash code calculation instead of OrdinalIgnoreCase #44695.

Might be worth merging the patch into this PR so that both issues can be closed at once?

(I was going to submit my patch as its own PR to address #44695, but since that patch relies on this PR being committed, if I were to submit it prematurely all of the unit tests would fail.)

danmoseley · 2020-11-15T23:32:06Z

Is there a more limited version of the change that might be lower risk to backport?

GrabYourPitchforks · 2020-11-16T00:09:07Z

The least code churn version of this change for servicing purposes would be:

Remove all of the code from GetNonRandomizedHashCodeOrdinalIgnoreCase and have it forward to the method Marvin.ComputeHash32OrdinalIgnoreCase; and
Take the RandomizedStringEqualityComparer.cs fix as-is from this PR; and
Suppress the tests in OutOfBoundsRegression.cs, as they'll start failing.

Together, these will essentially disable the perf optimization that was done in #36252, and Dictionary<...>(OrdinalIgnoreCase) performance will revert to what it was in netcoreapp3.1.

danmoseley · 2020-11-16T00:57:30Z

Dictionary<...>(OrdinalIgnoreCase) performance will revert to what it was in netcoreapp3.1.

Hmm, that would be a significant takeback.

…domizedHashCodeOrdinalIgnoreCase

GrabYourPitchforks · 2020-11-16T21:20:02Z

/azp run runtime

azure-pipelines · 2020-11-16T21:20:34Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs

jkotas · 2020-11-18T05:00:49Z

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs

+ return (int)(hash1 + (hash2 * 1566083941));
+
+ NotAscii:
+ return GetNonRandomizedHashCodeOrdinalIgnoreCaseSlow(this, hash1, hash2);


I do not think it is correct to pass hash1 and hash2 into the slow method. We could have processed some number of characters already and so hash1 and hash2 may not have their original values.

It would be nice to add a test case that covers this case.

oops, indeed
I am not sure it can be tested anyhow other than hardcoding the expected hashcode value 🤔

jkotas · 2020-11-18T20:15:03Z

The new test is failing on Mono. Could you please investigate?

GrabYourPitchforks · 2020-11-19T06:33:41Z

I used runfo to pull the logs from the helix machine, but there don't seem to be any failures recorded? The zip file I downloaded contains the script used to run the tests, but I don't see anything that resembles output from the test run. Will dig in further tomorrow.

jkotas · 2020-11-19T07:54:22Z

I see this failures when I go to Details / View more details on Azure Pipelines / "go back" / Tests / select iOS failure:

https://dev.azure.com/dnceng/public/_build/results?buildId=890570&view=ms.vss-test-web.build-test-results-tab&runId=28497198&paneView=debug&resultId=138723

System.Collections.Generic.KeyNotFoundException : The given key 'А' was not present in the dictionary.


Stack trace
   at System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].get_Item(String key)
   at System.Collections.Tests.Dictionary_Tests.DictionaryOrdinalIgnoreCaseCyrillicKeys()
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

EgorBo · 2020-11-20T23:53:01Z

The new test is failing on Mono. Could you please investigate?

Ah, it's because iOS is always invariant atm (I'm integrating ICU there).

"Ф".Equals("ф", StringComparison.OrdinalIgnoreCase); // "false" in the Invariant mode

Will ignore that test.

jkotas · 2020-11-21T23:00:53Z

Passed CI on Linux before. CI failures are #45061

jkotas · 2020-11-21T23:03:58Z

/backport to release/5.0

github-actions · 2020-11-21T23:04:50Z

Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/376611056

EgorBo added 2 commits November 14, 2020 21:24

Fix GetNonRandomizedHashCodeOrdinalIgnoreCase

085963c

Add a test

dbf4227

EgorBo closed this Nov 14, 2020

EgorBo added 2 commits November 14, 2020 22:16

correct (but slow) fix

c4de34f

clean up

adbb85a

EgorBo reopened this Nov 14, 2020

EgorBo added 2 commits November 14, 2020 22:20

Update String.Comparison.cs

093412d

Update String.Comparison.cs

a645884

EgorBo changed the title ~~GetNonRandomizedHashCodeOrdinalIgnoreCase: ignore second byte in chars~~ Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase Nov 14, 2020

Update String.Comparison.cs

101ce22

GrabYourPitchforks reviewed Nov 14, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs Outdated Show resolved Hide resolved

GrabYourPitchforks reviewed Nov 14, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs Outdated Show resolved Hide resolved

Address feedback and fix test

d9cd23c

EgorBo marked this pull request as ready for review November 14, 2020 20:40

undo change in tests

641fcbf

EgorBo added 2 commits November 15, 2020 02:08

Address feedback

842b980

Clean up

b0e7e86

GrabYourPitchforks reviewed Nov 14, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs Outdated Show resolved Hide resolved

Address feedback

7e007d6

GrabYourPitchforks reviewed Nov 15, 2020

View reviewed changes

GrabYourPitchforks approved these changes Nov 15, 2020

View reviewed changes

Address feedback

8004abd

GrabYourPitchforks mentioned this pull request Nov 15, 2020

StringComparer.OrdinalIgnoreCase does not ignore case for several Cyrillic letters when used with Dictionary #44681

Closed

danmoseley requested a review from stephentoub November 16, 2020 00:57

Merge branch 'master' of github.com:dotnet/runtime into fix-GetNonRan…

77a96bd

…domizedHashCodeOrdinalIgnoreCase

jkotas reviewed Nov 17, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs Outdated Show resolved Hide resolved

jkotas reviewed Nov 17, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs Outdated Show resolved Hide resolved

runfoapp bot mentioned this pull request Nov 17, 2020

Inability to unzip assets during build on Unix x64 #32805

Closed

Address Jan's feedback

bef2782

jkotas reviewed Nov 18, 2020

View reviewed changes

don't pass hash1 and hash2

561847f

jkotas mentioned this pull request Nov 18, 2020

Critical section level violation in Interop codebase #44114

Closed

Update Dictionary.Tests.cs

f699c0f

jkotas mentioned this pull request Nov 21, 2020

🔥🔥🔥 "Libraries Test Run checked coreclr Linux" timing out on all PRs #45061

Closed

jkotas approved these changes Nov 21, 2020

View reviewed changes

jkotas merged commit eb5df0d into dotnet:master Nov 21, 2020

github-actions bot mentioned this pull request Nov 21, 2020

[release/5.0] Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #45062

Merged

jkotas added the area-System.Runtime label Nov 22, 2020

GrabYourPitchforks mentioned this pull request Nov 22, 2020

Dictionary sometimes uses Ordinal hash code calculation instead of OrdinalIgnoreCase #44695

Closed

ghost locked as resolved and limited conversation to collaborators Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

EgorBo commented Nov 14, 2020 •

edited

Loading

Dotnet-GitSync-Bot commented Nov 14, 2020

EgorBo commented Nov 14, 2020

GrabYourPitchforks commented Nov 14, 2020

EgorBo commented Nov 14, 2020

EgorBo commented Nov 14, 2020

GrabYourPitchforks left a comment

GrabYourPitchforks commented Nov 15, 2020

GrabYourPitchforks commented Nov 15, 2020 •

edited

Loading

danmoseley commented Nov 15, 2020

GrabYourPitchforks commented Nov 16, 2020

danmoseley commented Nov 16, 2020

GrabYourPitchforks commented Nov 16, 2020

azure-pipelines bot commented Nov 16, 2020

jkotas Nov 18, 2020 •

edited

Loading

EgorBo Nov 18, 2020

jkotas commented Nov 18, 2020

GrabYourPitchforks commented Nov 19, 2020 •

edited

Loading

jkotas commented Nov 19, 2020

EgorBo commented Nov 20, 2020

jkotas commented Nov 21, 2020

jkotas commented Nov 21, 2020

github-actions bot commented Nov 21, 2020

Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

Conversation

EgorBo commented Nov 14, 2020 • edited Loading

Dotnet-GitSync-Bot commented Nov 14, 2020

EgorBo commented Nov 14, 2020

GrabYourPitchforks commented Nov 14, 2020

EgorBo commented Nov 14, 2020

EgorBo commented Nov 14, 2020

GrabYourPitchforks left a comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Nov 15, 2020

GrabYourPitchforks commented Nov 15, 2020 • edited Loading

danmoseley commented Nov 15, 2020

GrabYourPitchforks commented Nov 16, 2020

danmoseley commented Nov 16, 2020

GrabYourPitchforks commented Nov 16, 2020

azure-pipelines bot commented Nov 16, 2020

jkotas Nov 18, 2020 • edited Loading

Choose a reason for hiding this comment

EgorBo Nov 18, 2020

Choose a reason for hiding this comment

jkotas commented Nov 18, 2020

GrabYourPitchforks commented Nov 19, 2020 • edited Loading

jkotas commented Nov 19, 2020

EgorBo commented Nov 20, 2020

jkotas commented Nov 21, 2020

jkotas commented Nov 21, 2020

github-actions bot commented Nov 21, 2020

EgorBo commented Nov 14, 2020 •

edited

Loading

GrabYourPitchforks commented Nov 15, 2020 •

edited

Loading

jkotas Nov 18, 2020 •

edited

Loading

GrabYourPitchforks commented Nov 19, 2020 •

edited

Loading