Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688

Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Nov 14, 2020

Fixes #44681

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@EgorBo EgorBo closed this Nov 14, 2020
@EgorBo EgorBo reopened this Nov 14, 2020
@EgorBo EgorBo changed the title GetNonRandomizedHashCodeOrdinalIgnoreCase: ignore second byte in chars Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase Nov 14, 2020
@EgorBo EgorBo marked this pull request as ready for review November 14, 2020 20:40
@EgorBo
Copy link
Member Author

EgorBo commented Nov 14, 2020

@GrabYourPitchforks I'm not sure how to fix https://github.com/dotnet/runtime/blob/master/src/libraries/System.Collections/tests/Generic/Dictionary/HashCollisionScenarios/OutOfBoundsRegression.cs#L194-L196 test now - it relies on fact that we can easily generate collisions for a specific hash but actually 99.9.. % of such generated strings are non-ASCII so they go the slow path and don't produce collisions.

So I need to somehow generate 100 ascii string with the same hashcode 🤔

@GrabYourPitchforks
Copy link
Member

@EgorBo I think the only reason they're not producing the same hash code is that we're calling the Marvin routine. If we update the fallback logic to perform an uppercase conversion but still use the naïve bit-shifting routines already present in GetNonRandomizedHashCodeOrdinalIgnoreCase, the unit tests should pass.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 14, 2020

@EgorBo I think the only reason they're not producing the same hash code is that we're calling the Marvin routine. If we update the fallback logic to perform an uppercase conversion but still use the naïve bit-shifting routines already present in GetNonRandomizedHashCodeOrdinalIgnoreCase, the unit tests should pass.

ah ok, let me rewrite it then, thanks!

@EgorBo
Copy link
Member Author

EgorBo commented Nov 14, 2020

@GrabYourPitchforks updated, could you please take a look if it's what you meant

Copy link
Member

@GrabYourPitchforks GrabYourPitchforks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left some perf nit comments. These don't need to be addressed; they're mainly to point out areas of low hanging fruit just in case we did end up harming performance and we're looking for some easy ways to knock out a few percentage here and there.

@GrabYourPitchforks
Copy link
Member

I've reproed the unit test issue, one sec and I'll get a workaround out to you.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Nov 15, 2020

@EgorBo The commit GrabYourPitchforks@113d1e6 in my private branch includes three changes that will be of interest here:

Might be worth merging the patch into this PR so that both issues can be closed at once?

(I was going to submit my patch as its own PR to address #44695, but since that patch relies on this PR being committed, if I were to submit it prematurely all of the unit tests would fail.)

@danmoseley
Copy link
Member

Is there a more limited version of the change that might be lower risk to backport?

@GrabYourPitchforks
Copy link
Member

The least code churn version of this change for servicing purposes would be:

  • Remove all of the code from GetNonRandomizedHashCodeOrdinalIgnoreCase and have it forward to the method Marvin.ComputeHash32OrdinalIgnoreCase; and
  • Take the RandomizedStringEqualityComparer.cs fix as-is from this PR; and
  • Suppress the tests in OutOfBoundsRegression.cs, as they'll start failing.

Together, these will essentially disable the perf optimization that was done in #36252, and Dictionary<...>(OrdinalIgnoreCase) performance will revert to what it was in netcoreapp3.1.

@danmoseley
Copy link
Member

Dictionary<...>(OrdinalIgnoreCase) performance will revert to what it was in netcoreapp3.1.

Hmm, that would be a significant takeback.

@GrabYourPitchforks
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

return (int)(hash1 + (hash2 * 1566083941));

NotAscii:
return GetNonRandomizedHashCodeOrdinalIgnoreCaseSlow(this, hash1, hash2);
Copy link
Member

@jkotas jkotas Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is correct to pass hash1 and hash2 into the slow method. We could have processed some number of characters already and so hash1 and hash2 may not have their original values.

It would be nice to add a test case that covers this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, indeed
I am not sure it can be tested anyhow other than hardcoding the expected hashcode value 🤔

@jkotas
Copy link
Member

jkotas commented Nov 18, 2020

The new test is failing on Mono. Could you please investigate?

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Nov 19, 2020

I used runfo to pull the logs from the helix machine, but there don't seem to be any failures recorded? The zip file I downloaded contains the script used to run the tests, but I don't see anything that resembles output from the test run. Will dig in further tomorrow.

@jkotas
Copy link
Member

jkotas commented Nov 19, 2020

I see this failures when I go to Details / View more details on Azure Pipelines / "go back" / Tests / select iOS failure:

https://dev.azure.com/dnceng/public/_build/results?buildId=890570&view=ms.vss-test-web.build-test-results-tab&runId=28497198&paneView=debug&resultId=138723

System.Collections.Generic.KeyNotFoundException : The given key 'А' was not present in the dictionary.


Stack trace
   at System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Int32, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].get_Item(String key)
   at System.Collections.Tests.Dictionary_Tests.DictionaryOrdinalIgnoreCaseCyrillicKeys()
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

@EgorBo
Copy link
Member Author

EgorBo commented Nov 20, 2020

The new test is failing on Mono. Could you please investigate?

Ah, it's because iOS is always invariant atm (I'm integrating ICU there).

"Ф".Equals("ф", StringComparison.OrdinalIgnoreCase); // "false" in the Invariant mode

Will ignore that test.

@jkotas
Copy link
Member

jkotas commented Nov 21, 2020

Passed CI on Linux before. CI failures are #45061

@jkotas jkotas merged commit eb5df0d into dotnet:master Nov 21, 2020
@jkotas
Copy link
Member

jkotas commented Nov 21, 2020

/backport to release/5.0

@github-actions
Copy link
Contributor

Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/376611056

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringComparer.OrdinalIgnoreCase does not ignore case for several Cyrillic letters when used with Dictionary
5 participants