Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser] HybridGlobalization allows different hashes for strings that return true for Equals #96400

Closed
ilonatommy opened this issue Jan 2, 2024 · 3 comments
Labels

Comments

@ilonatommy
Copy link
Member

ilonatommy commented Jan 2, 2024

We do not have a native, locale sensitive algorithm for HybridGlobalization, that would replace the ICU data. Implementing it manually in JS would slow down hashing mechanism. From this reason, we decided to use Invariant hasing and sort key functions in HybridGlobalization mode (see: #96354).
This approach has a few downsides, including a situation where two strings, e.g. s1 = "igloo", s2 = "İGLOO"
image
that are equal under "tr-TR" linguistic comparer (ICU or HybridGlobalization), produce:

bool linguisticComparer = new CultureInfo("tr-TR").CompareInfo.Compare(s1, s2, CompareOptions.IgnoreCase); // true

The HasCodes in ICU mode and HybridGlobalization mode are produced differently:

int hashCode1 = new CultureInfo("tr-TR").CompareInfo.GetHashCode(s1, CompareOptions.IgnoreCase);
int hashCode2 = new CultureInfo("tr-TR").CompareInfo.GetHashCode(s2, CompareOptions.IgnoreCase);
// true when ICU loaded (Hybrid mode off), false when HybridGlobalization mode on (reduced ICU loaded)

Equality function should have the same effect as GetHashCode comparison, however for HybridGlobalization this does not stand.

There are HashCodeLocalized test cases excluded for HG marked with this issue.

Possible fix:

  • switching HybridGlobalization off (this will increase the application size by loading additional ICU data)
  • implementing JS-based, locale-sensitive hashing (this will be much slower than Invariant or ICU hashing).
@ghost
Copy link

ghost commented Jan 2, 2024

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

We do not have a native, locale sensitive algorithm for HybridGlobalization, that would replace the ICU data. Implementing it manually in JS would slow down hashing mechanism. From this reason, we decided to use Invariant hasing and sort key functions in HybridGlobalization mode (see: #96354).
This approach has a few downsides, including a situation where two strings, e.g. s1 = "Hello", s2 = "He\u200dllo"
image
that are equal under linguistic comparer (ICU or HybridGlobalization), produce:

bool linguisticComparer = StringComparer.InvariantCulture.Equals(s1, s2); // true

The HasCodes in ICU mode and HybridGlobalization mode are produced differently:

bool areHasCodesSame = StringComparer.InvariantCulture.GetHashCode(s1) == StringComparer.InvariantCulture.GetHashCode(s2);
// true when ICU loaded (Hybrid mode off), false when HybridGlobalization mode on (reduced ICU loaded)

Equality function should have the same effect as GetHashCode comparison, however for HybridGlobalization this does not stand.

Possible fix:

  • switching HybridGlobalization off (this will increase the application size by loading additional ICU data)
  • implementing JS-based, locale-sensitive hashing (this will be much slower than Invariant or ICU hashing).
Author: ilonatommy
Assignees: -
Labels:

arch-wasm, area-System.Globalization

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 2, 2024
@pavelsavara
Copy link
Member

I think this breaks contract between Equals and GetHashCode in bad way.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 5, 2024
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 16, 2024
@ilonatommy
Copy link
Member Author

Closing - the PR that was going to introduce this behavior got changed.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 18, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants