-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ASCII chars shouldn't compare equal to ASCII chars under OrdinalIgnoreCase comparison #32247
Comments
I would recommend doing that in the case of the ordinal casing but not for invariant. The reason here is Invariant casing still treated as a cultural casing and not ordinal and it makes sense to do the right casing according to Unicode standard as most of the people using it in scenarios that not specific to compare 2 strings securely. instead, the ordinal comparison should be used at that time. |
Just to clarify, you're suggesting: string.Equals("administrator", "adminiſtrator", StringComparison.OrdinalIgnoreCase) // <-- with this change, will return FALSE on all platforms
"administrator".ToUpperInvariant() == "adminiſtrator".ToUpperInvariant() // <-- will continue to return FALSE on Windows, TRUE on Linux This means that I'm ok with your suggestion here if you think we can swing the breaking change. :) |
@GrabYourPitchforks yes you have articulated it accurately. |
@tarekgh I've updated the proposal text to match your suggestions. Thanks! Do you suppose we'd need similar APIs |
Char already has ToUpper/ToLower which should match the ordinal casing. Also, it has ToUpperInvariant/ToLowerInvariant for invariant case. I think the current APIs are enough for doing any needed operation. |
There's no such API, unfortunately. :( After this change, on Linux: char.ToUpperInvariant('ſ') // <-- returns 'S'
"ſ".ToUpperInvariant() // <-- returns "S"
string.Equals("ſ", "S", StringComparison.OrdinalIgnoreCase) // <-- returns FALSE |
right. but I expect: string.Equals("ſ", "S", StringComparison.InvariantCultureIgnoreCase) // <-- returns true Is that right? |
I guess my question is that in theory it should be possible to write the following code: public static bool AreStringsEqualOrdinalCase(string a, string b)
{
if (a.Length != b.Length) { return false; }
for (int i = 0; i < a.Length; i++)
{
char charA = a[i];
char charB = b[i];
if (!CharsAreEqualOrdinalIgnoreCase(charA, charB)) { return false; }
}
return true;
} And that it should behave identically to What is the |
CharsAreEqualOrdinalIgnoreCase should be equivalent to |
runtime/src/libraries/System.Private.CoreLib/src/System/Char.cs Lines 350 to 353 in 0e0e852
We have no "ordinal" equivalent API. The closest we have is |
ah, got it. I didn't know char is calling the current culture. That is sad as we already have methods on TextInfo do the cultural casing. Thanks, for the clarification. I agree with the proposal |
@GrabYourPitchforks I marked this issue with the tag api-suggestion. do you think we need to finish this in 5.0? or can it be moved to the future? |
There's no API being proposed here. I think we should still tackle this in the 5.0 timeframe. It would dovetail nicely with the ICU work you're already doing. |
@GrabYourPitchforks this comment #32247 (comment) kind of suggesting a new APIs too. I removed the API suggestion label anyway. |
@tarekgh Ah, got it. I'll open a separate API proposal for those. |
@GrabYourPitchforks if we are going to expose new APIs, shouldn't we break the current behavior? and we ask devs to use the new APIs for correct scenarios? |
@tarekgh, not quite sure I follow. You're suggesting that both the current APIs and the new APIs get the new behavior? |
I am trying to say, don't change the current APIs behavior and have the new behavior in the new exposed APIs. |
So the current APIs would continue to have the behavior described below? Console.WriteLine(string.Equals("administrator", "adminiſtrator", StringComparison.OrdinalIgnoreCase)); // prints True (on Linux) |
Right. I think this can be acceptable. and the new proposed APIs can provide more control over the comparison as desired. |
I guess one of the things bugging me is that Unicode defines 4 ways to perform case-insensitive text comparison, and Our If we wanted to expose the 4 case-insensitive text comparison algorithms as their own separate APIs I can be sold on that idea. Edit - For reference, the four kinds of case-insensitive matching defined by Unicode are:
|
I agree then fixing them as you described #32247 (comment). |
@tarekgh One more quick question on On Windows, the behavior of these APIs is that they iterate through the string char-by-char, changing the case of each char in isolation. Each input char maps exactly to one output char (or to itself, if no case conversion exists). So the result string has the exact same length as the input string. Today's behavior for runtime/src/libraries/Native/Unix/System.Globalization.Native/pal_casing.c Lines 40 to 49 in 75036ff
ICU's normal case changing APIs (such as I think at the moment quite a bit of code relies on |
I would expose new APIs to support different length for compact reason. I saw a usage before that consumer of the current APIs always assumed the same input length. |
Do you have examples? We should look at how bad is this. It is not pretty to have old broken and new correct versions of the same APIs. It has negative value in the long run. We had the same problem with floating point formatting changes, and we choose to fix the existing APIs. |
Within the runtime and libraries: runtime/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.Globalization.cs Lines 317 to 319 in 1124c1a
runtime/src/libraries/System.Private.CoreLib/src/System/Marvin.OrdinalIgnoreCase.cs Lines 81 to 82 in d5be855
runtime/src/libraries/System.Private.CoreLib/src/System/Globalization/TextInfo.cs Line 397 in d5be855
Within application code, I assume the biggest pit of failure would be people who call |
|
This is fixed by the PR #40910. |
Under ICU, there are some non-ASCII code points that become ASCII code points after a simple case mapping transformation.
Since it's common for applications to use
StringComparison.OrdinalIgnoreCase
when comparing things like usernames, this could be a pit of failure for those applications, as it could lead to the following behavior at runtime.A fairly straightfoward fix would be to prevent non-ASCII chars and ASCII characters from being equal under an
OrdinalIgnoreCase
comparison. It would mean thatOrdinalIgnoreCase
is no longer a direct wrapper around ICU's case mapping / case folding APIs, but it would bring the behavior more in line with what developers have come to expect over .NET's history.With this proposal,
ToUpperInvariant
andToLowerInvariant
would be a direct wrapper around ICU's underlying simple case mapping APIs, and it wouldn't special-case any characters.Related: #27540
The following APIs would be affected:
string.Equals
,string.Compare
,string.GetHashCode
, and any other APIs which might acceptStringComparison.OrdinalIgnoreCase
as a parameterStringComparer.OrdinalIgnoreCase.Equals
andStringComparer.OrdinalIgnoreCase.GetHashCode
TextInfo.Compare
and similar APIs which might acceptCompareOptions.OrdinalIgnoreCase
/cc @tarekgh
The text was updated successfully, but these errors were encountered: