-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add support for invariant casing in PAL #24597
Conversation
unicodedata.cpp based on UnicodeData.txt v11.0.
Cc @sergiy-k |
src/utilcode/sstring.cpp
Outdated
@@ -102,12 +102,12 @@ static WCHAR MapChar(WCHAR wc, DWORD dwFlags, LocaleID lcid) | |||
|
|||
if (dwFlags == LCMAP_UPPERCASE) | |||
{ | |||
wTmp = toupper(wc); | |||
wTmp = ToUpperInvariant(wc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the upper casing actually used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for invariant SString
comparison and hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarekgh Does it matter whether case-insensitive comparison and hashing normalizes to lower-case or upper-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the full globalization support situation, usually use the sort keys of the strings. Considering this is not available here, I prefer upper case the strings (and not the lower casing). This is because of the Greek sigma cases which is we have 2 lower cases sigma mapped to one upper case character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas, if you insist on having a simpler table, in light of what Tarek says, it kind of looks like most places that call into LowerCase
(including the reflection path) should actually be doing UpperCase
.
We can fix all those places up and do a simpler table - the thing I actually started with in #21169 before the wild goose chase.
My preference would be not to embark on another wild goose chase though and just go with this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should actually be doing
UpperCase
Yes, we would be useful to fix this for consistency. It is an improvement to go from 3 clusters of potential bugs to 2 clusters of potential bugs.
I am less worried about the size of the table. Of course, smaller is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would be useful to fix this for consistency. It is an improvement to go from 3 clusters of potential bugs to 2 clusters of potential bugs.
It kind of looks like we have uses of SString::LowerCase that we won't be able to get rid of. So the table to do lower casing needs to stay.
I looked at switching reflection to use UpperCase, but this still does nothing for the Greek sigma because apparently, LCMapStringEx
we call here
coreclr/src/utilcode/sstring.cpp
Line 93 in a28b25a
int iRet = ::LCMapStringEx(lcid, dwFlags, &wc, 1, &wTmp, 1, NULL, NULL, 0); |
(with LOCALE_NAME_INVARIANT
no less) says that the upper case form of the lower case final sigma is lower case final sigma.
I don't know why it's the case because we even document the final sigma handling here: https://docs.microsoft.com/en-us/windows/desktop/intl/handling-sorting-in-your-applications. Alas, the final sigma would only work when using the table added in this pull request, so it will only work on Unix.
I'm tempted to call this out of scope and leave the reflection at LowerCase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@MichalStrehovsky, thank you! |
unicodedata.cpp based on UnicodeData.txt v11.0. Commit migrated from dotnet/coreclr@c553192
unicodedata.cpp based on UnicodeData.txt v11.0.
Fixes #20616.