Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add support for invariant casing in PAL #24597

Merged
merged 6 commits into from
May 17, 2019

Conversation

MichalStrehovsky
Copy link
Member

unicodedata.cpp based on UnicodeData.txt v11.0.

Fixes #20616.

@MichalStrehovsky
Copy link
Member Author

Cc @sergiy-k

@@ -102,12 +102,12 @@ static WCHAR MapChar(WCHAR wc, DWORD dwFlags, LocaleID lcid)

if (dwFlags == LCMAP_UPPERCASE)
{
wTmp = toupper(wc);
wTmp = ToUpperInvariant(wc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the upper casing actually used anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for invariant SString comparison and hashing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarekgh Does it matter whether case-insensitive comparison and hashing normalizes to lower-case or upper-case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the full globalization support situation, usually use the sort keys of the strings. Considering this is not available here, I prefer upper case the strings (and not the lower casing). This is because of the Greek sigma cases which is we have 2 lower cases sigma mapped to one upper case character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas, if you insist on having a simpler table, in light of what Tarek says, it kind of looks like most places that call into LowerCase (including the reflection path) should actually be doing UpperCase.

We can fix all those places up and do a simpler table - the thing I actually started with in #21169 before the wild goose chase.

My preference would be not to embark on another wild goose chase though and just go with this table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should actually be doing UpperCase

Yes, we would be useful to fix this for consistency. It is an improvement to go from 3 clusters of potential bugs to 2 clusters of potential bugs.

I am less worried about the size of the table. Of course, smaller is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would be useful to fix this for consistency. It is an improvement to go from 3 clusters of potential bugs to 2 clusters of potential bugs.

It kind of looks like we have uses of SString::LowerCase that we won't be able to get rid of. So the table to do lower casing needs to stay.

I looked at switching reflection to use UpperCase, but this still does nothing for the Greek sigma because apparently, LCMapStringEx we call here

int iRet = ::LCMapStringEx(lcid, dwFlags, &wc, 1, &wTmp, 1, NULL, NULL, 0);

(with LOCALE_NAME_INVARIANT no less) says that the upper case form of the lower case final sigma is lower case final sigma.

I don't know why it's the case because we even document the final sigma handling here: https://docs.microsoft.com/en-us/windows/desktop/intl/handling-sorting-in-your-applications. Alas, the final sigma would only work when using the table added in this pull request, so it will only work on Unix.

I'm tempted to call this out of scope and leave the reflection at LowerCase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

src/pal/inc/pal.h Outdated Show resolved Hide resolved
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sergiy-k
Copy link

@MichalStrehovsky, thank you!

@jkotas jkotas merged commit c553192 into dotnet:master May 17, 2019
@MichalStrehovsky MichalStrehovsky deleted the invariant branch May 17, 2019 03:46
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
unicodedata.cpp based on UnicodeData.txt v11.0.


Commit migrated from dotnet/coreclr@c553192
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants