-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix creating cultures with extensions in the name #87114
Fix creating cultures with extensions in the name #87114
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsThe .NET supports culture names that closely follow the BCP 47 format specifications, which are in the format of language-script-region-extensions-collation. However, there are slight differences between the two. For example, the culture name Currently, when we interact with ICU, we transform the .NET culture name to a format that ICU understands, and then .NET normalizes the final culture name back to the .NET format from the name returned by ICU. However, we currently only handle the collation extension and do not handle any other extensions, resulting in the returned name not being normalized. For example, if we use a name like To fix this issue, we need to normalize the culture names, taking into account the extension parts
|
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
Span<char> buffer = stackalloc char[ICU_ULOC_FULLNAME_CAPACITY]; | ||
int bufferIndex = 0; | ||
|
||
for (int i = 0; i < name.Length && bufferIndex < ICU_ULOC_FULLNAME_CAPACITY; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if bufferIndex ends up being >= ICU_ULOC_FULLNAME_CAPACITY? If that's possible, won't we end up truncating some data? And if it's not possible, should we instead throw an exception and/or assert if it occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost impossible as ICU restricts the names to that length, and we'll not grow the names more than ICU lengths. I have this check only to tolerate if ICU does anything wrong. The worst case we'll truncate the name, but nothing will affect the behavior more than showing the culture name is truncated. The reason is because we don't use this truncated name in interop with ICU, instead we use the ICU well-formed name stored in another field.
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Show resolved
Hide resolved
endOfCollation = name.Length; | ||
} | ||
|
||
int length = Math.Min(8, endOfCollation - collationIndex); // Windows doesn't allow collation names longer than 8 characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the collation name is > 8? We end up truncating it, and that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that will be ok, this is not going to affect the behavior because we don't use the truncated name when interop with ICU.
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Outdated
Show resolved
Hide resolved
/backport to release/7.0-staging |
Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/5182408638 |
/backport to release/6.0-staging |
Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/5182443774 |
@tarekgh backporting to release/6.0-staging failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix creating cultures with extensions in the name
Using index info to reconstruct a base tree...
M src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
M src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
CONFLICT (content): Merge conflict in src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Auto-merging src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
CONFLICT (content): Merge conflict in src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix creating cultures with extensions in the name
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@tarekgh an error occurred while backporting to release/6.0-staging, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
The .NET supports culture names that closely follow the BCP 47 format specifications, which are in the format of language-script-region-extensions-collation. However, there are slight differences between the two. For example, the culture name
en-US
conforms to the standards and is supported by .NET. Things become interesting when we use the extension parts in the culture name. In BCP 47, extensions are defined by starting them with either-u-
or-t-
. However, .NET only supports one type of extension, which is collation, and it uses a different format for it. For instance, the standard formatde-DE-u-co-phonebk
defines thede-DE
culture with the phone book collation behavior. In .NET, the culture name for the same collation would bede-DE_phonebk
. On the other hand, ICU uses the name formatde_DE@collation=phonebk
. As a result, BCP 47 uses the-u-co-
prefix to define the collation, .NET uses_
as the collation name prefix, and ICU uses@collation=
as the prefix.Currently, when we interact with ICU, we transform the .NET culture name to a format that ICU understands, and then .NET normalizes the final culture name back to the .NET format from the name returned by ICU. However, we currently only handle the collation extension and do not handle any other extensions, resulting in the returned name not being normalized. For example, if we use a name like
xx-u-XX
, ICU normalizes it toxx@XX=Yes
, but we do not normalize this name to a format that works with .NET. On Windows, when we interoperate with the Windows OS using such a culture name to retrieve the default user settings, the namexx@XX=Yes
will be rejected and an exception will be thrown. On Linux/MacOS, we continue to create a culture with such a name, but eventually, an exception will be thrown when using this culture with the resource manager, as it is an invalid name for resources.To fix this issue, we need to normalize the culture names, taking into account the extension parts
-u-
and-t-
. For the collation extension, we will continue to normalize it using_
while for other extensions, we will keep the original form. For example, in the case ofxx-u-XX
, we will keep the name as it is.#86441