Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix creating cultures with extensions in the name #87114

Merged
merged 6 commits into from
Jun 6, 2023

Conversation

tarekgh
Copy link
Member

@tarekgh tarekgh commented Jun 4, 2023

The .NET supports culture names that closely follow the BCP 47 format specifications, which are in the format of language-script-region-extensions-collation. However, there are slight differences between the two. For example, the culture name en-US conforms to the standards and is supported by .NET. Things become interesting when we use the extension parts in the culture name. In BCP 47, extensions are defined by starting them with either -u- or -t-. However, .NET only supports one type of extension, which is collation, and it uses a different format for it. For instance, the standard format de-DE-u-co-phonebk defines the de-DE culture with the phone book collation behavior. In .NET, the culture name for the same collation would be de-DE_phonebk. On the other hand, ICU uses the name format de_DE@collation=phonebk. As a result, BCP 47 uses the -u-co- prefix to define the collation, .NET uses _ as the collation name prefix, and ICU uses @collation= as the prefix.

Currently, when we interact with ICU, we transform the .NET culture name to a format that ICU understands, and then .NET normalizes the final culture name back to the .NET format from the name returned by ICU. However, we currently only handle the collation extension and do not handle any other extensions, resulting in the returned name not being normalized. For example, if we use a name like xx-u-XX, ICU normalizes it to xx@XX=Yes, but we do not normalize this name to a format that works with .NET. On Windows, when we interoperate with the Windows OS using such a culture name to retrieve the default user settings, the name xx@XX=Yes will be rejected and an exception will be thrown. On Linux/MacOS, we continue to create a culture with such a name, but eventually, an exception will be thrown when using this culture with the resource manager, as it is an invalid name for resources.

To fix this issue, we need to normalize the culture names, taking into account the extension parts -u- and -t-. For the collation extension, we will continue to normalize it using _ while for other extensions, we will keep the original form. For example, in the case of xx-u-XX, we will keep the name as it is.

#86441

@ghost
Copy link

ghost commented Jun 4, 2023

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

The .NET supports culture names that closely follow the BCP 47 format specifications, which are in the format of language-script-region-extensions-collation. However, there are slight differences between the two. For example, the culture name en-US conforms to the standards and is supported by .NET. Things become interesting when we use the extension parts in the culture name. In BCP 47, extensions are defined by starting them with either -u- or -t-. However, .NET only supports one type of extension, which is collation, and it uses a different format for it. For instance, the standard format de-DE-u-co-phonebk defines the de-DE culture with the phone book collation behavior. In .NET, the culture name for the same collation would be de-DE_phonebk. On the other hand, ICU uses the name format de_DE@collation=phonebk. As a result, BCP 47 uses the -u-co- prefix to define the collation, .NET uses _ as the collation name prefix, and ICU uses @collation= as the prefix.

Currently, when we interact with ICU, we transform the .NET culture name to a format that ICU understands, and then .NET normalizes the final culture name back to the .NET format from the name returned by ICU. However, we currently only handle the collation extension and do not handle any other extensions, resulting in the returned name not being normalized. For example, if we use a name like xx-u-XX, ICU normalizes it to xx@XX=Yes, but we do not normalize this name to a format that works with .NET. On Windows, when we interoperate with the Windows OS using such a culture name to retrieve the default user settings, the name xx@XX=Yes will be rejected and an exception will be thrown. On Linux/MacOS, we continue to create a culture with such a name, but eventually, an exception will be thrown when using this culture with the resource manager, as it is an invalid name for resources.

To fix this issue, we need to normalize the culture names, taking into account the extension parts -u- and -t-. For the collation extension, we will continue to normalize it using _ while for other extensions, we will keep the original form. For example, in the case of xx-u-XX, we will keep the name as it is.

#86441

Author: tarekgh
Assignees: tarekgh
Labels:

area-System.Globalization

Milestone: -

@tarekgh tarekgh added this to the 8.0.0 milestone Jun 4, 2023
Span<char> buffer = stackalloc char[ICU_ULOC_FULLNAME_CAPACITY];
int bufferIndex = 0;

for (int i = 0; i < name.Length && bufferIndex < ICU_ULOC_FULLNAME_CAPACITY; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if bufferIndex ends up being >= ICU_ULOC_FULLNAME_CAPACITY? If that's possible, won't we end up truncating some data? And if it's not possible, should we instead throw an exception and/or assert if it occurs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost impossible as ICU restricts the names to that length, and we'll not grow the names more than ICU lengths. I have this check only to tolerate if ICU does anything wrong. The worst case we'll truncate the name, but nothing will affect the behavior more than showing the culture name is truncated. The reason is because we don't use this truncated name in interop with ICU, instead we use the ICU well-formed name stored in another field.

endOfCollation = name.Length;
}

int length = Math.Min(8, endOfCollation - collationIndex); // Windows doesn't allow collation names longer than 8 characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the collation name is > 8? We end up truncating it, and that's ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will be ok, this is not going to affect the behavior because we don't use the truncated name when interop with ICU.

@tarekgh
Copy link
Member Author

tarekgh commented Jun 5, 2023

/backport to release/7.0-staging

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/5182408638

@tarekgh
Copy link
Member Author

tarekgh commented Jun 5, 2023

/backport to release/6.0-staging

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/5182443774

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

@tarekgh backporting to release/6.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix creating cultures with extensions in the name
Using index info to reconstruct a base tree...
M	src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
M	src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
CONFLICT (content): Merge conflict in src/libraries/System.Private.CoreLib/src/System/Globalization/CultureData.Icu.cs
Auto-merging src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
CONFLICT (content): Merge conflict in src/libraries/System.Globalization/tests/CultureInfo/CultureInfoCtor.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix creating cultures with extensions in the name
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

@tarekgh an error occurred while backporting to release/6.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@tarekgh tarekgh merged commit 2248ebd into dotnet:main Jun 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants