Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra locales to MS-ICU from CLDR-MS from NLS "MS-Only" culture data #91

Merged
merged 6 commits into from
Apr 9, 2021

Conversation

jefgen
Copy link
Member

@jefgen jefgen commented Apr 9, 2021

Summary

This change adds additional/extra locales to MS-ICU from CLDR-MS.

PR Checklist

  • I have verified that my change is specific to this fork and cannot be made upstream.
  • I am making a maintenance related change.
  • I am making a change that is related to usage internal to Microsoft.
  • I am making a change that is related to the Windows OS build of ICU.
  • CLA signed. If not, please see here to sign the CLA.

Detailed Description

This change rebuilds the MS-ICU data using CLDR-MS in order to add additional/extra locales from the updated CLDR-MS data.
These extra locales were the "MS-Only" locales from the NLS culture.xml data, which were ported to CLDR-MS.

I wrote a tool to dump out the list of MS-ICU and NLS locales, and then used the output to create a diff of the two sets of locales.

I uploaded an annotated diff of the locale lists here:
https://gist.github.com/jefgen/fee0eb88560f55993b90de72a3e250e9

Comparing the sets of locales, there were only 13 missing locales in MS-ICU compared to NLS.

The NLS specific locales that were missing (and which this change adds) are listed below:

  • bin-NG
  • en-029
  • en-ID
  • fr-029
  • ibb-NG
  • jv-Java-ID
  • kr-Latn-NG
  • ks-Deva-IN
  • la-VA
  • pap-029
  • tzm-Arab-MA
  • tzm-Latn-DZ
  • tzm-Tfng-MA

This change adds the missing locales to MS-ICU, meaning that all of the locales in NLS will now be present in MS-ICU as well (though they might have slightly different names though), which will address issue #64 about missing locales.

This change also adds additional aliases to MS-ICU as well in order to help with naming differences compared to the NLS locale names.

Since these locales had limited data, I had to patch the ICU tests to skip various tests for them, similar to some of the existing CLDR Seed locales. I added a new MSFT-Patch file for these changes.

The extra locale data also apparently pushed us just over the limit in icu/icu4c/source/tools/toolutil/package.h for the fixed-size STRING_STORE_SIZE buffer. I had to increase the size in order to get the data to build. I added another new MSFT-Patch file for this change as well.

@jefgen jefgen requested review from erik0686, axelandrejs, daniel-ju and a team April 9, 2021 00:06
@jefgen jefgen changed the title Add additional locales from CLDR-MS for previously MS-Only NLS locales Add extra locales to MS-ICU from CLDR-MS from NLS "MS-Only" culture data Apr 9, 2021
@daniel-ju
Copy link
Collaborator

Thanks for adding the patch files! Maybe we could consolidate the test patches together the next time we do an ingestion. :)

@jefgen
Copy link
Member Author

jefgen commented Apr 9, 2021

Maybe we could consolidate the test patches together the next time we do an ingestion. :)

Yeah, that's a good idea / point. We could/should consolidate the test patches, and I think we can likely also consolidate some of the header patches as well. There are 3 patches for header changes that could be one patch file for example.

I suppose if we wanted to, we could try to clean them up now, but it might be easier to wait until the next ingestion.
(We'd need to diff the original file versus our changes, etc.)

@jefgen jefgen merged commit 6eeeadb into master Apr 9, 2021
@jefgen jefgen deleted the user/jefgen/adding-extra-cldr-ms-locales branch April 9, 2021 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants