Implements ICU version of getDefaultLocale, fixes up ICU extension handling #3820

jackhorton · 2017-09-27T06:58:45Z

dilijev

dilijev · 2017-09-27T07:00:44Z

lib/Runtime/Library/InJavascript/Intl.js

+                locale = callInstanceFunc(StringInstanceReplace, locale, match[2], "");
+            }
+        } else {
+            // Windows' getDefaultLocale() will return a weird RFC4646 langtag


lol -- "weird" is a funny way of saying something from an outdated standard, but yeah

nit: this is a quirk of WinGlob not Windows per se, but the program logic makes that clear

dilijev · 2017-09-27T07:03:03Z

Tell me more about the fix to supportedLocalesOf?

jackhorton · 2017-09-27T19:02:44Z

Sorry, it was actually an issue in resolveLocaleHelper that only surfaced in supportedLocalesOf -- it was the change to ICU appending only "-" to the string instead of "-u-" because the getExtensions JS fallback returned ["u", "co", "phonebk"] for "de-u-co-phonebk", whereas I think the WinGlob native implementation returns just ["co", "phonebk"].

dilijev · 2017-09-27T22:57:36Z

@jackhorton I think maybe the WinGlob platform implementation of getExtensions actually returns an array of "key-value" sequences like "co-phonebk" and ignores the singletons like "-u" entirely.

I noticed this issue as well and added a comment to the code in my PR. Maybe handling it is now effectively fixed in your change -- we can clean up later.

dilijev · 2017-09-27T23:12:17Z

lib/Runtime/Library/InJavascript/Intl.js

-        return platform.builtInRegexMatch(GetDefaultLocale(), /([^_]*).*/)[1];
+        if (isPlatformUsingICU) {
+            const def = platform.getDefaultLocale();
+            const match = platform.builtInRegexMatch(def, LANG_TAG_EXT_RE);


LANG_TAG_EXT_RE) [](start = 58, length = 16)

Ensure this regex is not undefined first. If it is undefined, run the initializer function function. This pattern is used elsewhere.

e.g.:
if (!LANG_TAG_RE) {
InitializeLangTagREs();
}

dilijev

revoking review

dilijev

First resolve that initialization issue and regenerate the bytecode and ensure that the IntlICU build passes all tests except for Intl.

dilijev

LGTM

boingoing

👍

dilijev · 2017-09-28T07:29:52Z

lib/Runtime/Library/InJavascript/Intl.js

+        if (isPlatformUsingICU) {
+            // ICU's getDefaultLocale() will return a valid BCP-47/RFC 5646 langtag
+            locale = GetDefaultLocale();
+            const match = platform.builtInRegexMatch(locale, /-u(-[^\-][^\-]?-[^\-]+)*-co-([^\-]+).*/);


(-[^\-][^\-]?-[^\-]+) [](start = 64, length = 21)

nits (TODO in a future update):
You don't use the first capture group so make it non-matching with ?:
Add comments explaining what this is matching for (or rename match to something more descriptive). (Edit: eh, if (match) collation = is pretty clear, nevermind. A comment might still be nice.

dilijev · 2017-09-28T07:30:32Z

lib/Runtime/Library/InJavascript/Intl.js

+            const match = platform.builtInRegexMatch(locale, /-u(-[^\-][^\-]?-[^\-]+)*-co-([^\-]+).*/);
+            if (match) {
+                collation = match[2];
+                locale = callInstanceFunc(StringInstanceReplace, locale, `-co-${match[2]}`, "");


match[2] [](start = 80, length = 8)

nit (TODO): you just set collation = match[2], so use ${collation} instead

Here, do we intentionally still replace -co-value instead of stripping all subtags? (I'm guessing this mapping possibly involves other subtags?)
nit (TODO): (Come to think of it reverseLocaleAcceptingCollationValues needs a comment or example explaining what the resulting object looks like, because I hate having to use a debugger or reasonable about that code every time I want to answer that question.)
nit (TODO): (Come to think of it, running a bunch of JS code for something that is clearly something we could pre-compute is a bad idea. We should just cache the result directly in the script.)

In reply to: 141544228 [](ancestors = 141544228)

As far as I know (and as far as its variable name suggests), localesAcceptingCollationValues is only for collation values. As for whether the other extensions need to be stripped, as of right now, this function is only used as the defaultLocaleFunc for resolveLocales when initializing the Collator (initializing NumberFormat and DateTimeFormat uses strippedDefaultLocale instead). That internally calls platform.getExtensions, which falls back to getExtensionSubtags(locale), which is basically just

locale .match(LANG_TAG_EXT_RE)[0] // get extensions on the locale string .split('-') // split them into an array .filter(x => !!x) // remove empty elements

So, should we strip all subtags? No, probably not. Also, thinking about it, we probably shouldn't be stripping all subtags for NumberFormat or DateTimeFormat either. Id imagine the original reason for the different functions was that Collator needed the system collation value, so it went through the process of stripping "_collation" and mapping and re-formatting it, while NumberFormat and DateTimeFormat didn't need the collation, so they could blindly throw away the "_collation" part of the string, because they knew they could get -ca and -nu formats later from platform.getExtensions. In ICU, we cant do that right now and possibly don't need to do that ever. Also, since we are caching the results now, the perf cost of calculating the mapped collation value (if we even need to, re: the comment below) is 0 for the Number and Date case.

TL;DR: Stripping the locale for ICU is at best useless and at worst harmful, so I think we should actually ditch the ICU implementation of strippedDefaultLocale regardless

dilijev · 2017-09-28T07:36:13Z

lib/Runtime/Library/InJavascript/Intl.js

+                locale = callInstanceFunc(StringInstanceReplace, locale, `-co-${match[2]}`, "");
+            }
+        } else {
+            // Windows' getDefaultLocale() will return a weird RFC4646 langtag


weird [](start = 57, length = 5)

nit (TODO): I think we should remove "weird" since we're just editorializing. It simply is an RFC4646 langtag, and weird here makes it sound like something about the tags are not quite RFC4646. Maybe maybe the distinction as: "RFC4646 (not RFC5646/BCP47) langtag"

dilijev · 2017-09-28T07:44:27Z

lib/Runtime/Library/InJavascript/Intl.js

+        const collationMapForLocale = reverseLocaleAcceptingCollationValues[locale];
+        if (collationMapForLocale === undefined) {
+            // Assume the system wouldn't give us back a bad collation value
+            __mappedDefaultLocale = `${locale}-u-co-${collation}`;


${locale}-u-co-${collation}; [](start = 36, length = 30)

Have you exercised this path with a hypothetical default locale containing -u- ?
For some reason I think we'd be adding a duplicate singleton -u which (at least according to Intl spec) is not allowed in langtags.

https://tc39.github.io/ecma402/#sec-isstructurallyvalidlanguagetag 6.2.2. (bullet 3) "does not include duplicate singleton subtags."

That's correct. I think this should be resolved by making __mappedDefaultLocale = `${locale}-co-${collation}`

dilijev · 2017-09-28T07:54:58Z

lib/Runtime/Library/InJavascript/Intl.js

-        var bcpTag = availableBcpTags[collation];
-        if (bcpTag !== undefined) {
-            return locale + "-u-co-" + bcpTag;
+        const collationMapForLocale = reverseLocaleAcceptingCollationValues[locale];


reverseLocaleAcceptingCollationValues [](start = 38, length = 37)

I think the whole point of these mappings is to translate RFC 4646 (or associated standard/database/list) collation values into RFC 5646 (or associated) collation values. See the Intl explainer for the RFC 5646-associated list.
IOW this shouldn't be necessary for ICU because the langtag should already be RFC 5646.
However, since this value is in theory reported by the system, it might still be a problem. I'd like to think that ICU would convert or ignore anything it gets from Windows that isn't valid according to ICU. But I guess we can't assume that...

Doing a lookup in a dictionary once during the lifetime of the program is worth not needing to worry about this, I think. We may want to re-evaluate that decision if we make these dictionaries into a native-calculated, hardcoded thing (then, we could just not have this dictionary at all if ICU is enabled).

dilijev · 2017-09-28T07:56:57Z

lib/Runtime/Library/InJavascript/Intl.js

+        if (mappedCollation !== undefined) {
+            __mappedDefaultLocale = `${locale}-u-co-${mappedCollation}`;
+        } else {
+            __mappedDefaultLocale = `${locale}-u-co-${collation}`;


Same -- have you exercised these paths? The Intl harness should be helpful here.

dilijev · 2017-09-28T07:58:27Z

lib/Runtime/Library/InJavascript/Intl.js

+        } else {
+            resolved = undefined;
+        }
        return setPrototype({


nit (TODO): add a blank line before the return.

dilijev · 2017-09-28T08:00:39Z

lib/Runtime/Base/WindowsGlobalizationAdapter.cpp

-        if (lpLocaleName)
+        UErrorCode error = UErrorCode::U_ZERO_ERROR;
+        char bcp47[ULOC_FULLNAME_CAPACITY];
+        char defaultLocaleId[ULOC_FULLNAME_CAPACITY];


I mused about this in comments in other places -- it's not too expensive to zero-out these arrays = { 0 } and it's safer in case something goes wrong.
Now that we run this function only once on Intl initialization (thanks to caching the returned value in Intl.js), it seems worth it to pay that cost here.

dilijev · 2017-09-28T08:03:15Z

lib/Runtime/Base/WindowsGlobalizationAdapter.cpp

+        char defaultLocaleId[ULOC_FULLNAME_CAPACITY];
+
+        int32_t written = uloc_getName(nullptr, defaultLocaleId, ULOC_FULLNAME_CAPACITY, &error);
+        if (U_FAILURE(error) || written <= 0 || written >= ULOC_FULLNAME_CAPACITY)


if (U_FAILURE(error) || written <= 0 || written >= ULOC_FULLNAME_CAPACITY) [](start = 7, length = 75)

nit: success branch first for readability (as you did below), unless it's easier to express the failure case in an if conditional, or if you only do something in case of failure (basically if the else is the success path, it hurts readability).
To work around this, personally I will sometimes declare a boolean named (something like) success or failed and then negate if necessary depending on what I actually want to express for the if condition.

In this case, personally I think the success case reads better: (U_SUCCESS(error) && written > 0 && written < CAPACITY)

dilijev

Took a closer look at a few places. (Sorry for adding yet another review.)

Also note there's an incoming change from @boingoing that I'm merging from release/1.7 into intl-icu that we'll need to rebase on, and then do a proper bytecode regen (with build) to avoid nasty merge issues later.

dilijev · 2017-09-28T20:02:02Z

@dotnet-bot test this please

dilijev · 2017-09-28T20:03:04Z

@mmitche is something broken?

mmitche · 2017-09-28T20:38:25Z

@dilijev Jenkins upgrade hit a bunch of github api rate limit issues

dilijev · 2017-09-28T20:57:17Z

lib/Runtime/Library/InJavascript/Intl.js

-    var reverseLocaleAcceptingCollationValues = (function () {
-        var toReturn = setPrototype({}, null);
+    // reverses the keys and values in each locale's sub-object in localesAcceptingCollationValues
+    // localesAcceptingCollationValues[locale][key] = value -> reverseLocalesAcceptingCollationValues[locale][value] = key


Thanks for adding the explainer comment!

… fixes up ICU extension handling Merge pull request #3820 from jackhorton:icu-defaults Closes #3658

@dilijev

…Format and node-cc build integration into branch `release/1.7` Merge pull request #3840 from dilijev:intl-icu Includes the following PRs merged to branch `intl-icu` * [MERGE #3809 @dilijev] Intl-ICU: implement Intl.NumberFormat under ICU * [MERGE #3820 @jackhorton] Implements ICU version of getDefaultLocale, fixes up ICU extension handling * [MERGE #3822 @obastemur] xplat: fix the path for i18n * [MERGE #3750 @jackhorton] Expose platform.winglob to Intl.js to detect which i18n lib we are using in the native code.

@dilijev

…: Intl.NumberFormat and node-cc build integration into branch `release/1.7` Merge pull request #3840 from dilijev:intl-icu Includes the following PRs merged to branch `intl-icu` * [MERGE #3809 @dilijev] Intl-ICU: implement Intl.NumberFormat under ICU * [MERGE #3820 @jackhorton] Implements ICU version of getDefaultLocale, fixes up ICU extension handling * [MERGE #3822 @obastemur] xplat: fix the path for i18n * [MERGE #3750 @jackhorton] Expose platform.winglob to Intl.js to detect which i18n lib we are using in the native code.

jackhorton added the Intl-ICU label Sep 27, 2017

jackhorton requested a review from dilijev September 27, 2017 06:58

dilijev reviewed Sep 27, 2017

View reviewed changes

jackhorton force-pushed the icu-defaults branch from 46112c9 to 7a66537 Compare September 27, 2017 19:03

dilijev requested a review from boingoing September 27, 2017 20:30

jackhorton changed the title ~~Implements ICU version of getDefaultLocale, fixes up ICU supportedLocalesOf fallback~~ Implements ICU version of getDefaultLocale, fixes up ICU extension handling Sep 27, 2017

dilijev reviewed Sep 27, 2017

View reviewed changes

dilijev previously approved these changes Sep 27, 2017

View reviewed changes

dilijev suggested changes Sep 27, 2017

View reviewed changes

jackhorton force-pushed the icu-defaults branch 2 times, most recently from 429a3ec to d779273 Compare September 28, 2017 00:13

dilijev approved these changes Sep 28, 2017

View reviewed changes

boingoing approved these changes Sep 28, 2017

View reviewed changes

dilijev reviewed Sep 28, 2017

View reviewed changes

jackhorton force-pushed the icu-defaults branch from d779273 to da28d6e Compare September 28, 2017 19:33

Initial stab at ICU implementation of getDefaultLocale

235eeb3

jackhorton added 6 commits September 28, 2017 12:38

JS implementation of getDefaultLocale

37eefda

Full ICU defaultLocale support

6083530

Remove hard-coded default

36ff60b

Addressing review comments

96525b8

Add caches for default locales

9d986e9

Addressing further review comments

061081b

jackhorton force-pushed the icu-defaults branch from da28d6e to a045bda Compare September 28, 2017 19:43

dilijev closed this Sep 28, 2017

dilijev reopened this Sep 28, 2017

dilijev reviewed Sep 28, 2017

View reviewed changes

dilijev approved these changes Sep 28, 2017

View reviewed changes

dilijev and others added 3 commits September 28, 2017 16:42

Adding EngineInterface.Intl.useCaches flag for testing

523417e

Fixing mappedDefaultLocale on WinGlob

0130ca4

Updating bytecode

dba6a9d

jackhorton force-pushed the icu-defaults branch from e1fbd73 to dba6a9d Compare September 29, 2017 00:04

chakrabot merged commit dba6a9d into chakra-core:intl-icu Sep 29, 2017

chakrabot pushed a commit that referenced this pull request Sep 29, 2017

[MERGE #3820 @jackhorton] Implements ICU version of getDefaultLocale,…

f083f98

… fixes up ICU extension handling Merge pull request #3820 from jackhorton:icu-defaults Closes #3658

dilijev mentioned this pull request Sep 29, 2017

Intl-ICU: Merge branch intl-icu: Intl.NumberFormat and node-cc build integration into branch release/1.7 #3840

Merged

jackhorton deleted the icu-defaults branch September 29, 2017 16:54

Implements ICU version of getDefaultLocale, fixes up ICU extension handling #3820

Implements ICU version of getDefaultLocale, fixes up ICU extension handling #3820

Uh oh!

Conversation

jackhorton commented Sep 27, 2017 • edited by dilijev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dilijev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev commented Sep 27, 2017

Uh oh!

jackhorton commented Sep 27, 2017

Uh oh!

dilijev commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev left a comment

Choose a reason for hiding this comment

Uh oh!

dilijev left a comment

Choose a reason for hiding this comment

Uh oh!

dilijev left a comment

Choose a reason for hiding this comment

Uh oh!

boingoing left a comment

Choose a reason for hiding this comment

Uh oh!

dilijev Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilijev commented Sep 28, 2017

Uh oh!

dilijev commented Sep 28, 2017

Uh oh!

mmitche commented Sep 28, 2017

jackhorton commented Sep 27, 2017 •

edited by dilijev

Loading

dilijev commented Sep 27, 2017 •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading

dilijev left a comment •

edited

Loading

dilijev Sep 28, 2017 •

edited

Loading