Skip to content

Conversation

@jackhorton
Copy link
Contributor

@jackhorton jackhorton commented Sep 27, 2017

Closes #3658

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at jackhorton#1

locale = callInstanceFunc(StringInstanceReplace, locale, match[2], "");
}
} else {
// Windows' getDefaultLocale() will return a weird RFC4646 langtag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol -- "weird" is a funny way of saying something from an outdated standard, but yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a quirk of WinGlob not Windows per se, but the program logic makes that clear

@dilijev
Copy link
Contributor

dilijev commented Sep 27, 2017

Tell me more about the fix to supportedLocalesOf?

@jackhorton
Copy link
Contributor Author

Sorry, it was actually an issue in resolveLocaleHelper that only surfaced in supportedLocalesOf -- it was the change to ICU appending only "-" to the string instead of "-u-" because the getExtensions JS fallback returned ["u", "co", "phonebk"] for "de-u-co-phonebk", whereas I think the WinGlob native implementation returns just ["co", "phonebk"].

@dilijev dilijev requested a review from boingoing September 27, 2017 20:30
@jackhorton jackhorton changed the title Implements ICU version of getDefaultLocale, fixes up ICU supportedLocalesOf fallback Implements ICU version of getDefaultLocale, fixes up ICU extension handling Sep 27, 2017
@dilijev
Copy link
Contributor

dilijev commented Sep 27, 2017

@jackhorton I think maybe the WinGlob platform implementation of getExtensions actually returns an array of "key-value" sequences like "co-phonebk" and ignores the singletons like "-u" entirely.

I noticed this issue as well and added a comment to the code in my PR. Maybe handling it is now effectively fixed in your change -- we can clean up later.

return platform.builtInRegexMatch(GetDefaultLocale(), /([^_]*).*/)[1];
if (isPlatformUsingICU) {
const def = platform.getDefaultLocale();
const match = platform.builtInRegexMatch(def, LANG_TAG_EXT_RE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LANG_TAG_EXT_RE) [](start = 58, length = 16)

Ensure this regex is not undefined first. If it is undefined, run the initializer function function. This pattern is used elsewhere.

e.g.:
if (!LANG_TAG_RE) {
InitializeLangTagREs();
}

dilijev
dilijev previously approved these changes Sep 27, 2017
Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@dilijev dilijev dismissed their stale review September 27, 2017 23:13

revoking review

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First resolve that initialization issue and regenerate the bytecode and ensure that the IntlICU build passes all tests except for Intl.

@jackhorton jackhorton force-pushed the icu-defaults branch 2 times, most recently from 429a3ec to d779273 Compare September 28, 2017 00:13
Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@boingoing boingoing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (isPlatformUsingICU) {
// ICU's getDefaultLocale() will return a valid BCP-47/RFC 5646 langtag
locale = GetDefaultLocale();
const match = platform.builtInRegexMatch(locale, /-u(-[^\-][^\-]?-[^\-]+)*-co-([^\-]+).*/);
Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(-[^\-][^\-]?-[^\-]+) [](start = 64, length = 21)

nits (TODO in a future update):
You don't use the first capture group so make it non-matching with ?:
Add comments explaining what this is matching for (or rename match to something more descriptive). (Edit: eh, if (match) collation = is pretty clear, nevermind. A comment might still be nice.

const match = platform.builtInRegexMatch(locale, /-u(-[^\-][^\-]?-[^\-]+)*-co-([^\-]+).*/);
if (match) {
collation = match[2];
locale = callInstanceFunc(StringInstanceReplace, locale, `-co-${match[2]}`, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match[2] [](start = 80, length = 8)

nit (TODO): you just set collation = match[2], so use ${collation} instead

Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, do we intentionally still replace -co-value instead of stripping all subtags? (I'm guessing this mapping possibly involves other subtags?)
nit (TODO): (Come to think of it reverseLocaleAcceptingCollationValues needs a comment or example explaining what the resulting object looks like, because I hate having to use a debugger or reasonable about that code every time I want to answer that question.)
nit (TODO): (Come to think of it, running a bunch of JS code for something that is clearly something we could pre-compute is a bad idea. We should just cache the result directly in the script.)


In reply to: 141544228 [](ancestors = 141544228)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know (and as far as its variable name suggests), localesAcceptingCollationValues is only for collation values. As for whether the other extensions need to be stripped, as of right now, this function is only used as the defaultLocaleFunc for resolveLocales when initializing the Collator (initializing NumberFormat and DateTimeFormat uses strippedDefaultLocale instead). That internally calls platform.getExtensions, which falls back to getExtensionSubtags(locale), which is basically just

locale
  .match(LANG_TAG_EXT_RE)[0] // get extensions on the locale string
  .split('-') // split them into an array
  .filter(x => !!x) // remove empty elements

So, should we strip all subtags? No, probably not. Also, thinking about it, we probably shouldn't be stripping all subtags for NumberFormat or DateTimeFormat either. Id imagine the original reason for the different functions was that Collator needed the system collation value, so it went through the process of stripping "_collation" and mapping and re-formatting it, while NumberFormat and DateTimeFormat didn't need the collation, so they could blindly throw away the "_collation" part of the string, because they knew they could get -ca and -nu formats later from platform.getExtensions. In ICU, we cant do that right now and possibly don't need to do that ever. Also, since we are caching the results now, the perf cost of calculating the mapped collation value (if we even need to, re: the comment below) is 0 for the Number and Date case.

TL;DR: Stripping the locale for ICU is at best useless and at worst harmful, so I think we should actually ditch the ICU implementation of strippedDefaultLocale regardless

locale = callInstanceFunc(StringInstanceReplace, locale, `-co-${match[2]}`, "");
}
} else {
// Windows' getDefaultLocale() will return a weird RFC4646 langtag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird [](start = 57, length = 5)

nit (TODO): I think we should remove "weird" since we're just editorializing. It simply is an RFC4646 langtag, and weird here makes it sound like something about the tags are not quite RFC4646. Maybe maybe the distinction as: "RFC4646 (not RFC5646/BCP47) langtag"

const collationMapForLocale = reverseLocaleAcceptingCollationValues[locale];
if (collationMapForLocale === undefined) {
// Assume the system wouldn't give us back a bad collation value
__mappedDefaultLocale = `${locale}-u-co-${collation}`;
Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${locale}-u-co-${collation}; [](start = 36, length = 30)

Have you exercised this path with a hypothetical default locale containing -u- ?
For some reason I think we'd be adding a duplicate singleton -u which (at least according to Intl spec) is not allowed in langtags.

https://tc39.github.io/ecma402/#sec-isstructurallyvalidlanguagetag 6.2.2. (bullet 3) "does not include duplicate singleton subtags."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I think this should be resolved by making __mappedDefaultLocale = `${locale}-co-${collation}`

var bcpTag = availableBcpTags[collation];
if (bcpTag !== undefined) {
return locale + "-u-co-" + bcpTag;
const collationMapForLocale = reverseLocaleAcceptingCollationValues[locale];
Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverseLocaleAcceptingCollationValues [](start = 38, length = 37)

I think the whole point of these mappings is to translate RFC 4646 (or associated standard/database/list) collation values into RFC 5646 (or associated) collation values. See the Intl explainer for the RFC 5646-associated list.
IOW this shouldn't be necessary for ICU because the langtag should already be RFC 5646.
However, since this value is in theory reported by the system, it might still be a problem. I'd like to think that ICU would convert or ignore anything it gets from Windows that isn't valid according to ICU. But I guess we can't assume that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a lookup in a dictionary once during the lifetime of the program is worth not needing to worry about this, I think. We may want to re-evaluate that decision if we make these dictionaries into a native-calculated, hardcoded thing (then, we could just not have this dictionary at all if ICU is enabled).

if (mappedCollation !== undefined) {
__mappedDefaultLocale = `${locale}-u-co-${mappedCollation}`;
} else {
__mappedDefaultLocale = `${locale}-u-co-${collation}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same -- have you exercised these paths? The Intl harness should be helpful here.

} else {
resolved = undefined;
}
return setPrototype({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (TODO): add a blank line before the return.

if (lpLocaleName)
UErrorCode error = UErrorCode::U_ZERO_ERROR;
char bcp47[ULOC_FULLNAME_CAPACITY];
char defaultLocaleId[ULOC_FULLNAME_CAPACITY];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mused about this in comments in other places -- it's not too expensive to zero-out these arrays = { 0 } and it's safer in case something goes wrong.
Now that we run this function only once on Intl initialization (thanks to caching the returned value in Intl.js), it seems worth it to pay that cost here.

char defaultLocaleId[ULOC_FULLNAME_CAPACITY];

int32_t written = uloc_getName(nullptr, defaultLocaleId, ULOC_FULLNAME_CAPACITY, &error);
if (U_FAILURE(error) || written <= 0 || written >= ULOC_FULLNAME_CAPACITY)
Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (U_FAILURE(error) || written <= 0 || written >= ULOC_FULLNAME_CAPACITY) [](start = 7, length = 75)

nit: success branch first for readability (as you did below), unless it's easier to express the failure case in an if conditional, or if you only do something in case of failure (basically if the else is the success path, it hurts readability).
To work around this, personally I will sometimes declare a boolean named (something like) success or failed and then negate if necessary depending on what I actually want to express for the if condition.

In this case, personally I think the success case reads better: (U_SUCCESS(error) && written > 0 && written < CAPACITY)

Copy link
Contributor

@dilijev dilijev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look at a few places. (Sorry for adding yet another review.)

Also note there's an incoming change from @boingoing that I'm merging from release/1.7 into intl-icu that we'll need to rebase on, and then do a proper bytecode regen (with build) to avoid nasty merge issues later.

@dilijev
Copy link
Contributor

dilijev commented Sep 28, 2017

@dotnet-bot test this please

@dilijev dilijev closed this Sep 28, 2017
@dilijev dilijev reopened this Sep 28, 2017
@dilijev
Copy link
Contributor

dilijev commented Sep 28, 2017

@mmitche is something broken?

@mmitche
Copy link
Contributor

mmitche commented Sep 28, 2017

@dilijev Jenkins upgrade hit a bunch of github api rate limit issues

var reverseLocaleAcceptingCollationValues = (function () {
var toReturn = setPrototype({}, null);
// reverses the keys and values in each locale's sub-object in localesAcceptingCollationValues
// localesAcceptingCollationValues[locale][key] = value -> reverseLocalesAcceptingCollationValues[locale][value] = key
Copy link
Contributor

@dilijev dilijev Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the explainer comment!

@chakrabot chakrabot merged commit dba6a9d into chakra-core:intl-icu Sep 29, 2017
chakrabot pushed a commit that referenced this pull request Sep 29, 2017
… fixes up ICU extension handling

Merge pull request #3820 from jackhorton:icu-defaults

Closes #3658
@jackhorton jackhorton deleted the icu-defaults branch September 29, 2017 16:54
chakrabot pushed a commit that referenced this pull request Oct 2, 2017
…Format and node-cc build integration into branch `release/1.7`

Merge pull request #3840 from dilijev:intl-icu

Includes the following PRs merged to branch `intl-icu`

* [MERGE #3809 @dilijev] Intl-ICU: implement Intl.NumberFormat under ICU
* [MERGE #3820 @jackhorton] Implements ICU version of getDefaultLocale, fixes up ICU extension handling
* [MERGE #3822 @obastemur] xplat: fix the path for i18n
* [MERGE #3750 @jackhorton] Expose platform.winglob to Intl.js to detect which i18n lib we are using in the native code.
chakrabot pushed a commit that referenced this pull request Oct 3, 2017
…: Intl.NumberFormat and node-cc build integration into branch `release/1.7`

Merge pull request #3840 from dilijev:intl-icu

Includes the following PRs merged to branch `intl-icu`

* [MERGE #3809 @dilijev] Intl-ICU: implement Intl.NumberFormat under ICU
* [MERGE #3820 @jackhorton] Implements ICU version of getDefaultLocale, fixes up ICU extension handling
* [MERGE #3822 @obastemur] xplat: fix the path for i18n
* [MERGE #3750 @jackhorton] Expose platform.winglob to Intl.js to detect which i18n lib we are using in the native code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants