Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Clarify operations related to merging locale data #804

Merged
merged 3 commits into from
May 29, 2024

Conversation

gibson042
Copy link
Contributor

  • Rename ApplyOptionsToTag to UpdateLanguageId, move its locale id syntax validation out to the (sole) call site, and handle each overriding datum in its own single block.
  • Rename ApplyUnicodeExtensionToTag to MergeLocaleData.
  • Refactor InsertUnicodeExtensionAndCanonicalize(locale, extension) into InsertUnicodeExtensionAndCanonicalize(locale, attributes, keywords) (constructing the new -u- extension inline).
  • Fix a spec bug in CanonicalizeUnicodeLocaleId.

@gibson042 gibson042 requested a review from ryzokuken July 11, 2023 02:48
Copy link
Contributor

@ben-allen ben-allen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent -- I've got some questions about the refactored InsertUnicodeExtensionAndCanonicalize, but that may be because I'm missing what's going on there.

1. Let _components_ be UnicodeExtensionComponents(_extension_).
1. For each element _attr_ of _components_.[[Attributes]], do
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _attr_.
1. For each Record { [[Key]], [[Value]] } _keyword_ in _components_.[[Keywords]], do
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Key]].
1. If _keyword_.[[Value]] is not the empty String, then
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Value]].
1. Assert: _newExtension_ is not equal to *"u"*.
1. Assert: _newExtension_ is not equal to *"-u"*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang, nice catch!

spec/locale.html Outdated
1. Let _language_ be ? GetOption(_options_, *"language"*, ~string~, ~empty~, *undefined*).
1. If _language_ is not *undefined*, then
1. If _language_ cannot be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal replaced with _language_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this set of changes greatly improves readability. thank you!


<emu-clause id="sec-updatelanguageid" type="abstract operation" oldids="sec-apply-options-to-tag">
<h1>
UpdateLanguageId (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of getting LanguageId into the name of this AO, given that the unicode_language_id is what it works with

spec/locale.html Outdated
@@ -51,7 +52,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1>
1. If _numberingSystem_ is not *undefined*, then
1. If _numberingSystem_ cannot be matched by the <code>type</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Set _opt_.[[nu]] to _numberingSystem_.
1. Let _r_ be ! ApplyUnicodeExtensionToTag(_tag_, _opt_, _relevantExtensionKeys_).
1. Let _r_ be MergeLocaleData(_tag_, _opt_, _relevantExtensionKeys_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I don't want to like this name, but that I nevertheless like it because it's such a big improvement over the old name and because I can't think of anything that seems better.

_locale_: a Unicode canonicalized locale identifier,
_extension_: a Unicode locale extension sequence,
_locale_: a language tag,
_attributes_: a List of Strings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?

8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.

That is a goal, but so is isolating construction of Unicode locale extension sequences to one place—and both calling algorithms already have a list of attributes (trivially in the case of ResolveLocale) and handle keywords one by one. I also foresee similar convergence between CanonicalizeUnicodeLocaleId and InsertUnicodeExtensionAndCanonicalize, although I'm not yet taking that step here.

If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.

No, I do also find that replacing iterative string concatenation in ResolveLocale with use of keyword Records increases its comprehensibility. For example, compare:

-1. Let _supportedExtensionAddition_ be the string-concatenation of *"-"*, _key_, *"-"*, and _value_.
+1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: _value_ }.

spec/locale.html Outdated
1. Return CanonicalizeUnicodeLocaleId(_tag_).
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal replaced with _region_.
1. Set _tag_ to _tag_ with the substring matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced with _languageId_.
1. Return _tag_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure I'm getting this one right: it's not necessary to canonicalize here, because the new version of MergeLocaleData is guaranteed to canonicalize it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

spec/locale.html Show resolved Hide resolved
spec/locale.html Outdated Show resolved Hide resolved
@sffc
Copy link
Contributor

sffc commented Aug 23, 2023

For possible inspiration: unicode-org/icu4x#1833

@sffc
Copy link
Contributor

sffc commented Aug 23, 2023

@gibson042 Is this ready for @anba to re-review?

@gibson042
Copy link
Contributor Author

Yes.

@gibson042 gibson042 requested a review from anba August 24, 2023 22:14
@ryzokuken ryzokuken added editorial Involves an editorial fix needs review labels Feb 22, 2024
@sffc
Copy link
Contributor

sffc commented May 2, 2024

I suggest @ben-allen to verify that @gibson042's latest changes address @anba's comments.

Copy link
Member

@ryzokuken ryzokuken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, confirmed that @anba's two comments were appropriately addressed, marking them as resolved.

@ryzokuken
Copy link
Member

@gibson042 please rebase this whenever you could but this looks ready to merge.

@gibson042 gibson042 force-pushed the 2023-07-merge-locale-data branch from 31768c1 to 0de405d Compare May 29, 2024 17:40
@gibson042 gibson042 merged commit 53c382c into tc39:main May 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Involves an editorial fix needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants