-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Clarify operations related to merging locale data #804
Conversation
gibson042
commented
Jul 11, 2023
- Rename ApplyOptionsToTag to UpdateLanguageId, move its locale id syntax validation out to the (sole) call site, and handle each overriding datum in its own single block.
- Rename ApplyUnicodeExtensionToTag to MergeLocaleData.
- Refactor InsertUnicodeExtensionAndCanonicalize(locale, extension) into InsertUnicodeExtensionAndCanonicalize(locale, attributes, keywords) (constructing the new -u- extension inline).
- Fix a spec bug in CanonicalizeUnicodeLocaleId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent -- I've got some questions about the refactored InsertUnicodeExtensionAndCanonicalize, but that may be because I'm missing what's going on there.
spec/locales-currencies-tz.html
Outdated
1. Let _components_ be UnicodeExtensionComponents(_extension_). | ||
1. For each element _attr_ of _components_.[[Attributes]], do | ||
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _attr_. | ||
1. For each Record { [[Key]], [[Value]] } _keyword_ in _components_.[[Keywords]], do | ||
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Key]]. | ||
1. If _keyword_.[[Value]] is not the empty String, then | ||
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Value]]. | ||
1. Assert: _newExtension_ is not equal to *"u"*. | ||
1. Assert: _newExtension_ is not equal to *"-u"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dang, nice catch!
spec/locale.html
Outdated
1. Let _language_ be ? GetOption(_options_, *"language"*, ~string~, ~empty~, *undefined*). | ||
1. If _language_ is not *undefined*, then | ||
1. If _language_ cannot be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, throw a *RangeError* exception. | ||
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal replaced with _language_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this set of changes greatly improves readability. thank you!
|
||
<emu-clause id="sec-updatelanguageid" type="abstract operation" oldids="sec-apply-options-to-tag"> | ||
<h1> | ||
UpdateLanguageId ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of getting LanguageId into the name of this AO, given that the unicode_language_id
is what it works with
spec/locale.html
Outdated
@@ -51,7 +52,7 @@ <h1>Intl.Locale ( _tag_ [ , _options_ ] )</h1> | |||
1. If _numberingSystem_ is not *undefined*, then | |||
1. If _numberingSystem_ cannot be matched by the <code>type</code> Unicode locale nonterminal, throw a *RangeError* exception. | |||
1. Set _opt_.[[nu]] to _numberingSystem_. | |||
1. Let _r_ be ! ApplyUnicodeExtensionToTag(_tag_, _opt_, _relevantExtensionKeys_). | |||
1. Let _r_ be MergeLocaleData(_tag_, _opt_, _relevantExtensionKeys_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I don't want to like this name, but that I nevertheless like it because it's such a big improvement over the old name and because I can't think of anything that seems better.
_locale_: a Unicode canonicalized locale identifier, | ||
_extension_: a Unicode locale extension sequence, | ||
_locale_: a language tag, | ||
_attributes_: a List of Strings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?
8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.
If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I'm wondering about is what this gets us over the old version -- is it just so that InsertUnicodeExtensionAndCanonicalize can do in a more explicit way what this line from the old ApplyUnicodeExtensionsToLocale did?
8. Let newExtension be a Unicode BCP 47 U Extension based on attributes and keywords.
That is a goal, but so is isolating construction of Unicode locale extension sequences to one place—and both calling algorithms already have a list of attributes (trivially in the case of ResolveLocale) and handle keywords one by one. I also foresee similar convergence between CanonicalizeUnicodeLocaleId and InsertUnicodeExtensionAndCanonicalize, although I'm not yet taking that step here.
If all the changes in ResolveLocale were just made to account for the change in the parameters to InsertUnicodeExtensionAndCanonicalize I feel like all else being equal leaving InsertUnicodeExtensionAndCanonicalize unchanged is better.
No, I do also find that replacing iterative string concatenation in ResolveLocale with use of keyword Records increases its comprehensibility. For example, compare:
-1. Let _supportedExtensionAddition_ be the string-concatenation of *"-"*, _key_, *"-"*, and _value_.
+1. Set _supportedKeyword_ to the Record { [[Key]]: _key_, [[Value]]: _value_ }.
spec/locale.html
Outdated
1. Return CanonicalizeUnicodeLocaleId(_tag_). | ||
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal replaced with _region_. | ||
1. Set _tag_ to _tag_ with the substring matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced with _languageId_. | ||
1. Return _tag_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure I'm getting this one right: it's not necessary to canonicalize here, because the new version of MergeLocaleData is guaranteed to canonicalize it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
For possible inspiration: unicode-org/icu4x#1833 |
@gibson042 Is this ready for @anba to re-review? |
Yes. |
I suggest @ben-allen to verify that @gibson042's latest changes address @anba's comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, confirmed that @anba's two comments were appropriately addressed, marking them as resolved.
@gibson042 please rebase this whenever you could but this looks ready to merge. |
31768c1
to
0de405d
Compare