-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of some registry functions #420
Conversation
The |
Some comments from a very quick scan:
The number could be a BigDecimal, and you don't want to list all of those possible values! Maybe just \d+?
should be something like: Match a formatted numerical value against CLDR plural categories or against a number literal. That is, I'm looking for some way we can make it harder (or at least alert people to )make the following mistake: match {$count} That is, when $count = 1.1, this fails, with *You have 1 books |
Thank you!
The info comes from ECMAScript, which limits the possible values: The intention was to start with something that is common between ICU and JS. |
<!-- The time zone to use. The only value implementations must recognize | ||
is "UTC"; the default is the runtime's default time zone. | ||
Implementations may also recognize the time zone names of the IANA | ||
time zone database, such as "Asia/Shanghai", "Asia/Kolkata", | ||
"America/New_York". | ||
--> | ||
<option name="timeZone" pattern="timeZoneId"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget offset time zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 100% what ECMA says:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat/DateTimeFormat#timezone
I tried to not include anything that is not currently supported by ECMAScript, as we know there is strong opposition to that.
I'm for it, happy to discuss, but I didn't change this xml.
Added a topic in #422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the function accepts as validated input. It doesn't mean that every possible value is supported
spec/registry.xml
Outdated
the default for currency formatting is the number of minor unit digits provided by | ||
the ISO 4217 currency code list (2 if the list doesn't provide that information). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... unless there is an override that says otherwise, e.g. some curriences with the cash format (e.g. technically COP
has two digits but practically it has none)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. But this is 100% ECMAScript quote.
spec/registry.xml
Outdated
A value with a smaller number of integer digits than this number will be | ||
left-padded with zeros (to the specified length) when formatted. | ||
--> | ||
<option name="minimumIntegerDigits" values="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21" default="1"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue #422 for this (and more)
spec/registry.xml
Outdated
<option name="maximumSignificantDigits" values="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21" default="21"/> | ||
|
||
<match pattern="anyNumber"/> | ||
<match values="zero one two few many"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does * need to appear here? Some locales produce other
as output from plural, which is effectively *
in our syntax. That is, do we want to be able to say "this selector can request the default message"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our spec makes the presence of *
mandatory.
So I didn't include other
for plural and the other selectors.
Of course, open for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an interesting minor issue for the registry spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our spec makes the presence of
*
mandatory. So I didn't includeother
for plural and the other selectors.
Why omit other
if the spec only requires *
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking (why I excluded it): because they mean the same thing?
If I see other
in the registry, I expect to also see other
handled in the when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option: we add it here as a possible value, we also add default=other
, and update the registry spec to say that the default
value for selectors maps to *
I think I like that option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumption is unfortunately false, likely on accident in CLDR, although I miss the context. For Polish, other
is specifically used for fractions, while many
is what you'd expect to be a good other
in most other languages.
There is, however, no need to have that limitation here, and artificially limit more powerful implementations. Instead, what I think we should do is document that if the value exceeds what the implementation is capable of, the maximum supported by the implementation is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a much bigger subset of the JS Intl.NumberFormat & Intl.DateTimeFormat formatters' options than I expected, which is great! Besides the line comments, I would find it easier to consider exactly which parts are being left out: Could a list of such options with their reasonings be provided?
Some of the group members strongly opposed including anything that is not supported by ECMAScript. I anticipate (and hope) that we will discuss PR in the next meeting. |
Updated
ACK, and agree. I don't think we can capture this in the registry.xml, at least not with the current registry dtd. Added it as a discussion point in #422 |
Reverted most changes to the .dtd, except for the ones that were clearly unintentional, and prevented a valid .xml. For example:
This means that the registry can contain (0 or more This makes the examples in the .md invalid, and the registry can't cover the use cases we need. |
So this is ECMAScriptMessageFormatting only?
More seriously, if the restriction is removed, it simply allows clients to
pass in more digits of accuracy; doesn't mean that the implementations have
to support that in formatting.
…On Thu, Jul 13, 2023 at 8:45 AM Mihai Nita ***@***.***> wrote:
There is, however, no need to have that limitation here, and artificially
limit more powerful implementations.
Instead, what I think we should do is document that if the value exceeds
what the implementation is capable of, the maximum supported by the
implementation is used.
Some of the group members strongly opposed including anything that is not
supported by ECMAScript.
So I tried to start as non-controversial as possible.
I anticipate (and hope) that we will discuss PR in the next meeting.
—
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCIGHGOMNQULQ7KN23XQAJYXANCNFSM6AAAAAA2IAO7QA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks for kicking this off.
@@ -90,7 +90,7 @@ For the sake of brevity, only `locales="en"` is considered. | |||
<function name="number"> | |||
<description> | |||
Format a number. | |||
Match a numerical value against CLDR plural categories or against a number literal. | |||
Match a **formatted** numerical value against CLDR plural categories or against a number literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the imply that the input is a localized number that'd be parsed and then matched? Also since this function includes both selection and formatting, should this also say something about the formatting part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I've merged a suggestion from Addison which does not mention formatting.
There is an ongoing discussion in issue #425
Especially this comment #425 (comment)
<pattern id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/> | ||
<pattern id="positiveInteger" regex="[0-9]+"/> | ||
<pattern id="currencyCode" regex="[A-Z]{3}"/> | ||
<pattern id="timeZoneId" regex="[a-zA-Z/]+"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a more restricted grammar that only allows valid IANA timezone IDs, but I don't feel very strongly about it. Exhibit A: https://tc39.es/proposal-temporal/#prod-TimeZoneIANAName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current registry spec does not allow us to do this, only regex and list of values.
I suggested changing the spec to allow for URLs to external specs.
Added it as a discussion point in #422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Temporal polyfill includes a Regexp that works for this purpose, but unfortunately it isn't a literal but I imagine that if we agreed on having a restrictive regex here that only allows valid IANA tzids, then we could come up with a simplified regex that works in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in the longer run (not first commit) we should also add support for Unicode Time Zone Identifiers
The main reason would be stability, which the Olson database IDs does not offer (see linked)
spec/registry.xml
Outdated
<!-- The minimum number of significant digits to use. --> | ||
<option name="minimumSignificantDigits" values="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21" default="1"/> | ||
<!-- The maximum number of significant digits to use. --> | ||
<option name="maximumSignificantDigits" values="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21" default="21"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of these *Digits
<option>
s, I wonder if we could add a convenient shorthand in the DTD... Perhaps a simple <range start="x" end="y"/>
definition could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to positiveInteger
.
But I think the range option might also be handy in other cases.
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Ujjwal Sharma <ryzokuken@disroot.org>
Co-authored-by: Ujjwal Sharma <ryzokuken@disroot.org>
Between several people commenting on this, and the news that ECMAScript considers removing the restriction, I changed this to positiveInteger. I hope it will not be controversial :-) |
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
The JS limit isn't being removed; it's being increased from 20 to 100. Attempting to format a number with a |
…, matchSignature A spin-off from unicode-org#420, in which @mihnita noticed that `registry.dtd` uses alternatives in a wrong way. For example, `<!ELEMENT formatSignature (input?|option*)>` means that `formatSignature` is allowed to have as children either: at most one `input` OR any number of `options`, but not both. Instead, @mihnita suggested using sequences: `<!ELEMENT formatSignature (input?,option*)>`, which this PR does. Note that sequences require the children to appear in a specific order, which isn't something that's useful to us. However, I'm not aware of any way of lifting this requirement that also allows to enforce at most on `input` or exactly one `description`.
@@ -1,6 +1,6 @@ | |||
<!ELEMENT registry (function*|pattern*)> | |||
<!ELEMENT registry (function,pattern)*> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this. I filed #434 to discuss and fix this independently of this PR, so that you can focus it on registry.xml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a decent starting point for us.
I'd prefer the DTD changes to be handled separately, but I'm fine with them being included here and then iterated on in other PRs.
<!-- The formatting style to use. --> | ||
<option name="style" values="decimal currency percent unit" default="decimal"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To consider: If we were to leave out currency
and unit
formatting from the default, then we wouldn't need to say how compound units need to work.
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
…, matchSignature (#434) A spin-off from #420, in which @mihnita noticed that `registry.dtd` uses alternatives in a wrong way. For example, `<!ELEMENT formatSignature (input?|option*)>` means that `formatSignature` is allowed to have as children either: at most one `input` OR any number of `options`, but not both. Instead, @mihnita suggested using sequences: `<!ELEMENT formatSignature (input?,option*)>`, which this PR does. Note that sequences require the children to appear in a specific order, which isn't something that's useful to us. However, I'm not aware of any way of lifting this requirement that also allows to enforce at most on `input` or exactly one `description`. Co-authored-by: Addison Phillips <addison@unicode.org>
No description provided.