From 8f57d63a055d0d066a06a64d983696c040c90c70 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Tue, 13 Aug 2024 09:35:21 -0700 Subject: [PATCH 1/9] [DESIGN] Number selection design refinements This is to build up and capture technical considerations for how to address the issues raised by @eemeli's PR #842. --- exploration/number-selection.md | 93 ++++++++++++++++++++++++++++++++- 1 file changed, 92 insertions(+), 1 deletion(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index 8453142cc..a7c9bcfe6 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -1,6 +1,6 @@ # Selection on Numerical Values -Status: **Accepted** +Status: **Accepted** (moving back to **Proposed**)
Metadata @@ -53,6 +53,21 @@ Both JS and ICU PluralRules implementations provide for determining the plural c of a range based on its start and end values. Range-based selectors are not initially considered here. +In PR #842 +@eemeli points out a number of gaps or infelicities in the current specification +and there was extensive discussion of how to address these gaps. + +The `key` for exact numeric match in a variant has to be a string. +The format of such strings, therefore, has to be specified if messages are to be portable and interoperable. +In LDML45 Tech Preview we selected JSON's number serialization as a source for `key` values. +The JSON serialization is ambiguous, in that a given number value might be serialized validly in more than one way: +``` +123 +123.0 +1.23E2 +... etc... +``` + ## Use-Cases As a user, I want to write messages that use the correct plural for @@ -75,6 +90,64 @@ either plural or ordinal selection in a single message. > * {{You have {$numRemaining} chances remaining (plural)}} >``` +As a user, I want the selector to match the options specified: +``` +.local $num = {123.456 :number maximumSignificantDigits=2 maximumFractionDigits=2 minimumFractionDigits=2} +.match {$num} +120.00 {{This matches}} +120 {{This does not match}} +123.47 {{This does not match}} +123.456 {{This does not match}} +1.2E2 {{Does this match?}} +* {{ ... }} +``` + +Note that badly written keys just don't match, but we want users to be able to intuit whether a given set of keys will work or not. + +``` +.local $num = {123.456 :integer} +.match {$num} +123.456 {{Should not match?}} +123 {{Should match}} +123.0 {{Should not match?}} +* {{ ... }} +``` + +There can be complications, which we might need to define. Consider: + +``` +.local $num = {123.002 :number maximumFractionDigits=1 minimumFractionDigits=0} +.match {$num} +123.002 {{Should not match?}} +123.0 {{Does minimumFractionDigits make this not match?}} +123 {{Does minimumFractionDigits make this match?}} +* {{ ... }} +``` + +As an implementer, I am concerned about the cost of incorporating _options_ into the selector. +This might be accomplished by building a "second formatter". +Some implementations, such as ICU4J's, might use interfaces like `FormattedNumber` to feed the selector. +Implementations might also apply options by modifying the number value of the _operand_ +(or shadowing the options effect on the value) + +As a user, I want to be able to perform exact match using arbitrary digit numeric types where they are available. +As an implementer, I do **not** want to be required to provide or implement arbitrary precision +numeric types not available in my platform. +Programming/runtime environments vary widely in support of these types. +MF2 should not prevent the implementation of e.g. `BigDecimal` or `BigInt` types +and permit their use in MF2 messages. +MF2 should not _require_ implementations to support such types where they do not exist. +The problem of numeric type precision, +which is implementation dependent, +should not affect how message `key` values are specified. + +> For example: +>``` +>.local $num = {11111111111111.11111111111111 :number} +>.match {$num} +>11111111111111.11111111111111 {{This works on some implementations.}} +>* {{... but not on others? ...}} +>``` ## Requirements @@ -460,3 +533,21 @@ and they _might_ converge on some overlap that users could safely use across pla #### Cons - No guarantees about interoperability for a relatively core feature. + +## Alternatives Considered (`key` matching) + +### Standardize the Serialization Forms + +Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45 +and specify numeric matching by specifying the form of matching `key` values. +Comparison is as-if by string comparison of the serialized forms, just as in LDML45. + +### Compare numeric values + +This is the design proposed in #842. + +This modifies the key-match algorithm to use implementation-defined numeric value exact match: + +> 1. Let `exact` be the numeric value represented by `key`. +> 1. If `value` and `exact` are numerically equal, then + From 6b27f3236e6e4cdd1b337180c8c704a27f7f453d Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 11 Sep 2024 07:48:36 -0700 Subject: [PATCH 2/9] Update examples to match changes to syntax Also responds to the long discussion with @eemeli about significant digits by removing from the example. --- exploration/number-selection.md | 51 +++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index a7c9bcfe6..60be2bd50 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -83,30 +83,29 @@ As a user, I want to write messages that mix exact matching and either plural or ordinal selection in a single message. > For example: >``` ->.match {$numRemaining} ->0 {{You have no more chances remaining (exact match)}} ->1 {{You have one more chance remaining (exact match)}} +>.match $numRemaining +>0 {{You have no more chances remaining (exact match)}} +>1 {{You have one more chance remaining (exact match)}} >one {{You have {$numRemaining} chance remaining (plural)}} -> * {{You have {$numRemaining} chances remaining (plural)}} +>* {{You have {$numRemaining} chances remaining (plural)}} >``` As a user, I want the selector to match the options specified: ``` -.local $num = {123.456 :number maximumSignificantDigits=2 maximumFractionDigits=2 minimumFractionDigits=2} -.match {$num} -120.00 {{This matches}} -120 {{This does not match}} -123.47 {{This does not match}} -123.456 {{This does not match}} -1.2E2 {{Does this match?}} -* {{ ... }} +.local $num = {123.123 :number maximumFractionDigits=2 minimumFractionDigits=2} +.match $num +123.12 {{This matches}} +120 {{This does not match}} +123.123 {{This does not match}} +1.23123E2 {{Does this match?}} +* {{ ... }} ``` Note that badly written keys just don't match, but we want users to be able to intuit whether a given set of keys will work or not. ``` .local $num = {123.456 :integer} -.match {$num} +.match $num 123.456 {{Should not match?}} 123 {{Should match}} 123.0 {{Should not match?}} @@ -117,7 +116,7 @@ There can be complications, which we might need to define. Consider: ``` .local $num = {123.002 :number maximumFractionDigits=1 minimumFractionDigits=0} -.match {$num} +.match $num 123.002 {{Should not match?}} 123.0 {{Does minimumFractionDigits make this not match?}} 123 {{Does minimumFractionDigits make this match?}} @@ -131,10 +130,11 @@ Implementations might also apply options by modifying the number value of the _o (or shadowing the options effect on the value) As a user, I want to be able to perform exact match using arbitrary digit numeric types where they are available. + As an implementer, I do **not** want to be required to provide or implement arbitrary precision numeric types not available in my platform. Programming/runtime environments vary widely in support of these types. -MF2 should not prevent the implementation of e.g. `BigDecimal` or `BigInt` types +MF2 should not prevent the implementation using, for example, `BigDecimal` or `BigInt` types and permit their use in MF2 messages. MF2 should not _require_ implementations to support such types where they do not exist. The problem of numeric type precision, @@ -144,7 +144,7 @@ should not affect how message `key` values are specified. > For example: >``` >.local $num = {11111111111111.11111111111111 :number} ->.match {$num} +>.match $num >11111111111111.11111111111111 {{This works on some implementations.}} >* {{... but not on others? ...}} >``` @@ -351,7 +351,8 @@ but can cause problems in target locales that the original developer is not cons > considering other locale's need for a `one` plural: > > ``` -> .match {$var} +> .input {$var :integer} +> .match $var > 1 {{You have one last chance}} > one {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian > // such locales typically require other keywords @@ -365,6 +366,12 @@ but can cause problems in target locales that the original developer is not cons When implementing `style=percent`, the numeric value of the operand MUST be divided by 100 for the purposes of formatting. +> For example, +> ``` +> .local $percent = {1000 :integer style=percent} +> {{This formats as '10%' in the en-US locale: {$percent}}} +> ``` + ### Selection When implementing [`MatchSelectorKeys`](spec/formatting.md#resolve-preferences), @@ -489,7 +496,9 @@ To expand on the last of these, consider this message: ``` -.match {$count :plural minimumFractionDigits=1} +.input {$count :number minimumFractionDigits=1} +.local $selector = {$count :plural} +.match $selector 0 {{You have no apples}} 1 {{You have exactly one apple}} * {{You have {$count :number minimumFractionDigits=1} apples}} @@ -504,9 +513,9 @@ With the proposed design, this message would much more naturally be written as: ``` .input {$count :number minimumFractionDigits=1} -.match {$count} -0 {{You have no apples}} -1 {{You have exactly one apple}} +.match $count +0.0 {{You have no apples}} +1.0 {{You have exactly one apple}} one {{You have {$count} apple}} * {{You have {$count} apples}} ``` From dfcaa10d4e574d73b5de0a00fa533a9804b14806 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 16 Sep 2024 11:28:44 -0700 Subject: [PATCH 3/9] Address 2024-09-16 call comments This changes the status to "Re-Opened" and adds a link to the PR. Expect to merge this imminently, although discussion on number selection remains. --- exploration/number-selection.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index 60be2bd50..d48aee2f7 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -1,6 +1,6 @@ # Selection on Numerical Values -Status: **Accepted** (moving back to **Proposed**) +Status: **Re-Opened**
Metadata @@ -13,6 +13,7 @@ Status: **Accepted** (moving back to **Proposed**)
Pull Request
#471
#621
+
#859
From 9bd4097d90ba8bf72a52900db28e212efc549527 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 16 Sep 2024 13:33:43 -0700 Subject: [PATCH 4/9] Update exploration/number-selection.md Co-authored-by: Eemeli Aro --- exploration/number-selection.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index d48aee2f7..d60909632 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -365,12 +365,12 @@ but can cause problems in target locales that the original developer is not cons ### Percent Style When implementing `style=percent`, the numeric value of the operand -MUST be divided by 100 for the purposes of formatting. +MUST be multiplied by 100 for the purposes of formatting. > For example, > ``` -> .local $percent = {1000 :integer style=percent} -> {{This formats as '10%' in the en-US locale: {$percent}}} +> .local $percent = {1 :integer style=percent} +> {{This formats as '100%' in the en-US locale: {$percent}}} > ``` ### Selection From da9377b5c48128a42749d49dd3869c55a1483682 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 26 Oct 2024 08:37:02 -0700 Subject: [PATCH 5/9] Update from main (#914) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Create notes-2024-08-19.md * Accept attributes design & remove spec note (#845) * Accept attributes design & remove spec note * Disallow duplicate attribute names (closes #756) * Add link to contextual options PR * Add more prose to tag example text Co-authored-by: Addison Phillips * Mention attribute validity condition in the **_valid_** definition --------- Co-authored-by: Addison Phillips * Update selection-declaration design doc based on mtg / issue discussion (#867) * Add tests for pattern selection (#863) * Add tests for pattern selection * Add missing errors * Apply suggestions from code review Co-authored-by: Addison Phillips --------- Co-authored-by: Addison Phillips * Add Duplicate Variant to table in test/README.md (#861) * Add new selection-declaration alternative: Require annotation of selector variables in placeholders (#860) * Add new selection-declaration alternative: Require annotation of selector variables in placeholders * Improve examples * Switch example order * Update the stability policy (#834) * Update the stability policy Based on discussion in the 2024-07-22 call and in PR #829, update the stability policy. * A deeper, more thorough rewrite - Standardizes the phrasing completely. - Moves all potential future changes (which are not, after all, stability policies) to an "important" block - Removes duplication - Separates functions, options, and option values into separate guarantees - Clarifies the note about formatting changing over time * Update spec/README.md Co-authored-by: Tim Chevalier * Update spec/README.md Co-authored-by: Eemeli Aro * remove well-formed * Update spec/README.md --------- Co-authored-by: Tim Chevalier Co-authored-by: Eemeli Aro * Refine error handling text (#816) * Refine error handling text * Apply suggestions from code review Co-authored-by: Addison Phillips * Update fallback text * Turn bullet point list into paragraphs * Be more mighty Co-authored-by: Addison Phillips --------- Co-authored-by: Addison Phillips * Create notes-2024-08-26.md * Select "Match on variables instead of expressions" for selection-declarations (#824) * Select "Match on variables instead of expressions" for selection-declarations * Add hybrid option to selection-declaration.md (#870) * Add hybrid option to selection-declaration.md * Update selection-declaration.md fixed glitch in original edit * Update selection-declaration.md * Apply suggestions from code review Fixing typos Co-authored-by: Addison Phillips * Update selection-declaration.md * Update exploration/selection-declaration.md Co-authored-by: Eemeli Aro * Update exploration/selection-declaration.md Co-authored-by: Eemeli Aro * Update exploration/selection-declaration.md Co-authored-by: Eemeli Aro --------- Co-authored-by: Addison Phillips Co-authored-by: Eemeli Aro * Update selection-declaration.md --------- Co-authored-by: Mark Davis Co-authored-by: Addison Phillips * Fix "Allow immutable input declarative selectors" example (#874) * Update README.md (#875) * Update README.md * Update README.md * [DESIGN] Update bidi design document to show proposed design (#871) * [DESIGN] Update bidi design document to show proposed design The design I actually think we should adopt is the "hybrid approaches" one. This is a necessary first step on the highway to UAX31 compliance and I think is responsibly contained/managed. It is a hybrid approach, in that it permits testable strict implementations to be created (particularly for message serialization). This PR consists of moving text around. I added one "pro" to one option also. * Address comments * Miscellaneous test fixes (#862) * Add missing expected bad-selector errors * Fix expected parts for unsupported-statement test * Add a few new tests for leading-whitespace and duplicate-variant * Add tests for escaped-char changes made in #743 * Fix tests for attributes with variable values * Update contributing and joining info (#876) * Update contributing and joining info * Update README.md * Update CONTRIBUTING.md * Restore CLA copy * Clarify error & fallback handling (#879) * Clarify error & fallback handling * Apply suggestions from code review Co-authored-by: Addison Phillips * Select last rather than first attribute * Drop mention of "starting with Pattern Selection" * Attributes can't change the formatted output * Use "nor" instead of "or" regarding attribute restrictions --------- Co-authored-by: Addison Phillips * Clarify rule selection (#878) * Clarify rule selection Fixes #868 This adds normative SHOULD language to using CLDR plural and ordinal data, which was intended originally. - clarifies that keyword selection follows exact match - clarifies the purpose of rule-based selection - makes non-CLDR-based implementation permitted * Update spec/registry.md Co-authored-by: Eemeli Aro * Update spec/registry.md Co-authored-by: Eemeli Aro * Update spec/registry.md Co-authored-by: Eemeli Aro --------- Co-authored-by: Eemeli Aro * [DESIGN] Maintaining the Standard, Optional and Unicode Namespace Function Sets (#634) * Design doc to capture registry maintenance * Update maintaining-registry.md * Update exploration/maintaining-registry.md Co-authored-by: Tim Chevalier * Update exploration/maintaining-registry.md Co-authored-by: Tim Chevalier * Add user stories, small updates to RGI * Update exploration/maintaining-registry.md * Adding additional detail * Remove machine readable registry; update prose * Update maintaining-registry.md * Further development work * Update to change format and naming Per the 2024-08-19 call, we decided to switch towards a specification-per-function model, with statuses. This commit includes the initial set of changes to try and implement this. * Address some comments. --------- Co-authored-by: Tim Chevalier * Create notes-2024-09-09.md * Fix a typo in an example (#880) The upcoming work to implement resolved value might make this patch unnecessary or obsolete, but fixing the typo (missing `{`/`}` around the variable in the pattern) just in case * Remove forward-compatibility promise and all reserved & private syntax (#883) * Remove forwards compatibility from stability guarantee * Drop reserved statements and expressions * Drop private-use annotations * Update tests * Clarify that deprecation is not removal * Match on variables instead of expressions (#877) * Match on variables instead of expressions * Apply suggestions from code review Co-authored-by: Addison Phillips * Apply suggestions from code review * Add missing test changes noticed during implementation * Empty commit to re-trigger CLA check --------- Co-authored-by: Addison Phillips * Create notes-2024-09-10.md * Add bidi support and address UAX31/UTS55 requirements (#884) * Add bidi support and address UAX31/UTS55 requirements Adds the bidi strong marks ALM, RLM, and LRM plus the bidi isolate controls LRI, RLI, FSI, and PDI to the syntax. Formally defines optional vs. non-optional whitespace. Non-optional whitespace must include at least one whitespace character. Optional whitespace may contain only bidi marks (which are invisible) * Update syntax.md including text from previous PR * Repair the guidance on strongly directional marks Include ALM and better specify how to use the marks. * Fix formatting of the "important" * Add bidi characters to description of whitespace. * Permit bidi in a few more places Add optional whitespace at the start of `variant` Add optional whitespace around `quoted-pattern` These changes result in allowing bidi around keys and quoted patterns as intended. * Update syntax.md ABNF * Update formatting.md - Add a note about the difference between formatting and message syntax. - Clarify the sentence about message directionality. * Address comment about name/identifier * Address comments related to bidi in `name` * Fix variable's location * Address comment about the list of LRI/PDI targets * One character typo :-P * Update spec/syntax.md Co-authored-by: Eemeli Aro * Address comments about rule R3a-1 * Update spec/syntax.md Co-authored-by: Eemeli Aro * Address comment about U+061C * Change [o]wsp => `o` or `s` * Match syntax spec to abnf * Remove * * Update syntax.md * Update spec/syntax.md Co-authored-by: Eemeli Aro * Update spec/message.abnf Co-authored-by: Eemeli Aro * Update spec/message.abnf Co-authored-by: Eemeli Aro * Update syntax.md * Update spec/message.abnf Co-authored-by: Eemeli Aro * Update spec/syntax.md Co-authored-by: Eemeli Aro * Update spec/syntax.md Co-authored-by: Eemeli Aro --------- Co-authored-by: Eemeli Aro * Specify `bad-option` for bad digit size option values (#882) * Specify `bad-option` for bad digit size option values Fixes #739 * adopt 'non-negative integer' * Create notes-2024-09-16.md * Address name and literal equality (#885) * Address name and literal equality This change defines equality as discussed in the 2024-09-09 teleconference in the following ways: - It defines _name_ equality as being under NFC - It defines _literal_ equality as explicitly **not** under NFC - It moves _name_ before _identifier_ in that section of text to avoid a forward definition. Note that this deviates from discussion in 2024-09-09's call in that we didn't discuss literals at length. It also doesn't discuss non-name/non-literal values, which I'll point out are limited to ASCII sequences such as keywords. * Typo fix * Add a note about not requiring implementations to actually normalize * Implement changes dicussed in 2024-09-16 call. - Make _key_ require NFC for uniqueness/comparison - Add a note about NFC - Make _literal_ **_not_** define equality - Make text in _name_ identical to that in _key_ for consistency * Update formatting.md to include keys in NFC * Address comments * Update spec/syntax.md Co-authored-by: Eemeli Aro * Update spec/syntax.md Co-authored-by: Eemeli Aro --------- Co-authored-by: Eemeli Aro * Update list of normative changes during the LDML45 period (#890) * Fix typos in data-model-errors tests (#892) Fix #886 * Update note on exact numeric match for v46 (#891) Addresses #887 Non-normative changes to the notes specifically part of LDML46 * Fix attribute value to be literal (#894) Fixes #893 * Create notes-2024-09-30.md * Add Resolved Values and Function Handler sections to formatting (#728) * Add Resolved Values section to formatting * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Tim Chevalier * Linkify "resolved value" * Add some examples & explicitly allow wrapping input values * No throw, only emit Co-authored-by: Tim Chevalier * Add section on Function Handlers, defining the term * Apply suggestions from code review * Rephrase initial resolved value definition * Update spec/formatting.md Co-authored-by: Eemeli Aro * Update resolved value definition again Co-authored-by: Addison Phillips --------- Co-authored-by: Tim Chevalier Co-authored-by: Addison Phillips * Define function composition for :number and :integer values (#823) * Define function composition for :number and :integer values * Apply suggestions from code review Co-authored-by: Addison Phillips * Add operand option priority example * Add apostrophes' Co-authored-by: Tim Chevalier * Update spec/registry.md Co-authored-by: Eemeli Aro * Update spec/registry.md Co-authored-by: Eemeli Aro --------- Co-authored-by: Addison Phillips Co-authored-by: Tim Chevalier * Create notes-2024-10-07.md * Apply NFC normalization during :string key comparison (#905) * Apply NFC normalization during :string key comparison * Add link to UAX#15 Co-authored-by: Addison Phillips --------- Co-authored-by: Addison Phillips * Add tests for changes due to bidi/whitespace (#902) * Add tests for changes due to bidi/whitespace * Correct output * Make erroneous test a syntax error * Define function composition for date/time values (#814) * Define function composition for date/time values * Apply suggestions from code review Co-authored-by: Stanisław Małolepszy * Drop the "only" * Update spec/registry.md * Update spec/registry.md Co-authored-by: Eemeli Aro * Update spec/registry.md Co-authored-by: Eemeli Aro * Update spec/registry.md Co-authored-by: Eemeli Aro * Make :date and :time composition implementation-defined --------- Co-authored-by: Stanisław Małolepszy Co-authored-by: Addison Phillips * DESIGN: Add alternative designs to the design doc on function composition (#806) * DESIGN: Add a sequel to the design doc on function composition This document sketches out some alternatives for the machinery provided to enable function composition. The goal is to provide an exhaustive list of alternatives. * Remove 'part 2' document and move contents to the end of part 1 * Revise introduction to reflect the changed goal * Edited for conciseness * Further edits for conciseness * Give a name to InputType and use it * Refer to motivating examples * Update function-composition-part-1.md status Per 2024-10-14 telecon * Create notes-2024-10-14.md * Add test for :integer and :number composition (#907) * Fix `:integer` option `useGrouping` values (#912) I noticed that `:integer` does not include the "never" value for the option `useGrouping`. This is a bug. * Drop syntax note on additional bidi changes (#910) Drop syntax note on addition bidi changes * Add tests for changes due to #885 (name/literal equality) (#904) * Add tests for changes due to #885 (name/literal equality) * Update test/tests/functions/string.json Co-authored-by: Eemeli Aro * Update test/tests/syntax.json Co-authored-by: Eemeli Aro * Update test/tests/functions/string.json Co-authored-by: Eemeli Aro * Added tests for reordering and special case mapping * Add another selection test --------- Co-authored-by: Eemeli Aro * Add u: options namespace (#846) * Move spec/registry.md -> spec/registry/default.md * Add Unicode Registry definition * Refer to BCP47, add note about only requiring normal tags * Call it a namespace * Apply suggestions from code review Co-authored-by: Addison Phillips * Fix test file reference Co-authored-by: Tim Chevalier * Apply suggestions from code review * Update spec/u-namespace.md Co-authored-by: Eemeli Aro * Apply suggestions from code review Co-authored-by: Addison Phillips * Apply suggestions from code review Co-authored-by: Addison Phillips * Add mention of functions to namespace description --------- Co-authored-by: Addison Phillips Co-authored-by: Tim Chevalier * Define function composition for :string values (#798) * Define function composition for :string values * Update spec/registry.md as suggested by @stasm in #814 * Drop the "only" * Update text following code review comments --------- Co-authored-by: Addison Phillips * Drop data model request for feedback on "name" (#909) * Allow surrogates in content, issue #895 (#906) * Allow surrogates in content, issue #895 * Grammar and typos, linkify terms, make into a note, and fix 2119 keywords Thanks Addison! Co-authored-by: Addison Phillips * Not using "localizable elements" Co-authored-by: Addison Phillips * Keep syntax.md in sync with message.abnf * Added note about surrogates to quoted literals * Moved the note about surrogates from Security Considerations to The Message * Update spec/syntax.md * Update spec/syntax.md * Italicize in a couple of places * Implemeted more (all?) feedback from review --------- Co-authored-by: Addison Phillips --------- Co-authored-by: Eemeli Aro Co-authored-by: Elango Cheran Co-authored-by: Tim Chevalier Co-authored-by: Mark Davis Co-authored-by: Danny Gleckler Co-authored-by: Steven R. Loomis Co-authored-by: Stanisław Małolepszy Co-authored-by: Eemeli Aro Co-authored-by: Mihai Nita --- CONTRIBUTING.md | 9 +- README.md | 47 +- exploration/bidi-usability.md | 71 ++- exploration/expression-attributes.md | 4 +- exploration/function-composition-part-1.md | 258 ++++++++-- exploration/maintaining-registry.md | 316 ++++++++++++ exploration/registry-xml/README.md | 3 +- exploration/selection-declaration.md | 150 +++++- meetings/2024/notes-2024-08-19.md | 272 ++++++++++ meetings/2024/notes-2024-08-26.md | 151 ++++++ meetings/2024/notes-2024-09-09.md | 167 +++++++ meetings/2024/notes-2024-09-10.md | 361 ++++++++++++++ meetings/2024/notes-2024-09-16.md | 398 +++++++++++++++ meetings/2024/notes-2024-09-30.md | 216 ++++++++ meetings/2024/notes-2024-10-07.md | 396 +++++++++++++++ meetings/2024/notes-2024-10-14.md | 298 +++++++++++ spec/README.md | 84 ++-- spec/appendices.md | 6 +- spec/data-model/README.md | 92 +--- spec/data-model/message.dtd | 22 +- spec/data-model/message.json | 80 +-- spec/errors.md | 116 ++--- spec/formatting.md | 484 +++++++++--------- spec/message.abnf | 91 ++-- spec/registry.md | 208 ++++++-- spec/syntax.md | 549 ++++++++++----------- spec/u-namespace.md | 87 ++++ test/README.md | 49 +- test/schemas/v0/tests.schema.json | 5 +- test/tests/bidi.json | 145 ++++++ test/tests/data-model-errors.json | 24 +- test/tests/functions/date.json | 4 +- test/tests/functions/integer.json | 6 +- test/tests/functions/number.json | 167 ------- test/tests/functions/string.json | 33 +- test/tests/functions/time.json | 4 +- test/tests/pattern-selection.json | 120 +++++ test/tests/syntax-errors.json | 109 +++- test/tests/syntax.json | 319 +++--------- test/tests/u-options.json | 126 +++++ test/tests/unsupported-expressions.json | 53 -- test/tests/unsupported-statements.json | 18 - 42 files changed, 4626 insertions(+), 1492 deletions(-) create mode 100644 exploration/maintaining-registry.md create mode 100644 meetings/2024/notes-2024-08-19.md create mode 100644 meetings/2024/notes-2024-08-26.md create mode 100644 meetings/2024/notes-2024-09-09.md create mode 100644 meetings/2024/notes-2024-09-10.md create mode 100644 meetings/2024/notes-2024-09-16.md create mode 100644 meetings/2024/notes-2024-09-30.md create mode 100644 meetings/2024/notes-2024-10-07.md create mode 100644 meetings/2024/notes-2024-10-14.md create mode 100644 spec/u-namespace.md create mode 100644 test/tests/bidi.json create mode 100644 test/tests/pattern-selection.json create mode 100644 test/tests/u-options.json delete mode 100644 test/tests/unsupported-expressions.json delete mode 100644 test/tests/unsupported-statements.json diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d28236c05..1b2bb58bf 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,13 +1,6 @@ # Contributing to this project -## Joining the Working Group - -We are looking for participation from software developers, localization engineers and others with experience -in Internationalization (I18N) and Localization (L10N). If you wish to contribute to this work, please review -the information on the Contributor License Agreement below. In addition, you should: - -1. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -2. Watch this repository (use the "Watch" button in the upper right corner) +To join this Working Group, please read the information in the [README.md](./README.md) as well as the Contributor License Agreement information just below: diff --git a/README.md b/README.md index 8323f49e6..a0fe16304 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,8 @@ Functions can optionally take _options_: Messages can use a _selector_ to choose between different _variants_, which correspond to the grammatical (or other) requirements of the language: - .match {$count :integer} + .input {$count :integer} + .match $count 0 {{You have no notifications.}} one {{You have {$count} notification.}} * {{You have {$count} notifications.}} @@ -105,6 +106,23 @@ The `main` branch of this repository contains changes implemented since the tech Implementers should be aware of the following normative changes during the tech preview period. See the [commit history](https://github.com/unicode-org/message-format-wg/commits) after 2024-04-13 for a list of all commits (including non-normative changes). +- [#885](https://github.com/unicode-org/message-format-wg/issues/885) Address equality of `name` and `literal` values, including requiring keys to use NFC +- [#884](https://github.com/unicode-org/message-format-wg/issues/884) Add support for bidirectional isolates and strong marks in syntax and address UAX31/UTS55 requirements +- [#883](https://github.com/unicode-org/message-format-wg/issues/883) Remove forward-compatibility promise and all reserved/private syntax. +- [#882](https://github.com/unicode-org/message-format-wg/issues/882) Specify `bad-option` error for bad digit size options in `:number` and `:integer` functions +- [#878](https://github.com/unicode-org/message-format-wg/issues/878) Clarify "rule" selection in `:number` and `:integer` functions +- [#877](https://github.com/unicode-org/message-format-wg/issues/877) Match on variables instead of expressions. +- [#854](https://github.com/unicode-org/message-format-wg/issues/854) Allow whitespace at complex message start +- [#853](https://github.com/unicode-org/message-format-wg/issues/853) Add a "duplicate-variant" error +- [#845](https://github.com/unicode-org/message-format-wg/issues/845) Define "attributes" feature +- [#834](https://github.com/unicode-org/message-format-wg/issues/834) Modify the stability policy (not currently in effect due to Tech Preview) +- [#816](https://github.com/unicode-org/message-format-wg/issues/816) Refine error handling +- [#815](https://github.com/unicode-org/message-format-wg/issues/815) Removed machine-readable function registry as a deliverable +- [#813](https://github.com/unicode-org/message-format-wg/issues/813) Change default of `:date` and `:datetime` date formatting from `short` to `medium` +- [#812](https://github.com/unicode-org/message-format-wg/issues/812) Allow trailing whitespace for complex messages +- [#793](https://github.com/unicode-org/message-format-wg/issues/793) Recommend the use of escapes only when necessary +- [#775](https://github.com/unicode-org/message-format-wg/issues/775) Add formal definitions for variable, external variable, and local variable +- [#774](https://github.com/unicode-org/message-format-wg/issues/774) Refactor errors, adding Message Function Errors - [#771](https://github.com/unicode-org/message-format-wg/issues/771) Remove inappropriate normative statement from errors.md - [#767](https://github.com/unicode-org/message-format-wg/issues/767) Add a test schema and [#778](https://github.com/unicode-org/message-format-wg/issues/778) validate tests against it @@ -113,7 +131,9 @@ after 2024-04-13 for a list of all commits (including non-normative changes). - [#769](https://github.com/unicode-org/message-format-wg/issues/769) Add `:test:function`, `:test:select` and `:test:format` functions for implementation testing - [#743](https://github.com/unicode-org/message-format-wg/issues/743) Collapse all escape sequence rules into one (affects the ABNF) -- _more to be added as they are merged_ + +In addition to the above, the test suite is significantly modified and updated. + ## Implementations @@ -137,18 +157,27 @@ We invite feedback about the current syntax draft, as well as the real-life use- - General questions and thoughts → [post a discussion thread](https://github.com/unicode-org/message-format-wg/discussions). - Actionable feedback (bugs, feature requests) → [file a new issue](https://github.com/unicode-org/message-format-wg/issues). -## Participation +## Participation / Joining the Working Group + +We are looking for participation from software developers, localization engineers and others with experience +in Internationalization (I18N) and Localization (L10N). +If you wish to contribute to this work, please review the information about the Contributor License Agreement below. -To join in: +To follow this work: +1. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) +2. Watch this repository (use the "Watch" button in the upper right corner) -1. Review [CONTRIBUTING.md](./CONTRIBUTING.md) -2. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -3. Watch this repository (use the "Watch" button in the upper right corner) +To contribute to this work, in addition to the above: +1. Each individual MUST have a copy of the CLA on file. See below. +2. Individuals who are employees of Unicode Member organizations SHOULD contact their member representative. + Individuals who are not employees of Unicode Member organizations MUST contact the chair to request Invited Expert status. + Employees of Unicode Member organizations MAY also apply for Invited Expert status, + subject to approval from their member representative. ### Copyright & Licenses Copyright © 2019-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries. -The project is released under [LICENSE](./LICENSE). - A CLA is required to contribute to this project - please refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) file (or start a Pull Request) for more information. + +The contents of this repository are governed by the Unicode [Terms of Use](https://www.unicode.org/copyright.html) and are released under [LICENSE](./LICENSE). diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 3f70ed700..49bfcc1aa 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -273,6 +273,39 @@ Not allowing these to mix could produce annoying parse errors. _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ +I propose adopting a hybrid approach in which we permit "super-loose isolation". +This allows user to include isolates and strongly directional characters into the whitespace +portions of the syntax in order to make messages appear correctly. + +The second part of the hybrid approach would be to recommend ("SHOULD") the "strict isolation" +design for serializers. +(Note that "strict" and "super-loose" use non-identical productions with the name `bidi`. +These serve different purposes and are consistent with strict being narrower with super-loose.) +This syntax is a subset of the super-loose syntax and can be applied selectively to messages that +have RTL sequences or which have problematic display. + + +## Alternatives Considered + +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ + +### Nothing +We could do nothing. + +A likely outcome of doing nothing is that RTL users would insert bidi controls into +_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. +These controls would become part of the output of the _message_, +showing up inappropriately at runtime. +Because these characters are invisible, users might be very frustrated trying to manage +the results or debug what is wrong with their messages. + +By contrast, if users insert too many or the wrong controls using the recommended design, +the _message_ would still be functional and would emit no undesired characters. + +### LTR Messages with isolating sequences + The syntax of a _message_ assumes a left-to-right base direction both for the complete text of the _message_ as well as for each line (paragraph) contained therein. @@ -383,7 +416,7 @@ ns-separator = [bidi] ":" bidi = [ %x200E-200F / %x061C ] ``` -### Open Issues with Proposed Design +**Open Issues** The ABNF changes found above put isolates and strongly directional marks into specific locations, such as directly next to `{`/`}`/`{{`/`}}` markers @@ -393,24 +426,6 @@ A more permissive design would add the isolates and strongly directional marks t whitespace in the syntax and depend on users/editors to appropriately pair or position the marks to get optimal display. -## Alternatives Considered - -_What other solutions are available?_ -_How do they compare against the requirements?_ -_What other properties they have?_ - -### Nothing -We could do nothing. - -A likely outcome of doing nothing is that RTL users would insert bidi controls into -_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. -These controls would become part of the output of the _message_, -showing up inappropriately at runtime. -Because these characters are invisible, users might be very frustrated trying to manage -the results or debug what is wrong with their messages. - -By contrast, if users insert too many or the wrong controls using the recommended design, -the _message_ would still be functional and would emit no undesired characters. ### Super-loose isolation @@ -418,7 +433,17 @@ Add isolates and strongly directional marks to required and optional whitespace This would permit users to get the effects described by the above design, as long as they use isolates/marks in a "responsible" way. -(Omitting other changes found in #673) +The exception to this is the namespace separator, used in `identifier`. +This requires the ability to insert isolates or strongly directional marks +between the namespace and name portions, where whitespace is not permitted. +This is the only location in the syntax where such characters might be needed +but whitespace is not at least optional. +This could be defined as: +```abnf +ns-separator = [bidi] ":" [bidi] +``` + +Here are the other ABNF changes: ```abnf ; strongly directional marks and bidi isolates @@ -447,7 +472,7 @@ s = ( SP / HTAB / CR / LF / %x3000 ) ### Strict isolation all the time Apply bidi isolates in a strict way. -The main differences to the proposed solution is: +In this design: 1. The open/close isolate characters are syntactically required to be paired. This introduces parse errors for unpaired invisible characters, which could lead to bad user experiences. @@ -467,7 +492,7 @@ markup = "{" [s] "#" identifier [bidi] *(s option) *(s attribute) [s] [" / "{" [s] "/" identifier [bidi] *(s option) *(s attribute) [s] "}" ; close / "{" LRI [s] "/" identifier [bidi] *(s option) *(s attribute) [s] close-isolate "}" ; close identifier = [(namespace ns-separator)] name -ns-separator = [bidi] ":" +ns-separator = [bidi] ":" [bidi] bidi = [ %x200E-200F / %x061C ] ``` @@ -610,6 +635,8 @@ adherence to the stricter grammar. syntax errors - Provides a foundation for tools to claim strict conformance and message normalization as well as guidance to implementers to make them want to adopt it +- Messages are valid while being edited (such as when the open or close isolate has been + inserted but the corresponding opposite isolate hasn't been entered yet) **Cons** - Requires additional effort to maintain the grammar diff --git a/exploration/expression-attributes.md b/exploration/expression-attributes.md index 2edde5613..0253fc49e 100644 --- a/exploration/expression-attributes.md +++ b/exploration/expression-attributes.md @@ -1,6 +1,6 @@ # Expression Attributes -Status: **Proposed** +Status: **Accepted**
Metadata @@ -15,6 +15,8 @@ Status: **Proposed**
#772
#780
#792
+
#845
+
#846
diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index ca392386f..3fb267713 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -1,6 +1,6 @@ # Function Composition -Status: **Proposed** +Status: **Obsolete**
Metadata @@ -11,22 +11,20 @@ Status: **Proposed**
2024-03-26
Pull Requests
#753
+
#806
-## Objective +## Objectives -_What is this proposal trying to achieve?_ +* Present a complete list of alternative designs for how to +provide the machinery for function composition. +* Create a shared vocabulary for discussing these alternatives. -### Non-goal - -The objective of this design document is not to make -a concrete proposal, but rather to explore a problem space. -This space is complicated enough that agreement on vocabulary -is desired before defining a solution. - -Instead of objectives, we present a primary problem -and a set of subsidiary problems. +> [!NOTE] +> This design document is preserved as part of a valuable conversation about +> function composition, but it is not the basis for the design eventually +> accepted. ### Problem statement: defining resolved values @@ -838,7 +836,10 @@ so that functions can be passed the values they need. It also needs to provide a mechanism for declaring when functions can compose with each other. -Other requirements: +### Guarantee portability + +A message that has a valid result in one implementation +should not result in an error in a different implementation. ### Identify a set of use cases that must be supported @@ -975,26 +976,217 @@ Hence, revisiting the extensibility of the runtime model now that the data model is settled may result in a more workable solution. -## Proposed design and alternatives considered - -These sections are omitted from this document and will be added in -a future follow-up document, -given the length so far and need to agree on a common vocabulary. - -We expect that any proposed design -would fall into one of the following categories: - -1. Provide a general mechanism for custom function authors -to specify how functions compose with each other. -1. Specify composition rules for built-in functions, -but not in general, allowing custom functions -to cooperate in an _ad hoc_ way. -1. Recommend a rich representation of resolved values -without specifying any constraints on how these values -are used. -(This is the approach in [PR 645](https://github.com/unicode-org/message-format-wg/pull/645).) -1. Restrict function composition for built-in functions -(in order to prevent unintuitive behavior). +## Alternatives to be considered + +The goal of this section is to present a _complete_ list of +alternatives that may be considered by the working group. + +Each alternative corresponds to a different concrete +definition of "resolved value". + +## Introducing type names + +It's useful to be able to refer to three types: + +* `InputType`: This type encompasses strings, numbers, date/time values, +all other possible implementation-specific types that input variables can be +assigned to. The details are implementation-specific. +* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). +* `ValueType`: This type is the union of an `InputType` and a `MessageValue`. + +It's tagged with a string tag so functions can do type checks. + +``` +interface ValueType { + type(): string + value(): unknown +} +``` + +## Alternatives to consider + +In lieu of the usual "Proposed design" and "Alternatives considered" sections, +we offer some alternatives already considered in separate discussions. + +Because of our constraints, implementations are **not required** +to use the `MessageValue` interface internally as described in +any of the sections. +The purpose of defining the interface is to guide implementors. +An implementation that uses different types internally +but allows the same observable behavior for composition +is compliant with the spec. + +Five alternatives are presented: +1. Typed functions +2. Formatted value model +3. Preservation model +4. Allow both kinds of composition +5. Don't allow composition + +### Typed functions + +Types are a way for users of a language +to reason about the kinds of data +that functions can operate on. +The most ambitious solution is to specify +a type system for MessageFormat functions. + +In this solution, `ValueType` is not what is defined above, +but instead is the most general type +in a system of user-defined types. +(The internal definitions are omitted.) +Using the function registry, +each custom function could declare its own argument type +and result type. +This does not imply the existence of any static typechecking. + +Example B1: +``` + .local $age = {$person :getAge} + .local $y = {$age :duration skeleton=yM} + .local $z = {$y :uppercase} +``` + +In an informal notation, +the three custom functions in this example +have the following type signatures: + +``` +getAge : Person -> Number +duration : Number -> String +uppercase : String -> String +``` + +The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) +could be extended to define `Number` and `String` +as subtypes of `MessageValue`. +A custom function author could use the custom +registry they define to define `Person` as +a subtype of `MessageValue`. + +An optional static typechecking pass (linting) +would then detect any cases where functions are composed in a way that +doesn't make sense. The advantage of this approach is documentation. + +### Formatted value model (Composition operates on output) + +To implement the "formatted value" model, +the `MessageValue` definition would look as in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728), but without +the `resolvedOptions()` method: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + selectKeys(keys: string[]): string[] +} +``` + +`MessageValue` is effectively a `ValueType` with methods. + +Using this definition would make some of the use cases +impractical. For example, the result of Example A4 +might be surprising. Also, Example 1.3 from +[the dataflow composability design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md) +wouldn't work because options aren't preserved. + +### Preservation model (Composition can operate on input and options) + +In the preservation model, +functions "pipeline" the input through multiple calls. + +The `ValueType` definition is different: + +```ts +interface ValueType { + type(): string + value(): InputType | MessageValue +} +``` + +The resolved value interface would include both "input" +and "output" methods: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getInput(): ValueType + getOutput(): ValueType + properties(): { [key: string]: ValueType } + selectKeys(keys: string[]): string[] +} +``` + +Compared to PR 728: +The `resolvedOptions()` method is renamed to `properties`. +Individual function implementations +choose which options to pass through into the resulting +`MessageValue`. + +Instead of using `unknown` as the result type of `getValue()`, +we use `ValueType`, mentioned previously. +Instead of using `unknown` as the value type for the +`properties()` object, we use `ValueType`, +since options can also be full `MessageValue`s with their own options. +(The motivation for this is Example 1.3 from +[the "dataflow composability" design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md).) + +This solution allows functions to pipeline input, +operate on output, or both; as well as to examine +previously passed options. Any example from this +document can be implemented. + +Without a mechanism for type signatures, +it may be hard for users to tell which combinations +of functions compose without errors, +and for implementors to document that information +for users. + +### Allow both kinds of composition (with different syntax) + +By introducing new syntax, the same function could have +either "preservation" or "formatted value" behavior. + +Consider (this suggestion is from Elango Cheran): + +``` + .local $x = {$num :number maxFrac=2} + .pipeline $y = {$x :number maxFrac=5 padStart=3} + {{$x} {$y}} +``` + +`.pipeline` would be a new keyword that acts like `.local`, +except that if its expression has a function annotation, +the formatter would apply the "preservation model" semantics +to the function. + +### Don't allow composition for built-in functions + +Another option is to define the built-in functions this way, +notionally: + +``` +number : Number -> FormattedNumber +date : Date -> FormattedDate +``` + +The `MessageValue` type would be defined the same way +as in the formatted value model. + +The difference is that built-in functions +would not accept a "formatted result" +(would signal a runtime error in these cases). + +As with the formatted value model, this restricts the +behavior of custom functions. + +### Non-alternative: Allow composition in some implementations + +Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). + +This violates the portability requirement. ## Acknowledgments diff --git a/exploration/maintaining-registry.md b/exploration/maintaining-registry.md new file mode 100644 index 000000000..be2d141dc --- /dev/null +++ b/exploration/maintaining-registry.md @@ -0,0 +1,316 @@ +# Maintaining and Registering Functions + +Status: **Proposed** + +
+ Metadata +
+
Contributors
+
@aphillips
+
First proposed
+
2024-02-12
+
Pull Requests
+
#634
+
+
+ +## Objective + +_What is this proposal trying to achieve?_ + +Describe how to manage the registration of functions and options under the +auspices of MessageFormat 2.0. +This includes the Standard Functions which are normatively required by MF2.0, +functions or options in the Unicode `u:` namespace, +and functions/options that are recommended for interoperability. + +## Background + +_What context is helpful to understand this proposal?_ + +MessageFormat v2 originally included the concept of "function registries", +including a "default function registry" required of conformant implementations. + +The terms "registry" and "default registry" suggest machine-readbility +and various relationships between function sets that the working group decided +was not appropriate. + +MessageFormat v2 includes a standard set of functions. +Implementations are required to implement all of the _selectors_ +and _formatters_ in this set, +including _operands_, _options_, and option values. +Our goal is to be as universal as possible, +making MFv2's message syntax available to developers in many different +runtimes in a wholly consistent manner. +Because we want broad adoption in many different programming environments +and because the capabilities +and functionality available in these environments vary widely, +this standard set of functions must be conservative in its requirements +such that every implementation can reasonably implement it. + +Promoting message interoperability can and should go beyond this. +Even when a given feature or function cannot be adopted by all platforms, +diversity in the function names, operands, options, error behavior, +and so forth remains undesirable. +Another way to say this is that, ideally, there should be only one way to +do a given formatting or selection operation in terms of the syntax of a message. + +This suggests that there exist a set of functions and options that +extends the standard set of functions. +Such a set contains the "templates" for functions that go beyond those every implementation +must provide or which contain additional, optional features (options, option values) +that implementations can provide if they are motivated and capable of doing so. +These specifications are normative for the functionality that they provide, +but are optional for implementaters. + +There also needs to be a mechanism and process by which functions in the default namespace +can be incubated for future inclusion in either the standard set of functions +or in this extended, optional set. + +### Examples + +_Function Incubation_ + +CLDR and ICU have defined locale data and formatting for personal names. +This functionality is new in CLDR and ICU. +Because it is new, few, if any, non-ICU implementations are currently prepared to implement +a function such as a `:person` formatter or selector. +Implementation and usage experience is limited in ICU. +Where functionality is made available, we don't want it to vary from +platform to platform. + +_Option Incubation_ + +In the Tech Preview (LDML45) release, options for `:number` (and friends) +and `:datetime` (and friends) were omitted, including `currency` for `:number` +and `timeZone` for `:datetime`. +The options and their values were reserved, possibly for the LDML46 release as required, +but they also might be retained at a lower level of maturity. + +## Use-Cases + +_What use-cases do we see? Ideally, quote concrete examples._ + +As an implementer, I want to know what functions, options, and option values are +required to claim support for MF2: +- I want to know what options I am required to implement. +- I want to know what the values of each option are. +- I want to know what the options and their values mean. +- I want to be able to implement all of the required functions using my runtime environment + without difficulty. +- I want to be able to use my local I18N APIs, which might use an older release of CLDR + or might not be based on CLDR data at all. + This could mean that my output might not match that of an CLDR-based implementation. + +As an implementer, user, translator, or tools author I expect functions, options +and option values to be stable. +The meaning and use of these, once established, should never change. +Messages that work today must work tomorrow. +This doesn't mean that the output is stabilized or that selectors won't +produce different results for a given input or locale. + +As an implementer, I want to track best practices for newer I18N APIs +(such as implementing personal name formatting/selection) +without being required to implement any such APIs that I'm not ready for. + +As an implementer, I want to be assured that functions or options added in the future +will not conflict with functions or options that I have created for my local users. + +As a developer, I want to be able to implement my own local functions or local options +and be assured that these do not conflict with future additions by the core standard. + +As a tools developer, I want to track both required and optional function development +so that I can produce consistent support for messages that use these features. + +As a translator, I want messages to be consistent in their meaning. +I want functions and options to work consistently. +I want to selection and formatting rules to be consistent so that I only have +to learn them once and so that there are no local quirks. + +As a user, I want to be able to use required functions and their options in my messages. +I want to be able to quickly adopt new additions as my implementation supports them +or be able to choose plug-in or shim implementations. +I never want to have to rewrite a message because a function or its options have changed. + +As an implementer or user, I want to be able to suggest useful additions to MF2 functionality +so that users can benefit from consistent, standardized features. +I want to understand the status of my proposal (and those of others) and know that a public, +structured, well-managed process has been applied. + +## Requirements + +_What properties does the solution have to manifest to enable the use-cases above?_ + +The Standard Function Set needs to describe the minimum set of selectors and formatters +needed to create messages effectively. +This must be compatible with ICU MessageFormat 1 messages. + +There must be a clear process for the creation of new selectors that are required +by the Standard Function Set, +which includes a maturation process that permits implementer feedback. + +There must be a clear process for the creation of new formatters that are required +by the Standard Function Set, +which includes a maturation process that permits implementer feedback. + +There must be a clear process for the addition of options or option values that are required +by the Standard Function Set, +which includes a maturation process that permits implementer feedback. + +There must be a clear process for the deprecation of any functions, options, or option values +that are no longer I18N best practices. +The stability guarantees of our standard do not permit removal of any of these. + +## Constraints + +_What prior decisions and existing conditions limit the possible design?_ + +## Proposed Design + +_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ + +The MessageFormat WG will release a set of specifications +that standardize the implementation of functions and options in the default namespace of +MessageFormat v2 beginning with the LDML46 release. +Implementations and users are strongly discouraged from defining +their own functions or options that use the default namespace +Future updates to these sets of functions and options will coincide with LDML releases. + +Each _function_ is described by a single specification document. +Each such document will use a common template. +A _function_ can be a _formatting function_, +a _selector_, +or both. + +The specification will indicate if the _formatting function_, +the _selector function_, or, where applicable, both are `Standard` or `Optional`. +The specification must describe operands, including literal representations. + +The specification includes all defined _options_ for the function. +Each _option_ must define which values it accepts. +An _option_ is either `Standard` or `Optional`. + +_Functions_ or _options_ that have an `Optional` status +must have a maturity level assigned. +The maturity levels are: +- **Proposed** +- **Accepted** +- **Released** +- **Deprecated** + +_Functions_ and _options_ that have a `Standard` status have only the +`Released` and `Deprecated` statuses. + +* An _option_ can be `Standard` for an `Optional` function. + This means that the function is optional to implement, but that, when implemented, must include the option. +* An _option_ can be `Optional` for a `Standard` function. + This means that the function is required, but implementations are not required to implement the option. +* An _option_ can be `Optional` for an `Optional` function. + This means that the function is optional to implement and the option is optional when implementing the function. + +A function specification describes the functions _operand_ or _operands_, +its formatting options (if any) and their values, +its selection options (if any) and their values, +its formatting behavior (if any), +its selection behavior (if any), +and its resolved value behavior. + +`Standard` functions are stable and subject to stability guarantees. +Such entries will be limited in scope to functions that can reasonably be +implemented in nearly any programming environment. +> Examples: `:string`, `:number`, `:datetime`, `:date`, `:time` + + +`Optional` functions are stable and subject to stability guarantees once they +reach the status of **Released**. +Implmentations are not required to implement _functions_ or _options_ with an `Optional` status +when claiming MF2 conformance. +Implementations MUST NOT implement functions or options that conflict with `Optional` functions or options. + +`Optional` values may have their status changed to `Standard`, +but not vice-versa. + +> Option Examples `:datetime` might have a `timezone` option in LDML46. +> Function Examples: We don't currently have any, but potential work here +> might includes personal name formatting, gender-based selectors, etc. + +The CLDR-TC reserves the `u:` namespace for use by the Unicode Consortium. +This namespace can contain _functions_ or _options_. +Implementations are not required to implement these _functions_ or _options_ +and may adopt or ignore them at their discretion, +but are encouraged to implement these items. + +Items in the Unicode Reserved Namespace are stable and subject to stability guarantees. +This namespace might sometimes be used to incubate functionality before +promotion to the default namespace in a future release. +In such cases, the `u:` namespace version is retained, but deprecated. +> Examples: Number and date skeletons are an example of Unicode extension +> possibilities. +> Providing a well-documented shorthand to augment "option bags" is +> popular with some developers, +> but it is not universally available and could represent a barrier to adoption +> if normatively required. + +All `Standard`, `Optional`, and Unicode namespace function or option specifications goes through +a development process that includes these levels of maturity: + +1. **Proposed** The _function_ or _option_, along with necessary documentation, + has been proposed for inclusion in a future release. +2. **Accepted** The _function_ or _option_ has been accepted but is not yet released. + During this period, changes can still be made. +3. **Released** The _function_ or _option_ is accepted as of a given LDML release that MUST be specified. +4. **Deprecated** The _function_ or _option_ was previously _released_ but has been deprecated. + Implementations are still required to support `Standard` functions or options that are deprecated. +5. **Rejected** The _function_ or _option_ was considered and rejected by the MF2 WG and/or the CLDR-TC. + Such items are not part of any standard, but might be maintained for historical reference. + +A proposal can seek to modify an existing function. +For example, if a _function_ `:foo` were an `Optional` function in the LDMLxx release, +a proposal to add an _option_ `bar` to this function would take the form +of a proposal to alter the existing specification of `:foo`. +Multiple proposals can exist for a given _function_ or _option_. + +### Process + +Proposals for additions are made via pull requests in a unicode-org github repo +using a specific template TBD. +Proposals for changes are made via pull requests in a unicode-org github repo +using a specific template TBD against the existing specification for the function or option. + +Proposals must be made at least _x months_ prior to the release date to be included +in a specific LDML release. +The CLDR-TC will consider each proposal using _process details here_ and make a determination. +The CLDR-TC may delegate approval to the MF2 WG. +Decisions by the MF2 WG may be appealed to the CLDR-TC. +Decisions by the CLDR-TC may be appealed using _existing process_. + +Technical discussion during the approval process is strongly encouraged. +Changes to the proposal, +such as in response to comments or implementation experience, are permitted +until the proposal has been approved. +Once approved, changes require re-approval (how?) + + +The timing of official releases of the Standard Function Set and Optional Set is the same as CLDR/LDML. +Each LDML release will include: +- **Released** specifications in the Standard Function Set +- **Released** specifications in the Unicode reserved namespace +- a section of the MF2 specification specifically incorporating versions of the above +- **Accepted** entries for each of the above available for testing and feedback + +Proposals for additions to any of the above include the following: +- a design document, which MUST contain: + - the exact text to include in the MF2 specification using a template to be named later + +Each proposal is stored in a directory indicating indicating its maturity level. +The maturity levels are: +- **Accepted** Items waiting for the next CLDR release. +- **Released** Complete designs that are released. +- **Proposed** Proposals that have not yet been considered by the MFWG or which are under active development. +- **Rejected** Proposals that have been rejected by the MFWG in the past. + +## Alternatives Considered + +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ diff --git a/exploration/registry-xml/README.md b/exploration/registry-xml/README.md index 75b049041..a3a3a6890 100644 --- a/exploration/registry-xml/README.md +++ b/exploration/registry-xml/README.md @@ -163,7 +163,8 @@ For the sake of brevity, only `locales="en"` is considered. Given the above description, the `:number` function is defined to work both in a selector and a placeholder: ``` -.match {$count :number} +.input {$count :number} +.match $count 1 {{One new message}} * {{{$count :number} new messages}} ``` diff --git a/exploration/selection-declaration.md b/exploration/selection-declaration.md index 0e2c8abc7..39215cc18 100644 --- a/exploration/selection-declaration.md +++ b/exploration/selection-declaration.md @@ -1,6 +1,6 @@ # Effect of Selectors on Subsequent Placeholders -Status: **Proposed** +Status: **Accepted**
Metadata @@ -10,7 +10,14 @@ Status: **Proposed**
First proposed
2024-03-27
Pull Requests
-
#000
+
#755
+
#824
+
#860
+
#867
+
#877
+
Ballot
+
#872 (discussion)
+
#873 (voting)
@@ -139,6 +146,16 @@ _What use-cases do we see? Ideally, quote concrete examples._ * {{...}} ``` + As another example of where the selection function and formatting functions differ, consider a person object provided as a formatting input. + A `:gender` function can return the person's gender, + but a `:personName` person name formatter function formats the name. + ``` + .match {$person :gender} + male {{Bienvenido {$person :personName}}} + female {{Bienvenida {$person :personName}}} + other {{Le damos la bienvenida {$person :personName}}} + ``` + ## Requirements _What properties does the solution have to manifest to enable the use-cases above?_ @@ -149,14 +166,11 @@ _What prior decisions and existing conditions limit the possible design?_ ## Proposed Design -_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ +The design alternative [Match on variables instead of expressions](#match-on-variables-instead-of-expressions) +described below is selected. ## Alternatives Considered -_What other solutions are available?_ -_How do they compare against the requirements?_ -_What other properties they have?_ - ### Do nothing In this alternative, selectors are independent of declarations. @@ -175,6 +189,7 @@ Examples: **Pros** - No changes required. - `.local` can be used to solve problems with variations in selection and formatting +- No confusion or overlap of keywords' behavior (ex: `.match`, `.input`) - Supports multiple selectors on the same operand **Cons** @@ -182,6 +197,52 @@ Examples: - Can produce a mismatch between formatting and selection, since the operand's formatting isn't visible to the selector. +### Require annotation of selector variables in placeholders + +In this alternative, the pre-existing validity requirement + +> Each _selector_ MUST have an _annotation_, +> or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. + +is expanded to also require later uses of a variable that's used as a selector to be annotated: + +> In a _complex message_, +> each _placeholder_ _expression_ using the same _operand_ as a _selector_ MUST have an _annotation_, +> or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. + +Example invalid message with this alternative: +``` +.match {$n :number minimumFractionDigits=2} +* {{Data model error: {$n}}} +``` + +Valid, recommended form for the above message: +``` +.input {$n :number minimumFractionDigits=2} +.match {$n} +* {{Formats '$n' as a number with fraction digits: {$n}}} +``` + +Technically valid but not recommended: +``` +.input {$n :integer} +.match {$n :number minimumFractionDigits=2} +* {{Formats '$n' as an integer: {$n}}} + +.match {$n :number minimumFractionDigits=2} +* {{Formats '$n' as an integer: {$n :integer}}} +``` + +**Pros** +- No syntax changes required. +- `.local` can be used to solve problems with variations in selection and formatting +- Supports multiple selectors on the same operand +- Avoids mismatches between formatting and selection by requiring their annotation. + +**Cons** +- May require the user to annotate the operand for both formatting and selection, + unless they use a declaration. + ### Allow both local and input declarative selectors with immutability In this alternative, we modify the syntax to allow selectors to @@ -217,6 +278,8 @@ declaration = s variable [s] "=" [s] expression - Produces an error when users inappropriately annotate some items **Cons** +- Complexity: `.match` means more than one thing +- Complexity: `.match` implicitly creates a new lexical scope - Selectors can't provide additional selection-specific options if the variable name is already in scope - Doesn't allow multiple selection on the same operand, e.g. @@ -249,6 +312,8 @@ Instead the selector's annotation replaces what came before. - Shorthand version works intuitively with minimal typing. **Cons** +- Complexity: `.match` means more than one thing +- Complexity: `.match` implicitly creates a new lexical scope - Violates immutability that we've established everywhere else ### Allow _immutable_ input declarative selectors @@ -270,7 +335,7 @@ This implies that multiple selecton on the same operand is pointless. .match {$num :number maximumFractionDigits=0} * {{This message produces a Duplicate Declaration error}} -.input {$num :integer} {$num :number} +.match {$num :integer} {$num :number} * * {{This message produces a Duplicate Declaration error}} ``` @@ -280,6 +345,8 @@ This implies that multiple selecton on the same operand is pointless. - Produces an error when users inappropriately annotate some items **Cons** +- Complexity: `.match` means more than one thing +- Complexity: `.match` implicitly creates a new lexical scope - Selectors can't provide additional selection-specific options if the value has already been annotated - Doesn't allow multiple selection on the same operand, e.g. @@ -321,6 +388,7 @@ The ABNF change would look like: - Preserves immutability. **Cons** +- Complicates the situations where selection != formatting due to the strictness's design nudges - A separate declaration is required for each selector. ### Provide a `#`-like Feature @@ -358,3 +426,69 @@ and a data model error otherwise. Removes some self-documentation from the pattern. - Requires the pattern to change if the selectors are modified. - Limits number of referenceable selectors to 10 (in the current form) + +### Hybrid approach: Match may mutate, no duplicates + +In this alternative, in a `.match` statement: + +1. variables are mutated by their annotation +2. no variable can be the operand in two selectors + +This keeps most messages more concise, producing the expected results in Example 1. + +#### Example 1 + +``` +.match {$count :integer} +one {{You have {$count} whole apple.}} +* {{You have {$count} whole apples.}} +``` +is precisely equivalent to: + +#### Example 2 +``` +.local $count2 = {$count :integer} +.match {$count2} +one {{You have {$count2} whole apple.}} +* {{You have {$count2} whole apples.}} +``` + +This avoids the serious problems with mismatched selection and formats +as in Example 1 under "Do Nothing", whereby the input of `count = 1.2`, +results the malformed "You have 1.2 whole apple." + +Due to clause 2, this requires users to declare any selector using a `.input` or `.local` declaration +before writing the `.match`. That is, the following is illegal. + +#### Example 3 +``` +.match {$count }{$count } +``` +It would need to be rewritten as something along the lines of: + +#### Example 4 +``` +.local $count3 = {$count} +.match {$count }{$count3 } +``` +Notes: +- The number of times the same variable is used twice in a match (or the older Select) is vanishingly small. Since it is an error — and the advice to fix is easy — that will prevent misbehavior. +- There would be no change to the ABNF; but there would be an additional constraint in the spec, and relaxation of immutability within the .match statement. + +**Pros** +- No new syntax is required +- Preserves immutability before and after the .match statement +- Avoids the serious problem of mismatch of selector and format of option "Do Nothing" +- Avoids the extra syntax of option "Allow both local and input declarative selectors with immutability" +- Avoids the problem of multiple variables in "Allow immutable input declarative selectors" +- Is much more consise than "Match on variables instead of expressions", since it doesn't require a .local or .input for every variable with options +- Avoids the readability issues with "Provide a #-like Feature" + +**Cons** +- Complexity: `.match` means more than one thing +- Complexity: `.match` implicitly creates a new lexical scope +- Violates immutability that we've established everywhere else +- Requires additional `.local` declarations in cases where a variable would occur twice + such as `.match {$date :date option=monthOnly} {$date :date option=full}` + + diff --git a/meetings/2024/notes-2024-08-19.md b/meetings/2024/notes-2024-08-19.md new file mode 100644 index 000000000..f7a704243 --- /dev/null +++ b/meetings/2024/notes-2024-08-19.md @@ -0,0 +1,272 @@ +# 19 August 2024 | MessageFormat Working Group Teleconference + +### Attendees +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro - Mozilla (EAO) +- Mihai Niță - Google (MIH) +- Elango Cheran - Google (ECH) +- Richard Gibson - OpenJSF (RGN) + +Scribe: MIH, help from ECH + +## Topic: Info Share + +## Topic: Issue review +https://github.com/unicode-org/message-format-wg/issues +Currently we have 58 open (was 64 last time). +15 are Preview-Feedback +0 are resolve-candidate and proposed for close. +0 are Agenda+ and proposed for discussion. +0 are ballots + + + +### #859 [DESIGN] Number selection design refinements #859 + +EAO : do you get my point about why I find problematic to take some of the options instead of all the options + +APP: I think we are all grappling with the same problems. +As a message writer, if the UX asks me for certain values, how do I write them. +For integers it is obvious, but decimals are problematic. +Problems with different notations, scientific, etc. +How do I select 1.2345 ? +In code how does the string get into a number? +There is no easy way out of that. + +EAO: I would say that fundamentally if we need a match for 1.2345 the dev would create a value from data outside MF2 that is testable. + +We should offer no support to make it easy in MF2 to do rounding in code, etc. + +MIH: I cannot find any good use case to provide `|1.00|` as a key in a `.match` statement. This existed in MF1 for a long time, and it does matching numbers, and nobody complained. Matching ` |$1.00|` is already handled by plural rules. + +APP: Yes, I semi-agree with that. The only case is when the exact match value also matches a plural category that is specified. Ex: if you have driving directions that say + + +Think “in half a mile” or situations like that. +Maybe we should stick to our guns in regard to decimal matching. + +To EAO’s point about formatting outside, occasionally we need selections on non-numbers. + +MIH: NumberFormat does selection on fractionals. It’s not like it only works on integers. If you do numeric match on 0.5, it’s fine, it results in “in half a mile”. There’s not a case where you match on `0.5` to say “in half a mile”, but you say something different for 0.500. + +EAO: The “in half a mile” example sounds like choice format. You’re not looking for a specific value, but insead, a range of values. + +One can write a custom selection function. + +APP: time-box this. +I will go back to the design doc and make some changes. +To simplify the selection, to integer selection. + +We can discuss some more. But I think we can have a path forward. +Without making this too complicated. + +EAO: Technically, what we have is not matching on integers. It’s matching on all numbers, but we haven’t defined how the matching is to be done. + +APP: Understood, I’ll revise the design +At least document better where the “sharp edges” are. + +ECH: Is there a way to solve this in the registry? +If we need to do selection on decimal numbers? + +APP: exact match on decimal numbers, we can look at the serialization and match against serialization. +If I give you a big decimal 1.00000001 + options, how do I know if this matches? + +ECH: ICU had to do this already, both in C++ and Java. + +MIH: I’m still debating whether it is good to match on a formatted number. I still think it is worth considering whether it is beneficial to have the keys as strings and then match as numbers. +If the keys are numbers, and we compare numbers, it is a lot easier. + +EAO: Should we consider the options for number formatting. We have an example in ICU MessageFormat. You can offset for a plurals / ordinal selector. That option is not taken into account for exact matching, but it is taken into account for plural rule category matching. + +APP: I will rehash the design document? +### #845 Accept attributes design & remove spec note #845 + +APP Has merge conflicts. +I think it only implements things that we already agred on. +### #824: Select "Match on variables instead of expressions" for selection-declarations #824 + +APP: EAO, you were not here. +I think we should go to ballot with this. +You might want to go look at the notes + +EAO: can we narrow down the options to a smaller set? + +APP: personally I would like to see an emerging consensus. + +ECH: RGN, since you are here, how do you feel about this PR? +Because I couldn't really quite tell. +Also refers to the discussion in #736. + +ECH: we have a .match and we put the function in the selectors. +Do we or do we not that the options we provide to the selectors are “sticky” to the arguments we are giving to the selection. + +RGN: people can have intuition in both directions. +Existing tools can eliminate ambiguity with an extra declaration. + +APP: I hear that. Doing nothing is confusing, at least to me. +I look at the message in the example, and it feels messed up. +When we write date selections, we would want to get to the calendar. +What can I do so that if I write an `.input` I don’t repeat myself again in `.match` +We can disallow any kind of expressions in the selector. + +EAO: as we discovered, when we look at the syntax of `.match` we don’t all agree that the expression there is modifying the argument or not. +Do you agree that this is a problem, and we should not allow this confusion to arise? +With the cost that when we select on an expression that does not show in formatting we still have to declare with an `.input` or `.local`. + +ECH: when you select on a person object, the selection might be on the gender of the person. +But the formatting of the name depends on the whole `Person` object. Ex: + +``` +.match {$person :gender} +male Bienvenido {$person :person} +female Bienvenida {$person :person} +other Bienvendos {$person :person} +``` + +I think we are conflating these operations. +If someone needs to repeat, they have the option to declare, today. +There is no need to force it on them. +Legislating things adds friction. + +APP: when you select on numbers, you want to select on the same things as what you format. +I can imagine selectors where you want a different selector than the formatter. +That means “double annotation”. Some selectors might be better not-annotated. + + +RGN: concision and simplicity are not the same thing. +Extra verbosity is well worthy for extra clarity. +I mean to require an explicit declaration. + +ECH: you don’t need to require it, but if you want to match selection and formatting, you need to declare. +This is compatible with “do nothing”, I think. + +RGN: I am in favor to forbid the possibility of the formatting options used for selection be different from those used in formatting the pattern. + +APP: look carefully at the examples in the “leave as is” + +ECH: I understand the plural selection example very well, in which the formatting required for selection should be reused when formatting the input number for the message pattern. However, I gave you a case right above where the formatting for selection is not the same as the formatting for the message. That’s a counter example proving that we can’t say in all cases that selection “formatting” is the same as pattern formatting. + +Because it’s not universal in all cases, I can’t support the alternative based on an assumption that these concepts are connected in all cases. That alternative to “match on variables instead of expressions” is legislating that formatting is done in a `.local` and referenced as a variable in `.match`, which is motivated by the idea that the formatting of the placeholder in the pattern will be the same as the selector, and that’s not always true. + +MIH: I would like to not changing the syntax, in which we support assignments in the `.match`. But we can make it a data model error as EAO suggested if there is a different function used in the selector vs. the function used in a placeholder of a pattern. If this is what EAO described, I would be fine with it. + +EAO: this alternative says that you need to annotate a selector. +The new wording would be that if that when you use a variable in a selector, everywhere where you use that should be annotated. + +APP: let’s add that at the doc, read the doc afterwards. +Let’s see if we can close in and decide what we do here. + +### #834 Update the stability policy #834 + +EAO: what we promise (too much) is that a format will always format the same, forever. +What I think we should promise is that it will format to something valid. + +APP: the output might change (because locale data changes). +But the output will not “break” + +EAO: if it formats “fine” with v2.0 then it will format “fine” with v2.1 + +APP: I was thinking “would not produce an error” +I think you suggest something stronger. + +EAO: if an implementation changes nothing, only moves from v2.0 to v2.1 the result will still be formatted without error and without fallback, if v2.0 was without error and without fallback. +“No error” is less than what I want. + +APP: I added some extra requirements. +I am trying to make all of these promises, making them properly enumerated, rather than one portmanteau promise. + +EAO: I think I want that portmanteau promise. +It would not want to go from a non-error to an error when we update. + +APP: a stability policy is a strait jacket we agree to put on ourselves. +As opposed to things that we do outside of policy. +I don’t want to overpromise, because that would prevent honest corrections. +### #634 [DESIGN] registry maintenance #634 + +EAO: … + +APP: this PR is a design doc, on how to manage the registries. +I’m open to suggestions. +But we are starting to play with the registries. +How will we work with these? +This is a swipe at that. +I would like to submit as proposed, so that we can make it easier to read it. +And open to update afterward. + +EAO: I’m entirely on board describing functions and options one is allowed (not not required) to implement. +“If you do this, do it this way” + +I’m on board with `u:` for the “global” options described. +What would a `u:` function do? Why would it exist? Why put it in the `u;` space and not in the standard one? + +APP: I didn’t want to say “we will never put a function in the `u:` namespace.. +We already have functions for testing (for conformance) that might live in the `u:` space. + +EAO: is “registry” the right word to use for these things? +Should we be calling them default functions? +Sounds as if we can choose between registries? + +APP: everyone is required to have the default one. +And there can be extras, at all kind of levels (proposed, company, application) +We’ve been using “registry” for years now. + +MIH: You touched on the fact that IANA might want to use that. If we don’t like “default”, then we can call it “standard” to say that you cannot take out things from it or change it. The word “registry” does not bother me that much. + +EAO: in our communication we often talked about “the registry” +Maybe we should not use “registry” when we talks about the standard functions. +And I think that “RGI” is too clunky. + +APP: specifications, what forms “the registry”. +It is not machine readable. It used to be, but we took it away. +It is the same as when you registry various URI prefixes. You write a spec. +And have a status, maybe a namespace. +So it is a collection of specs. +That might be a way to thing about it. +The RGI was intentionally “clunky” + +EAO: built-in functions sounds like a name that is user friendly. +“Registry” is an implementation detail. + +MIH: The registry is growing over time. Also, all functions are on an equal footing: functions that are in the standard registry are no more special than functions that are in private registries. That is why I prefer calling them the same thing. + +APP: maybe we can talk about “registration process” +With “recommended addons” or something like that. +Might not be a registry file that one can download. + +ECH: are we bikeshedding the name? + +APP: yes + +ECH: I am not “wedded” to the word “registry” +I’m cool with “registration process”, but consistency (standard functions, company functions, custom functions) + +EAO: need to talk about how the registries will be versioned. +Is there a way to implement an updated version of the registered functions. + +ECH: a hassle to deal with different versions of various parts. + +MIH: the main reason we separated the spec proper from the registry was to be able to add functions without changing the spec. +Changing the spec version sounds scary. +Will all my tooling break (if the spec changed), when all we did was add an option to a function? +We can maybe version “the whole registry”, separate from the MF2 spec proper (syntax, data model, etc) + +APP: I think the function registries can be LDML versioned, and my expectation is that we might change in time. We can try to stabilize, as each implementation is required to implement those functions. +We can use the “twice per year” LDML version. +Gives us some kind of predictability. + +EAO: biannual tied to LDML sounds like a good idea. +But for a user reading the spec, I am interested in options not required, but recommended. +Wouldn't it be more readable if they were in the same document? +For example if I want to implement the `:number` function I have to read 3 documents. +Isn’t that clunky? + +APP: OK + +I was thinking RGI as a bucket for “this is done, but not required” + +MIH: Can we do something like what ICU does for public APIs, where we tag them with statuses “draft”, “technical preview”, “beta”, “final”? + +APP: the things in the `u:` namespace live in a different place than the main registry. +So it needs rules for that too. + + diff --git a/meetings/2024/notes-2024-08-26.md b/meetings/2024/notes-2024-08-26.md new file mode 100644 index 000000000..dbf27974e --- /dev/null +++ b/meetings/2024/notes-2024-08-26.md @@ -0,0 +1,151 @@ +# 26 August 2024 | MessageFormat Working Group Teleconference + +### Attendees +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro - Mozilla (EAO) +- Elango Cheran - Google (ECH) +- Mihai Niță - Google (MIH) +- Mark Davis - Google (MED) +- Richard Gibson - OpenJSF (RGN) + +* Scribe: ECH + +## Agenda +MED: The final date for the spec is Sept 25. Last time it was delayed a bit. This time, we need to not update the spec past Sept 25 in ways that would make ICU4C and ICU4J implementations invalid. + +APP: Yes, and regarding deadlines, we either need to make faster progress, or we will slip into next spring. + +MED: One question for MIH is how much have the changes in the spec so far affected ICU? + +MIH: I haven’t tried to implement them as they happened. + +MED: You’re caught up, right? + +MIH: No, not yet. We have a PR for ICU4C and updated ICU4J to make the shared MF2 tests all pass, which helps, but we’re not there yet. + +EAO: Even if we don’t get out of Technical Preview for LDML 46, we should still get out a release. I have updated the JS implementation to the current version of the spec. I’ve added changes to the tests. + +APP: A thing that I’m concerned about is that some of the things that are coming soon might have impact. Some will make changes to whitespace or Bidi controls. That may not have major impact but we need to make sure the details work. Other things may have more impact like how we deal with selectors and related declarations. +I promise to set up meetings instead of having face to face meetings. We have to come to a call of whether we’re exiting Tech Preview for CLDR 46 or not. + +EAO: As a comment from the work I did is that the biggest changes I’ve made are reification of attributes. Also, changing the data model representation of options as a mapping. Duplicate options and attribute usage are called errors, but they can’t show up in the data model, so I call them “syntax errors”. + +MED: For things where we previously disallowed things to be allowed is much easier to do for implementers, because old messages are still valid. + +APP: Dates are coming up. + +## Topic: Info Share + +ECH: Glad we’re talking about testing, feedback. Another thing I mentioned before is conformance testing. Hasn’t been brought up. Did some improvements. Lot of concern because setup wasn’t good before. TEsting the tests we had. Testing against ICU74… surprise! There are going to be errors. Still some things to resolve. Link to dashboard to see the green-ness: + +https://unicode-org.github.io/conformance/ + +EAO: On Dec 13 I will be talking with the Finnish National Committee to talk about how Finnish gets represented in CLDR, and whether it works for Finnish. There could be similar entities out there. + +### #863 Add tests for pattern selection + +EAO: There is somewhere in the spec where a lowercase “should” should be a capital SHOULD or MUST. + +MED: I want to say that it should be a capital SHOULD. The reason that it cannot be a capital MUST is that we do have to adjust plural rules. An example is that we realized recently that pluralization for French changes for compact notation when you get up into the millions. Also, there is a change for Wolof that we had to do. + +EAO: I am also in favor of making the recommendation be a capital SHOULD. We should separate the SHOULD recommendations back to MUST recommendations. + +APP: Let’s have a PR for that. Any objection to merging this PR, and then we can make followup + +### #862 Miscellaneous test fixes + +## Topic: Disallow “whitespace or special char” prefixed `.` in reserved-statement’s body (#840) +RGN: I would like to see a more significant change to be made, if we are to make a change. + +EAO: I would like to see a small change, and move from there. + +APP: I don’t understand the change because it moves part of `nameChar` to `name`, and maybe `nameChar` is used somewhere else apart from `name`. What’s the benefit here? + +RGN: There is a description of the benefit in the commentary of the Markdown. The approximate motivation is to recover in a parse error, such that the statement body should not be so broad that something that is truncated should not be a reserved truncated statement. + +MIH: I’m not against it, but it seems wrong to tinker with small things here and there. We are changing one production in the grammar in reserved statements, and redefine what reserved means. How often are we going to see this? I think we put too much emphasis on the whole reserved thing, and it feels like busy work. + +APP: I think the purpose is that if we ever added another keyword, we want to recognize the end of the statement. The purpose of reserved statements is to have it be possible to `.foo` and see that even when that statement is broken, we could still proceed to the `.match` and see that it was valid. If this PR helps us with that, I think it’s useful. + +EAO: This happens in practice when you’re editing a message. It creates a bad user experience for an author of a message using tools. A further step from this proposal is if we don’t allow whitespace to show up in a statement, it allows up to drop the requirement for a ___ statement to end in an expression. + +MIH: I think + +APP: I think we need to study it more as a group. + +ECH: Just observing that the discussion makes it sound like we’re over optimization based on future use cases that we haven’t seen. + +RGN: I think this has relevance to more than just syntax highlighting during editing. Reserved statement syntax affects which declarations would be valid in the future. + +https://github.com/unicode-org/message-format-wg/issues/547 + +The change in this PR seems to address at least one bullet point from that issue (`.strict true .local $var = {|val|}`), which gives me more confidence that it is a step in the right direction. + +APP: One thing to note is that our grammar says that it is consistent with `ncName`, although this change would make that not true. + +MIH: The fact that this change fixes an issue that RGN brought up a while ago by accident makes me worried that it’s breaking something by accident because we’re just tinkering. + +EAO: This PR doesn’t relax the requirement of having an expression at the end of a reserved statement. So what it does is intentional, and what RGN wants would be a further change. + +APP: Let’s study this seriously and discuss next week. + +## Topic: Selection-declaration (#824) +_Discuss the design options seeking WG consensus. Timeboxed to 15 minutes or will go to ballot._ + +https://github.com/unicode-org/message-format-wg/blob/main/exploration/selection-declaration.md + +MED: I put in a comment that I think could solve the problem. https://github.com/unicode-org/message-format-wg/pull/824#discussion_r1731496159 It would put in a restriction that would make EAO’s `$count` work right. + +APP: We don’t have a selector for `:date` or `:person`. It’s possible that they would want to produce different + +MED: It could happen more often, but it affects a relatively small percentage of cases. + +APP: I’ve been saying that we would ballot this a few weeks in a row without doing it. MED, I should check if your alternative is not already covered. If you want the alternative to include that proposal, please make a PR to the design doc. And then we can ballot this next week, but we need to ballot it because this is an important issue. + +## Topic: Bidi design (#811) +_Bidi and whitespace options need to be discussed in light of the design document._ + +https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md + +APP: We merged this design document so that you all could read it. This has a lot of impact if we go down this path. + +EAO: I still think that we should have `name` be isolated. It would have the same effect that some of the parts of this PR would have. I would like to get input from other implementers. + +APP: I thought a lot about the implementation aspect and also people editing, including translators and message authors. Bidi controls are invisible. Moving curly brackets around that cause the controls to do funky unexpected things. My proposal is to make the Bidi controls and strong markers optional for super loose isolation. Parsers that parse messages would just ignoring things because they’re ignorable syntactically. It allows mirrored symbols to be unpaired. And messages could be tightly wrapped. I’m open to getting feedback, but I don’t want to make our syntax so fiddly because they need to work with Bidi things that they don’t understand. + +EAO: I’m concerned that we can get the right behavior without having isolation. For example, in a RTL context, interspersing a placeholder that has RTL content, I don’t see how we can get that to work right without isolation. Within `name` and identifiers, I can see high value for allowing just maybe the RTL mark, then that ensures the LTR doesn’t bleed to the end. + +APP: You would put that at the end because the `$` sign is already strongly LTR. I agree, that is an area where we have vagueness. Another thing is to have key lists because what you see in visual order may not be what is written in logical order, so we have to be careful. + +EAO: I would like the LTR mark to be in the `name` construct and not all the places where `name` shows up. + +APP: NFC doesn’t interact with that. + +MIH: It really feels like we are micro optimizing, but I’m not sure for what. I expect translators to use professional tools to edit messages. It might be useful only once in a while where someone uses a text editor to fix stuff. It feels like we’re designing a programming language and worry about what happens if people edit their source code in Notepad. No one edits code in Notepad. When it comes to inserting the marks, we don’t know how it will negatively affect messages. And lastly, we cannot say “I think it works” without trying it because there are lots of text editors that don’t support Bidi correctly. + +EAO: Look at #847 to see why we are considering things as a corner case. I think we should be conformant with UAX 31 and UTS 55. + +MIH: I already understand what we’re trying to do. If we try to follow specs, then we should make our text reflect that we introduce a feature to solve which statement in which spec, but not because we feel like it or think it would be good. + +APP: I think we should be conformant with Unicode specs. There are some things whitespace-wise that come to the front. I pushed the Bidi discussion because that issue is not covered by that. + +## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) [was “registry maintenance”] +_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ + +https://github.com/unicode-org/message-format-wg/pull/634 + +APP: Based on last week’s discussion where we would move from “registries” to specifications of “standard” and “optional” functions. + +EAO: This is leading to spec language where we need labels on functions and options. I would like feedback from others on that. APP, you proposed `accepted`, `released`, and `deprecated`. Some iteration on names would be helpful. + +ECH: Why not reuse the terms that ICU uses for APIs? + +APP: I pulled that from somewhere that seemed reasonable, but I’m happy to match what ICU does, which sounds reasonable. + +EAO: I would be very happy to reuse something else as a starting point. Can you find a link to the ICU API states and add it to the PR. + + + +## Topic: AOB? + +(discussion of process) diff --git a/meetings/2024/notes-2024-09-09.md b/meetings/2024/notes-2024-09-09.md new file mode 100644 index 000000000..0af122093 --- /dev/null +++ b/meetings/2024/notes-2024-09-09.md @@ -0,0 +1,167 @@ +# 9 September 2024 | MessageFormat Working Group Teleconference + +### Attendees +- Addison Phillips - Unicode (APP) - chair +- Mihai Niță - Google (MIH) +- Eemeli Aro - Mozilla (EAO) +- Mark Davis - Google (MED) +- Tim Chevalier - Igalia (TIM) +- Richard Gibson - OpenJSF (RGN) +- Harmit Goswami - Mozilla (HGO) + +Scribe: HGO + +## Topic: LDML46 and the end of Technical Preview +_The v46 release is upcoming. There is also a desire to finish the 2.0 release (exit technical preview). Let’s discuss the practical considerations for doing both, including the possibility of a 46.1. This is also the section of the meeting in which we’ll set out the goals for the next 2-3 days._ + +[APP]: Current plan for #46 is to bookmark where we’re at and run the spec out. We still call it a technical preview but release out to-date work + +[MED]: Deadline is 25th for tech preview, we need time for back-and-forth and review, I don’t see time for that so we should target end of November to be done in this community + +[EAO]: Why do we need to complete this in this calendar year? + +[MED]: Funding issues, also without a forcing factor, this group might take ages. A deadline helps us to get done + +[EAO]: My concern with finishing the tech preview is that we will need to await on external inputs (Although I like the deadline) + +[MED]: If this is done properly, we can fix problems later (if it’s done properly). Trying to perfect it now is risky. + +[APP]: I think we can do enough to go ask the larger community prior to finishing the core issues remaining. We can run off a copy of #46 as a ‘stake in the ground’ + +[MED]: Sounds good, we don't want to force things into the tech preview since there’s only a week. + +[EAO]: Wanted to clarify the parts of the spec that are not able to be complete within the week. If people outside this group have different thoughts, I’m concerned the balance between opinions and decisions we can make will get out of hand, and worst-case can lead to a v3 release + +[APP]: Most of the concerns are regarding syntax. I agree, but people who don’t like the syntax will either have to live with it or create their own standard. We’ve reached our goals with what we wanted to accomplish with the syntax, other people can discuss whitespace, etc., but that won’t be in MF2. We can’t keep opening that box. + +[APP]: On monday, we’ll finalize what to add in, and submit on wednesday. + +## Topic: PR Review +Timeboxed review of items ready for merge. + + +## Topic: … (#879) +[Merged] + +## Topic: … (#878) +[MED and EAO approve, merged] + + +## Topic: Selection-declaration (#824, #873, #872) +_Discuss the design options seeking WG consensus. Timeboxed to 15 minutes._ + +- https://github.com/unicode-org/message-format-wg/blob/main/exploration/selection-declaration.md +- https://github.com/unicode-org/message-format-wg/issues/873 +- https://github.com/unicode-org/message-format-wg/issues/872 + +[APP]: There wasn’t a consensus on #873, but solution F seems to be getting an emergent consensus. I think that’s the proposal on the table, any challenges? + +[MED]: I think it’s suboptimal, but can be extensively modified in the future (see solution E). I think it’s good for release 46. + +[APP]: I’m also unhappy with it currently + +[EAO]: If there’s a desire to make this backwards extensible, then we need to reserve the space in the syntax, opposed to what we currently do + +[APP]: Or we look at our stability guarantee to see if we can make that change + +[MED]: The key thing people want is backwards compatibility + +[EAO]: In our current stability policy, the 2.0 parser should parse without syntax error a message made in 2.1 … 2.n version. So then I feel we must reserve the space + +[MED]: I think it’s a mistake to promise the syntax is forwards and backwards compatible, since that ties our hands for the future. Changing forward compatibility needs a good reason, but tying our hands now can be bad, as I’ve seen in my career + +[EAO]: I’d be okay with no forwards compatibility. This also lets us drop all the reserved structures from the syntax. + +[MIH]: I have mixed feelings about dropping. L10n tools would work, which is the main benefit. On the other hand, currently having reserved structures is clunky, so I’m okay with removing forwards compatibility + +[APP]: I think it’s a reasonable evil. I doubt we’ll use the structure but I could be wrong. + +[EAO]: I’d be okay with losing forwards compatibility, partly due to this, but also because it’ll help us simplify a lot, and can get rid of all the reserved stuff. Effectively, everything that’s an error can be fixed later. It’d also make me less unhappy about rushing this out of the tech preview, since we have more options in the future + +[APP]: We’re suggesting that we can make additions to the syntax that won’t break your compatibility? [All: yes] + +[APP]: Okay so we need to rework our guarantee (MED: to guarantee backwards compat, but not forwards), remove reserved structures, and move forward with solution F? [MED]: Yup [APP]: EAO, we should do a PR for solution F first, then make 2 additional PRs + +## Topic: Disallow “whitespace or special char” prefixed `.` in reserved-statement’s body (#840) +_Discuss making this technical change in the reserved-statement syntax._ + +[APP]: Now out of scope! + +## Topic: Bidi design (#811) +_Bidi and whitespace options need to be discussed in light of the design document._ + +https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md + + +[APP]: A piece of homework for this topic was to review the ALM mark, which has an effect when used BEFORE a sequence of characters, but not when you add it to the end of a token. The way we use strong characters in the syntax, there’s not many ways you can incorporate ALM into it. + +[EAO]: So you propose we drop ALM from the allowed things? + +[APP]: It’s an allowed character but not allowed in the syntax + +… + + +## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) [was “registry maintenance”] +_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ + +[APP]: Should I add this in as proposed and we iterate later? [No objections] + +## Topic: Uniqueness (#869, #847) +_String equality (used in key matching or operand uniqueness) is affected by Unicode Normalization concerns. We need to decide whether to require a specific normalization form (typically NFC) or whether we warn users about the consequences of using denormalized values._ + +[APP]: We should address string equality, given the nature of Unicode. + +[EAO]: Mentioning that we have option and attribute names checking for uniqueness + +[MIH]: My take is that I strongly favor comparing strings as they are without normalization. If you want to normalize, you are free to do it outside, but in terms of preprocessing, what gets to MF2 is processed as is. + +[MED]: Almost every process nowadays has access to NFC normalization, if the dataset is small. You can do a very quick check to see if a text looks suspicious or not. I’m more worried about odd errors hitting people, since one implementation normalizes and another does not. This won’t affect European languages as much, but it’ll hit other languages a lot + +[APP]: I’ve always wanted people to check for normalization. If we want broad adoption, not insisting on normalization will help, but then we have to warn people that naming variables “options” and “operand”, etc. is a bad idea. + +[MED]: I see two issues. One, if all comparisons are within MF2 itself, and the second, if it depends on parameters and whether or not the parameters are normalized. I think it’s a mistake not to have a ‘SHOULD’ that comparisons should be done with normalization if possible. + +[EAO]: Agree with MED, SHOULD is good but MUST is too much of a fight + +[APP]: Should is hard to test though + +[MED]: I don’t think it’s too hard, you can easily provide such test cases. You can mark the test cases as they’re SHOULD + +[APP]: If we give authoring guidance that you should use normalized values, but the implementation doesn’t require normalization, then you can get yourself into trouble since it may sometimes work and other times won't. If you write a normalization-sensitive message, then it’s liable to cause problems, and there should be a warning + +[EAO]: I still think we should have a SHOULD. In the spec, you can get noticeable differences in behavior between normalized and non-normalized messages. + +[APP]: Agreed, it should be given to the author. + +[MED]: If we don’t go for MUST, then we should go with ‘MF2 text should be normalized with NFC, and parameters should be compared with normalization’. There can also be a section of the site that talks about implementation features, and this isn’t as formal so can be modified easily over time. + +[APP]: Normalizing the whole message is a bad idea since we have quoted text pieces that we promise as verbatim. That’s why I say it should be inside the comparison. I understand EAOs point, but if some messages behave differently in different environments, then I think it’s okay to just put a warning sticker there. + +[EAO]: It’s either we enforce with a MUST, or recommend with SHOULD, and handle the diverging corner cases + +[MED]: Agreed. The SHOULD should be put on building the message and comparisons. + +[MIH]: We still have to say the comparison should be normalized away. The comparison should be there no matter what. As an implementer, I don't really care since I implement on top of ICU. I’m still reluctant to ask for normalization behavior at runtime, but whatever + +[EAO]: Comparisons is the only place we should put the SHOULD, since that’s the only thing we control [All agree] + +[EAO]: We might also want to include a definition for ‘unique’ and ‘duplicate’, so we can point to those definitions in the PR + +[MIH]: I’m reluctant to claim a user should normalize an ArgMap, it’s just not that obvious. There might be use-cases where I want the denormalized form, and I can imagine a use-case + +[EAO]: My implementation plan won’t include normalizing the ArgMap, since it’ll be ASCII only. + + +## Topic: Issue review +https://github.com/unicode-org/message-format-wg/issues +Currently we have 56 open (was 60 last time). +- 14 are Preview-Feedback +- 1 is resolve-candidate and proposed for close. +- 3 are Agenda+ and proposed for discussion. +- 1 is a ballot + + + +## Topic: AOB? + diff --git a/meetings/2024/notes-2024-09-10.md b/meetings/2024/notes-2024-09-10.md new file mode 100644 index 000000000..96023890e --- /dev/null +++ b/meetings/2024/notes-2024-09-10.md @@ -0,0 +1,361 @@ +# Sep 10, 2024 | [MFWG: Virtual F2F](https://www.google.com/calendar/event?eid=MGw2M2M5czZzYWw4ZnRwMTlhZG01N2dyYWZfMjAyNDA5MTBUMTYzMDAwWiBhZGRpc29uQHVuaWNvZGUub3Jn) + +### Attendees + +- Addison Phillips \- Unicode (APP) \- chair +- Eemeli Aro \- Mozilla (EAO) +- Mark Davis \- Google (MED) +- Mihai Niță \- Google (MIH) +- Elango Cheran \- Google (ECH) +- Staś Małolepszy \- Google (STA) + + +**Scribe:** MIH + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## Topic: Tech Preview + +Let’s review the Task List: + +[https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing](https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing) + +## Topic: PR Review + +*Timeboxed review of items ready for merge.* + +| PR | Description | Recommendation | +| ----- | ----- | ----- | +| #883 | [Remove forward-compatibility promise and all reserved & private syntax](https://github.com/unicode-org/message-format-wg/pull/883) | Merge | +| #882 | Specify bad-option for bad digit size option values | Discuss | +| #877 | [Match on variables instead of expressions](https://github.com/unicode-org/message-format-wg/pull/877) | Merge | +| #869 | Add section on Uniqueness and Equality | Discuss | +| #859 | \[DESIGN\] Number selection design refinements | Discuss | +| #846 | Add Unicode Registry definition | Discuss (\#634) | +| #842 | Match numbers numerically | Discuss | +| #840 | Disallow whitespace and special char prefixed . in reserved-statement’s body | Reject (Out-of-scope) | +| #823 | Define function composition for :number and :integer values | Discuss | +| #814 | Define function composition for date/time values | Discuss | +| #806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | +| #799 | Unify input and local declarations in model | Discuss | +| #798 | Define function composition for :string values | Discuss | +| #728 | Add "resolved values" section to formatting | Blocked by \#806 and \#798 | +| #673 | Fix whitespace conformance to match UAX31 | Discuss | +| #646 | Update spec as if PR \#645 were accepted | Discuss | +| #634 | [\[DESIGN\] Maintaining the Standard, Optional and Unicode Namespace Function Sets](https://github.com/unicode-org/message-format-wg/pull/634) | Discuss (Agenda+) | +| #584 | Add new terms to glossary | Discuss | + +## Topic: Resolved Values (646, 728, 798, 806, 814, 823, 842, 859) + +_This is the most controversial topic in Tech Preview and blocks a large number of our PRs as well as our exit from preview. The resolution to this should be achievable._ + +## Topic: Bidi design (#811) + +_Bidi and whitespace options need to be discussed in light of the design document._ + +[https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md](https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md) + +## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) \[was “registry maintenance”\] + +_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ + +## Topic: Issue review +[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) + +Currently we have 61 open (was 60 last time). + +* 15 are `Preview-Feedback` +* 0 are `resolve-candidate` and proposed for close. +* 2 are `Agenda+` and proposed for discussion. +* 1 is a ballot + +| Issue | Description | Recommendation | +| ----- | ----- | ----- | +| #865 | TC39-TG2 would like to see completion of the TG5 study | Discuss | +| #881 | Should we drop private-use annotations? | Discuss | +| #847 | Conformance with UAX\#31 and UTS\#55 | Discuss | +| #735 | Recovery from data model errors | Resolve | + +## **\#\# Topic: Design Status Review** + +| Doc | Description | Status | +| ----- | ----- | ----- | +| bidi-usability | Manage bidi isolation | Proposed, Discuss | +| dataflow-composability | Data Flow for Composable Functions | Proposed | +| function-composition-part-1 | Function Composition | Proposed | +| maintaining-registry | Maintaining the function registry | Proposed (\#624), Discuss | +| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | +| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | +| beauty-contest | Choose between syntax options | Obsolete | +| selection-matching-options | Selection Matching Options (ballot) | Obsolete | +| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | +| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | +| formatted-parts | Define how format-to-parts works | Rejected | +| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | +| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | +| code-mode-introducer | Choose the pattern for complex messages | Accepted | +| data-driven-tests | Capture the planned approach for the test suite | Accepted | +| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | +| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | +| error-handling | Decide whether and what implementations do after a runtime error | Accepted | +| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | +| expression-attributes | Define how attributes may be attached to expressions | Accepted | +| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | +| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | +| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | +| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | +| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | + +## Topic: AOB? + +=== + +#603 omitting the `*` key when the msg authors thing they are exhaustive + +EAO : an example would be French, there the number of options went up in time. So what was exhaustive then it was not. +It can be exhaustive for a boolean. + +APP: a fallback option if nothing matches, which would be different from \* as the most likely option. + +MED: there are 2 conflicting things. The reason plurals work is because there is a default. If there is a default value, then that’s identical to \`\*\`. + +EAO: I am happier to leave it open. Now that we don’t have a guarantee for forward compatibility. + +APP: this was also working before we changed the +`*` is technically different from `other`, in the matching algorithm. Technically you can write a plural algorithm that recognizes \`other\` as a keyword. + +EAO: if we leave out the `*`, with the current algorithm when nothing matches the selector is going to end up to `*`. +Maybe we should reconsider `other` in `:number`. +I don’t think we need that, with the + +MED: Guides constructing things by hand, because you don’t need to write entries for both \`other\` and \`\*\`. +We can put in a note that it is a tech preview, and might be relaxed in the future. +APP: I think we should stay with what we have and keep it for the future. + +MED: if we really care about conciseness we can invent some kind of fall-through. + +EAO: what I was saying about `:number` is … \[reading from spec\]: +> Apply the rules to the resolved value of the operand and the relevant function options, and return the resulting keyword. If no rules match, return \`other\`. +They return `other` I don’t think we need. + +MED: I think we should leave this alone. +I have some strong opinions about how things are resolved, but I would leave it as is for now. + +--- + +APP: Error handling. I think we are now done with error handling. +EAO, are we now done with all the tests? + +EAO: I did not check that all the tests if all cases where errors are expected are updated. + +APP: Bidi / whitespace handling, we discussed. We have a design. We need to discuss, we also discussed a bit yesterday. +I wait for the EAO change that removes the resolved. + +Interchange data model: informative. + +EAO: PR #799 + +APP: since it is not a deliverable, we can put it aside until we release. +This becomes even more interesting because of what we did with \`.match\`. It might be easier if we unify. +We should do this for 2.0, not necessarily for LDML 46\. + +APP: other things on the data model? + +EAO: should namespaces be part of the variable or keep them as one. + +EAO: XLIFF + +APP & MED: nice to have. + +EAO: For 2.0 as well? + +EAO: I can present what I have. It does not require extensions to XLIFF. + +APP: XLIFF is not a deliverable anymore. + +EAO: XLIFF is still listed, see [https://github.com/unicode-org/message-format-wg/blob/main/docs/goals.md\#goals](https://github.com/unicode-org/message-format-wg/blob/main/docs/goals.md#goals) +If we drop XLIFF, we have to make an explicit note about that decision. + +EAO: I would still be happy to present my XLIFF mapping later. + +MED: I don’t think that XLIFF is needed for MF 2.0. Needs a lot of testing. It is binding to another standard, and need to make sure it works for people who already use XLIFF. + +APP: I agree. +I want to push past this. This is interesting and important, but we need to solve what we must release now. + +EAO: at Mozilla I find the data model the most useful part. + +MIH: the data model is in ICU4C and ICU4J, and it is a public API. + +MED: we need to pun a pin in this, and will not be there in LDML 46\. + +EAO: I think it should be published somewhere. +Given that we have several implementations. + +MED: I am not against having the data model, but tbd if we do something with it. + +--- + +APP: the function registry is not a registry anymore. We have “function sets”. +We need to update everything. + +EAO: namespace \`u:\`. Introduces \`locale\`, \`id\`, and \`dir\`. +Would be good to finalize this. Affects the \`syntax.md\` +There should be a note to discourage rolling your own implementation of such functionality. + +MIH: do we really need to change the name from “function registry”? Because now it is public API in ICU. + +APP: now we don’t have a machine readable format. + +MED: for ICU we will have to have all function capabilities of MF1. + +APP: we did that in 45 + +EAO: back to “don’t roll your own locale”, this is what we have: +[https://github.com/unicode-org/message-format-wg/pull/845/files\#diff-dd0b88aaa872a181a51fffcc6c3ba8a005b84075c053b70b6693e92e41ea00c9L738](https://github.com/unicode-org/message-format-wg/pull/845/files#diff-dd0b88aaa872a181a51fffcc6c3ba8a005b84075c053b70b6693e92e41ea00c9L738) + +APP: Would be good to land 846 (the \`u:\` namespace). And make sure that the text “don’t roll your own” is there. + +EAO: if we land 846 we don’t need a note. + +APP: for function sets this can be post LDML 46\. +We probably need an update to \`registry.md\` + +APP: markup, \#650. + +MED: no need to fix now. + +APP: but we need to close this somehow. + +APP: expression attributes. We did (?) + +APP: tests, we need to make sure that we have them. + +APP: we have a PR list, and maybe a couple that we can merge. + +- PR: Remove forward-compatibility promise and all reserved & private syntax #883 + +APP: are we ready to merge? + +MED: very important to have a note saying what “deprecated” means. It means it should not be used, but it will never be removed. Because (for example) ISO just removes stuff when it is deprecated. + +EAO: I would be interested if STS has an opinion on this. + +STA: I don’t know :-) + +APP: summary: yesterday we decided that we remove all reserved parts, because we drop the request for forward stability. +So in the future we can do whatever we want, as long as we don’t break the old stuff. So MF 2.1 can read and understand 2.0, but a MF 2.0 engine might fail to read MF 2.1 syntax. + +STA: in Seville we didn’t yet have namespaces. + +APP: we also envisioned being able to write one parser that is future proof. + +EAO: also for STA: another aspect from yesterday is that there is strong pressure for us to deliver 2.0 by the end of this year. +It is much harder to make sure what what we release is also future-prof. + +- PR: Match on variables instead of expressions #877 + +--- + +Housekeeping. + +Issue #735: Recovery from data model errors #735 +APP: I intend to close this. I think that the decisions on error handling cover this. + +Issue #881, Should we drop private-use annotations? #881 +APP: we just discussed + +Issue #673 (WRONG number) +APP: whitespace conformance. I have a PR for this (#673) +Also related to the BiDi design document. +I am waiting for the other big changes before trying to merge this. + +MIH: we have several issue for function compositions, separate for :number / :integer, or :string, or :date / :time +But that is not very useful, or interesting. All it does is combine options bags for formatting / selection. It saves typing, it’s all it does. Instead of typing 3 options and 8 options more, I can only type the updates. + +MIH: The more useful one is transforming functions. +Take a person and gets the date of birth. So formatting a person is completely different than formatting a date. The option bags are different, they don’t merge. + +MED: yes, transforming functions are very handy. Think uppercase transforms, or normalization. + +MED: by “mutating” I don’t mean mutating the input value. + +STA: I am very happy that this topic shows up when I join meetings. +Options: +* save typing +* extract option (get a field, like a date, of gender) +* inspect a value. This is how I imagined the grammatical accord. We should accept that certain functions only work with other functions. As a translator I can make sure things work together. + +STA: I don’t claim to have all the answers, or how to say it in the spec. +EAO: one of the reasons for the series of PRs is that we’ve been covering the same ground over several conversations, and with explicit functions we can work on concrete functions. + +MIH: the inputs now can be all kinds of types. A \`:date\` formatter can take a Java Date, or Temporal, or Calendar, or even a long (as epoch time). And we return a formatted-to-parts list of objects, which can be passed to the next function in the chain. + +MED: for the functionality we need right now we don’t need the concept of a “resolved value” +... +We don’t need to decide this for 2.0. + +EAO: is this a valid message, or not? +``` +.local $x \= {2.1 :integer} +{{{12.3 :number minimumFractionDigits=$x}}} +``` +This is internal to MF2, and the behavior should be the same in all implementations. + +APP: I agree that this is a good illustration. +There is a tension between the idea of immutability, and that the annotation does something to the variable. +We should resolve the above, if the above is an assignment, or we just put \`$x\` in a map? + +MED: my inclination is that \`:integer\` and \`:number\` don’t change the value. +They only format and select. +If you want a mutating, returning a number, we need another kind of function. + +EAO: the case of a person that interacts with MF2, when they see something like the above, they will presume that \`$x\` is assigned, and it is an integer. +If not, should it have a string value? Or some kind of number? We processed the input a bit, but not much. +I would argue that it should be an integer type, with an integer value. + +STA: I like the example. And I have 2 obs. +One, nobody should do messages like this. +We should yield control to the function itself. +``` +.local $x \= {2.1 :integer signDisplay=always} +{{{12.3 :number signDisplay=$x}}} +``` + +APP: I find this example weird. I see what you are doing, but I can see myself spending time explaining this to localization engineers. +Every function should say: these are the types it can take, and what can be put out. +And we can dodge the question, somewhat. +Especially now that we don’t have a match repeating the expression. + +MIH: I would argue that right now, for a plural implementation, the :number does return some kind of numeric value. Because when one does \`.match {$foo :number}\`, to make the decision one has to do the operations described in CLDR. Which is do \`$foo\` modulo 100, and if the result is between 10 and 20 then the plural is \`many\` (for example). +But to do this kind of modulo operations it means that the \`$foo\` is some kind of numeric value, not a string. + +EAO: I think I agree with Stas, to say that each function can define it’s own resolved value, with resolved options. +A function is allowed to do anything it wants. + +APP: as a function author I can implement a \`:number\` function that returns a string. But it is not mandated to return a string. + +EAO: if we describe a resolved value in spec we can help an implementer understand how this would work. + +STA: implementations should allow functions that return something other than string. + +APP: a function might return a resolved value that can be something other than string. + +APP: I can imagine a “part of speech” class, a subclass of string, but would have attributes other than strings. It would be implementation specific. +We can describe that, the trouble is how to do it. + +APP: +1. What are going to do here +2. Next meeting? + +CONSENSUS: + +* A function MUST define its resolved value. The resolved value MAY be different from the value of the operand of the function. It MAY be an implementation specific type. It is not required to be the same type as the operand. + +* A function MUST define its resolved options. The resolved options MAY be different from the options of the function. + +Timebox discussion of :u and whether its discoverable or handled at the processor level diff --git a/meetings/2024/notes-2024-09-16.md b/meetings/2024/notes-2024-09-16.md new file mode 100644 index 000000000..752a28301 --- /dev/null +++ b/meetings/2024/notes-2024-09-16.md @@ -0,0 +1,398 @@ +# 16 September 2024 | MessageFormat Working Group Teleconference + +### Attendees + +- Addison Phillips \- Unicode (APP) \- chair +- Mark Davis \- Google (MED) (first 30 min) +- Mihai Niță \- Google (MIH) +- Eemeli Aro \- Mozilla (EAO) +- Richard Gibson \- OpenJSF (RGN) +- Harmit Goswami \- Mozilla (HGO) +- Matt Radbourne \- Bloomberg (MRR) +- Elango Cheran \- Google (ECH) +- Tim Chevalier \- Igalia (TIM) + + + +**Scribe:** MRR + + +## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) + +**Next week: cancel call because TPAC and LDML46 spec beta?** + +## Topic: Info Share + +Addison: [https://github.com/tc39/tg5/issues/3\#issuecomment-2350218930](https://github.com/tc39/tg5/issues/3#issuecomment-2350218930) + +You may want to look at the comments I’ve made. I’ve made them without my chair hat on. I’d appreciate others looking at them. + +EAO: The JS implementation has a PR open. Then it will be up to date with the current state of the spec. +Second thing: I’v’m talking at the Unicode Tech Workshop about message resources. + +APP: I think you and I will tag team. + +MED: Myself and Elango (ECH) will be there so we can meet some of you in person. + +## Topic: LDML46 Final Touches + +*\_Let’s make sure we address open issues for LDML46 and reach consensus of what is included in our milestone Tech Preview release.\_* + +- Syntax freeze? +- Add a note about renaming the function registry or should we change it now? See [https://github.com/unicode-org/message-format-wg/blob/main/exploration/maintaining-registry.md](https://github.com/unicode-org/message-format-wg/blob/main/exploration/maintaining-registry.md) +- Composition + +APP: One of the open PRs has changes to whitespace and bi-di. I don’t know how much churn that would introduce for implementers. We’re getting close to behaving as if we have a syntax freeze. We’ll want to discuss what syntax freezes we have in 46\. + +APP: We did agree that we’d get rid of the idea of a function registry. The section is still called “Registry” so we either want to fast-track some renaming of this or provide some explanatory text. + +MED: Leaving a note is perfectly fine. Section headings and things can change before the .1 release. +If a note is easy and there’s a lot of stuff piled up, a note is fine. + +APP: I’ll fast-track a note and will be looking for approvals on that. + +APP: The other thing is function composition. We have a rough consensus but the devil is in the detail and we’re not going to do this for 46\. Do we want to say something in 46 about ‘this is the shape of what we’re doing’. + +MED: We don’t need to prematurely signal where we might be going until it’s really solid. + +APP: There is a note in there. + +EAO: It would be good to know if TIM will be participating in this discussion. + +APP: Yes, but not at this moment, + +APP: If we don’t agree to merge it today, it’s not going into 46\. EAO, I saw you raised a PR with a typo, we can fast-track that, + +MED: Typos can come in afterwards. Clear obvious small changes. + +## Topic: PR Review + +*Timeboxed review of items ready for merge.* + +| PR | Description | Recommendation | +| ----- | ----- | ----- | +| \#885 | [Address name and literal equality](https://github.com/unicode-org/message-format-wg/pull/885) | Discuss | +| \#884 | [Add bidi support and address UAX31/UTS55 requirements](https://github.com/unicode-org/message-format-wg/pull/884) | Discuss | +| \#882 | [Specify `bad-option` for bad digit size option values](https://github.com/unicode-org/message-format-wg/pull/882) | Merge | +| \#869 | [Add section on Uniqueness and Equality](https://github.com/unicode-org/message-format-wg/pull/869) | Competes with \#885 | +| \#859 | \[DESIGN\] Number selection design refinements | Merge (Proposed) | +| \#846 | Add Unicode Registry definition | Discuss (\#634) | +| \#842 | Match numbers numerically | Discuss (Reject) | +| \#823 | Define function composition for :number and :integer values | Discuss | +| \#814 | Define function composition for date/time values | Discuss | +| \#806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | +| \#799 | Unify input and local declarations in model | Discuss | +| \#798 | Define function composition for :string values | Discuss | +| \#728 | Add "resolved values" section to formatting | Blocked by \#806 and \#798 | +| \#673 | Fix whitespace conformance to match UAX31 | Discuss | +| \#646 | Update spec as if PR \#645 were accepted | Discuss | +| \#584 | Add new terms to glossary | Discuss | + +## Topic: String Equality (#885, #869) + +*Addison proposed changes to address string equality. There is one controversial detail: whether literals require NFC for equality or not.* + +APP: \#885 I don’t think theres any disagreement around name equality. +The only place where our spec does literal matching is with key values. +Literals themselves are not constrained \- they’re just strings. +The question is \- do we want to require a key comparison to be done under NFC or roco\[?\] points? + +EAO: I don’t think we need to address this but we can: Are the keys normalized? +We _need_ to define equality for key lists. + +APP: We don’t have to require implementations to do any normalization. +If we do equality, we have to do NFC on the values or at least check that they aren’t normalized. + +MED: I think EAO is right that we separate these two items \- you generate a ‘duplicate key’ error if the key list is equal according to canonical \[?\]. Secondly, whether or not this is done before we pass a literal to a function or we leave it up to the functions. + +EAO: My preference is to allow normalization before, but use as if normalized. + +MIH: Do we want to leave the freedom to functions? I don’t find good use cases but it’s a custom function so it can do what it wants. + +APP: I agree with that. If we say we don’t normalize and then allow option values or operands. If denormalization works only some of the time, that’s cognitively tricky, versus saying ‘we’re not going to normalize these for you. + +``` +.local $angstromsAreCool = {Å :string} +.match $angstromsAreCool +Å {{U+212B is the only way to be cool}} +Å {{I'm U+00C5, so almost cool}} +Å {{I'm A + U+030A, so I combine with cool}} +* {{I'm not cool}} +``` +We’re not going to _stop_ you normalizing your data. + +MED: The implementation load is minimal for normalizing literals. +I think it’s far far more likely that people will have errors because of denormalized text rather than them wanting to do something with denormalized text. If we want this feature, we can think of a syntax but I don’t see a need. + +EAO: I would be fine with us handling key values differently from literal values because they feel syntax-y from a user point-of-view. When thinking about implementation and the requirement of match selector keys returning exact inputs. If I want to enable normalization to happen with my custom matching function, it becomes weird (e.g. hanging on to original values, comparison on normalized but return unnormalized values.) My \_key\_ point here is that we can define behaviour for keys separately from what we define for literals elsewhere in the spec. + +APP: I agree. Since we require key list uniqueness. Having a note about not requiring keys to be normalized. I can imagine lots of places where I want to do use a denormalized string as an operand. It behaves like text and we know how to handle text. For the operations that we control, saying this makes sense. + +MED: I’d prefer to not jump on this for literals before release. I think this takes a little more thought. + +APP: They already can be. What we’re saying is that the function will get whatever is there. We should clarify key equality and key comparison. + +MED: Requiring the comparison to be canonically equivalent (normalizing NFC) is good. I think normalizing before passing to selector. I certainly would want to make it possible to do. It’s a tricky subject. + +\[MED leaves\] + +MIH: I think if they are passed normalized then they should be returned normalized. + +EAO: We should do the same with option names. They’re compared as NFC but their values are not NFC. At the point of passing to a custom function, we don’t have any language saying what will happen. The normalizations should correspond with each other. + +ECH: \+1 to the idea that they are canonically equivalent. + +EAO: Happy to leave further consideration until later. + +APP: So implementations should normalize key names? I’m cool with that. Do we want to require a name to be NFC? + +EAO: Again, we should not talk about ’name’ in general, but with option names. We have the same non-duplication. We end up passing the option name uncanonicalized. If what we’re doing with option names and key values and attribute values would all match each other. If we talk about those specifically and not ‘names’ it would help. + +ECH: It's an interesting discussion but I'm not sure if we need to enforce people to provide things in NFC composed form. I think it’s good for checking equality. When it comes to function option names, checking for duplicates is useful. I think you just need to know what the contract is with the function. Maybe also we can revisit this and not worry for the time being. + +APP: It’s not so much that people are going to use denormalized latin script, it’s that they’ll use the domain things in their own language. When we say comparisons are done (see note in PR) and the name is not normalized, we treat them as equal so you can’t have the same names. In practice most people are going to choose rational values (things that don’t change because of encoding etc.). I’ve seen plenty of code written in different languages (e.g. variable names in Russian). + +MIH: I would be inclined to really normalize the names as if they were equal. In ICU, I think I put them in a map. If there’s a requirement to pass the real thing, it’s just weird. I can’t think of a good use case where people really care about this. + +EAO: I agree with MIH. I think we ought to normalize the string values of keys, option names and attribute names. I don’t know that we need to normalize anything else. Anything else can be normalized with a function, but this can be a later discussion. As a side node, I believe MF2 will lead us to localize our variable names. In this sort of use with Finnish, it made more sense for me to use Finnish variable names. + +APP: We would like all of our comparisons to be under normalization. +Permit literals that are not compared internally. +We think we might impose NFC on identifier name in future but not in 46\. Is that a fair summary? Can I make changes to the PR? + +EAO: I support that. I’m not hearing objections to requiring the normalization with keys, option names and attribute names. We could do that in 46 or later. + +APP: I will not merge today. + +## Topic: Whitespace Handling (\#884, \#847) + +*This pull request implements the design discussion from \#811 (“bidi-usability.md” design) and addresses UAX31/UTS55 requirements. Discuss merging.* + +APP: This implements the loose part of the bi-di design. It also changes whitespace handling \- as a result, it replaces S-production with an O-production for required whitespace. There’s text in the spec to deal with UAX requirements which are not a material change. The biggest kicker is to allow some of the bi-di markers into the syntax outside of text. + +EAO: I think I approved this PR. + +APP: You did. +Re. syntax stabilization, I’d like to say this is pretty close to the people who are tracking our progress. It makes a lot of very small changes to the optionality of whitespace (removing some square brackets). + +EAO: I propose we merge. + +APP: Any objection? + +TIM: There’s no spec tests, since there are a fair amount of changes being made to the ABNF. + +APP: I agree \- there are spec changes. The tests would need to be updated to have a bunch of the bi-di controls. + +EAO: Could we add the tests as a separate further change. + +TIM: Fine with me. + +APP: Any objection to merging this? I see none. I see some agreement. \[Merged\] + +APP: Anything else on whitespace. + +EAO: Track issue for tests. Separately, adding the recommended text “if you’re emitting message format 2, this is how you should be doing the bi-di output. Like with the data model, we could have a recommended part \- “these are instructions that you should be following but we’re not requiring you to do so. + +## 882 + +EAO: For boolean values that expect “true” and get a different literal string value, we’d expect them to behave the same as digit size options. + +APP: We have a task to specify additional places for this option. I’m going to squash and merge this one. + +MIH: I can’t find any boolean type of thing. \[Merged\] + +## Topic: Number Selection (#859, #842, #823) + +_Let’s resolve how number selection is described. We have some PRs loosely coupled to this, notably the design doc in 859 and @eemeli’s proposal to use number value selection in 842._ + +APP: \#859 is a change to the design document based on comments by EAO around matching numerically. It changes the status of the design from ‘approved’ back to ‘proposed’. Does anyone mind if we merge, knowing that we’ve captured this in the design document. + +EAO: We could iterate on the PR with the changes. Reopening is fine with me as well. I’d prefer using a different term than ‘proposed’ like ‘reopened’ to indicate that it might have a more colorful history. + +APP: I’ll do that. Do we want to talk about number selection today? + +EAO: I’d be happy to talk about that. + +APP: Current state is that we currently say something about using a serialization of a number as the thing that gets compared. EAO’s proposal is to change it to actual numeric comparison. + +EAO: The two really viable options: +Do the selection, ignoring all of the options on :number, because different implementations will understand the options differently (e.g. rounding \- we don’t define how that happens). I think the only really reasonable we to get consistent bahaviour is to ignore all options. +Or leave as-is but clarify that exact value selection is implementation-defined. +I’m not aware of other satisfactory options. + +MIH: I’d be happy to introduce something that looks like a numeric type that can be platform-specific. What we have now is just for exact keys, which are relatively rarely used. + +APP: 0 and 1 are used a lot. + +MIH: I’ve never seen an exact key that looks like an arbitrary precision. If somebody needs something like that, it’s a custom function. It’s easy to say that the values in this function are strings. It’s something that we can add to the plural later on. If we discover it’s not enough, number can also accept the arbitrary precision value. + +APP: I would urge people to read through the long thread on \#842. E.g. in plurals, having fraction digits selects a different value. I would want our key definition to be as clear as we can make it. And that certain kinds of matching may have idiosyncrasies. I think there are corner cases where people want to do integer matching. Occasionally fractional values get matched \- the most memorable example for me: 0.00 gets turned into ‘free’ when you have a currency value. We shouldn’t make it impossible but maybe we don’t have to specify all of the rounding etc. that EAO mentions. We’ll conflict with different programming languages. + +MIH: With plural $1 vs $1.00, it’s not about exact values. You really make it a number and apply the rules from CLDR. You make it a number anyway. It’s true that we care about the decimals for plural selection but not for exact match.We’re not blocking ourselves and can bring it back when people need it. If we treat everything as strings, people have to parse a string to a number. Libraries for MF2 should implement string \-\> number, which feels very clunky. + +APP: Where are we at? I don’t think we want to merge EAO’s proposal today. Our current wording attempts to solve this problem in a specific way but it doesn’t sound like we’re happy with it. It doesn’t sound like we’re going to fix this in 46\. + +EAO: I can imagine custom functions desiring 0.00 to indicate how formatting should happen. If it’s parsed as a numerical value, this information is lost. Behaving differently if it’s quoted vs not quoted, it’s weird. + +MIH: foo=|=0.00| +I would argue that, if somebody needs to make that distinction they can use quotes, etc. +If people want a string, they should treat it as a string. +I can have the options in JSON and we’re back to where we started \- I cannot convert to JSON. + +EAO: An option value of 1.3, everyone agrees on “1.3” but I can think of three different distinct numeric values of this. I think that we’d be imposing a high cost by requiring this within MF2. + +MIH: +``` +options" : { + "maxFractionalDigits": 1.00 +} +``` +\=\> this parses as a 1 (number) + +APP: we just disallowed that earlier in the call +I think the damage is limited to exact match keys. If we can contain it \[to this\] it’s easier. In either case, we don’t have text to merge today. I think a change can be made to the integer text to propose changes. + +EAO: What might be achievable is seeking consensus on whether comparison should be implementation-dependent. + +MIH: I would not merge this as-is. We argued that precision is going to screw you over. Either we care about precision or we don’t care about precision. + +APP: I disagree with actual numeric comparison. I think MED and I are coming from a similar place \- the number you are going to format later is what you are going to compare. EAO, you called out gaps in the current text. I don’t think it’s perfect. For 46, we could put in a note that we’re studying this problem and that comments are welcome. I don’t think we’ve solved it yet. + +EAO: Do we have consensus on it being implementation-defined? + +APP: I _might_. I think we should have clear guidance for authors. It wouldn’t be implementation-dependent and it would enhance portability. I would be open to introduce implementation-defined stuff. We could say, e.g. floating-point is somewhat implementation-defined. I would prefer if we could define it well and define the boundaries. + +MIH: Considering we have a code-freeze in 3 days, we should leave it as implementation-defined. + +APP: It’s not defined as that now, + +EAO: It’s currently implicitly implementation-defined. I don’t remember the exact text but it’s leaving wiggle-room for the implementers. Going from a number to its JSON representation, there’s not one JSON number representation that can be used. APP, you might need to write the write proposal text. I an welcome to be shown I am wrong but someone else will need to propose text. + +APP: In 45, we proposed only integer matching is required: +“Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits.” + +EAO: Might be good to review if the note satisfies some of the conditions we’ve mentioned here. +E.g. 1234 \-\> 12.34 or 12.00 in JSON + +APP: I’ll propose text for 46 that clarifies our note. I think we could fast-track that. + +EAO: It sounds like none of the function composition stuff is going to be merged for 46\. + +APP: That’s accurate. Although I believe we’re now near consensus of what we’re going to do. Am I hearing that we want to sit on :string for the time being. + +MIH: I think that last time we reached a generic way to do compositions. Without being 100% sure that those are bringing us closer to what we decided last week, I’d rather not do this now. + +EAO: The two tasks for function composition are in-line with what we talked about last week. In addition, we’ll still need to define how this stuff works for the functions that we define. + +APP: I think you guys are in violent agreement but not on timing. + +MIH: Timing and order. Since we didn’t fully agree on the generic rule in writing. We can’t say that we agree. + +EAO: We have pretty exact language in the notes from last week but can be returned to later. + +APP: We’re going to skip next week. I’ll use the email list and GitHub to communicate as we go through the 46 stuff. It’ll be effectively what we’ve merged now plus the fat-tracked items discussed in the call, then we’ll resume in 2 weeks. + +EAO: The skip next week is for W3C TPAC. + +APP: Fighting a good fight, but with a different hat\! + + +## Topic: Issue review** +[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) + +Currently we have 50 open (was 56 last time). + +* 14 are `Preview-Feedback` +* 3 are `resolve-candidate` and proposed for close. +* 2 are `Agenda+` and proposed for discussion. +* None are ballots + +| Issue | Description | Recommendation | +| ----- | ----- | ----- | +| \#865 | TC39-TG2 would like to see completion of the TG5 study | Discuss | +| \#847 | [Conformance with UAX \#31 & UTS \#55](https://github.com/unicode-org/message-format-wg/issues/847) | Discuss | +| | | | + + +## Topic: Design Status Review + +| Doc | Description | Status | +| ----- | ----- | ----- | +| bidi-usability | Manage bidi isolation | Accepted | +| dataflow-composability | Data Flow for Composable Functions | Proposed | +| function-composition-part-1 | Function Composition | Proposed | +| maintaining-registry | Maintaining the function registry | Proposed, Discuss | +| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | +| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | +| beauty-contest | Choose between syntax options | Obsolete | +| selection-matching-options | Selection Matching Options (ballot) | Obsolete | +| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | +| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | +| formatted-parts | Define how format-to-parts works | Rejected | +| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | +| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | +| code-mode-introducer | Choose the pattern for complex messages | Accepted | +| data-driven-tests | Capture the planned approach for the test suite | Accepted | +| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | +| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | +| error-handling | Decide whether and what implementations do after a runtime error | Accepted | +| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | +| expression-attributes | Define how attributes may be attached to expressions | Accepted | +| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | +| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | +| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | +| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | +| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | + +## Topic: AOB? + +— + +Chat stuff: + +You +9:34 AM +[https://docs.google.com/document/d/1zofxbu8PdxEpHbRVA1EtHnbPyrmEAPv4\_jqjFL4hx5o/edit](https://docs.google.com/document/d/1zofxbu8PdxEpHbRVA1EtHnbPyrmEAPv4_jqjFL4hx5o/edit?authuser=2) +*keep*Pinned +Mihai ⦅U⦆ Niță +9:36 AM +ICU has code freeze Sept 19\. So what's in by then, that's it (implementation wise) +Elango Cheran +9:51 AM +FYI to those new to Unicode string normalization: [https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html](https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html) +You +10:05 AM +\> \[\!NOTE\] \> Implementations are not required to normalize \_names\_. \> Comparisons of \_name\_ values only need be done "as-if" normalization \> has occured. \> Since most text in the wild is already in NFC \> and since checking for NFC is fast and efficient, \> implementations can often substitute checking for actually applying normalization \> to \_name\_ values. +Elango Cheran +10:09 AM +French and German have combining marks (umlaut, cedilla, accent, etc.) +You +10:09 AM +... but nobody types them denormalized +Mihai ⦅U⦆ Niță +10:20 AM +\> ... but nobody types them denormalized Vietnamese might type them denormalized +The Windows Vietnamese code page is denormalized. And legacy keyboards produced that form. I don't know if they are still widely used or not. +Mihai ⦅U⦆ Niță +10:38 AM +foo=|=0.00| +Mihai ⦅U⦆ Niță +10:41 AM +"options" : { "maxFractionalDigits": 1.00 } \=\> this parses as a 1 (number) +You +10:41 AM +we just disallowed that earlier in the call +Mihai ⦅U⦆ Niță +10:43 AM +I am not asking for treating quoted / not-quoted numbers differently \! +Mihai ⦅U⦆ Niță +10:47 AM +\> we just disallowed that earlier in the call What I'm saying is that the example I show is json And it is parsed as a number by the json parser. Which does not care about what we disallowed or not +You +10:49 AM +Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits. +^ is a note +Mihai ⦅U⦆ Niță +10:57 AM +\> French and German have combining marks (umlaut, cedilla, accent, etc.) Yes. But nobody types them in decomposed form. Vienamese does (some older keyboards) +MessageFormat Working Group teleconference diff --git a/meetings/2024/notes-2024-09-30.md b/meetings/2024/notes-2024-09-30.md new file mode 100644 index 000000000..38b5d1845 --- /dev/null +++ b/meetings/2024/notes-2024-09-30.md @@ -0,0 +1,216 @@ +# 30 September 2024 | MessageFormat Working Group Teleconference + +### Attendees + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro - Mozilla (EAO) +- Elango Cheran - Google (ECH) +- Mihai Niță - Google (MIH) +- Richard Gibson - OpenJSF (RGN) +- Tim Chevalier - Igalia (TIM) +- + +**Scribe:** EAO + +## Topic: Info Share + +### TPAC Fallout + +APP: Physically present for half the conference; remoted in for the latter due to a cold. + +EAO: I filed [this issue](https://github.com/w3c/webextensions/issues/698) after talking to webextension CG, which has FF, WK, Chrome support for adopting MF2 as soon as we adopt. Kind of discussed a year ago. Had an hour to present to them. Reception was very positive. Solves a real problem. Issue has more details about what’s involved, and what the state of play is… I think notes have been published if more interested. + +… otherwise had good conversations with interesting people. Github, tiktok, others. Tiktok is potentially interesting, more than any other in US/EU, they have development in Chinese. Probably dealing somehow with sourcing in Chinese and then getting translate. Maybe hacking at it? Interesting problem? Dunno, hope to find out more. Will share. + +EAO: Mention JS implementation is up to date with spec. Maybe missing a minor detail. NPM was down. Will update it. + +ECH: program for UTW is now available. At least a couple sessions. Slots available. [https://www.unicode.org/events/utw/2024/](https://www.unicode.org/events/utw/2024/) + +### LDML 46 tag, branch, publication status + +APP: Updated as of last week. + +## Topic: LDML46 and Beyond + +- Review by ICU-TC and CLDR-TC +- Final work + +APP: Obviously we’re not finishing tech preview quite yet. Mark has mooted finishing our work this calendar year, and proposed a 46.1 release for MF 2.0 (e.g. 20 Nov). Both ICU & CLDR committees have expressed interest in reviewing the spec. Somewhat worried about receiving comments after finishing the work, rather than before. Approval for a 46.1 release is not certain, though. + +EAO: Reminds me of TG5 work. Ought to connect or addison, you, with the guy organizing the user study. + +ECH: there was a meeting on wednesday. Did they talk survey? + +EAO: I was there, yes, discussed survey and next steps. Gathering questions of content. Mentioned what APP proposed. Left on me to chase up. ECH, shall I include you? + +ECH: Yes, that sounds good. + +## Topic: PR Review + +*Timeboxed review of items ready for merge.* + +| PR | Description | Recommendation | +| ----- | ----- | ----- | +| 859 | \[DESIGN\] Number selection design refinements | Merge (Proposed) | +| 846 | Add Unicode Registry definition | Discuss (634) | +| 842 | Match numbers numerically | Discuss (Reject) | +| 823 | Define function composition for :number and :integer values | Discuss | +| 814 | Define function composition for date/time values | Discuss | +| 806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | +| 799 | Unify input and local declarations in model | Discuss | +| 798 | Define function composition for :string values | Discuss | +| 728 | Add "resolved values" section to formatting | Blocked by 806 and 798 | +| 646 | Update spec as if PR 645 were accepted | Discuss | +| 584 | Add new terms to glossary | Discuss | + +859 + +APP: Action on me to write some prose describing how this should happen. + +842 + +APP: Leaving open while 859 is in flight. + +### Number Selection + + + +### Resolved Value Implementation + +From [2024-09-10 call](https://github.com/unicode-org/message-format-wg/blob/main/meetings/2024/notes-2024-09-10.md): quote: + +> CONSENSUS: +> +> * A function MUST define its resolved value. The resolved value MAY be different from the value of the operand of the > function. It MAY be an implementation specific type. It is not required to be the same type as the operand. +> +> * A function MUST define its resolved options. The resolved options MAY be different from the options of the function. + +APP: Any concerns or objections? Is this still our consensus? + +…: \[tumbleweed\] + +ECH: Do we define “resolved value” in the spec? + +EAO: It would be added by PR 728. + +EAO: We should have a better place in the spec for providing these instructions to function authors. + +APP: Maybe in the syntax’s function definition? + +EAO: Would be more appropriately under “resolved value” in formatting, if we introduce that. + +EAO: With this consensus, could we look again at 728 today, or later? + +MIH: Add this for next week’s agenda? + +APP: A solid read-through makes sense before considering it. + +EAO: I’ll update 728 to include the above consensus for review during this week & approval next week. + +#### 823 + +… + +MIH: We should not include currencies and units in :number formatting. + +APP: Functions should say what they use, what they consume, what they emit. + +MIH: Also add options. Are we being too specific? + +EAO: With the proposed :string, :number, and :integer we’re covering this whole spectrum, as :string eats everything, :number passes everything through, and :integer filters out a few specific named options. + +MIH: We should be lax with the restrictions we impose. + +APP: A function should be specific about its side effects. + +MIH: Worried about nailing this down for :number and :integer. + +EAO: \[reads changes from PR\] + +… + +APP: Will review the PR again. + +## Topic: Issue review + +[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) + +Currently we have 49 open (was 50 last time). + +* 3 are (late for) LDML46 +* 15 are for 46.1 +* 14 are `Preview-Feedback` +* 4 are `resolve-candidate` and proposed for close. +* 4 are `Agenda+` and proposed for discussion. +* None are ballots + +| Issue | Description | Recommendation | +| ----- | ----- | ----- | +| 865 | TC39-TG2 would like to see completion of the TG5 study | Discuss, Agenda+ | +| 847 | [Conformance with UAX 31 & UTS 55](https://github.com/unicode-org/message-format-wg/issues/847) | Discuss, Agenda+ | +| 650 | Extra spaces in markup | Discuss, Agenda+ | +| 895 | The standard as is right now is unfriendly / unusual for tech stacks that are "native utf-16" | Discuss, Agenda+ | +| 837, 721, 650, 635 | (resolve candidates) | Close | + +### 847 + +EAO: We should have Someone™ check if we’re now conformant. + +APP: After discussion with Robin Berjon, we may be conformant now. I’ll do a check-through. + +### 650 + +APP: Are you satisfied with the resolution, after our prior discussions? + +MIH: It’s just an eyesore, if you ask me. HTML does not allow spaces before the tag identifier. The / is not a sigil like the others. It logically attaches to the {}, not the identifier. + +EAO: For me, the analogy with HTML/XML breaks because we introduced options on closing markup, \`{/foo opt=bar}\`. + +EAO: At the moment, the syntax uses sigils \`$ : / @\` as prefixes to the subsequent part of code, and allows whitespace (including newlines) quite liberally. Breaking this balance seems unnecessary. + +… + +MIH: Ok, let’s close it. + +APP: We could ballot this. + +… + +MIH: I’m fine to let it be. + +TIM: No issues implementing spec as is, no strong opinions on usability. + +RGN: Does not look like a significant benefit or hindrance for usability. + +## Topic: Design Status Review + +| Doc | Description | Status | +| ----- | ----- | ----- | +| bidi-usability | Manage bidi isolation | Accepted | +| dataflow-composability | Data Flow for Composable Functions | Proposed | +| function-composition-part-1 | Function Composition | Proposed | +| maintaining-registry | Maintaining the function registry | Proposed, Discuss | +| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | +| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | +| beauty-contest | Choose between syntax options | Obsolete | +| selection-matching-options | Selection Matching Options (ballot) | Obsolete | +| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | +| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | +| formatted-parts | Define how format-to-parts works | Rejected | +| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | +| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | +| code-mode-introducer | Choose the pattern for complex messages | Accepted | +| data-driven-tests | Capture the planned approach for the test suite | Accepted | +| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | +| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | +| error-handling | Decide whether and what implementations do after a runtime error | Accepted | +| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | +| expression-attributes | Define how attributes may be attached to expressions | Accepted | +| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | +| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | +| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | +| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | +| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | + +## Topic: AOB? + diff --git a/meetings/2024/notes-2024-10-07.md b/meetings/2024/notes-2024-10-07.md new file mode 100644 index 000000000..869e3c57c --- /dev/null +++ b/meetings/2024/notes-2024-10-07.md @@ -0,0 +1,396 @@ +# 7 October 2024 | MessageFormat Working Group Teleconference + + +### Attendees + +- Addison Phillips \- Unicode (APP) \- chair +- Eemeli Aro \- Mozilla (EAO) +- Mihai Niță \- Google (MIH) +- Tim Chevalier \- Igalia (TIM) +- Elango Cheran \- Google (ECH) +- Richard Gibson \- OpenJSF (RGN) +- Matt Radbourne \- Bloomberg (MRR) + +### Previous Attendees + +- Addison Phillips \- Unicode (APP) \- chair +- Eemeli Aro \- Mozilla (EAO) +- Elango Cheran \- Google (ECH) +- Mihai Niță \- Google (MIH) +- Richard Gibson \- OpenJSF (RGN) +- Tim Chevalier \- Igalia (TIM) +- + + + +**Scribe:** TIM + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## Topic: Info Share + +(discussion about EAO's upcoming talk about locale identifiers) + +## Topic: Schedule for Release + +*The CLDR-TC, ICU-TC and MFWG discussed a schedule for completing the 2.0 release. We propose to complete a dot-release of CLDR called 46.1 with balloting complete on 25 November. Stable (Draft) API in v47. The terminology here needs to be discussed to be clear.* + +*This means that we have just six weeks following this one to complete our work.* + +APP: EAO and I met with Mark Davis, Annemarie Apple, and a few others, about the possibilities for/schedules for doing an official release of MF2. To summarize, we would like to shoot for doing our release in this calendar year as an LDML 46.1, and then a stable draft release – draft is a specific status in ICU – in version 77 of ICU, which would be March 2025\. This means we need to be done with our work for 46.1, not 47\. A date that was suggested would be balloting complete on the spec by the 25th of November. Not counting this meeting, that leaves six more of these calls before we’d need to be done. I want to throw that out as a proposal and see if we are willing to commit to trying to make these dates. + +EAO: We would aim to be done with the spec by mid-November and we would declare our job done and have the spec be in a state where we can and will and should pass it on to the ICU TC, the CLDR TC, and probably the W3C TAG and TC39 TG2 to review and comment on and validate that this is suitable for the stated purposes, so that we can include it in next spring’s release? + +APP: We would want to be done in our own minds. One of my side goals is to indoctrinate CLDR and ICU TC so they would rubber stamp our work rather than spending a lot of time commenting. The other reviews would be external in the Technical Preview time frame. They would be post-us-saying-we’re-done. We would respond to feedback, but would be in a position of saying this isn’t going to change. + +EAO: On behalf of Unicode, there would not be a block for W3C TAG or TC39 TG2 to review and accept MF2 as a spec, but any input we would get could and should be taken into account, either in the 2.0 release or in future work that we do on the spec? + +APP: We would have an opportunity, because the draft version wouldn’t be until 47 / 77\. We would not persist in having weekly meetings working to resolve things. + +TIM: Do we have a list of what really needs to be resolved before mid-November. I’m wondering if we know what absolutely needs to be done. + +APP: I’ve updated `Things that Need Doing`. It’s relatively short. There are 47 issues. There are some housekeeping issues beyond the main important issues. That’s assuming we get through main issues like function composition + +ECH: Are we close to done? I guess so. Maybe it’s not a question of being close to done so much as: is what we have good enough? Is it a good place to put a stake in the sand and say “here’s a release”? + +EAO: I’m relatively confident that we are nearly done in the work needed for 2.0. At least from my point of view, a big change of us relaxing the stability policy to allow for later changes that we were previously not supporting makes it much easier to consider some issues in a post-2.0 world, rather than needing to get absolutely everything nailed down and fully agreed on before 2.0. The biggest things we need to figure out – there’s the u-options stuff, some questions around that, and then there’s the composition of `:date` and `:time` values specifically, and the point that Shane raised about wanting to get semantic skeleton considerations into the date/time stuff. One way to resolve that would be to leave it not required but optional, the `:datetime` field formatting options. If we resolve these things to some resolution, then I think we should have this thing sorted. Assuming we agree to the “easy” parts of resolved values and function composition. + +APP: I’d add the concept of standard or required and optional functions and options. I think that’s going to be an interesting thing we need to go through. We’ll have to invest some thought to make that concrete. So do we shoot for finishing balloting in the meeting on the 25th? + +EAO: Or sooner + +APP: If we’re finishing it there, then we have to be done sooner + +## Topic: `resolve-candidate` + +*The following issues are proposed for resolve:* +837 + +APP: Closed two resolve candidates this morning because they related to the reserved syntax we removed from the ABNF. The other one I have marked as resolved-feedback is feedback from Luca Casonato about “dot cannot be escaped”. This is also a problem because of reserved-statement, but we removed reserved-statement and so I think we can also close this one. Any objection? \[no objections\] That one’s closed. + +## Topic: UTF-16 unpaired surrogate handling (895) + +*Timeboxed discussion of how to handle unpaired surrogates.* + +APP: During the run-up to 46, Tim and Mihai ran into a potential infelicity because `content-char` does not allow unpaired surrogates, but string types in ICU4C/ICU4J do allow it, and their code was checking for unpaired surrogates in text. Seems like substantial overhead. They are asking whether we should change at least the `content-char` in text to allow for unpaired surrogate values in there. I counter-suggested that we add a note permitting implementations to not check for these, even though when we talk about the grammar of a message, we don’t permit it. That’s maybe to help some tools; I can’t think of a case where an unpaired surrogate is any kind of valid data that people would want to have in a message. I think it’s an error. Mihai or Tim, do you want to comment? + +MIH: I agree with you that there’s no good use case and it should be an error. The thing is, it does happen. The existing APIs that I know of don’t care, they just pass them through. A lot of string functions in those platforms consider strings to be a bunch of code units, not code points. I’ve seen cases with translated messages that had unpaired surrogates by accident and I don’t think you want to bring down a whole application because of something like that. On the other hand, I’ve seen people abusing unpaired surrogates by putting special markers in the strings. I don’t think these are good use cases, but people do that, and if you want to move between versions of MF2, you’d expect stuff like that to not explode in your face. We should have linters, but reality is what it is. + +ECH: Isn’t this a discussion we had a couple years ago? This is where it initially got introduced. I found RCH’s PR, 290, that introduced the change. I know that we talked about this stuff. + +APP: We did. There’s a couple of things here. There’s a practical consideration: do we need to require UTF-16-based implementations to write a bunch of code to check for this. I think my reaction there is that we probably don’t, for text. But disallowing them in names and other things is responsible. I don’t think those things work reliably. I think it probably makes more sense to keep the restriction in some places and allow for implementations to go “this bag of code units, I’m not going to check it”. If you think about a bunch of other places, like encoding, the unpaired surrogate’s going to be a replacement character. I hear you, Mihai, about people abusing code points for bad things, but Unicode has a bazillion private-use and other special things that you can use for that stuff. + +EAO: My preference order on solving this is first, to keep the restrictions we currently have; second, to allow for unpaired surrogates in `content-char` but only there; and beyond that, have this suggested text where implementations are free to vary on this. That sets up a bad situation, where switching between implementations breaks someone’s code. This is GIGO and I’m fine with that for content. I’d prefer us to not allow it, but we should do one or the other. + +APP: I will briefly note that `content-char` serves as the basis for `quoted-char` and `text-char`, so – + +EAO: We would need to change the inheritance between the chars to make this apply only to text content and nothing else. Not literal content either, probably. + +APP: I think what you’re suggesting is that `text-char` would allow surrogates + +EAO: That’s probably what I meant to say, yes + +MIH: Would we be okay to say something like “unpaired surrogates are converted by MF2 to the replacement character”? I’m not going to explode in your face, but if we see this, that’s what we’re going to do; it’s in the spec, it’s not optional. + +APP: We would be a USV(?) string, then. You’d have to check for unpaired. + +MIH: It’s in the spec right now; we check for the characters to be in those ranges. It’s not about it being difficult to implement. Accounting for reality, not what we would like necessarily. + +APP: A few proposals. One to permit them in `text-char`. One to allow them to be replaced with the replacement char. A third is not to do anything. Do we want to make a choice here? + +EAO: I’m interested to hear what RCH thinks, given the preceding iteration of this discussion had participation from him + +RCH: Mostly I wanted it nailed down. As long as it’s clear and the ability to output strings that are not expressible in a transformation format remains, then it’s fine. Nailing down names is acceptable to me, I don’t know why someone would want the names to be non-conforming, and they don’t affect the output anyway. + +EAO: If we are to not error on unpaired surrogates in text, my preference is to just pass them through as they are. Needing to treat them as a special escaped or replacement thing would add complexity that ought to be unnecessary. + +RCH: I agree + +APP: Would my suggestion work better, which is to say our syntax is rigorous but we allow implementations to ignore it for text? + +EAO: No, that’s worse, because we end up with inconsistent implementations and that’s going to be bad. It’s sounding like the least bad option is to allow for unpaired surrogates in text and pass them through as they are. + +APP: For all implementations? If we have a UTF-8 implementation, it won’t work. + +EAO: Isn’t that handled before the content gets to the MF2 parser? + +MIH: Yes, it’s lost before. + +RCH: There are implementations where it wouldn’t be possible to express the text content including an unpaired surrogate + +APP: We don’t want to require them to support it. + +MIH: The surrogates are lost already before that, so… + +APP: We should have very careful wording about the handling of unpaired surrogates. Who would like to write the PR? + +MIH: I can do that. I raised the issue and asked for it, kind of. + +EAO: `text-char` and only `text-char`. `text-char` currently inherits from `content-char`; it might be easier to define them separately. + +APP: No, you just OR on the unpaired range. + +EAO: Let’s see what MIH comes up with and go from there + +## Resolved Value Implementation (728) + +APP: This has spawned several additional bits of work, which we should not consider here. This is the main thing to make “resolved value” a formal term and define it in the way we’ve been discussing, which is to say the value from a function that also includes options and annotation. I have said okay, Tim has said okay, everyone else is sitting on the sidelines. Is this ready to go in? Anyone object to it going in? All right, we’re resolving resolved value. + +## Topic: PR Review + +*Timeboxed review of items ready for merge.* + +| PR | Description | Recommendation | +| ----- | ----- | ----- | +| 859 | \[DESIGN\] Number selection design refinements | Discuss | +| 846 | Add Unicode Registry definition | Discuss (634) | +| 842 | Match numbers numerically | Discuss (Reject) | +| 823 | Define function composition for :number and :integer values | Discuss | +| 814 | Define function composition for date/time values | Discuss | +| 806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | +| 799 | Unify input and local declarations in model | Discuss (for 14 Oct) | +| 798 | Define function composition for :string values | Discuss | +| 728 | Add "resolved values" section to formatting | Discuss (Merge, Revise summary) | +| 646 | Update spec as if PR 645 were accepted | Discuss | +| 584 | Add new terms to glossary | Discuss | + +### #799 (data model) + +APP: Hasn’t received a lot of love lately. + +EAO: I just refreshed this so it doesn’t have any merge conflicts and it’s easier to see the diff. The last comment there is from me replying to a bunch of stuff from Mihai, Elango and Stas about their concerns with respect to this. I think that was in July or something, and it hasn’t advanced from there. I would be very happy to actively ask Mihai and Elango to look at this and discuss it more on that thread during this week. + +MIH: Just one question to clarify. The last comment there is from July 28\. What changed since then? + +EAO: There’s a merge from main to that branch, accounting for changes done in the interim. + +MIH: The argument we all tried to make is: what’s the point of doing this? The debate is that there’s no good reason to do this. + +EAO: My request here is for you to review my last comment there and reply to it in the thread, and for us to discuss this next week. + +APP: So if I’m hearing correctly, there may be a disagreement about whether to do this and we’re going to have a technical discussion next week about it. + +APP: I think all of the other PRs have to do with resolved value or function composition, which is resolved value. I think an ask for the various authors is to go through and ensure those are consistent. Tim, I don’t know if 646 is germane anymore. I’ll close it. The other one is Simon Clark had some terms he wanted to add to the glossary. I think there are open comments against it. He’s not here to defend himself, so I will ping him. + +EAO: I was going to note that I have gone through the number and integer function composition and the date/time/datetime composition PRs, and in order to align them with the text that now landed, the only ones are the ones I proposed today, linkifying the “resolved value” term. Otherwise, these correspond with how we currently define resolved values. + +APP: If you’re interested, go through, everyone, and check to see if these are merge-ready. Then I will work on the number selection design piece. + +### Number Selection (#842, #859) + + APP: The outstanding thing we have left is non-integer number selection. And/or any changes to integer number selection. The thing that’s missing there is I have a proposal… + +### bidi changes + +APP: Has anyone worked on tests for those? + +EAO: I do not have automated tests that validate it. + +MIH: I didn’t have time to do anything, due to CLDR/ICU release cycle. + +MRR: I can write some tests for that + +### Function composition for number and integer + +EAO: As we’ve discussed number and integer function composition for a bit, the text there should align with our current understanding of what a resolved value is, would it be possible to consider that for merging today? + +APP: I have some wording things. Maybe that could be considered separately. Do others have a feeling? Any objections? We’ll be back here soon if we change the number selection. \[No objections\] Merging + +EAO: So the next thing is the date and time and datetime composition thing. There, I think the biggest question is whether – what do we do when you have a `:time` value and you feed it to a `:datetime`, and what’s supposed to happen there? The argument I’m proposing in the current PR is to consider it an error. From a `:time` you get a “time-like thing” and the input requirements for a `:datetime` must be a “datetime-like thing”. + +APP: I think that’s too stringent, because – there’s classical timekeeping of the milliseconds since epoch/calendar variety, and there’s Temporal-type time types, and a subset of the Temporal-like time types are restricted in that way. But most of the classical ones have ?? for this kind of thing. I think there’s a tripping hazard where if I knowingly pass a `Java.util.date` in my arguments array, and the first time I touch it I annotate it with `:time`, I’m still thinking it’s a `Java.util.date` so I can touch it a second time with a `:datetime`. I can support the idea that a `:time` may throw an error, because it might only be a time. I’m reticent to break classical timekeeping. + +EAO: As whatever `:time` can do is a strict subset of what you can do with a `:datetime`, in order to get the effect of what you're looking for, you could and probably should use a `:datetime` on the input. Even if we allow for an error to occur, it means that a reader of a message who doesn’t know how the value from the outside is coming in – it becomes quite dangerous to presume that you could use a `:time` thing in the resolved value of a `:time` annotated expression and then do `:date` operations on it. Where what you ought to be using is `:datetime`. + +MIH: I have two arguments. One is – I agree with EAO that that would feel like the correct behavior. On the other hand, there are PLs that don’t even have any special types for date and time, like C. There is in libraries and whatnot, but the language doesn’t have anything in the standard libraries. The other argument is that one can imagine something like `time`… imagine something that takes a time and gives you back a datetime by gluing today’s date to it. I don’t know if that’s the current time function. Similarly for the other way around. I can imagine a function that takes a `time` and gives you back a `datetime`. If you say that’s not the current function `time`, then you’re probably right. I would tend to be tolerant the way APP described. + +APP: I would be okay with saying that a `:time` annotated value or a `:date` annotated value may throw a Bad Operand error, or with other function types, because it’s using an implementation-defined type that isn’t supported. For example, a zoned time would throw a Bad Operand error if you tried to `:date` it. That’s an explainable thing and there’s a developer on the end of that stick who would understand why it happens, so the usage pattern is clear. There’s a bunch of operations that we’re kind of ignoring. Coercing time zone on and off values to float and unfloat the value, other things people commonly want to do with time values – MF2 should have a clear story. I built a whole bunch of things for that in past lives that are effective and that I can explain to developers. What I’m afraid of is that there’s a lot of developers in the world and they’re going to be passing in values and are not thinking of annotations as having an effect on the value. We want to make it simple for them to do the right things and possible for them to do the hard things, and that’s why I tend to be reticent about making a hard limit on that when it may just be an expression thing. + +EAO: Sounds like there could be a consensus position here where a `:datetime` is always fine with an operand that is coming from a `:datetime`; a `:time` is always fine with an operand coming from `:datetime` or `:time`; and a `:date` is always fine with an operand coming from `:datetime` or `:date`. And if you otherwise combine these resolved values with such annotations, the behavior is implementation-defined and that behavior may be to complain about a bad operand. Does this match what you are proposing? + +MIH: I think that would be a good way to put it in the spec. On the other side, I think I would leave this kind of stuff to a linter. In the early days of MF2, we tried not to be opinionated about things that aren’t really i18n. PLs are catching up; JS has a Temporal proposal, Java added something… it’s a stretch for us to be opinionated. Leave this to a linter, enforce what EAO described, but not in the spec. + +APP: I don’t know that I agree with linting. EAO’s proposal makes sense because it’s an enumerable thing to say that some implementation-defined types may cause Bad Operand. Suppose I have a local time value to use a specific type. Does `:datetime` format it or is that a bad operand? + +EAO: That’s an implementation-defined behavior. + +APP: In your implementation, how would you handle it? + +EAO: That would depend on what `Intl.DateTimeFormat` does with whatever value you end up giving it. Given that `Intl.DateTimeFormat` does not currently support such a value, it might depend on exactly what options are declared there. + +APP: And I know that that’s how Java works. DateTimeFormat works fine on that unless you ask for a year. + +EAO: Just to clarify, we are talking here about the behavior when combining resolved values rather than formatted values. …That’s behavior we can entirely control in the spec. I want to modify the PR to match what I presented earlier and there’s certainly space there for linters around it. We should be recommending against messages that feed in a `:time` to a `:date` or a `:date` to a `:datetime`. Fundamentally, because the words we’re using imply to a reader that they’re not quite sure what might happen. Even if we leave it as an implementation-defined behavior, we should recommend against it, given that with `:datetime` we can make it happen in a way that’s clear to the reader. + +MIH: If the proposal is changed in the way EAO is described, I won’t oppose, but I think it’s overreaching. We should be opinionated about i18n, but this isn’t i18n, it’s bad programming practice. Not my business to handle that. + +APP: I understand about “we’re not going to actually call the function” but I think there’s still room to say “implementation-defined types”. We do say that the resolved value is an implementation-defined type, and that’s generally narrower than the ones that it accepts. Potentially an implementation could say “here’s the list of types I will emit as a resolved value” and if you mix and match, it could result in a bad operand. + +EAO: I would like to push back at MIH, I think it’s relevant to translation and l10n. If we have a message with an input that has a `:date`, the resolved value of this input is then used as an operand for a `:datetime`, a translator looking at this can either reasonably presume that the value being formatted is the full original date/time passed in, or it could also be the date with a 00 time on it for the beginning of the day, because it was passed through a `:date` and therefore it’s lost the time. If we allow for this, and particularly if linters don’t complain, we’ll end up with messages that are valid but confusing. This confusion is what I’m seeking most to avoid here. + +APP: One observation: the option bag conversation will become interesting here, because that’s one of the other things that composes, and as you mentioned earlier, Shane wants us to lean towards the nascent semantic skeleton thing, and maybe make some of these option bags optional. We want to carefully consider what the options are. That might have an influence there. You’re right that it’s possible to write a message that would effectively filter information out of a date and time value. That is potentially antithetical to our idea of immutability. Translators will generally see placeholders that say what they want to do. They’re not thinking about whether the numbers are going to be 0 or not, they’re thinking about what values are going to appear here. + +``` +.input {$date :datetime} +.local $t = {$date :time} +.local $d = {$t :date} +{{What does {$d} at {$t} say?}} +``` + +EAO: I’d be happy for us to move on to that discussion and specifically a proposal I’d like to make on the topic, which is that I think we should make for the initial release of the default functions the field options of `:datetime` optional rather than required. So that implementations can implement those, but they are not required to do so. + +MIH: So you mean the whole option bag that we have now would be optional? + +EAO: Not the whole option bag, the field options. So that excludes some of the options – do we call them locale options? – and the timestyle and datestyle options, which I do think should be required. + +MIH: I’m very reluctant to do that. One of the big requirements from Mark Davis, and I agree with it, is to have a way to migrate existing messages to MF2. Existing messages do have equivalent things to what we have here with option bags. MF1 has option bags and the JS formatter has something like this. Even if semantic skeletons land sooner or later, this is kind of well-established stuff that I think would be good to support. People do that today; they use it with existing native APIs. + +APP: Let me present Shane’s argument. The best practice at some near-term future moment would be to use skeletons and in particular, the semantic skeletons that aren’t programmable with the weird pattern language ICU has. If that were the best practice, then you want it to be standard and built-in. Any of the existing implementations should be able to handle that because they are going to feed it through the datetime pattern generator behind the scenes. They would have a way to generate that option bag or generate the pattern through local functionality. This would push people toward good things, so therefore it should be standard. There would be these optional options, where we would say how they’re implemented and what the valid values are, and our definition of optional is that you’re not required to implement them, but if you do, do it like this. I could see implementing this as optional and I can see ICU as having it. People have programmed wacky patterns in the past. We don’t currently have picture strings at all. We should address those requirements in the right way, and it might be through optional options. Or if we require it, then everyone has to write that code. + +EAO: I was just going to mention that Mihai, I think the requirement for migrating from MF1 content into MF2 is already going to require some set of extensions to the default functions. Skeletons come to mind, picture strings is another, which is entirely valid for MF1. Also the spellout and other functions for number, and the plural offset, which we also do not have. All of these things are required for having MF1-to-MF2 transformability. So us making these options as optional rather than required is not going to increase the burden for any such migration. In particular, as none of these options are directly supported by MF1. + +MIH: I’m very split. Picture strings are bad i18n, we rejected them from very early on, and that’s part of the area where we’re entitled to be opinionated. We know it’s bad i18n. This is not about bad i18n, it’s something that – soon it’s going to be best practices, but what’s the definition of soon? Soon can be five years or more. Stuff like this – I don’t know. You mentioned skeletons. Yes, but the skeletons can be mapped 1:1 to the existing option bags. It’s just syntactic sugar. So for MF1, skeletons are supported. I can do the same thing you used to do then today. + +EAO: I’m pretty sure for the majority of cases, that is true, but on the edges, there is functionality in semantic skeleta that’s supported in date/time formatting that is not supported in JS at all. I’ve written a parser for those formats so I could build exactly those option bags, and needing to leave some of the values on the edges, unsupported. + +APP:`Intl.DateTimeFormat` is a subset of the functionality present in ICU. So – ICU is more capable of representing a bunch of things, so I’d be unsurprised by that assertion. Two interesting things: one, one of Shane’s things is that the semantic skeleta are limited in what you can represent. They don’t let you do some things that the current skeleton lets you do, like year-month-hour. You can’t say that in a semantic skeleton. That’s maybe an interesting thing. Mihai might be interested to note that when you do the resolved value thing, will ICU skeleton result in resolved options that look like year/month/hour/minute field option bags/ Or will it look like ICU skeleton as the option? + +MIH: Everything looks like option bags. They get converted to an ICU skeleton in order to do the formatting, only when you do format-to-string things. So the resolved value would contain option bags. + +EAO: I would also like to note that the thing I’m asking for is specifically and only downgrading these field options from required to optional in the initial release. Doing so and still defining them and saying which values they’re supposed to take in makes it possible for us to later change our minds and make them required. The intent with this change would be to give a little time for the work on semantic skeleta to proceed and see if it is on a track to becoming a widely adopted standard. Allowing near-future implementations to not need to implement also the field options if they go the other way out. This is a concern for the ICU4X implementation. + +MIH: I don’t know. We’ve been pushing skeletons for many years and people are starting to adopt them. I would be reluctant to push something out and have people say “you can’t even do date and time now.” If I look at the spec and say “I can’t even do this basic stuff I’ve been doing for ten years”, it feels like a bummer. So I think the semantic skeletons are going the right direction, but the thing is, we have existing things in current languages/frameworks that do it a certain way, not just in ICU, in ECMAScript, with Java.time. So you want as little friction as you can. It’s my problem if I want “December at 5 PM”, it’s not an i18n problem. + +EAO: I don’t think people are going to make decisions at that sort of level are going to be looking at the spec. They’ll be looking at the implementation that they’re going to be using. For the JS implementation, I’m still going to opt into all of the field options if we make them optional. I’m in a position where I can do that and trust that the situation is going to resolve one way or the other before the `Intl.MessageFormat` part of the language is locked down. I kind of trust and believe that the ICU impl might choose to opt into these options. The ICU impl might include an `icu:skeleton` option directly. These are going to be the interfaces that people need to look at to choose what they’re doing. Rather than us saying in the spec for `:datetime` that these specific options are optional. + +MIH: I would say that a big selling point of MF2 is being cross-platform. I can write a bunch of messages and use them in GMail Android, web, and iOS. That’s a big selling point. Having extensions is one thing, another one is icu: options, it’s not portable anymore. You say you’re in a position to do that as optional, I don’t think you are. You might be able to put it in Firefox but not in Chrome. We can’t even guarantee we have a JS implementation that is consistent everywhere. If we have some kind of “draft” namespace that’s the same everywhere, that would help, but I don’t think it’s a good idea. + +APP: I think maybe there’s a gap in the phrasing that we’re using. EAO and I have been discussing that in refactoring the function registry, I think we discussed it in previous calls, instead of having a built-in registry and proto-registry, that we have `:number`, which has required things and optional things. Optional options are part of the `:number` spec and if you are an implementor, you are not required to implement them. If you do, then you have to implement them like that. Different than the optional registry. What we’re saying is that every implementation absolutely has to have this set of options, `datestyle` and `timestyle`, and you may have these other ones, and because they’re standardized, toolchains would know what those things meant. They would be built in, but not every implementation would accept those options. The current thing that we have is a brief window in which we could leave out some set of options and therefore not have a whole bunch of options that are ?? deprecated, sort of the way some of the early date stuff in Java is. It’s been deprecated for 30 years and it would be good not to reinvent at-deprecating some of these things if we can. If we think we have to have the option bags, so be it, but then everyone will have to implement it. + +EAO: Just thought I’d clarify that when I say “JS implementation” I mean the npm-installable library that is an OpenJS Foundation project, that is in part a polyfill for the JS spec for `Intl.MessageFormat`. So the spec for `Intl.MessageFormat` will need some definition of what it supports. That’s currently at stage 1 and it will take some time to advance through standardization. Separately, the package on npm, which is entirely controlled by me, I can make it accept all of the current options of the formatters. The key is that later on, I can do a major version update to that library where I drop features and switch to a different sort of option bag if semantic skeletons advance sufficiently that they become available on `Intl.DateTimeFormat` in JS, and it starts to make sense for the `Intl.MessageFormat` implementation to only support semantic skeletons and not these field options. This is what I mean by me being able to control what I do in my implementation, and the spec later when it finalizes may say something else. + +MIH: Then I want to ask a question. You said these are optional the same way we have certain options on the number and integer formatters. If that’s the case, then this is not in the same bucket with skeletons in ICU, because that’s in a namespace that’s implementation-specific. I’m not sure what we’re proposing yet. Leave them out completely, or say “you can implement this in a namespace”? + +EAO: No; the proposal specifically is that we leave them as they are with the names they currently have, which are namespaced, and say “you may implement these options on `:datetime`”. + +MIH: Then we can never take them away + +APP: That’s right + +EAO: We can never take them away from the spec, but an implementation would not need to support them + +MIH: Meaning they’re not portable + +EAO: At the moment they’re not portable, correct + +APP: We’ve been talking about this a while. I think we’ve talked about the abstract aspects of it and I think we should work on a concrete proposal or maybe even a design doc that says “here are the options”. As we’ve got six weeks to agree. We should have a clearer understanding – bringing this up is good because we should have some level of policy here. We should be parsimonious about what we put in, because everything we put in is required forever. At the same time, we should put in everything that we think is necessary for meaningful adoption. + +EAO: For an example of an optional formatter that I think we should define, maybe add on later, is `:list`. List formatting is something that is actively supported in multiple places; we have a decent idea of what it looks like, and we should allow for it to be supported. At the same time, I don’t think we’re in a position where we want to require all implementations to support it. + +MIH: I agree with you and I think I even have a list as a proof of concept in one of the unit tests, just to make sure that my implementation can support stuff like that. Certain things will be under the icu namespace, like durations. But list is not in MF1, so not a strong requirement from ICU to say “you have to support that in MF2”. The whole idea of dropping these option bags, I think I would like to take this up with the ICU TC to ask them how they feel about it. In the end, I have to land that thing in ICU itself. + +APP: Let’s see how much we can resolve within the WG in a week. It may be a no-op. + +EAO: Two things. `:duration` like `:list` is another one I’d be happy for us to define as an optional formatter. And then say, if you’re going to do it, do it this way. But we can return to this later as we can expand and work on the core set of functions. Another point is that the intent with what I proposed here is not to drop the field options, but to make them optional, so the question to ICU TC would be whether to support field options or not, as they are spec’d but as optional. + +MIH: I really don’t like the idea of making them optional without a namespace. I see there that I can use it in ICU, I will assume it’s standard and portable and I can use it. People don’t use the spec, they’ll be in their editor and copy/paste examples, they’ll see it works on three platforms but the fourth one doesn’t. I’d feel better with the namespace. `icu:` is a big warning that it’s not portable. When it becomes final, you drop the `icu:`. They don’t read the spec and notice that this stuff they’ve copy/pasted that works everywhere else doesn’t work in one place. + +EAO: Are you also arguing against defining `:list` and `:duration` as formatters that would be optional? + +MIH: At this point, we don’t have time for it, so I’m opposing it based on – + +EAO: What you’re proposing about these options is also an argument that could be made about having optional-but-not-required formatters defined at all in the spec. + +APP: I think we have to define functions that some implementations are not required to implement. PHP will implement this, perl, awk… they don’t have a list formatter, so they’re not going to do that. Would you be happier, Mihai, if we used the `u:` namespace? + +MIH: Kind of; you say it will be deprecated, but it will never really be deprecated + +APP: If we specify them, they will always be there, but as you well know, there will be things we can say “but best practices say…” That’s documentation, not implementation. Implementations have to do what the spec says. With `list` as an example, if we specify list formatters, then we want people to do it like X. If we use the `u:` namespace, we can always remove that to make it required by every implementation. Which I assume we would version MF if we did that, because we’d be breaking a bunch of implementations. + +MIH: We version the registry, but not MF + +APP: We don’t have a registry anymore, but we version specs. There’s that, and things like some of these optional options which we might never promote. We would still say, if you write one, then it looks like this. + +MIH: One of the ideas with the machine-readable registry was that you can use it to implement a linter or tooling like IDEs, or integrate it with translation tools. So translators know not to scrub stuff… even `u:`, if I lint, all I can say is “warning: this is not portable.” + +APP: I’m going to timebox this. Somebody should take the action item to put the options together in a design doc. Adding a machine-readable registry description is a fine task for us to do in the preview period after 46.1, as something we consider adding on. Unless we think that suddenly becomes a requirement again, I don’t see us doing it now. Does what I’m suggesting sound like the right outcome? + +EAO: I’m here to say that if we’re going to define the `u:` namespace as stuff that might or might not work, we should consider whether the `u` letter is useful or if some other prefix would be better, if `x` is appropriate or otherwise. I think we should stick to the plan that Addison has been advancing, which would allow for optional things to be in the root namespace. It sounds like a conversation we’ll need to continue later. + +APP: Who wants the action to write a design doc? + +EAO: On what part of this? + +APP: The options – enumerating them to consider in technical arguments. + +EAO: I nominate Shane + +APP: He’s not here + +MIH: I will try to take some temperature readings in the ICU TC + +APP: Are you going to write the design doc? + +EAO: I think we really want Shane to do it; because he’s the one who originally wants this. + +EAO: Next actions on me are to update the date/time function composition as we agreed on here. Making the changes sooner will make the later discussion easier. Separately I’ll look at the string composition one. If we could get that to land next week, it would be really good. With an explicitly defined resolved value, we can do much better at defining what a fallback value is. + +EAO: We should send to the mailing list a note about this upcoming deadline + +APP: I will do that when we hang up + +## Topic: Issue review + +[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) + +Currently we have 48 open (was 50 last time). + +* 3 are (late for) LDML46 +* 15 are for 46.1 +* 15 are `Preview-Feedback` +* 1 is `resolve-candidate` and proposed for close. +* 2 are `Agenda+` and proposed for discussion. +* None are ballots + +| Issue | Description | Recommendation | +| ----- | ----- | ----- | +| 865 | TC39-TG2 would like to see completion of the TG5 study | Discuss, Agenda+ | +| 895 | The standard as is right now is unfriendly / unusual for tech stacks that are "native utf-16" | Discuss, Agenda+ | +| 837 | (resolve candidates) | Close | + +## Topic: Design Status Review + +| Doc | Description | Status | +| ----- | ----- | ----- | +| bidi-usability | Manage bidi isolation | Accepted | +| dataflow-composability | Data Flow for Composable Functions | Proposed | +| function-composition-part-1 | Function Composition | Proposed | +| maintaining-registry | Maintaining the function registry | Proposed, Discuss | +| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | +| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | +| beauty-contest | Choose between syntax options | Obsolete | +| selection-matching-options | Selection Matching Options (ballot) | Obsolete | +| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | +| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | +| formatted-parts | Define how format-to-parts works | Rejected | +| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | +| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | +| code-mode-introducer | Choose the pattern for complex messages | Accepted | +| data-driven-tests | Capture the planned approach for the test suite | Accepted | +| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | +| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | +| error-handling | Decide whether and what implementations do after a runtime error | Accepted | +| exact-match-selector-options | Choose the name for the “exact match” selector function (this is `:string`) | Accepted | +| expression-attributes | Define how attributes may be attached to expressions | Accepted | +| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | +| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | +| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | +| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | +| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | + +## Topic: AOB? + diff --git a/meetings/2024/notes-2024-10-14.md b/meetings/2024/notes-2024-10-14.md new file mode 100644 index 000000000..3ad42c97c --- /dev/null +++ b/meetings/2024/notes-2024-10-14.md @@ -0,0 +1,298 @@ +# 14 October 2024 | MessageFormat Working Group Teleconference + +### Attendees + +- Addison Phillips - Unicode (APP) -chair +- Eemeli Aro - Mozilla (EAO) +- Mihai Niță - Google (MIH) +- Tim Chevalier - Igalia (TIM) +- Richard Gibson - OpenJSF (RGN) +- Matt Radbourne - Bloomberg (MRR) +- Mark Davis - Google (MED) + + +**Scribe:** MIH + + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) + +To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. + +## Topic: Info Share + +(none) + +## Topic: Schedule for Release + +(none) + +## Topic: `resolve-candidate` + +*The following issues are proposed for resolve:* +797 +786 +752 +703 + +## ** Topic: Agenda+ Topics** + +### Bag of options vs. semantic skeletons + +### + +### Topic: Allow surrogates in content + +*The previous consensus was to allow unpaired surrogate code points in text but not in literal or other constructs. Mihai points out some issues with this.* + +MIH: My initial understanding was that we should allow this in localizable text, and literals are localizable text + +### Topic: Add alternative designs to the design doc on function composition + +*This topic should take only a minute. The discussion here is whether to merge PR 806, marking the design as “obsolete” or just close the PR.* + +### : Topic: 799/786 Possible simplification of the data model/unify input/local definitions + +***This was homework for this week.** The PR proposes to unify local and input declarations in the data model. We should accept or reject this proposal.* + +### Topic: 603 We should not require \* if the variant keys exhaust all possibilities + +*We should review this proposal and categorically accept or reject it for 46.1* + +## ** Topic: PR Review** + +*Timeboxed review of items ready for merge.* + +| PR | Description | Recommendation | +| ----- | ----- | ----- | +| 906 | Allow surrogates in content | Discuss, Agenda+ | +| 905 | Apply NFC normalization during :string key comparison | Merge | +| 904 | Add tests for changes due to 885 (name/literal equality) | Merge | +| 903 | Fix fallback value definition and use | Discuss | +| 902 | Add tests for changes due to bidi/whitespace | Merge | +| 901 | Clarify note about eager vs. lazy evaluation | Discuss | +| 859 | \[DESIGN\] Number selection design refinements | Discuss | +| 846 | Add u: options namespace | Discuss (634) | +| 842 | Match numbers numerically | Discuss (Reject) | +| 814 | Define function composition for date/time values | Discuss | +| 806 | DESIGN: Add alternative designs to the design doc on function composition | Merge as Obsolete, Agenda+ | +| 799 | Unify input and local declarations in model | Discuss (for 14 Oct) | +| 798 | Define function composition for :string values | Discuss | +| 584 | Add new terms to glossary | Discuss | + +## Topic: Issue review + +[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) + +Currently we have 46 open (was 48 last time). + +* 3 are (late for) LDML46 +* 15 are for 46.1 +* 11 are `Preview-Feedback` +* 4 are `resolve-candidate` and proposed for close. +* 3 are `Agenda+` and proposed for discussion. +* None are ballots + +| Issue | Description | Recommendation | +| ----- | ----- | ----- | +| | | | +| | | | +| | | | + +## ** Topic: Design Status Review** + +| Doc | Description | Status | +| ----- | ----- | ----- | +| bidi-usability | Manage bidi isolation | Accepted | +| dataflow-composability | Data Flow for Composable Functions | Proposed | +| function-composition-part-1 | Function Composition | Proposed | +| maintaining-registry | Maintaining the function registry | Proposed, Discuss | +| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | +| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | +| beauty-contest | Choose between syntax options | Obsolete | +| selection-matching-options | Selection Matching Options (ballot) | Obsolete | +| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | +| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | +| formatted-parts | Define how format-to-parts works | Rejected | +| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | +| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | +| code-mode-introducer | Choose the pattern for complex messages | Accepted | +| data-driven-tests | Capture the planned approach for the test suite | Accepted | +| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | +| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | +| error-handling | Decide whether and what implementations do after a runtime error | Accepted | +| exact-match-selector-options | Choose the name for the “exact match” selector function (this is `:string`) | Accepted | +| expression-attributes | Define how attributes may be attached to expressions | Accepted | +| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | +| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | +| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | +| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | +| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | + +## ** Topic: AOB?** + +EAO: I will probably not be available in the next two meetings + +### Make bag of options for `` `:date` `` and `` `:time` `` optional in wait for semantic skeletons + +MED: do we go out with nothing, or with an interim + +EAO: can we have some time with these non-required, and make them required later + +APP: we are talking about required options. Non required means you can still implement them. + +APP: we decided early on to go with a bag of options because they can go back and forth to string skeletons. They are equivalent. + +APP: what are we going to do with semantic skeletons they they come? + +APP: we can’t really ship only with date / time style. We can’t say we are complete without something more flexible. + +MED: I feel strongly that semantic skeletons are where we want to go. +The current skeletons / bag of options would be a migration path. +We can make them optional for now, and that gives us freedom to make them required, or keep them optional forever. + +APP: but we do them as a package. If you implement, we implement all. + +APP: anything else you are interested on in the agenda + +### 603 We should not require \* if the variant keys exhaust all possibilities + +MED: touching on the star, the issue of not requiring it means that things are not that robust. +Messages build without a star you get into problems. It is kind of ugly to mix `\*` and `other`, but it is more robust. + +EAO: the other case is the booleans. If you define true / false you will have nothing else ever. + +APP: you need to know how to “explode” the cases. + +MED: I think that we can back away from it if we require selectors to identify a default value. +So at least the default value should be there. +But has the downside that implementations need to know about all the selectors. + +MIH: you mentioned we discussed it. Thought we reached a decision. Mentioning booleans. Seems like they have only two values, but some languages, like java, can have a null there. Localization tools have to know the functions. No way for tools to know without machine readable registry for now. + +MED: eventually we need a machine readable registry. + +MIH: for a while we don’t have it. + +EAO: how an implementation communicates about custom functions is the language server work. +When we have a selector like `:boolean` if there is a `{$x :boolean}`, if `$x` is not provided then the selection fails. + +APP: probably best we can do. + +EAO: with `\*` the selection would use that. + +APP: in the end plural will be a pointer to CLDR +Other selectors will likely behave the same. +Machine readability needs to be able to include a “hey, look there” + +MED: a lot of tools will take the messages in a source language, expand, translated, then compact. +So in theory it can compact to `\* \* \*`. +The star makes the tooling much more reliable. + +APP: this is also a thing we can examine in the tech preview. We asked, we had no feedback. +This can be tightened in the future, if we need to. +We have a proposal on the table. + +EAO: we can’t loosen it in the future. + +APP: this is a data model. It is checked before we do function resolution. +Which makes it tricky. + +MED: requiring it is backward compatible. If we relax it in the future, the old messages are still valid. + +EAO: I wanted to note that it looks like the proposal is rejected. Maybe for future consideration. + +APP: any other topics you want to touch. + +### 797 Create a PR for function interaction + +Can I close this? Objections. + +### 786 Possible simplification of the data model + +APP: Find to resolve? + +### 752 Improve test coverage for built-in function options + +TIM: fin to close it? + +### 793 Recommend not escaping all the things + +TIM: no objections to close it + +### 905 Apply NFC normalization during :string key comparison 905 + + +APP: Closing, approved by MED, TIM, APP + +### 904 Add tests for changes due to 885 (name/literal equality) + +APP: EAO approved, I have some minor comments + +EAO: I left a comment. + +### 902 Tests for bidi and whitespace + +APP: EAO an me already approved. Comments? + +### 806 DESIGN: Add alternative designs to the design doc on function composition + +APP: we already did a lot of that work +Do we want to merge? +Some good work here. I can merge but mark it as obsolete. + +### 895 Allowing surrogates + +APP: there are areas that are localizable. +One of the examples was with text in a placeholder. +I tend to agree that the first pass through UTF-8 will break shoes characters. + +APP: the proposal as you make it means we can use one in a key. + +EAO: can I jump into this? +Bad tooling can make mistakes in the text. Bot in literals. + +APP: I tend to agree. If MF2 implementation would break in unpaired surrogates it might be a feature. + +MIH: I don’t see a difference between text and localizable literals. +If a tool is bad then it is bad in both. + +TIM: for implementation I didn’t know what the correct behavior is when we find invalid surrogates. + +APP: is the proposal to allow unpaired surrogates everywhere? + +MIH: no, only in localizable text + +EAO: is NFC well defined for unpaired surrogates? + +APP: yes + +RGN: I am 90% confident it normalizes to replacement character. + +APP: I checked, NFC normalizes as itself + +EAO: when you update this make sure to change all mentions of code units, to code points. + +EAO: will you include a warning to not use unpaired surrogates? + +MIH: yes + +### 814 Define function composition for date/time values + +EAO: can we merge that? + +APP: that is not permanent? Is it a solution for now? + +EAO: it allows us to change later. + +APP: I think we will be back here when we get to semantic skeletons + +MIH: we are introducing a strong type system, even when the underlying programming language does not do that. We basically say that ``:date`` returns a date kind of type, and it is an error to feed that into ``:time``, because it is a bad type. + +### 799, 786 Unify input and local declarations in data model / \[FEEDBACK\] Possible simplification of the data model + +MIH: Long discussion, unfortunately I was involved in it an didn’t manage to take notes. +But the final decision was to drop it + +APP: drop diff --git a/spec/README.md b/spec/README.md index a631901c6..c603282ca 100644 --- a/spec/README.md +++ b/spec/README.md @@ -17,6 +17,7 @@ 1. [Resolution Errors](errors.md#resolution-errors) 1. [Message Function Errors](errors.md#message-function-errors) 1. [Default Function Registry](registry.md) +1. [`u:` Namespace](u-namespace.md) 1. [Formatting](formatting.md) 1. [Interchange data model](data-model/README.md) @@ -79,41 +80,33 @@ A reference to a _term_ looks like this. > The provisions of the stability policy are not in effect until > the conclusion of the technical preview and adoption of this specification. -Updates to this specification will not change -the syntactical meaning, the runtime output, or other behaviour -of valid messages written for earlier versions of this specification -that only use functions defined in this specification. +Updates to this specification will not make any valid _message_ invalid. + Updates to this specification will not remove any syntax provided in this version. -Future versions MAY add additional structure or meaning to existing syntax. -Updates to this specification will not remove any reserved keywords or sigils. +Updates to this specification MUST NOT specify an error for any message +that previously did not specify an error. -> [!NOTE] -> Future versions may define new keywords. +Updates to this specification MUST NOT specify the use of a fallback value for any message +that previously did not specify a fallback value. + +Updates to this specification will not change the syntactical meaning +of any syntax defined in this specification. -Updates to this specification will not reserve or assign meaning to -any character "sigils" except for those in the `reserved` production. +Updates to this specification will not remove any functions defined in the default registry. -Updates to this specification -will not remove any functions defined in the default registry nor -will they remove any options or option values. -Additional options or option values MAY be defined. +Updates to this specification will not remove any options or option values +defined in the default registry. > [!NOTE] -> This does not guarantee that the results of formatting will never change. -> Even when the specification doesn't change, +> The foregoing policies are _not_ a guarantee that the results of formatting will never change. +> Even when this specification or its implementation do not change, > the functions for date formatting, number formatting and so on -> will change their results over time. - -Later specification versions MAY make previously invalid messages valid. - -Updates to this specification will not introduce message syntax that, -when parsed according to earlier versions of this specification, -would produce syntax or data model errors. -Such messages MAY produce errors when formatted -according to an earlier version of this specification. +> can change their results over time or behave differently due to local runtime +> differences in implementation or changes to locale data +> (such as due to the release of new CLDR versions). -From version 2.0, MessageFormat will only reserve, define, or require +Updates to this specification will only reserve, define, or require function names or function option names consisting of characters in the ranges a-z, A-Z, and 0-9. All other names in these categories are reserved for the use of implementations or users. @@ -121,28 +114,31 @@ All other names in these categories are reserved for the use of implementations > [!NOTE] > Users defining custom names SHOULD include at least one character outside these ranges > to ensure that they will be compatible with future versions of this specification. +> They SHOULD also use the namespace feature to avoid collisions with other implementations. -Later versions of this specification will not introduce changes +Future versions of this specification will not introduce changes to the data model that would result in a data model representation based on this version being invalid. > For example, existing interfaces or fields will not be removed. -Later versions of this specification MAY introduce changes -to the data model that would result in future data model representations -not being valid for implementations of this version of the data model. - -> For example, a future version could introduce a new keyword, -> whose data model representation would be a new interface -> that is not recognized by this version's data model. - -Later specification versions will not introduce syntax that cannot be -represented by this version of the data model. - -> For example, a future version could introduce a new keyword. -> The future version's data model would provide an interface for that keyword -> while this version of the data model would parse the value into -> the interface `UnsupportedStatement`. -> Both data models would be "valid" in their context, -> but this version's would be missing any functionality for the new statement type. +> [!IMPORTANT] +> This stability policy allows any of the following, non-exhaustive list, of changes +> in future versions of this specification: +> - Future versions may define new syntax and structures +> that would not be supported by this version of the specification. +> - Future versions may add additional structure or meaning to existing syntax. +> - Future versions may define new keywords. +> - Future versions may make previously invalid messages valid. +> - Future versions may define additional functions in the default registry +> or may reserve the names of functions for the purposes of interoperability. +> - Future versions may define additional options to existing functions. +> - Future versions may define additional option values for existing options. +> - Future versions may deprecate (but not remove) keywords, functions, options, or option values. +> - Future versions of this specification may introduce changes +> to the data model that would result in future data model representations +> not being valid for implementations of this version of the data model. +> - For example, a future version could introduce a new keyword, +> whose data model representation would be a new interface +> that is not recognized by this version's data model. diff --git a/spec/appendices.md b/spec/appendices.md index e94544596..b65036c6c 100644 --- a/spec/appendices.md +++ b/spec/appendices.md @@ -14,12 +14,10 @@ host environments, their serializations and resource formats, that might be sufficient to prevent most problems. However, MessageFormat itself does not supply such a restriction. -MessageFormat _messages_ permit nearly all Unicode code points, -with the exception of surrogates, +MessageFormat _messages_ permit nearly all Unicode code points to appear in _literals_, including the text portions of a _pattern_. This means that it can be possible for a _message_ to contain invisible characters -(such as bidirectional controls, -ASCII control characters in the range U+0000 to U+001F, +(such as bidirectional controls, ASCII control characters in the range U+0000 to U+001F, or characters that might be interpreted as escapes or syntax in the host format) that abnormally affect the display of the _message_ when viewed as source code, or in resource formats or translation tools, diff --git a/spec/data-model/README.md b/spec/data-model/README.md index 517596f1c..bd7028df0 100644 --- a/spec/data-model/README.md +++ b/spec/data-model/README.md @@ -17,11 +17,10 @@ Implementations that expose APIs supporting the production, consumption, or tran _message_ as a data structure are encouraged to use this data model. This data model provides these capabilities: -- any MessageFormat 2 message (including future versions) - can be parsed into this representation +- any MessageFormat 2.0 message can be parsed into this representation - this data model representation can be serialized as a well-formed -MessageFormat 2 message -- parsing a MessageFormat 2 message into a data model representation +MessageFormat 2.0 message +- parsing a MessageFormat 2.0 message into a data model representation and then serializing it results in an equivalently functional message This data model might also be used to: @@ -59,10 +58,6 @@ declarations, options, and attributes to be optional rather than required proper > In the MessageFormat 2 [syntax](/spec/syntax.md), the source for these `name` fields > sometimes uses the production `identifier`. > This happens when the named item, such as a _function_, supports namespacing. -> -> In the Tech Preview, feedback on whether to separate the `namespace` from the `name` -> and represent both separately, or just, as here, use an opaque single field `name` -> is desired. ## Messages @@ -85,7 +80,7 @@ interface PatternMessage { interface SelectMessage { type: "select"; declarations: Declaration[]; - selectors: Expression[]; + selectors: VariableRef[]; variants: Variant[]; } ``` @@ -98,21 +93,8 @@ The `name` does not include the initial `$` of the _variable_. The `name` of an `InputDeclaration` MUST be the same as the `name` in the `VariableRef` of its `VariableExpression` `value`. -An `UnsupportedStatement` represents a statement not supported by the implementation. -Its `keyword` is a non-empty string name (i.e. not including the initial `.`). -If not empty, the `body` is the "raw" value (i.e. escape sequences are not processed) -starting after the keyword and up to the first _expression_, -not including leading or trailing whitespace. -The non-empty `expressions` correspond to the trailing _expressions_ of the _reserved statement_. - -> [!NOTE] -> Be aware that future versions of this specification -> might assign meaning to _reserved statement_ values. -> This would result in new interfaces being added to -> this data model. - ```ts -type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement; +type Declaration = InputDeclaration | LocalDeclaration; interface InputDeclaration { type: "input"; @@ -125,13 +107,6 @@ interface LocalDeclaration { name: string; value: Expression; } - -interface UnsupportedStatement { - type: "unsupported-statement"; - keyword: string; - body?: string; - expressions: Expression[]; -} ``` In a `SelectMessage`, @@ -173,45 +148,35 @@ type Pattern = Array; type Expression = | LiteralExpression | VariableExpression - | FunctionExpression - | UnsupportedExpression; + | FunctionExpression; interface LiteralExpression { type: "expression"; arg: Literal; - annotation?: FunctionAnnotation | UnsupportedAnnotation; + function?: FunctionRef; attributes: Attributes; } interface VariableExpression { type: "expression"; arg: VariableRef; - annotation?: FunctionAnnotation | UnsupportedAnnotation; + function?: FunctionRef; attributes: Attributes; } interface FunctionExpression { type: "expression"; arg?: never; - annotation: FunctionAnnotation; - attributes: Attributes; -} - -interface UnsupportedExpression { - type: "expression"; - arg?: never; - annotation: UnsupportedAnnotation; + function: FunctionRef; attributes: Attributes; } - -type Attributes = Map; ``` ## Expressions The `Literal` and `VariableRef` correspond to the the _literal_ and _variable_ syntax rules. When they are used as the `body` of an `Expression`, -they represent _expression_ values with no _annotation_. +they represent _expression_ values with no _function_. `Literal` represents all literal values, both _quoted literal_ and _unquoted literal_. The presence or absence of quotes is not preserved by the data model. @@ -231,14 +196,14 @@ interface VariableRef { } ``` -A `FunctionAnnotation` represents a _function_ _annotation_. +A `FunctionRef` represents a _function_. The `name` does not include the `:` starting sigil. `Options` is a key-value mapping containing options, -and is used to represent the _annotation_ and _markup_ _options_. +and is used to represent the _function_ and _markup_ _options_. ```ts -interface FunctionAnnotation { +interface FunctionRef { type: "function"; name: string; options: Options; @@ -247,31 +212,13 @@ interface FunctionAnnotation { type Options = Map; ``` -An `UnsupportedAnnotation` represents a -_private-use annotation_ not supported by the implementation or a _reserved annotation_. -The `source` is the "raw" value (i.e. escape sequences are not processed), -including the starting sigil. - -When parsing the syntax of a _message_ that includes a _private-use annotation_ -supported by the implementation, -the implementation SHOULD represent it in the data model -using an interface appropriate for the semantics and meaning -that the implementation attaches to that _annotation_. - -```ts -interface UnsupportedAnnotation { - type: "unsupported-annotation"; - source: string; -} -``` - ## Markup A `Markup` object has a `kind` of either `"open"`, `"standalone"`, or `"close"`, each corresponding to _open_, _standalone_, and _close_ _markup_. The `name` in these does not include the starting sigils `#` and `/` or the ending sigil `/`. -The `options` for markup use the same key-value mapping as `FunctionAnnotation`. +The `options` for markup use the same key-value mapping as `FunctionRef`. ```ts interface Markup { @@ -283,6 +230,17 @@ interface Markup { } ``` +## Attributes + +`Attributes` is a key-value mapping +used to represent the _expression_ and _markup_ _attributes_. + +_Attributes_ with no value are represented by `true` here. + +```ts +type Attributes = Map; +``` + ## Extensions Implementations MAY extend this data model with additional interfaces, diff --git a/spec/data-model/message.dtd b/spec/data-model/message.dtd index 33be40df2..bc51dd159 100644 --- a/spec/data-model/message.dtd +++ b/spec/data-model/message.dtd @@ -1,5 +1,5 @@ @@ -10,13 +10,7 @@ name NMTOKEN #REQUIRED > - - - - + @@ -24,8 +18,8 @@ @@ -33,15 +27,13 @@ - - + + - - - + diff --git a/spec/data-model/message.json b/spec/data-model/message.json index 77fc3a4f4..b669af462 100644 --- a/spec/data-model/message.json +++ b/spec/data-model/message.json @@ -32,11 +32,11 @@ "attributes": { "type": "object", "additionalProperties": { - "oneOf": [{ "$ref": "#/$defs/literal-or-variable" }, { "const": true }] + "oneOf": [{ "$ref": "#/$defs/literal" }, { "const": true }] } }, - "function-annotation": { + "function": { "type": "object", "properties": { "type": { "const": "function" }, @@ -45,65 +45,17 @@ }, "required": ["type", "name"] }, - "unsupported-annotation": { - "type": "object", - "properties": { - "type": { "const": "unsupported-annotation" }, - "source": { "type": "string" } - }, - "required": ["type", "source"] - }, - "annotation": { - "oneOf": [ - { "$ref": "#/$defs/function-annotation" }, - { "$ref": "#/$defs/unsupported-annotation" } - ] - }, - - "literal-expression": { - "type": "object", - "properties": { - "type": { "const": "expression" }, - "arg": { "$ref": "#/$defs/literal" }, - "annotation": { "$ref": "#/$defs/annotation" }, - "attributes": { "$ref": "#/$defs/attributes" } - }, - "required": ["type", "arg"] - }, - "variable-expression": { - "type": "object", - "properties": { - "type": { "const": "expression" }, - "arg": { "$ref": "#/$defs/variable" }, - "annotation": { "$ref": "#/$defs/annotation" }, - "attributes": { "$ref": "#/$defs/attributes" } - }, - "required": ["type", "arg"] - }, - "function-expression": { - "type": "object", - "properties": { - "type": { "const": "expression" }, - "annotation": { "$ref": "#/$defs/function-annotation" }, - "attributes": { "$ref": "#/$defs/attributes" } - }, - "required": ["type", "annotation"] - }, - "unsupported-expression": { + "expression": { "type": "object", "properties": { "type": { "const": "expression" }, - "annotation": { "$ref": "#/$defs/unsupported-annotation" }, + "arg": { "$ref": "#/$defs/literal-or-variable" }, + "function": { "$ref": "#/$defs/function" }, "attributes": { "$ref": "#/$defs/attributes" } }, - "required": ["type", "annotation"] - }, - "expression": { "oneOf": [ - { "$ref": "#/$defs/literal-expression" }, - { "$ref": "#/$defs/variable-expression" }, - { "$ref": "#/$defs/function-expression" }, - { "$ref": "#/$defs/unsupported-expression" } + { "required": ["type", "arg"] }, + { "required": ["type", "function"] } ] }, @@ -148,26 +100,12 @@ }, "required": ["type", "name", "value"] }, - "unsupported-statement": { - "type": "object", - "properties": { - "type": { "const": "unsupported-statement" }, - "keyword": { "type": "string" }, - "body": { "type": "string" }, - "expressions": { - "type": "array", - "items": { "$ref": "#/$defs/expression" } - } - }, - "required": ["type", "keyword", "expressions"] - }, "declarations": { "type": "array", "items": { "oneOf": [ { "$ref": "#/$defs/input-declaration" }, - { "$ref": "#/$defs/local-declaration" }, - { "$ref": "#/$defs/unsupported-statement" } + { "$ref": "#/$defs/local-declaration" } ] } }, @@ -201,7 +139,7 @@ "declarations": { "$ref": "#/$defs/declarations" }, "selectors": { "type": "array", - "items": { "$ref": "#/$defs/expression" } + "items": { "$ref": "#/$defs/variable" } }, "variants": { "type": "array", diff --git a/spec/errors.md b/spec/errors.md index 7a6375ee9..5782622b2 100644 --- a/spec/errors.md +++ b/spec/errors.md @@ -24,22 +24,36 @@ or _Message Function Errors_ in _expressions_ that are not otherwise used by the such as _placeholders_ in unselected _patterns_ or _declarations_ that are never referenced during _formatting_. -In all cases, when encountering a runtime error, -a message formatter MUST provide some representation of the message. -An informative error or errors MUST also be separately provided. +When formatting a _message_ with one or more errors, +an implementation MUST provide a mechanism to discover and identify +at least one of the errors. +The exact form of error signaling is implementation defined. +Some examples include throwing an exception, +returning an error code, +or providing a function or method for enumerating any errors. + +For all _valid_ _messages_, +an implementation MUST enable a user to get a formatted result. +The formatted result might include _fallback values_ +such as when a _placeholder_'s _expression_ produced an error +during formatting. + +The two above requirements MAY be fulfilled by a single formatting method, +or separately by more than one such method. When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors SHOULD prioritise _Syntax Errors_ and _Data Model Errors_ over others. -When an error occurs within a _selector_, +When an error occurs while resolving a _selector_ +or calling MatchSelectorKeys with its resolved value, the _selector_ MUST NOT match any _variant_ _key_ other than the catch-all `*` -and a _Resolution Error_ or a _Message Function Error_ MUST be emitted. +and a _Bad Selector_ error MUST be emitted. ## Syntax Errors -**_Syntax Errors_** occur when the syntax representation of a message is not well-formed. +**_Syntax Errors_** occur when the syntax representation of a message is not _well-formed_. > Example invalid messages resulting in a _Syntax Error_: > @@ -61,7 +75,7 @@ and a _Resolution Error_ or a _Message Function Error_ MUST be emitted. ## Data Model Errors -**_Data Model Errors_** occur when a message is invalid due to +**_Data Model Errors_** occur when a message is not _valid_ due to violating one of the semantic requirements on its structure. ### Variant Key Mismatch @@ -72,13 +86,16 @@ does not equal the number of _selectors_. > Example invalid messages resulting in a _Variant Key Mismatch_ error: > > ``` -> .match {$one :func} +> .input {$one :func} +> .match $one > 1 2 {{Too many}} > * {{Otherwise}} > ``` > > ``` -> .match {$one :func} {$two :func} +> .input {$one :func} +> .input {$two :func} +> .match $one $two > 1 2 {{Two keys}} > * {{Missing a key}} > * * {{Otherwise}} @@ -92,13 +109,16 @@ does not include a _variant_ with only catch-all keys. > Example invalid messages resulting in a _Missing Fallback Variant_ error: > > ``` -> .match {$one :func} +> .input {$one :func} +> .match $one > 1 {{Value is one}} > 2 {{Value is two}} > ``` > > ``` -> .match {$one :func} {$two :func} +> .input {$one :func} +> .input {$two :func} +> .match $one $two > 1 * {{First is one}} > * 1 {{Second is one}} > ``` @@ -106,27 +126,27 @@ does not include a _variant_ with only catch-all keys. ### Missing Selector Annotation A **_Missing Selector Annotation_** error occurs when the _message_ -contains a _selector_ that does not have an _annotation_, -or contains a _variable_ that does not directly or indirectly reference a _declaration_ with an _annotation_. +contains a _selector_ that does not +directly or indirectly reference a _declaration_ with a _function_. > Examples of invalid messages resulting in a _Missing Selector Annotation_ error: > > ``` -> .match {$one} +> .match $one > 1 {{Value is one}} > * {{Value is not one}} > ``` > > ``` > .local $one = {|The one|} -> .match {$one} +> .match $one > 1 {{Value is one}} > * {{Value is not one}} > ``` > > ``` > .input {$one} -> .match {$one} +> .match $one > 1 {{Value is one}} > * {{Value is not one}} > ``` @@ -186,13 +206,16 @@ same list of _keys_ is used for more than one _variant_. > Examples of invalid messages resulting in a _Duplicate Variant_ error: > > ``` -> .match {$var :string} +> .input {$var :string} +> .match $var > * {{The first default}} > * {{The second default}} > ``` > > ``` -> .match {$x :string} {$y :string} +> .input {$x :string} +> .input {$y :string} +> .match $x $y > * foo {{The first "foo" variant}} > bar * {{The "bar" variant}} > * |foo| {{The second "foo" variant}} @@ -217,7 +240,8 @@ An **_Unresolved Variable_** error occurs when a variable reference c > ``` > > ``` -> .match {$var :func} +> .input {$var :func} +> .match $var > 1 {{The value is one.}} > * {{The value is not one.}} > ``` @@ -236,67 +260,33 @@ a reference to a function which cannot be resolved. > ``` > > ``` -> .match {|horse| :func} +> .local $horse = {|horse| :func} +> .match $horse > 1 {{The value is one.}} > * {{The value is not one.}} > ``` -### Unsupported Expression - -An **_Unsupported Expression_** error occurs when an expression uses -syntax reserved for future standardization, -or for private implementation use that is not supported by the current implementation. - -> For example, attempting to format this message -> would result in an _Unsupported Expression_ error -> because it includes a _reserved annotation_. -> -> ``` -> The value is {!horse}. -> ``` -> -> Attempting to format this message would result in an _Unsupported Expression_ error -> if done within a context that does not support the `^` private use sigil: -> -> ``` -> .match {|horse| ^private} -> 1 {{The value is one.}} -> * {{The value is not one.}} -> ``` - -### Unsupported Statement - -An **_Unsupported Statement_** error occurs when a message includes a _reserved statement_. - -> For example, attempting to format this message -> would result in an _Unsupported Statement_ error: -> -> ``` -> .some {|horse|} -> {{The message body}} -> ``` - ### Bad Selector A **_Bad Selector_** error occurs when a message includes a _selector_ -with a resolved value which does not support selection. +with a _resolved value_ which does not support selection. > For example, attempting to format this message > would result in a _Bad Selector_ error: > > ``` > .local $day = {|2024-05-01| :date} -> .match {$day} +> .match $day > * {{The due date is {$day}}} > ``` ## Message Function Errors A **_Message Function Error_** is any error that occurs -when calling a message function implementation +when calling a _function handler_ or which depends on validation associated with a specific function. -Implementations SHOULD provide a way for _functions_ to emit +Implementations SHOULD provide a way for _function handlers_ to emit (or cause to be emitted) any of the types of error defined in this section. Implementations MAY also provide implementation-defined _Message Function Error_ types. @@ -310,7 +300,7 @@ Implementations MAY also provide implementation-defined _Message Function Error_ > 3. Uses a `:get` message function which requires its argument to be an object and > an option `field` to be provided with a string value. > -> The exact type of _Message Function Error_ is determined by the message function implementation. +> The exact type of _Message Function Error_ is determined by the _function handler_. > > ``` > Hello, {horse :get field=name}! @@ -348,7 +338,8 @@ for that specific _function_. > ``` > > ``` -> .match {|horse| :number} +> .local $horse = {|horse| :number} +> .match $horse > 1 {{The value is one.}} > * {{The value is not one.}} > ``` @@ -385,7 +376,8 @@ does not match the expected implementation-defined format. > which is a requirement of the `:number` function: > > ``` -> .match {42 :number} +> .local $answer = {42 :number} +> .match $answer > 1 {{The value is one.}} > horse {{The value is a horse.}} > * {{The value is not one.}} diff --git a/spec/formatting.md b/spec/formatting.md index dc3719b10..f1a12cae0 100644 --- a/spec/formatting.md +++ b/spec/formatting.md @@ -7,16 +7,26 @@ when formatting a message for display in a user interface, or for some later pro To start, we presume that a _message_ has either been parsed from its syntax or created from a data model description. -If this construction has encountered any _Syntax Errors_ or _Data Model Errors_, -an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result. +If the resulting _message_ is not _well-formed_, a _Syntax Error_ is emitted. +If the resulting _message_ is _well-formed_ but is not _valid_, a _Data Model Error_ is emitted. -Formatting of a _message_ is defined by the following operations: +The formatting of a _message_ is defined by the following operations: + +- **_Pattern Selection_** determines which of a message's _patterns_ is formatted. + For a message with no _selectors_, this is simple as there is only one _pattern_. + With _selectors_, this will depend on their resolution. + +- **_Formatting_** takes the _resolved values_ of + the _text_ and _placeholder_ parts of the selected _pattern_, + and produces the formatted result for the _message_. + Depending on the implementation, this result could be a single concatenated string, + an array of objects, an attributed string, or some other locally appropriate data type. - **_Expression and Markup Resolution_** determines the value of an _expression_ or _markup_, with reference to the current _formatting context_. This can include multiple steps, such as looking up the value of a variable and calling formatting functions. - The form of the resolved value is implementation defined and the + The form of the _resolved value_ is implementation defined and the value might not be evaluated or formatted yet. However, it needs to be "formattable", i.e. it contains everything required by the eventual formatting. @@ -24,6 +34,15 @@ Formatting of a _message_ is defined by the following operations: The resolution of _text_ is rather straightforward, and is detailed under _literal resolution_. +Implementations are not required to expose +the _expression resolution_ and _pattern selection_ operations to their users, +or even use them in their internal processing, +as long as the final _formatting_ result is made available to users +and the observable behavior of the _formatting_ matches that described here. + +_Attributes_ MUST NOT have any effect on the formatted output of a _message_, +nor be made available to _function handlers_. + > [!IMPORTANT] > > **This specification does not require either eager or lazy _expression resolution_ of _message_ @@ -35,28 +54,9 @@ Formatting of a _message_ is defined by the following operations: > value of a given _expression_ until it is actually used by a > selection or formatting process. > However, when an _expression_ is resolved, it MUST behave as if all preceding -> _declarations_ and _selectors_ affecting _variables_ referenced by that _expression_ +> _declarations_ affecting _variables_ referenced by that _expression_ > have already been evaluated in the order in which the relevant _declarations_ -> and _selectors_ appear in the _message_. - -- **_Pattern Selection_** determines which of a message's _patterns_ is formatted. - For a message with no _selectors_, this is simple as there is only one _pattern_. - With _selectors_, this will depend on their resolution. - - At the start of _pattern selection_, - if the _message_ contains any _reserved statements_, - emit an _Unsupported Statement_ error. - -- **_Formatting_** takes the resolved values of the selected _pattern_, - and produces the formatted result for the _message_. - Depending on the implementation, this result could be a single concatenated string, - an array of objects, an attributed string, or some other locally appropriate data type. - -Formatter implementations are not required to expose -the _expression resolution_ and _pattern selection_ operations to their users, -or even use them in their internal processing, -as long as the final _formatting_ result is made available to users -and the observable behavior of the formatter matches that described here. +> appear in the _message_. ## Formatting Context @@ -78,61 +78,92 @@ At a minimum, it includes: This is often determined by a user-provided argument of a formatting function call. - The _function registry_, - providing the implementations of the functions referred to by message _functions_. + providing the _function handlers_ of the functions referred to by message _functions_. -- Optionally, a fallback string to use for the message - if it contains any _Syntax Errors_ or _Data Model Errors_. +- Optionally, a fallback string to use for the message if it is not _valid_. Implementations MAY include additional fields in their _formatting context_. -## Expression and Markup Resolution +## Resolved Values -_Expressions_ are used in _declarations_, _selectors_, and _patterns_. -_Markup_ is only used in _patterns_. +A **_resolved value_** is the result of resolving a _text_, _literal_, _variable_, _expression_, or _markup_. +The _resolved value_ is determined using the _formatting context_. +The form of the _resolved value_ is implementation-defined. -In a _declaration_, the resolved value of the _expression_ is bound to a _variable_, -which is available for use by later _expressions_. -Since a _variable_ can be referenced in different ways later, -implementations SHOULD NOT immediately fully format the value for output. +In a _declaration_, the _resolved value_ of an _expression_ is bound to a _variable_, +which makes it available for use in later _expressions_ and _markup_ _options_. + +> For example, in +> ``` +> .input {$a :number minimumFractionDigits=3} +> .local $b = {$a :integer notation=compact} +> .match $a +> 0 {{The value is zero.}} +> * {{In compact form, the value {$a} is rendered as {$b}.}} +> ``` +> the _resolved value_ bound to `$a` is used as the _operand_ +> of the `:integer` _function_ when resolving the value of the _variable_ `$b`, +> as a _selector_ in the `.match` statement, +> as well as for formatting the _placeholder_ `{$a}`. In an _input-declaration_, the _variable_ operand of the _variable-expression_ identifies not only the name of the external input value, -but also the _variable_ to which the resolved value of the _variable-expression_ is bound. +but also the _variable_ to which the _resolved value_ of the _variable-expression_ is bound. -In _selectors_, the resolved value of an _expression_ is used for _pattern selection_. +In a _pattern_, the _resolved value_ of an _expression_ or _markup_ is used in its _formatting_. -In a _pattern_, the resolved value of an _expression_ or _markup_ is used in its _formatting_. - -The form that resolved values take is implementation-dependent, +The form that _resolved values_ take is implementation-dependent, and different implementations MAY choose to perform different levels of resolution. -> For example, the resolved value of the _expression_ `{|0.40| :number style=percent}` -> could be an object such as +> While this specification does not require it, +> a _resolved value_ could be implemented by requiring each _function handler_ to +> return a value matching the following interface: > -> ``` -> { value: Number('0.40'), -> formatter: NumberFormat(locale, { style: 'percent' }) } +> ```ts +> interface MessageValue { +> formatToString(): string +> formatToX(): X // where X is an implementation-defined type +> getValue(): unknown +> resolvedOptions(): { [key: string]: MessageValue } +> selectKeys(keys: string[]): string[] +> } > ``` > -> Alternatively, it could be an instance of an ICU4J `FormattedNumber`, -> or some other locally appropriate value. +> With this approach: +> - An _expression_ could be used as a _placeholder_ if +> calling the `formatToString()` or `formatToX()` method of its _resolved value_ +> did not emit an error. +> - A _variable_ could be used as a _selector_ if +> calling the `selectKeys(keys)` method of its _resolved value_ +> did not emit an error. +> - Using a _variable_, the _resolved value_ of an _expression_ +> could be used as an _operand_ or _option_ value if +> calling the `getValue()` method of its _resolved value_ did not emit an error. +> In this use case, the `resolvedOptions()` method could also +> provide a set of option values that could be taken into account by the called function. +> +> Extensions of the base `MessageValue` interface could be provided for different data types, +> such as numbers or strings, +> for which the `unknown` return type of `getValue()` and +> the generic `MessageValue` type used in `resolvedOptions()` +> could be narrowed appropriately. +> An implementation could also allow `MessageValue` values to be passed in as input variables, +> or automatically wrap each variable as a `MessageValue` to provide a uniform interface +> for custom functions. -Depending on the presence or absence of a _variable_ or _literal_ operand -and a _function_, _private-use annotation_, or _reserved annotation_, -the resolved value of the _expression_ is determined as follows: +## Expression and Markup Resolution -If the _expression_ contains a _reserved annotation_, -an _Unsupported Expression_ error is emitted and -a _fallback value_ is used as the resolved value of the _expression_. +_Expressions_ are used in _declarations_ and _patterns_. +_Markup_ is only used in _patterns_. -Else, if the _expression_ contains a _private-use annotation_, -its resolved value is defined according to the implementation's specification. +Depending on the presence or absence of a _variable_ or _literal_ operand and a _function_, +the _resolved value_ of the _expression_ is determined as follows: -Else, if the _expression_ contains an _annotation_, -its resolved value is defined by _function resolution_. +If the _expression_ contains a _function_, +its _resolved value_ is defined by _function resolution_. Else, if the _expression_ consists of a _variable_, -its resolved value is defined by _variable resolution_. +its _resolved value_ is defined by _variable resolution_. An implementation MAY perform additional processing when resolving the value of an _expression_ that consists only of a _variable_. @@ -151,13 +182,13 @@ that consists only of a _variable_. > the pattern included the function `:datetime` with some set of default options. Else, the _expression_ consists of a _literal_. -Its resolved value is defined by _literal resolution_. +Its _resolved value_ is defined by _literal resolution_. -> **Note** -> This means that a _literal_ value with no _annotation_ +> [!NOTE] +> This means that a _literal_ value with no _function_ > is always treated as a string. > To represent values that are not strings as a _literal_, -> an _annotation_ needs to be provided: +> a _function_ needs to be provided: > > ``` > .local $aNumber = {1234 :number} @@ -168,43 +199,58 @@ Its resolved value is defined by _literal resolution_. ### Literal Resolution -The resolved value of a _text_ or a _literal_ is +The _resolved value_ of a _text_ or a _literal_ contains the character sequence of the _text_ or _literal_ after any character escape has been converted to the escaped character. When a _literal_ is used as an _operand_ or on the right-hand side of an _option_, -the formatting function MUST treat its resolved value the same +the formatting function MUST treat its _resolved value_ the same whether its value was originally a _quoted literal_ or an _unquoted literal_. > For example, > the _option_ `foo=42` and the _option_ `foo=|42|` are treated as identical. -The resolution of a _text_ or _literal_ MUST resolve to a string. + +> For example, in a JavaScript formatter +> the _resolved value_ of a _text_ or a _literal_ could have the following implementation: +> +> ```ts +> class MessageLiteral implements MessageValue { +> constructor(value: string) { +> this.formatToString = () => value; +> this.getValue = () => value; +> } +> resolvedOptions: () => ({}); +> selectKeys(_keys: string[]) { +> throw Error("Selection on unannotated literals is not supported"); +> } +> } +> ``` ### Variable Resolution To resolve the value of a _variable_, its _name_ is used to identify either a local variable or an input variable. -If a _declaration_ exists for the _variable_, its resolved value is used. +If a _declaration_ exists for the _variable_, its _resolved value_ is used. Otherwise, the _variable_ is an implicit reference to an input value, and its value is looked up from the _formatting context_ _input mapping_. -The resolution of a _variable_ MAY fail if no value is identified for its _name_. -If this happens, an _Unresolved Variable_ error MUST be emitted. +The resolution of a _variable_ fails if no value is identified for its _name_. +If this happens, an _Unresolved Variable_ error is emitted. If a _variable_ would resolve to a _fallback value_, this MUST also be considered a failure. ### Function Resolution -To resolve an _expression_ with a _function_ _annotation_, +To resolve an _expression_ with a _function_, the following steps are taken: 1. If the _expression_ includes an _operand_, resolve its value. If this fails, use a _fallback value_ for the _expression_. 2. Resolve the _identifier_ of the _function_ and, based on the starting sigil, - find the appropriate function implementation to call. - If the implementation cannot find the function, + find the appropriate _function handler_ to call. + If the implementation cannot find the _function handler_, or if the _identifier_ includes a _namespace_ that the implementation does not support, emit an _Unknown Function_ error and use a _fallback value_ for the _expression_. @@ -214,108 +260,76 @@ the following steps are taken: 3. Perform _option resolution_. -4. Call the function implementation with the following arguments: +4. Determine the _function context_ for calling the _function handler_. - - The current _locale_. - - The resolved mapping of _options_. - - If the _expression_ includes an _operand_, its resolved value. + The **_function context_** contains the context necessary for + the _function handler_ to resolve the _expression_. This includes: - The form that resolved _operand_ and _option_ values take is implementation-defined. + - The current _locale_, + potentially including a fallback chain of locales. + - The base directionality of the _message_ and its _text_ tokens. - A _declaration_ binds the resolved value of an _expression_ - to a _variable_. - Thus, the result of one _function_ is potentially the _operand_ - of another _function_, - or the value of one of the _options_ for another function. - For example, in - ``` - .input {$n :number minimumIntegerDigits=3} - .local $n1 = {$n :number maximumFractionDigits=3} - ``` - the value bound to `$n` is the - resolved value used as the _operand_ - of the `:number` _function_ - when resolving the value of the _variable_ `$n1`. - - Implementations that provide a means for defining custom functions - SHOULD provide a means for function implementations - to return values that contain enough information - (e.g. a representation of - the resolved _operand_ and _option_ values - that the function was called with) - to be used as arguments to subsequent calls - to the function implementations. - For example, an implementation might define an interface that allows custom function implementation. - Such an interface SHOULD define an implementation-specific - argument type `T` and return type `U` - for implementations of functions - such that `U` can be coerced to `T`. - Implementations of a _function_ SHOULD emit a - _Bad Operand_ error for _operands_ whose resolved value - or type is not supported. + If the resolved mapping of _options_ includes any _`u:` options_ + supported by the implementation, process them as specified. + Such `u:` options MAY be removed from the resolved mapping of _options_. -> [!NOTE] -> The behavior of the previous example is -> currently implementation-dependent. Supposing that -> the external input variable `n` is bound to the string `"1"`, -> and that the implementation formats to a string, -> the formatted result of the following message: -> -> ``` -> .input {$n :number minimumIntegerDigits=3} -> .local $n1 = {$n :number maximumFractionDigits=3} -> {{$n1}} -> ``` -> -> is currently implementation-dependent. -> Depending on whether the options are preserved -> between the resolution of the first `:number` _annotation_ -> and the resolution of the second `:number` _annotation_, -> a conformant implementation -> could produce either "001.000" or "1.000" -> -> Each function **specification** MAY have -> its own rules to preserve some options in the returned structure -> and discard others. -> In instances where a function specification does not determine whether an option is preserved or discarded, -> each function **implementation** of that specification MAY have -> its own rules to preserve some options in the returned structure -> and discard others. -> +5. Call the function implementation with the following arguments: -> [!NOTE] -> During the Technical Preview, -> feedback on how the registry describes -> the flow of _resolved values_ and _options_ -> from one _function_ to another, -> and on what requirements this specification should impose, -> is highly desired. - - An implementation MAY pass additional arguments to the function, - as long as reasonable precautions are taken to keep the function interface - simple and minimal, and avoid introducing potential security vulnerabilities. + - The _function context_. + - The resolved mapping of _options_. + - If the _expression_ includes an _operand_, its _resolved value_. - An implementation MAY define its own functions. - An implementation MAY allow custom functions to be defined by users. + The form that resolved _operand_ and _option_ values take is implementation-defined. - Function access to the _formatting context_ MUST be minimal and read-only, - and execution time SHOULD be limited. - - Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_. + An implementation MAY pass additional arguments to the _function handler_, + as long as reasonable precautions are taken to keep the function interface + simple and minimal, and avoid introducing potential security vulnerabilities. -5. If the call succeeds, +6. If the call succeeds, resolve the value of the _expression_ as the result of that function call. If the call fails or does not return a valid value, emit the appropriate _Message Function Error_ for the failure. - Implementations MAY provide a mechanism for the _function_ to provide + Implementations MAY provide a mechanism for the _function handler_ to provide additional detail about internal failures. Specifically, if the cause of the failure was that the datatype, value, or format of the _operand_ did not match that expected by the _function_, - the _function_ might cause a _Bad Operand_ error to be emitted. + the _function_ SHOULD cause a _Bad Operand_ error to be emitted. - In all failure cases, use the _fallback value_ for the _expression_ as the resolved value. + In all failure cases, use the _fallback value_ for the _expression_ as its _resolved value_. + +#### Function Handler + +A **_function handler_** is an implementation-defined process +such as a function or method +which accepts a set of arguments and returns a _resolved value_. +A _function handler_ is required to resolve a _function_. + +An implementation MAY define its own functions and their handlers. +An implementation MAY allow custom functions to be defined by users. + +Implementations that provide a means for defining custom functions +MUST provide a means for _function handlers_ +to return _resolved values_ that contain enough information +to be used as _operands_ or _option_ values in subsequent _expressions_. + +The _resolved value_ returned by a _function handler_ +MAY be different from the value of the _operand_ of the _function_. +It MAY be an implementation specified type. +It is not required to be the same type as the _operand_. + +A _function handler_ MAY include resolved options in its _resolved value_. +The resolved options MAY be different from the _options_ of the function. + +A _function handler_ SHOULD emit a +_Bad Operand_ error for _operands_ whose _resolved value_ +or type is not supported. + +_Function handler_ access to the _formatting context_ MUST be minimal and read-only, +and execution time SHOULD be limited. + +Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_. #### Option Resolution @@ -325,7 +339,7 @@ For each _option_: - Resolve the _identifier_ of the _option_. - If the _option_'s right-hand side successfully resolves to a value, - bind the _identifier_ of the _option_ to the resolved value in the mapping. + bind the _identifier_ of the _option_ to the _resolved value_ in the mapping. - Otherwise, bind the _identifier_ of the _option_ to an unresolved value in the mapping. Implementations MAY later remove this value before calling the _function_. (Note that an _Unresolved Variable_ error will have been emitted.) @@ -338,26 +352,27 @@ This mapping can be empty. Unlike _functions_, the resolution of _markup_ is not customizable. -The resolved value of _markup_ includes the following fields: +The _resolved value_ of _markup_ includes the following fields: - The type of the markup: open, standalone, or close - The _identifier_ of the _markup_ - The resolved _options_ values after _option resolution_. +If the resolved mapping of _options_ includes any _`u:` options_ +supported by the implementation, process them as specified. +Such `u:` options MAY be removed from the resolved mapping of _options_. + The resolution of _markup_ MUST always succeed. ### Fallback Resolution -A **_fallback value_** is the resolved value for an _expression_ that fails to resolve. +A **_fallback value_** is the _resolved value_ for an _expression_ that fails to resolve. An _expression_ fails to resolve when: -- A _variable_ used as an _operand_ (with or without an _annotation_) fails to resolve. +- A _variable_ used as an _operand_ (with or without a _function_) fails to resolve. * Note that this does not include a _variable_ used as an _option_ value. -- A _function_ _annotation_ fails to resolve. -- A _private-use annotation_ is unsupported by the implementation or if - a _private-use annotation_ fails to resolve. -- The _expression_ has a _reserved annotation_. +- A _function_ fails to resolve. The _fallback value_ depends on the contents of the _expression_: @@ -371,9 +386,8 @@ The _fallback value_ depends on the contents of the _expression_: > In a context where `:func` fails to resolve, > `{42 :func}` resolves to the _fallback value_ `|42|` and > `{|C:\\| :func}` resolves to the _fallback value_ `|C:\\|`. - > In any context, `{|| @reserved}` resolves to the _fallback value_ `||`. -- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without an _annotation_): +- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without a _function_): the _value_ to which it resolves (which may already be a _fallback value_) > Examples: @@ -390,12 +404,12 @@ The _fallback value_ depends on the contents of the _expression_: > (transitively) resolves to the _fallback value_ `:now` and > the message formats to `{:now}`. -- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without an _annotation_): +- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without a _function_): U+0024 DOLLAR SIGN `$` followed by the _name_ of the _variable_ > Examples: - > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` and `{$var @reserved}` - > all resolve to the _fallback value_ `$var`. + > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` + > both resolve to the _fallback value_ `$var`. > In a context where `:func` fails to resolve, > the _pattern_'s _expression_ in `.input $arg {{{$arg :func}}}` > resolves to the _fallback value_ `$arg` and @@ -408,21 +422,6 @@ The _fallback value_ depends on the contents of the _expression_: > In a context where `:func` fails to resolve, `{:func}` resolves to the _fallback value_ `:func`. > In a context where `:ns:func` fails to resolve, `{:ns:func}` resolves to the _fallback value_ `:ns:func`. -- unsupported _private-use annotation_ or _reserved annotation_ with no _operand_: - the _annotation_ starting sigil - - > Examples: - > In any context, `{@reserved}` and `{@reserved |...|}` both resolve to the _fallback value_ `@`. - -- supported _private-use annotation_ with no _operand_: - the _annotation_ starting sigil, optionally followed by implementation-defined details - conforming with patterns in the other cases (such as quoting literals). - If details are provided, they SHOULD NOT leak potentially private information. - - > Examples: - > In a context where `^` expressions are used for comments, `{^▽^}` might resolve to the _fallback value_ `^`. - > In a context where `&` expressions are _function_-like macro invocations, `{&foo |...|}` might resolve to the _fallback value_ `&foo`. - - Otherwise: the U+FFFD REPLACEMENT CHARACTER `�` This is not currently used by any expression, but may apply in future revisions. @@ -431,8 +430,33 @@ _Option_ _identifiers_ and values are not included in the _fallback value_. _Pattern selection_ is not supported for _fallback values_. +> For example, in a JavaScript formatter +> the _fallback value_ could have the following implementation, +> where `source` is one of the above-defined strings: +> +> ```ts +> class MessageFallback implements MessageValue { +> constructor(source: string) { +> this.formatToString = () => `{${source}}`; +> this.getValue = () => undefined; +> } +> resolvedOptions: () => ({}); +> selectKeys(_keys: string[]) { +> throw Error("Selection on fallback values is not supported"); +> } +> } +> ``` + ## Pattern Selection +If the _message_ being formatted is not _well-formed_ and _valid_, +the result of pattern selection is a _pattern_ consisting of a single _fallback value_ +using the _message_'s fallback string defined in the _formatting context_ +or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. + +If the _message_ being formatted does not contain a _matcher_, +the result of pattern selection is its _pattern_ value. + When a _message_ contains a _matcher_ with one or more _selectors_, the implementation needs to determine which _variant_ will be used to provide the _pattern_ for the formatting operation. @@ -450,7 +474,8 @@ according to their _key_ values and selecting the first one. > > For example, in the `pl` (Polish) locale, this _message_ cannot reach > > the `*` _variant_: > > ``` -> > .match {$num :integer} +> > .input {$num :integer} +> > .match $num > > 0 {{ }} > > one {{ }} > > few {{ }} @@ -470,13 +495,16 @@ Each _key_ corresponds to a _selector_ by its position in the _variant_. > For example, in this message: > > ``` -> .match {:one} {:two} {:three} +> .input {$one :number} +> .input {$two :number} +> .input {$three :number} +> .match $one $two $three > 1 2 3 {{ ... }} > ``` > -> The first _key_ `1` corresponds to the first _selector_ (`{:one}`), -> the second _key_ `2` to the second _selector_ (`{:two}`), -> and the third _key_ `3` to the third _selector_ (`{:three}`). +> The first _key_ `1` corresponds to the first _selector_ (`$one`), +> the second _key_ `2` to the second _selector_ (`$two`), +> and the third _key_ `3` to the third _selector_ (`$three`). To determine which _variant_ best matches a given set of inputs, each _selector_ is used in turn to order and filter the list of _variants_. @@ -489,39 +517,25 @@ Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priorit When all of the _selectors_ have been processed, the earliest-sorted _variant_ in the remaining list of _variants_ is selected. -> [!NOTE] -> A _selector_ is not a _declaration_. -> Even when the same _function_ can be used for both formatting and selection -> of a given _operand_ -> the _annotation_ that appears in a _selector_ has no effect on subsequent -> _selectors_ nor on the formatting used in _placeholders_. -> To use the same value for selection and formatting, -> set its value with a `.input` or `.local` _declaration_. - This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here. -If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, -the result of pattern selection MUST be a pattern resolving to a single _fallback value_ -using the message's fallback string defined in the _formatting context_ -or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. - ### Resolve Selectors First, resolve the values of each _selector_: -1. Let `res` be a new empty list of resolved values that support selection. +1. Let `res` be a new empty list of _resolved values_ that support selection. 1. For each _selector_ `sel`, in source order, - 1. Let `rv` be the resolved value of `sel`. + 1. Let `rv` be the _resolved value_ of `sel`. 1. If selection is supported for `rv`: 1. Append `rv` as the last element of the list `res`. 1. Else: - 1. Let `nomatch` be a resolved value for which selection always fails. + 1. Let `nomatch` be a _resolved value_ for which selection always fails. 1. Append `nomatch` as the last element of the list `res`. 1. Emit a _Bad Selector_ error. -The form of the resolved values is determined by each implementation, +The form of the _resolved values_ is determined by each implementation, along with the manner of determining their support for selection. ### Resolve Preferences @@ -535,9 +549,9 @@ Next, using `res`, resolve the preferential order for all message keys: 1. Let `key` be the `var` key at position `i`. 1. If `key` is not the catch-all key `'*'`: 1. Assert that `key` is a _literal_. - 1. Let `ks` be the resolved value of `key`. + 1. Let `ks` be the _resolved value_ of `key` in Unicode Normalization Form C. 1. Append `ks` as the last element of the list `keys`. - 1. Let `rv` be the resolved value at index `i` of `res`. + 1. Let `rv` be the _resolved value_ at index `i` of `res`. 1. Let `matches` be the result of calling the method MatchSelectorKeys(`rv`, `keys`) 1. Append `matches` as the last element of the list `pref`. @@ -549,6 +563,9 @@ The returned list MAY be empty. The most-preferred key is first, with each successive key appearing in order by decreasing preference. +The resolved value of each _key_ MUST be in Unicode Normalization Form C ("NFC"), +even if the _literal_ for the _key_ is not. + If calling MatchSelectorKeys encounters any error, a _Bad Selector_ error is emitted and an empty list is returned. @@ -565,7 +582,7 @@ filter the list of _variants_ to the ones that match with some preference: 1. If `key` is the catch-all key `'*'`: 1. Continue the inner loop on `pref`. 1. Assert that `key` is a _literal_. - 1. Let `ks` be the resolved value of `key`. + 1. Let `ks` be the _resolved value_ of `key`. 1. Let `matches` be the list of strings at index `i` of `pref`. 1. If `matches` includes `ks`: 1. Continue the inner loop on `pref`. @@ -591,7 +608,7 @@ Finally, sort the list of variants `vars` and select the _pattern_: 1. Let `key` be the `tuple` _variant_ key at position `i`. 1. If `key` is not the catch-all key `'*'`: 1. Assert that `key` is a _literal_. - 1. Let `ks` be the resolved value of `key`. + 1. Let `ks` be the _resolved value_ of `key`. 1. Let `matchpref` be the integer position of `ks` in `matches`. 1. Set the `tuple` integer value as `matchpref`. 1. Set `sortable` to be the result of calling the method `SortVariants(sortable)`. @@ -618,7 +635,7 @@ _This section is non-normative._ #### Example 1 -Presuming a minimal implementation which only supports `:string` annotation +Presuming a minimal implementation which only supports `:string` _function_ which matches keys by using string comparison, and a formatting context in which the variable reference `$foo` resolves to the string `'foo'` and @@ -626,7 +643,9 @@ the variable reference `$bar` resolves to the string `'bar'`, pattern selection proceeds as follows for this message: ``` -.match {$foo :string} {$bar :string} +.input {$foo :string} +.input {$bar :string} +.match $foo $bar bar bar {{All bar}} foo foo {{All foo}} * * {{Otherwise}} @@ -657,7 +676,9 @@ Alternatively, with the same implementation and formatting context as in Example pattern selection would proceed as follows for this message: ``` -.match {$foo :string} {$bar :string} +.input {$foo :string} +.input {$bar :string} +.match $foo $bar * bar {{Any and bar}} foo * {{Foo and any}} foo bar {{Foo and bar}} @@ -706,7 +727,7 @@ the pattern selection proceeds as follows for this message: ``` .input {$count :number} -.match {$count} +.match $count one {{Category match for {$count}}} 1 {{Exact match for {$count}}} * {{Other match for {$count}}} @@ -737,19 +758,18 @@ one {{Category match for {$count}}} After _pattern selection_, each _text_ and _placeholder_ part of the selected _pattern_ is resolved and formatted. -Resolved values cannot always be formatted by a given implementation. +_Resolved values_ cannot always be formatted by a given implementation. When such an error occurs during _formatting_, -an implementation SHOULD emit an appropriate _Message Function Error_ and produce a -_fallback value_ for the _placeholder_ that produced the error. -A formatting function MAY substitute a value to use instead of a _fallback value_. +an appropriate _Message Function Error_ is emitted and +a _fallback value_ is used for the _placeholder_ with the error. Implementations MAY represent the result of _formatting_ using the most appropriate data type or structure. Some examples of these include: - A single string concatenated from the parts of the resolved _pattern_. - A string with associated attributes for portions of its text. -- A flat sequence of objects corresponding to each resolved value. -- A hierarchical structure of objects that group spans of resolved values, +- A flat sequence of objects corresponding to each _resolved value_. +- A hierarchical structure of objects that group spans of _resolved values_, such as sequences delimited by _markup-open_ and _markup-close_ _placeholders_. Implementations SHOULD provide _formatting_ result types that match user needs, @@ -762,10 +782,6 @@ MUST be an empty string. Implementations MAY offer functionality for customizing this, such as by emitting XML-ish tags for each _markup_. -_Attributes_ are reserved for future standardization. -Other than checking for valid syntax, they SHOULD NOT -affect the processing or output of a _message_. - ### Examples _This section is non-normative._ @@ -790,8 +806,9 @@ the _fallback value_ as a string, and a U+007D RIGHT CURLY BRACKET `}`. > For example, -> a message with a _Syntax Error_ and no fallback string -> defined in the _formatting context_ would format to a string as `{�}`. +> a _message_ that is not _well-formed_ would format to a string as `{�}`, +> unless a fallback string is defined in the _formatting context_, +> in which case that string would be used instead. ### Handling Bidirectional Text @@ -801,7 +818,16 @@ That is, the text can can consist of a mixture of left-to-right and right-to-lef The display of bidirectional text is defined by the [Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9]. -The directionality of the message as a whole is provided by the _formatting context_. +The directionality of the formatted _message_ as a whole is provided by the _formatting context_. + +> [!NOTE] +> Keep in mind the difference between the formatted output of a _message_, +> which is the topic of this section, +> and the syntax of _message_ prior to formatting. +> The processing of a _message_ depends on the logical sequence of Unicode code points, +> not on the presentation of the _message_. +> Affordances to allow users appropriate control over the appearance of the +> _message_'s syntax have been provided. When a _message_ is formatted, _placeholders_ are replaced with their formatted representation. @@ -858,7 +884,7 @@ The _Default Bidi Strategy_ is defined as follows: These correspond to the message having left-to-right directionality, right-to-left directionality, and to the message's directionality not being known. 1. For each _expression_ `exp` in _pattern_: - 1. Let `fmt` be the formatted string representation of the resolved value of `exp`. + 1. Let `fmt` be the formatted string representation of the _resolved value_ of `exp`. 1. Let `dir` be the directionality of `fmt`, one of « `'LTR'`, `'RTL'`, `'unknown'` », with the same meanings as for `msgdir`. 1. If `dir` is `'LTR'`: diff --git a/spec/message.abnf b/spec/message.abnf index 3377275da..a9293040c 100644 --- a/spec/message.abnf +++ b/spec/message.abnf @@ -1,45 +1,41 @@ message = simple-message / complex-message -simple-message = [s] [simple-start pattern] +simple-message = o [simple-start pattern] simple-start = simple-start-char / escaped-char / placeholder pattern = *(text-char / escaped-char / placeholder) placeholder = expression / markup -complex-message = [s] *(declaration [s]) complex-body [s] -declaration = input-declaration / local-declaration / reserved-statement +complex-message = o *(declaration o) complex-body o +declaration = input-declaration / local-declaration complex-body = quoted-pattern / matcher -input-declaration = input [s] variable-expression -local-declaration = local s variable [s] "=" [s] expression +input-declaration = input o variable-expression +local-declaration = local s variable o "=" o expression -quoted-pattern = "{{" pattern "}}" +quoted-pattern = o "{{" pattern "}}" -matcher = match-statement 1*([s] variant) -match-statement = match 1*([s] selector) -selector = expression -variant = key *(s key) [s] quoted-pattern +matcher = match-statement s variant *(o variant) +match-statement = match 1*(s selector) +selector = variable +variant = key *(s key) quoted-pattern key = literal / "*" ; Expressions -expression = literal-expression - / variable-expression - / annotation-expression -literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" -variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" -annotation-expression = "{" [s] annotation *(s attribute) [s] "}" +expression = literal-expression + / variable-expression + / function-expression +literal-expression = "{" o literal [s function] *(s attribute) o "}" +variable-expression = "{" o variable [s function] *(s attribute) o "}" +function-expression = "{" o function *(s attribute) o "}" -annotation = function - / private-use-annotation - / reserved-annotation - -markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone - / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close +markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone + / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close ; Expression and literal parts function = ":" identifier *(s option) -option = identifier [s] "=" [s] (literal / variable) -; Attributes are reserved for future standardization -attribute = "@" identifier [[s] "=" [s] (literal / variable)] +option = identifier o "=" o (literal / variable) + +attribute = "@" identifier [o "=" o literal] variable = "$" name @@ -54,32 +50,15 @@ input = %s".input" local = %s".local" match = %s".match" -; Reserve additional .keywords for use by future versions of this specification. -reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) -; Note that the following production is a simplification, -; as this rule MUST NOT be considered to match existing keywords -; (`.input`, `.local`, and `.match`). -reserved-keyword = "." name - -; Reserve additional sigils for use by future versions of this specification. -reserved-annotation = reserved-annotation-start [[s] reserved-body] -reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" - -; Reserve sigils for private-use by implementations. -private-use-annotation = private-start [[s] reserved-body] -private-start = "^" / "&" -reserved-body = reserved-body-part *([s] reserved-body-part) -reserved-body-part = reserved-char / escaped-char / quoted-literal - ; Names and identifiers ; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName -; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD +; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C identifier = [namespace ":"] name namespace = name -name = name-start *name-char +name = [bidi] name-start *name-char [bidi] name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF - / %x370-37D / %x37F-1FFF / %x200C-200D + / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF name-char = name-start / DIGIT / "-" / "." @@ -87,9 +66,8 @@ name-char = name-start / DIGIT / "-" / "." ; Restrictions on characters in various contexts simple-start-char = content-char / "@" / "|" -text-char = content-char / s / "." / "@" / "|" -quoted-char = content-char / s / "." / "@" / "{" / "}" -reserved-char = content-char / "." +text-char = content-char / ws / "." / "@" / "|" +quoted-char = content-char / ws / "." / "@" / "{" / "}" content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x0B-0C ; omit CR (%x0D) / %x0E-1F ; omit SP (%x20) @@ -98,12 +76,21 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x41-5B ; omit \ (%x5C) / %x5D-7A ; omit { | } (%x7B-7D) / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) - / %x3001-D7FF ; omit surrogates - / %xE000-10FFFF + / %x3001-10FFFF ; allowing surrogates is intentional ; Character escapes escaped-char = backslash ( backslash / "{" / "|" / "}" ) backslash = %x5C ; U+005C REVERSE SOLIDUS "\" -; Whitespace -s = 1*( SP / HTAB / CR / LF / %x3000 ) +; Required whitespace +s = *bidi ws o + +; Optional whitespace +o = *(ws / bidi) + +; Bidirectional marks and isolates +; ALM / LRM / RLM / LRI, RLI, FSI & PDI +bidi = %x061C / %x200E / %x200F / %x2066-2069 + +; Whitespace characters +ws = SP / HTAB / CR / LF / %x3000 diff --git a/spec/registry.md b/spec/registry.md index 918d7baed..eb8fb6297 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -1,7 +1,7 @@ # MessageFormat 2.0 Default Function Registry -This section describes the functions which each implementation MUST provide -to be conformant with this specification. +This section describes the functions for which each implementation MUST provide +a _function handler_ to be conformant with this specification. Implementations MAY implement additional _functions_ or additional _options_. In particular, implementations are encouraged to provide feedback on proposed @@ -51,27 +51,18 @@ The function `:string` has no options. #### Selection When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](/spec/formatting.md#resolve-preferences) -where `resolvedSelector` is the resolved value of a _selector_ _expression_ +where `resolvedSelector` is the _resolved value_ of a _selector_ and `keys` is a list of strings, -the `:string` selector performs as described below. +the `:string` selector function performs as described below. -1. Let `compare` be the string value of `resolvedSelector`. +1. Let `compare` be the string value of `resolvedSelector` + in Unicode Normalization Form C (NFC) [\[UAX#15\]](https://www.unicode.org/reports/tr15) 1. Let `result` be a new empty list of strings. 1. For each string `key` in `keys`: 1. If `key` and `compare` consist of the same sequence of Unicode code points, then 1. Append `key` as the last element of the list `result`. 1. Return `result`. -> [!NOTE] -> Matching of `key` and `compare` values is sensitive to the sequence of code points -> in each string. -> As a result, variations in how text can be encoded can affect the performance of matching. -> The function `:string` does not perform case folding or Unicode Normalization of string values. -> Users SHOULD encode _messages_ and their parts (such as _keys_ and _operands_), -> in Unicode Normalization Form C (NFC) unless there is a very good reason -> not to. -> See also: [String Matching](https://www.w3.org/TR/charmod-norm) - > [!NOTE] > Unquoted string literals in a _variant_ do not include spaces. > If users wish to match strings that include whitespace @@ -80,14 +71,28 @@ the `:string` selector performs as described below. > > For example: > ``` -> .match {$string :string} +> .input {$string :string} +> .match $string > | space key | {{Matches the string " space key "}} > * {{Matches the string "space key"}} > ``` #### Formatting -The `:string` function returns the string value of the resolved value of the _operand_. +The `:string` function returns the string value of the _resolved value_ of the _operand_. + +> [!NOTE] +> The function `:string` does not perform Unicode Normalization of its formatted output. +> Users SHOULD encode _messages_ and their parts in Unicode Normalization Form C (NFC) +> unless there is a very good reason not to. + +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:string` _function_, +its _resolved value_ contains the string value of the _operand_ of the annotated _expression_, +together with its resolved locale and directionality. +None of the _options_ set on the _expression_ are part of the _resolved value_. ## Numeric Value Selection and Formatting @@ -152,6 +157,20 @@ The following options and their values are required to be available on the funct - `maximumSignificantDigits` - ([digit size option](#digit-size-options)) +If the _operand_ of the _expression_ is an implementation-defined type, +such as the _resolved value_ of an _expression_ with a `:number` or `:integer` _annotation_, +it can include option values. +These are included in the resolved option values of the _expression_, +with _options_ on the _expression_ taking priority over any option values of the _operand_. + +> For example, the _placeholder_ in this _message_: +> ``` +> .input {$n :number notation=scientific minimumFractionDigits=2} +> {{{$n :number minimumFractionDigits=1}}} +> ``` +> would be formatted with the resolved options +> `{ notation: 'scientific', minimumFractionDigits: '1' }`. + > [!NOTE] > The following options and option values are being developed during the Technical Preview > period. @@ -195,7 +214,8 @@ but can cause problems in target locales that the original developer is not cons > For example, a naive developer might use a special message for the value `1` without > considering a locale's need for a `one` plural: > ``` -> .match {$var :number} +> .input {$var :number} +> .match $var > 1 {{You have one last chance}} > one {{You have {$var} chance remaining}} > * {{You have {$var} chances remaining}} @@ -219,6 +239,14 @@ MUST be multiplied by 100 for the purposes of formatting. The _function_ `:number` performs selection as described in [Number Selection](#number-selection) below. +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:number` _annotation_, +its _resolved value_ contains an implementation-defined numerical value +of the _operand_ of the annotated _expression_, +together with the resolved options' values. + ### The `:integer` function The function `:integer` is a selector and formatter for matching or formatting numeric @@ -228,7 +256,6 @@ values as integers. The function `:integer` requires a [Number Operand](#number-operands) as its _operand_. - #### Options Some options do not have default values defined in this specification. @@ -262,12 +289,25 @@ function `:integer`: - `useGrouping` - `auto` (default) - `always` + - `never` - `min2` - `minimumIntegerDigits` - ([digit size option](#digit-size-options), default: `1`) - `maximumSignificantDigits` - ([digit size option](#digit-size-options)) +If the _operand_ of the _expression_ is an implementation-defined type, +such as the _resolved value_ of an _expression_ with a `:number` or `:integer` _annotation_, +it can include option values. +In general, these are included in the resolved option values of the _expression_, +with _options_ on the _expression_ taking priority over any option values of the _operand_. +Option values with the following names are however discarded if included in the _operand_: +- `compactDisplay` +- `notation` +- `minimumFractionDigits` +- `maximumFractionDigits` +- `minimumSignificantDigits` + > [!NOTE] > The following options and option values are being developed during the Technical Preview > period. @@ -311,7 +351,8 @@ but can cause problems in target locales that the original developer is not cons > For example, a naive developer might use a special message for the value `1` without > considering a locale's need for a `one` plural: > ``` -> .match {$var :integer} +> .input {$var :integer} +> .match $var > 1 {{You have one last chance}} > one {{You have {$var} chance remaining}} > * {{You have {$var} chances remaining}} @@ -335,6 +376,14 @@ MUST be multiplied by 100 for the purposes of formatting. The _function_ `:integer` performs selection as described in [Number Selection](#number-selection) below. +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:integer` _annotation_, +its _resolved value_ contains the implementation-defined integer value +of the _operand_ of the annotated _expression_, +together with the resolved options' values. + ### Number Operands The _operand_ of a number function is either an implementation-defined type or @@ -371,34 +420,37 @@ All other values produce a _Bad Operand_ error. ### Digit Size Options Some _options_ of number _functions_ are defined to take a "digit size option". -Implementations of number _functions_ use these _options_ to control aspects of numeric display +The _function handlers_ for number _functions_ use these _options_ to control aspects of numeric display such as the number of fraction, integer, or significant digits. A "digit size option" is an _option_ value that the _function_ interprets as a small integer value greater than or equal to zero. -Implementations MAY define an upper limit on the resolved value +Implementations MAY define an upper limit on the _resolved value_ of a digit size option option consistent with that implementation's practical limits. In most cases, the value of a digit size option will be a string that -encodes the value as a decimal integer. +encodes the value as a non-negative integer. Implementations MAY also accept implementation-defined types as the value. When provided as a string, the representation of a digit size option matches the following ABNF: >```abnf > digit-size-option = "0" / (("1"-"9") [DIGIT]) >``` +If the value of a digit size option does not evaluate as a non-negative integer, +or if the value exceeds any implementation-defined upper limit +or any option-specific lower limit, a _Bad Option Error_ is emitted. ### Number Selection Number selection has three modes: - `exact` selection matches the operand to explicit numeric keys exactly - `plural` selection matches the operand to explicit numeric keys exactly - or to plural rule categories if there is no explicit match + followed by a plural rule category if there is no explicit match - `ordinal` selection matches the operand to explicit numeric keys exactly - or to ordinal rule categories if there is no explicit match + followed by an ordinal rule category if there is no explicit match When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](/spec/formatting.md#resolve-preferences) -where `resolvedSelector` is the resolved value of a _selector_ _expression_ +where `resolvedSelector` is the _resolved value_ of a _selector_ and `keys` is a list of strings, numeric selectors perform as described below. @@ -423,32 +475,47 @@ numeric selectors perform as described below. #### Rule Selection -If the option `select` is set to `exact`, rule-based selection is not used. -Return the empty string. +Rule selection is intended to support the grammatical matching needs of different +languages/locales in order to support plural or ordinal numeric values. + +If the _option_ `select` is set to `exact`, rule-based selection is not used. +Otherwise rule selection matches the _operand_, as modified by function _options_, to exactly one of these keywords: +`zero`, `one`, `two`, `few`, `many`, or `other`. +The keyword `other` is the default. > [!NOTE] > Since valid keys cannot be the empty string in a numeric expression, returning the > empty string disables keyword selection. -If the option `select` is set to `plural`, selection should be based on CLDR plural rule data -of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) -for examples. +The meaning of the keywords is locale-dependent and implementation-defined. +A _key_ that matches the rule-selected keyword is a stronger match than the fallback key `*` +but a weaker match than any exact match _key_ value. -If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data -of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) -for examples. +The rules for a given locale might not produce all of the keywords. +A given _operand_ value might produce different keywords depending on the locale. -Apply the rules defined by CLDR to the resolved value of the operand and the function options, +Apply the rules to the _resolved value_ of the _operand_ and the relevant function _options_, and return the resulting keyword. If no rules match, return `other`. +If the option `select` is set to `plural`, the rules applied to selection SHOULD be +the CLDR plural rule data of type `cardinal`. +See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) +for examples. + +If the option `select` is set to `ordinal`, the rules applied to selection SHOULD be +the CLDR plural rule data of type `ordinal`. +See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) +for examples. + > **Example.** > In CLDR 44, the Czech (`cs`) plural rule set can be found > [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs). > > A message in Czech might be: > ``` -> .match {$numDays :number} +> .input {$numDays :number} +> .match $numDays > one {{{$numDays} den}} > few {{{$numDays} dny}} > many {{{$numDays} dne}} @@ -467,11 +534,11 @@ If no rules match, return `other`. #### Determining Exact Literal Match > [!IMPORTANT] -> The exact behavior of exact literal match is only defined for non-zero-filled +> The exact behavior of exact literal match is currently only well defined for non-zero-filled > integer values. -> Annotations that use fraction digits or significant digits might work in specific +> Functions that use fraction digits or significant digits might work in specific > implementation-defined ways. -> Users should avoid depending on these types of keys in message selection. +> Users should avoid depending on these types of keys in message selection in this release. Number literals in the MessageFormat 2 syntax use the @@ -481,10 +548,19 @@ if, when the numeric value of `resolvedSelector` is serialized using the format the two strings are equal. > [!NOTE] -> Only integer matching is required in the Technical Preview. -> Feedback describing use cases for fractional and significant digits-based -> selection would be helpful. -> Otherwise, users should avoid using matching with fractional numbers or significant digits. +> The above description of numeric matching contains +> [open issues](https://github.com/unicode-org/message-format-wg/issues/675) +> in the Technical Preview, since a given numeric value might be formatted in +> several different ways under RFC8259 +> and since the effect of formatting options, such as the number of fraction +> digits or significant digits, is not described. +> The Working Group intends to address these issues before final release +> with a number of design options +> [being considered](https://github.com/unicode-org/message-format-wg/pull/859). +> +> Users should avoid creating messages that depend on exact matching of non-integer +> numeric values. +> Feedback, including use cases encountered in message authoring, is strongly desired. ## Date and Time Value Formatting @@ -524,7 +600,12 @@ or can use a collection of _field options_ (but not both) to control the formatt output. If both are specified, a _Bad Option_ error MUST be emitted -and a _fallback value_ used as the resolved value of the _expression_. +and a _fallback value_ used as the _resolved value_ of the _expression_. + +If the _operand_ of the _expression_ is an implementation-defined date/time type, +it can include _style options_, _field options_, or other option values. +These are included in the resolved option values of the _expression_, +with _options_ on the _expression_ taking priority over any option values of the _operand_. > [!NOTE] > The names of _options_ and their _values_ were derived from the @@ -549,8 +630,6 @@ The function `:datetime` has these _style options_. _Field options_ describe which fields to include in the formatted output and what format to use for that field. -The implementation may use this _annotation_ to configure which fields -appear in the formatted output. > [!NOTE] > _Field options_ do not have default values because they are only to be used @@ -627,7 +706,15 @@ are encouraged to track development of these options during Tech Preview: - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) - `timeZone` (default is system default time zone or UTC) - valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557) - + +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:datetime` _annotation_, +its _resolved value_ contains an implementation-defined date/time value +of the _operand_ of the annotated _expression_, +together with the resolved options values. + ### The `:date` function The function `:date` is used to format the date portion of date/time values. @@ -651,6 +738,19 @@ The function `:date` has these _options_: - `medium` (default) - `short` +If the _operand_ of the _expression_ is an implementation-defined date/time type, +it can include other option values. +Any _operand_ option values matching the `:datetime` _style options_ or _field options_ are ignored, +as is any `style` option. + +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:date` _annotation_, +its _resolved value_ is implementation-defined. +An implementation MAY emit a _Bad Operand_ or _Bad Option_ error (as appropriate) +when this happens. + ### The `:time` function The function `:time` is used to format the time portion of date/time values. @@ -674,6 +774,18 @@ The function `:time` has these _options_: - `medium` - `short` (default) +If the _operand_ of the _expression_ is an implementation-defined date/time type, +it can include other option values. +Any _operand_ option values matching the `:datetime` _style options_ or _field options_ are ignored, +as is any `style` option. + +#### Composition + +When an _operand_ or an _option_ value uses a _variable_ annotated, +directly or indirectly, by a `:time` _annotation_, +its _resolved value_ is implementation-defined. +An implementation MAY emit a _Bad Operand_ or _Bad Option_ error (as appropriate) +when this happens. ### Date and Time Operands @@ -723,5 +835,3 @@ For more information, see [Working with Timezones](https://w3c.github.io/timezon > The form of these serializations is known and is a de facto standard. > Support for these extensions is expected to be required in the post-tech preview. > See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/ - - diff --git a/spec/syntax.md b/spec/syntax.md index 42d742ef1..6100b562d 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -60,7 +60,8 @@ The syntax specification takes into account the following design restrictions: control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and - U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content. + U+100000 through U+10FFFD), unassigned code points, unpaired surrogates (U+D800 through U+DFFF), + and other potentially confusing content. ## Messages and their Syntax @@ -90,14 +91,14 @@ Attempting to parse a _message_ that is not _well-formed_ will result in a _Synt A _message_ is **_valid_** if it is _well-formed_ and **also** meets the additional content restrictions and semantic requirements about its structure defined below for -_declarations_, _matcher_ and _options_. +_declarations_, _matcher_, and _options_. Attempting to parse a _message_ that is not _valid_ will result in a _Data Model Error_. ## The Message A **_message_** is the complete template for a specific message formatting request. -A **_variable_** is a _name_ associated to a resolved value. +A **_variable_** is a _name_ associated to a _resolved value_. An **_external variable_** is a _variable_ whose _name_ and initial value are supplied by the caller @@ -113,6 +114,22 @@ A **_local variable_** is a _variable_ created as the result of a _lo > In particular, it avoids using quote characters common to many file formats and formal languages > so that these do not need to be escaped in the body of a _message_. +> [!NOTE] +> _Text_ and _quoted literals_ allow unpaired surrogate code points +> (`U+D800` to `U+DFFF`). +> This is for compatibility with formats or data structures +> that use the UTF-16 encoding +> and do not check for unpaired surrogates. +> (Strings in Java or JavaScript are examples of this.) +> These code points SHOULD NOT be used in a _message_. +> Unpaired surrogate code points are likely an indication of mistakes +> or errors in the creation, serialization, or processing of the _message_. +> Many processes will convert them to +> � U+FFFD REPLACEMENT CHARACTER +> during processing or display. +> Implementations not based on UTF-16 might not be able to represent +> a _message_ containing such code points. + > [!NOTE] > In general (and except where required by the syntax), whitespace carries no meaning in the structure > of a _message_. While many of the examples in this spec are written on multiple lines, the formatting @@ -134,17 +151,20 @@ A **_local variable_** is a _variable_ created as the result of a _lo > > An exception to this is: whitespace inside a _pattern_ is **always** significant. > [!NOTE] -> The syntax assumes that each _message_ will be displayed with a left-to-right display order +> The MessageFormat 2 syntax assumes that each _message_ will be displayed +> with a left-to-right display order > and be processed in the logical character order. -> The syntax also permits the use of right-to-left characters in _identifiers_, +> The syntax permits the use of right-to-left characters in _identifiers_, > _literals_, and other values. -> This can result in confusion when viewing the _message_. -> -> Additional restrictions or requirements, -> such as permitting the use of certain bidirectional control characters in the syntax, -> might be added during the Tech Preview to better manage bidirectional text. -> Feedback on the creation and management of _messages_ -> containing bidirectional tokens is strongly desired. +> This can result in confusion when viewing the message +> or users might incorrectly insert bidi controls or marks that negatively affect the output +> of the message. +> +> To assist with this, the syntax permits the use of various controls and +> strongly-directional markers in both optional and required _whitespace_ +> in a _message_, as well was encouraging the use of isolating controls +> with _expressions_ and _quoted patterns_. +> See: [whitespace](#whitespace) (below) for more information. A _message_ can be a _simple message_ or it can be a _complex message_. @@ -154,13 +174,13 @@ message = simple-message / complex-message A **_simple message_** contains a single _pattern_, with restrictions on its first non-whitespace character. -An empty string is a valid _simple message_. +An empty string is a _valid_ _simple message_. Whitespace at the start or end of a _simple message_ is significant, and a part of the _text_ of the _message_. ```abnf -simple-message = [s] [simple-start pattern] +simple-message = o [simple-start pattern] simple-start = simple-start-char / escaped-char / placeholder ``` @@ -176,7 +196,7 @@ Whitespace at the start or end of a _complex message_ is not significant, and does not affect the processing of the _message_. ```abnf -complex-message = [s] *(declaration [s]) complex-body [s] +complex-message = o *(declaration o) complex-body o ``` ### Declarations @@ -187,17 +207,14 @@ _Declarations_ are optional: many messages will not contain any _declarations_. An **_input-declaration_** binds a _variable_ to an external input value. The _variable-expression_ of an _input-declaration_ -MAY include an _annotation_ that is applied to the external value. - -A **_local-declaration_** binds a _variable_ to the resolved value of an _expression_. +MAY include a _function_ that is applied to the external value. -For compatibility with later MessageFormat 2 specification versions, -_declarations_ MAY also include _reserved statements_. +A **_local-declaration_** binds a _variable_ to the _resolved value_ of an _expression_. ```abnf -declaration = input-declaration / local-declaration / reserved-statement -input-declaration = input [s] variable-expression -local-declaration = local s variable [s] "=" [s] expression +declaration = input-declaration / local-declaration +input-declaration = input o variable-expression +local-declaration = local s variable o "=" o expression ``` _Variables_, once declared, MUST NOT be redeclared. @@ -206,7 +223,7 @@ _Duplicate Declaration_ error during processing: - A _declaration_ MUST NOT bind a _variable_ that appears as a _variable_ anywhere within a previous _declaration_. - An _input-declaration_ MUST NOT bind a _variable_ - that appears anywhere within the _annotation_ of its _variable-expression_. + that appears anywhere within the _function_ of its _variable-expression_. - A _local-declaration_ MUST NOT bind a _variable_ that appears in its _expression_. A _local-declaration_ MAY overwrite an external input value as long as the @@ -214,46 +231,18 @@ external input value does not appear in a previous _declaration_. > [!NOTE] > These restrictions only apply to _declarations_. -> A _placeholder_ or _selector_ can apply a different annotation to a _variable_ +> A _placeholder_ can apply a different _function_ to a _variable_ > than one applied to the same _variable_ named in a _declaration_. > For example, this message is _valid_: > ``` > .input {$var :number maximumFractionDigits=0} -> .match {$var :number maximumFractionDigits=2} -> 0 {{The selector can apply a different annotation to {$var} for the purposes of selection}} -> * {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}} +> .local $var2 = {$var :number maximumFractionDigits=2} +> .match $var2 +> 0 {{The selector can apply a different function to {$var} for the purposes of selection}} +> * {{A placeholder in a pattern can apply a different function to {$var :number maximumFractionDigits=3}}} > ``` > (See the [Errors](./errors.md) section for examples of invalid messages) -#### Reserved Statements - -A **_reserved statement_** reserves additional `.keywords` -for use by future versions of this specification. -Any such future keyword must start with `.`, -followed by two or more lower-case ASCII characters. - -The rest of the statement supports -a similarly wide range of content as _reserved annotations_, -but it MUST end with one or more _expressions_. - -```abnf -reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) -reserved-keyword = "." name -``` - -> [!NOTE] -> The `reserved-keyword` ABNF rule is a simplification, -> as it MUST NOT be considered to match any of the existing keywords -> `.input`, `.local`, or `.match`. - -This allows flexibility in future standardization, -as future definitions MAY define additional semantics and constraints -on the contents of these _reserved statements_. - -Implementations MUST NOT assign meaning or semantics to a _reserved statement_: -these are reserved for future standardization. -Implementations MUST NOT remove or alter the contents of a _reserved statement_. - ### Complex Body The **_complex body_** of a _complex message_ is the part that will be formatted. @@ -285,7 +274,7 @@ A _quoted pattern_ starts with a sequence of two U+007B LEFT CURLY BRACKET `{{` and ends with a sequence of two U+007D RIGHT CURLY BRACKET `}}`. ```abnf -quoted-pattern = "{{" pattern "}}" +quoted-pattern = o "{{" pattern "}}" ``` A _quoted pattern_ MAY be empty. @@ -299,8 +288,8 @@ A _quoted pattern_ MAY be empty. ### Text **_text_** is the translateable content of a _pattern_. -Any Unicode code point is allowed, except for U+0000 NULL -and the surrogate code points U+D800 through U+DFFF inclusive. +Any Unicode code point is allowed, except for U+0000 NULL. + The characters U+005C REVERSE SOLIDUS `\`, U+007B LEFT CURLY BRACKET `{`, and U+007D RIGHT CURLY BRACKET `}` MUST be escaped as `\\`, `\{`, and `\}` respectively. @@ -316,9 +305,8 @@ be preserved during formatting. ```abnf simple-start-char = content-char / "@" / "|" -text-char = content-char / s / "." / "@" / "|" -quoted-char = content-char / s / "." / "@" / "{" / "}" -reserved-char = content-char / "." +text-char = content-char / ws / "." / "@" / "|" +quoted-char = content-char / ws / "." / "@" / "{" / "}" content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x0B-0C ; omit CR (%x0D) / %x0E-1F ; omit SP (%x20) @@ -327,10 +315,14 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x41-5B ; omit \ (%x5C) / %x5D-7A ; omit { | } (%x7B-7D) / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) - / %x3001-D7FF ; omit surrogates - / %xE000-10FFFF + / %x3001-10FFFF ; allowing surrogates is intentional ``` +> [!NOTE] +> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive) +> are allowed for compatibility with UTF-16 based implementations +> that do not check for this encoding error. + When a _pattern_ is quoted by embedding the _pattern_ in curly brackets, the resulting _message_ can be embedded into various formats regardless of the container's whitespace trimming rules. @@ -368,27 +360,31 @@ and at least one _variant_. When the _matcher_ is processed, the result will be a single _pattern_ that serves as the template for the formatting process. -A _message_ can only be considered _valid_ if the following requirements are -satisfied: - -- The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. -- At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. -- Each _selector_ MUST have an _annotation_, - or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. -- Each _variant_ MUST use a list of _keys_ that is unique from that +A _message_ can only be considered _valid_ if the following requirements are satisfied; +otherwise, a corresponding _Data Model Error_ will be produced during processing: + +- _Variant Key Mismatch_: + The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. +- _Missing Fallback Variant_: + At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. +- _Missing Selector Annotation_: + Each _selector_ MUST be a _variable_ that + directly or indirectly references a _declaration_ with a _function_. +- _Duplicate Variant_: + Each _variant_ MUST use a list of _keys_ that is unique from that of all other _variants_ in the _message_. _Literal_ _keys_ are compared by their contents, not their syntactical appearance. ```abnf -matcher = match-statement 1*([s] variant) -match-statement = match 1*([s] selector) +matcher = match-statement s variant *(o variant) +match-statement = match 1*(s selector) ``` > A _message_ with a _matcher_: > > ``` > .input {$count :number} -> .match {$count} +> .match $count > one {{You have {$count} notification.}} > * {{You have {$count} notifications.}} > ``` @@ -396,18 +392,18 @@ match-statement = match 1*([s] selector) > A _message_ containing a _matcher_ formatted on a single line: > > ``` -> .match {:platform} windows {{Settings}} * {{Preferences}} +> .local $os = {:platform} .match $os windows {{Settings}} * {{Preferences}} > ``` ### Selector -A **_selector_** is an _expression_ that ranks or excludes the +A **_selector_** is a _variable_ whose _resolved value_ ranks or excludes the _variants_ based on the value of the corresponding _key_ in each _variant_. The combination of _selectors_ in a _matcher_ thus determines which _pattern_ will be used during formatting. ```abnf -selector = expression +selector = variable ``` There MUST be at least one _selector_ in a _matcher_. @@ -418,7 +414,8 @@ There MAY be any number of additional _selectors_. > based on grammatical case: > > ``` -> .match {$userName :hasCase} +> .local $hasCase = {$userName :hasCase} +> .match $hasCase > vocative {{Hello, {$userName :person case=vocative}!}} > accusative {{Please welcome {$userName :person case=accusative}!}} > * {{Hello!}} @@ -429,7 +426,7 @@ There MAY be any number of additional _selectors_. > ``` > .input {$numLikes :integer} > .input {$numShares :integer} -> .match {$numLikes} {$numShares} +> .match $numLikes $numShares > 0 0 {{Your item has no likes and has not been shared.}} > 0 one {{Your item has no likes and has been shared {$numShares} time.}} > 0 * {{Your item has no likes and has been shared {$numShares} times.}} @@ -445,14 +442,14 @@ There MAY be any number of additional _selectors_. A **_variant_** is a _quoted pattern_ associated with a list of _keys_ in a _matcher_. Each _variant_ MUST begin with a sequence of _keys_, -and terminate with a valid _quoted pattern_. +and terminate with a _valid_ _quoted pattern_. The number of _keys_ in each _variant_ MUST match the number of _selectors_ in the _matcher_. Each _key_ is separated from each other by whitespace. Whitespace is permitted but not required between the last _key_ and the _quoted pattern_. ```abnf -variant = key *(s key) [s] quoted-pattern +variant = key *(s key) quoted-pattern key = literal / "*" ``` @@ -465,6 +462,12 @@ A _key_ can be either a _literal_ value or the "catch-all" key `*`. The **_catch-all key_** is a special key, represented by `*`, that matches all values for a given _selector_. +The value of each _key_ MUST be treated as if it were in +[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC"). +Two _keys_ are considered equal if they are canonically equivalent strings, +that is, if they consist of the same sequence of Unicode code points after +Unicode Normalization Form C has been applied to both. + ## Expressions An **_expression_** is a part of a _message_ that will be determined @@ -477,28 +480,27 @@ An _expression_ cannot contain another _expression_. An _expression_ MAY contain one more _attributes_. A **_literal-expression_** contains a _literal_, -optionally followed by an _annotation_. +optionally followed by a _function_. A **_variable-expression_** contains a _variable_, -optionally followed by an _annotation_. +optionally followed by a _function_. -An **_annotation-expression_** contains an _annotation_ without an _operand_. +A **_function-expression_** contains a _function_ without an _operand_. ```abnf -expression = literal-expression - / variable-expression - / annotation-expression -literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" -variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" -annotation-expression = "{" [s] annotation *(s attribute) [s] "}" +expression = literal-expression + / variable-expression + / function-expression +literal-expression = "{" o literal [s function] *(s attribute) o "}" +variable-expression = "{" o variable [s function] *(s attribute) o "}" +function-expression = "{" o function *(s attribute) o "}" ``` There are several types of _expression_ that can appear in a _message_. All _expressions_ share a common syntax. The types of _expression_ are: 1. The value of a _local-declaration_ -2. A _selector_ -3. A kind of _placeholder_ in a _pattern_ +2. A kind of _placeholder_ in a _pattern_ Additionally, an _input-declaration_ can contain a _variable-expression_. @@ -511,12 +513,6 @@ Additionally, an _input-declaration_ can contain a _variable-expression_. > .local $y = {|This is an expression|} > ``` > -> Selectors: -> -> ``` -> .match {$selector :functionRequired} -> ``` -> > Placeholders: > > ``` @@ -526,36 +522,26 @@ Additionally, an _input-declaration_ can contain a _variable-expression_. > This placeholder contains a function expression with a variable-valued option: {:function option=$variable} > ``` -### Annotation - -An **_annotation_** is part of an _expression_ containing either -a _function_ together with its associated _options_, or -a _private-use annotation_ or a _reserved annotation_. - -```abnf -annotation = function - / private-use-annotation - / reserved-annotation -``` +### Operand An **_operand_** is the _literal_ of a _literal-expression_ or the _variable_ of a _variable-expression_. -An _annotation_ can appear in an _expression_ by itself or following a single _operand_. -When following an _operand_, the _operand_ serves as input to the _annotation_. - #### Function -A **_function_** is named functionality in an _annotation_. +A **_function_** is named functionality in an _expression_. _Functions_ are used to evaluate, format, select, or otherwise process data values during formatting. +A _function_ can appear in an _expression_ by itself or following a single _operand_. +When following an _operand_, the _operand_ serves as input to the _function_. + Each _function_ is defined by the runtime's _function registry_. A _function_'s entry in the _function registry_ will define whether the _function_ is a _selector_ or formatter (or both), whether an _operand_ is required, what form the values of an _operand_ can take, -what _options_ and _option_ values are valid, +what _options_ and _option_ values are acceptable, and what outputs might result. See [function registry](./registry.md) for more information. @@ -583,16 +569,17 @@ The _identifier_ is separated from the _value_ by an U+003D EQUALS SIGN `=` alon optional whitespace. The value of an _option_ can be either a _literal_ or a _variable_. -Multiple _options_ are permitted in an _annotation_. +Multiple _options_ are permitted in a _function_. _Options_ are separated from the preceding _function_ _identifier_ and from each other by whitespace. -Each _option_'s _identifier_ MUST be unique within the _annotation_: -an _annotation_ with duplicate _option_ _identifiers_ is not valid. +Each _option_'s _identifier_ MUST be unique within the _function_: +a _function_ with duplicate _option_ _identifiers_ is not _valid_ +and will produce a _Duplicate Option Name_ error during processing. The order of _options_ is not significant. ```abnf -option = identifier [s] "=" [s] (literal / variable) +option = identifier o "=" o (literal / variable) ``` > Examples of _functions_ with _options_ @@ -611,82 +598,6 @@ option = identifier [s] "=" [s] (literal / variable) > Today is {$date :datetime weekday=$dateStyle}! > ``` -#### Private-Use Annotations - -A **_private-use annotation_** is an _annotation_ whose syntax is reserved -for use by a specific implementation or by private agreement between multiple implementations. -Implementations MAY define their own meaning and semantics for _private-use annotations_. - -A _private-use annotation_ starts with either U+0026 AMPERSAND `&` or U+005E CIRCUMFLEX ACCENT `^`. - -Characters, including whitespace, are assigned meaning by the implementation. -The definition of escapes in the `reserved-body` production, used for the body of -a _private-use annotation_ is an affordance to implementations that -wish to use a syntax exactly like other functions. Specifically: - -- The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}` respectively - when they appear in the body of a _private-use annotation_. -- The character `|` is special: it SHOULD be escaped as `\|` in a _private-use annotation_, - but can appear unescaped as long as it is paired with another `|`. - This is an affordance to allow _literals_ to appear in the private use syntax. - -A _private-use annotation_ MAY be empty after its introducing sigil. - -```abnf -private-use-annotation = private-start [[s] reserved-body] -private-start = "^" / "&" -``` - -> [!NOTE] -> Users are cautioned that _private-use annotations_ cannot be reliably exchanged -> and can result in errors during formatting. -> It is generally a better idea to use the function registry -> to define additional formatting or annotation options. - -> Here are some examples of what _private-use_ sequences might look like: -> -> ``` -> Here's private use with an operand: {$foo &bar} -> Here's a placeholder that is entirely private-use: {&anything here} -> Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|} -> The character \| has to be paired or escaped: {&private || |something between| or isolated: \| } -> Stop {& "translate 'stop' as a verb" might be a translator instruction or comment } -> Protect stuff in {^ph}{^/ph}private use{^ph}{^/ph} -> ``` - -#### Reserved Annotations - -A **_reserved annotation_** is an _annotation_ whose syntax is reserved -for future standardization. - -A _reserved annotation_ starts with a reserved character. -The remaining part of a _reserved annotation_, called a _reserved body_, -MAY be empty or contain arbitrary text that starts and ends with -a non-whitespace character. - -This allows maximum flexibility in future standardization, -as future definitions MAY define additional semantics and constraints -on the contents of these _annotations_. - -Implementations MUST NOT assign meaning or semantics to -an _annotation_ starting with `reserved-annotation-start`: -these are reserved for future standardization. -Whitespace before or after a _reserved body_ is not part of the _reserved body_. -Implementations MUST NOT remove or alter the contents of a _reserved body_, -including any interior whitespace, -but MAY remove or alter whitespace before or after the _reserved body_. - -While a reserved sequence is technically "well-formed", -unrecognized _reserved-annotations_ or _private-use-annotations_ have no meaning. - -```abnf -reserved-annotation = reserved-annotation-start [[s] reserved-body] -reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" - -reserved-body = reserved-body-part *([s] reserved-body-part) -reserved-body-part = reserved-char / escaped-char / quoted-literal -``` - ## Markup **_Markup_** _placeholders_ are _pattern_ parts @@ -713,8 +624,8 @@ It MAY include _options_. is a _pattern_ part ending a span. ```abnf -markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone - / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close +markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone + / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close ``` > A _message_ with one `button` markup span and a standalone `img` markup element: @@ -723,7 +634,8 @@ markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open > {#button}Submit{/button} or {#img alt=|Cancel| /}. > ``` -> A _message_ with attributes in the closing tag: +> A _message_ containing _markup_ that uses _options_ to pair +> two closing markup _placeholders_ to the one open markup _placeholder_: > > ``` > {#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.} @@ -737,66 +649,25 @@ on the pairing, ordering, or contents of _markup_ during _formatting_. ## Attributes -**_Attributes_ are reserved for standardization by future versions of this specification._** -Examples in this section are meant to be illustrative and -might not match future requirements or usage. - -> [!NOTE] -> The Tech Preview does not provide a built-in mechanism for overriding -> values in the _formatting context_ (most notably the locale) -> Nor does it provide a mechanism for identifying specific expressions -> such as by assigning a name or id. -> The utility of these types of mechanisms has been debated. -> There are at least two proposed mechanisms for implementing support for -> these. -> Specifically, one mechanism would be to reserve specifically-named options, -> possibly using a Unicode namespace (i.e. `locale=xxx` or `u:locale=xxx`). -> Such options would be reserved for use in any and all functions or markup. -> The other mechanism would be to use the reserved "expression attribute" syntax -> for this purpose (i.e. `@locale=xxx` or `@id=foo`) -> Neither mechanism was included in this Tech Preview. -> Feedback on the preferred mechanism for managing these features -> is strongly desired. -> -> In the meantime, function authors and other implementers are cautioned to avoid creating -> function-specific or implementation-specific option values for this purpose. -> One workaround would be to use the implementation's namespace for these -> features to insure later interoperability when such a mechanism is finalized -> during the Tech Preview period. -> Specifically: -> - Avoid specifying an option for setting the locale of an expression as different from -> that of the overall _message_ locale, or use a namespace that later maps to the final -> mechanism. -> - Avoid specifying options for the purpose of linking placeholders -> (such as to pair opening markup to closing markup). -> If such an option is created, the implementer should use an -> implementation-specific namespace. -> Users and implementers are cautioned that such options might be -> replaced with a standard mechanism in a future version. -> - Avoid specifying generic options to communicate with translators and -> translation tooling (i.e. implementation-specific options that apply to all -> functions. -> The above are all desirable features. -> We welcome contributions to and proposals for such features during the -> Technical Preview. - An **_attribute_** is an _identifier_ with an optional value that appears in an _expression_ or in _markup_. +During formatting, _attributes_ have no effect, +and they can be treated as code comments. _Attributes_ are prefixed by a U+0040 COMMERCIAL AT `@` sign, followed by an _identifier_. -An _attribute_ MAY have a _value_ which is separated from the _identifier_ +An _attribute_ MAY have a _literal_ _value_ which is separated from the _identifier_ by an U+003D EQUALS SIGN `=` along with optional whitespace. -The _value_ of an _attribute_ can be either a _literal_ or a _variable_. Multiple _attributes_ are permitted in an _expression_ or _markup_. Each _attribute_ is separated by whitespace. -The order of _attributes_ is not significant. - +Each _attribute_'s _identifier_ SHOULD be unique within the _expression_ or _markup_: +all but the last _attribute_ with the same _identifier_ are ignored. +The order of _attributes_ is not otherwise significant. ```abnf -attribute = "@" identifier [[s] "=" [s] (literal / variable)] +attribute = "@" identifier [o "=" o literal] ``` > Examples of _expressions_ and _markup_ with _attributes_: @@ -838,15 +709,33 @@ A _literal_ can appear as a _key_ value, as the _operand_ of a _literal-expression_, or in the value of an _option_. -A _literal_ MAY include any Unicode code point -except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF. +A _literal_ MAY include any Unicode code point except for U+0000 NULL. All code points are preserved. +> [!IMPORTANT] +> Most text, including that produced by common keyboards and input methods, +> is already encoded in the canonical form known as +> [Unicode Normalization Form C](https://unicode.org/reports/tr15) ("NFC"). +> A few languages, legacy character encoding conversions, or operating environments +> can result in _literal_ values that are not in this form. +> Some uses of _literals_ in MessageFormat, +> notably as the value of _keys_, +> apply NFC to the _literal_ value during processing or comparison. +> While there is no requirement that the _literal_ value actually be entered +> in a normalized form, +> users are cautioned to employ the same character sequences +> for equivalent values and, whenever possible, ensure _literals_ are in NFC. + A **_quoted literal_** begins and ends with U+005E VERTICAL BAR `|`. The characters `\` and `|` within a _quoted literal_ MUST be escaped as `\\` and `\|`. +> [!NOTE] +> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive) +> are allowed in _quoted literals_ for compatibility with UTF-16 based +> implementations that do not check for this encoding error. + An **_unquoted literal_** is a _literal_ that does not require the `|` quotes around it to be distinct from the rest of the _message_ syntax. An _unquoted literal_ MAY be used when the content of the _literal_ @@ -867,26 +756,30 @@ number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / " ### Names and Identifiers -An **_identifier_** is a character sequence that -identifies a _function_, _markup_, or _option_. -Each _identifier_ consists of a _name_ optionally preceeded by -a _namespace_. -When present, the _namespace_ is separated from the _name_ by a -U+003A COLON `:`. -Built-in _functions_ and their _options_ do not have a _namespace_ identifier. - -The _namespace_ `u` (U+0075 LATIN SMALL LETTER U) -is reserved for future standardization. - -_Function_ _identifiers_ are prefixed with `:`. -_Markup_ _identifiers_ are prefixed with `#` or `/`. -_Option_ _identifiers_ have no prefix. - A **_name_** is a character sequence used in an _identifier_ or as the name for a _variable_ or the value of an _unquoted literal_. -_Variable_ names are prefixed with `$`. +A _name_ can be preceded or followed by bidirectional marks or isolating controls +to aid in presenting names that contain right-to-left or neutral characters. +These characters are **not** part of the value of the _name_ and MUST be treated as if they were not present +when matching _name_ or _identifier_ strings or _unquoted literal_ values. + +_Variable_ _names_ are prefixed with `$`. + +Two _names_ are considered equal if they are canonically equivalent strings, +that is, if they consist of the same sequence of Unicode code points after +[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC") +has been applied to both. + +> [!NOTE] +> Implementations are not required to normalize all _names_. +> Comparisons of _name_ values only need be done "as-if" normalization +> has occured. +> Since most text in the wild is already in NFC +> and since checking for NFC is fast and efficient, +> implementations can often substitute checking for actually applying normalization +> to _name_ values. Valid content for _names_ is based on Namespaces in XML 1.0's [NCName](https://www.w3.org/TR/xml-names/#NT-NCName). @@ -899,6 +792,21 @@ Otherwise, the set of characters allowed in a _name_ is large. > Such variables cannot be referenced in a _message_, > but are not otherwise errors. +An **_identifier_** is a character sequence that +identifies a _function_, _markup_, or _option_. +Each _identifier_ consists of a _name_ optionally preceeded by +a _namespace_. +When present, the _namespace_ is separated from the _name_ by a +U+003A COLON `:`. +Built-in _functions_ and their _options_ do not have a _namespace_ identifier. + +The _namespace_ `u` (U+0075 LATIN SMALL LETTER U) +is reserved for future standardization. + +_Function_ _identifiers_ are prefixed with `:`. +_Markup_ _identifiers_ are prefixed with `#` or `/`. +_Option_ _identifiers_ have no prefix. + Examples: > A variable: >``` @@ -922,14 +830,14 @@ in this release. ```abnf variable = "$" name -option = identifier [s] "=" [s] (literal / variable) +option = identifier o "=" o (literal / variable) identifier = [namespace ":"] name namespace = name -name = name-start *name-char +name = [bidi] name-start *name-char [bidi] name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF - / %x370-37D / %x37F-1FFF / %x200C-200D + / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF name-char = name-start / DIGIT / "-" / "." @@ -942,8 +850,7 @@ An **_escape sequence_** is a two-character sequence starting with U+005C REVERSE SOLIDUS `\`. An _escape sequence_ allows the appearance of lexically meaningful characters -in the body of _text_, _quoted literal_, or _reserved_ -(which includes, in this case, _private-use_) sequences. +in the body of _text_ or _quoted literal_ sequences. Each _escape sequence_ represents the literal character immediately following the initial `\`. ```abnf @@ -963,24 +870,112 @@ and inside _patterns_ only escape `{` and `}`. ### Whitespace -**_Whitespace_** is defined as one or more of -U+0009 CHARACTER TABULATION (tab), -U+000A LINE FEED (new line), -U+000D CARRIAGE RETURN, -U+3000 IDEOGRAPHIC SPACE, -or U+0020 SPACE. +The syntax limits whitespace characters outside of a _pattern_ to the following: +`U+0009 CHARACTER TABULATION` (tab), +`U+000A LINE FEED` (new line), +`U+000D CARRIAGE RETURN`, +`U+3000 IDEOGRAPHIC SPACE`, +or `U+0020 SPACE`. Inside _patterns_ and _quoted literals_, whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax. +There are two whitespace productions in the syntax. +**_Optional whitespace_** is whitespace that is not required by the syntax, +but which users might want to include to increase the readability of a _message_. +**_Required whitespace_** is whitespace that is required by the syntax. + +Both types of whitespace optionally permit the use of the bidirectional isolate controls +and certain strongly directional marks. +These can assist users in presenting _messages_ that contain right-to-left +text, _literals_, or _names_ (including those for _functions_, _options_, +_option values_, and _keys_) + +_Messages_ that contain right-to-left (aka RTL) characters SHOULD use one of the +following mechanisms to make messages display intelligibly in plain-text editors: + +1. Use paired isolating bidi controls `U+2066 LEFT-TO-RIGHT ISOLATE` ("LRI") + and `U+2069 POP DIRECTIONAL ISOLATE` ("PDI") as permitted by the ABNF around + parts of any _message_ containing RTL characters: + - _inside_ of _placeholder_ markers `{` and `}` + - _outside_ _quoted-pattern_ markers `{{` and `}}` + - _outside_ of _variable_, _function_, _markup_, or _attribute_, + including the identifying sigil (e.g. `$var` or `:ns:name`) +2. Use the 'local-effect' bidi marks + `U+061C ARABIC LETTER MARK`, `U+200E LEFT-TO-RIGHT MARK` or + `U+200F RIGHT-TO-LEFT MARK` as permitted by the ABNF before or after _identifiers_, + _names_, unquoted _literals_, or _option_ values, + especially when the values contain a mix of neutral, weakly directional, and + strongly directional characters. + +> [!IMPORTANT] +> Always take care **not** to add bidirectional controls or marks +> where they would be semantically significant +> or where they would unintentionally become part of the _message_'s output: +> - do not put them inside of a _literal_ except when they are part of the value, +> (instead put them outside of _literal_ quotes, such as `|...|`) +> - do not put them inside quoted _patterns_ except when they are part of the text, +> (instead put them outside of quoted _patterns_, such as `{{...}}`) +> - do not put them outside _placeholders_, +> (instead put them inside the _placeholder_, such as `{$foo :number}`) +> +> Controls placed inside _literal_ quotes or quoted _patterns_ are part of the _literal_ +> or _pattern_. +> Controls in a _pattern_ will appear in the output of the message. +> Controls inside _literal_ quotes are part of the _literal_ and +> will be considered in operations such as matching a _key_ to a _selector_. + +> [!NOTE] +> Users cannot be expected to create or manage bidirectional controls or +> marks in _messages_, since the characters are invisible and can be difficult +> to manage. +> Tools (such as resource editors or translation editors) +> and other implementations of MessageFormat 2 serialization are strongly +> encouraged to provide paired isolates around any right-to-left +> syntax as described above so that _messages_ display appropriately as plain text. + +These definitions of _whitespace_ implement +[UAX#31 Requirement R3a-2](https://www.unicode.org/reports/tr31/#R3a-2). +It is a profile of R3a-1 in that specification because: +- The following pattern whitespace characters are not allowed: + `U+000B FORM FEED`, + `U+000C VERTICAL TABULATION`, + `U+0085 NEXT LINE`, + `U+2028 LINE SEPARATOR` and + `U+2029 PARAGRAPH SEPARATOR`. +- The character `U+3000 IDEOGRAPHIC SPACE` + _is_ interpreted as whitespace. + - The following directional marks and isolates + are treated as ignorable format controls: + `U+061C ARABIC LETTER MARK`, + `U+200E LEFT-TO-RIGHT MARK`, + `U+200F RIGHT-TO-LEFT MARK`, + `U+2066 LEFT-TO-RIGHT ISOLATE`, + `U+2067 RIGHT-TO-LEFT ISOLATE`, + `U+2068 FIRST STRONG ISOLATE`, + and `U+2069 POP DIRECTIONAL ISOLATE`. + (The character `U+061C` is an addition according to R3a.) + + > [!NOTE] > The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for > compatibility with certain East Asian keyboards and input methods, > in which users might accidentally create these characters in a _message_. ```abnf -s = 1*( SP / HTAB / CR / LF / %x3000 ) +; Required whitespace +s = *bidi ws o + +; Optional whitespace +o = *(s / bidi) + +; Bidirectional marks and isolates +; ALM / LRM / RLM / LRI, RLI, FSI & PDI +bidi = %x061C / %x200E / %x200F / %x2066-2069 + +; Whitespace characters +ws = SP / HTAB / CR / LF / %x3000 ``` ## Complete ABNF diff --git a/spec/u-namespace.md b/spec/u-namespace.md new file mode 100644 index 000000000..dabbcc70f --- /dev/null +++ b/spec/u-namespace.md @@ -0,0 +1,87 @@ +# MessageFormat 2.0 Unicode Namespace + +The `u:` _namespace_ is reserved for the definition of _options_ +which affect the _function context_ of the specific _expressions_ +in which they appear, +or for the definition of _options_ that are universally applicable +rather than function-specific. +It might also be used to define _functions_ in a future release. + +The CLDR Technical Committee of the Unicode Consortium +manages the specification for this namespace, hence the name `u:`. + +## Options + +This section describes common **_`u:` options_** which each implementation SHOULD support +for all _functions_ and _markup_. + +### `u:id` + +A string value that is included as an `id` or other suitable value +in the formatted parts for the _placeholder_, +or any other structured formatted results. + +Ignored when formatting a message to a string. + +The value of the `u:id` _option_ MUST be a _literal_ or a +_variable_ whose _resolved value_ is either a string +or can be resolved to a string without error. +For other values, a _Bad Option_ error is emitted +and the `u:id` option is ignored. + +### `u:locale` + +Replaces the _locale_ defined in the _function context_ for this _expression_. + +A comma-delimited list consisting of +well-formed [BCP 47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) +language tags, +or an implementation-defined list of such tags. + +If this option is set on _markup_, a _Bad Option_ error is emitted +and the value of the `u:locale` option is ignored. + +During processing, the `u:locale` option +MUST be removed from the resolved mapping of _options_ +before calling the _function handler_. + +Values matching the following ABNF are always accepted: +```abnf +u-locale-option = unicode_bcp47_locale_id *(o "," o unicode_bcp47_locale_id) +``` +using `unicode_bcp47_locale_id` as defined for +[Unicode Locale Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#unicode_bcp47_locale_id). + +Implementations MAY support additional language tags, +such as private-use or grandfathered tags, +or tags using `_` instead of `-` as a separator. +When the value of `u:locale` is set by a _variable_, +implementations MAY support non-string values otherwise representing locales. + +Implementations MAY emit a _Bad Option_ error +and MAY ignore the value of the `u:locale` _option_ as a whole +or any of the entries in the list of language tags. +This might be because the locale specified is not supported +or because the language tag is not well-formed, +not valid, or some other reason. + +### `u:dir` + +Replaces the base directionality defined in +the _function context_ for this _expression_. + +If this option is set on _markup_, a _Bad Option_ error is emitted +and the value of the `u:dir` option is ignored. + +During processing, the `u:dir` option +MUST be removed from the resolved mapping of _options_ +before calling the _function handler_. + +The value of the `u:dir` _option_ MUST be one of the following _literal_ values +or a _variable_ whose _resolved value_ is one of these _literals_: +- `ltr`: left-to-right directionality +- `rtl`: right-to-left directionality +- `auto`: directionality determined from _expression_ contents + +For other values, a _Bad Option_ error is emitted +and the value of the `u:dir` option is ignored. diff --git a/test/README.md b/test/README.md index 95a8ef7f0..d5cbee831 100644 --- a/test/README.md +++ b/test/README.md @@ -10,6 +10,8 @@ These test files are intended to be useful for testing multiple different messag - `data-model-errors.json` - Strings that should produce a Data Model Error when processed. Error names are defined in ["MessageFormat 2.0 Errors"](../spec/errors.md) in the spec. +- `u-options.json` — Test cases for the `u:` options, using built-in functions. + - `functions/` — Test cases that correspond to built-in functions. The behaviour of the built-in formatters is implementation-specific so the `exp` field is often omitted and assertions are made on error cases. @@ -21,6 +23,7 @@ Some examples of test harnesses using these tests, from the source repository: - [Formatting tests](https://github.com/messageformat/messageformat/blob/11c95dab2b25db8454e49ff4daadb817e1d5b770/packages/mf2-messageformat/src/messageformat.test.ts) A [JSON schema](./schemas/) is included for the test files in this repository. + ## Error Codes The following table relates the error names used in the [JSON schema](./schemas/) @@ -34,13 +37,12 @@ to the error names used in ["MessageFormat 2.0 Errors"](../spec/errors.md) in th | Bad Variant Key | bad-variant-key | | Duplicate Declaration | duplicate-declaration | | Duplicate Option Name | duplicate-option-name | +| Duplicate Variant | duplicate-variant | | Missing Fallback Variant | missing-fallback-variant | | Missing Selector Annotation | missing-selector-annotation | | Syntax Error | syntax-error | | Unknown Function | unknown-function | | Unresolved Variable | unresolved-variable | -| Unsupported Expression | unsupported-expression | -| Unsupported Statement | unsupported-statement | | Variant Key Mismatch | variant-key-mismatch | The "Message Function Error" error name used in the spec @@ -65,29 +67,40 @@ The function `:test:function` requires a [Number Operand](/spec/registry.md#numb #### Options -The only _option_ `:test:function` recognizes is `decimalPlaces`, -a _digit size option_ for which only `0` and `1` are valid values. +The following _options_ are available on `:test:function`: +- `decimalPlaces`, a _digit size option_ for which only `0` and `1` are valid values. + - `0` + - `1` +- `fails` + - `never` (default) + - `select` + - `format` + - `always` All other _options_ and their values are ignored. #### Behavior When resolving a `:test:function` expression, -its `Input` and `DecimalPlaces` values are determined as follows: +its `Input`, `DecimalPlaces`, `FailsFormat`, and `FailsSelect` values are determined as follows: 1. Let `DecimalPlaces` be 0. -1. Let `arg` be the resolved value of the _expression_ _operand_. -1. If `arg` is the resolved value of an _expression_ +1. Let `FailsFormat` be `false`. +1. Let `FailsSelect` be `false`. +1. Let `arg` be the _resolved value_ of the _expression_ _operand_. +1. If `arg` is the _resolved value_ of an _expression_ with a `:test:function`, `:test:select`, or `:test:format` _annotation_ for which resolution has succeeded, then 1. Let `Input` be the `Input` value of `arg`. 1. Set `DecimalPlaces` to be `DecimalPlaces` value of `arg`. + 1. Set `FailsFormat` to be `FailsFormat` value of `arg`. + 1. Set `FailsSelect` to be `FailsSelect` value of `arg`. 1. Else if `arg` is a numerical value or a string matching the `number-literal` production, then 1. Let `Input` be the numerical value of `arg`. 1. Else, 1. Emit "bad-input" _Resolution Error_. - 1. Use a _fallback value_ as the resolved value of the _expression_. + 1. Use a _fallback value_ as the _resolved value_ of the _expression_. Further steps of this algorithm are not followed. 1. If the `decimalPlaces` _option_ is set, then 1. If its value resolves to a numerical integer value 0 or 1 @@ -95,13 +108,25 @@ its `Input` and `DecimalPlaces` values are determined as follows: 1. Set `DecimalPlaces` to be the numerical value of the _option_. 1. Else if its value is not an unresolved value set by _option resolution_, 1. Emit "bad-option" _Resolution Error_. - 1. Use a _fallback value_ as the resolved value of the _expression_. + 1. Use a _fallback value_ as the _resolved value_ of the _expression_. +1. If the `fails` _option_ is set, then + 1. If its value resolves to the string `'always'`, then + 1. Set `FailsFormat` to be `true`. + 1. Set `FailsSelect` to be `true`. + 1. Else if its value resolves to the string `'format'`, then + 1. Set `FailsFormat` to be `true`. + 1. Else if its value resolves to the string `'select'`, then + 1. Set `FailsSelect` to be `true`. + 1. Else if its value does not resolve to the string `'never'`, then + 1. Emit "bad-option" _Resolution Error_. When `:test:function` is used as a _selector_, the behaviour of calling it as the `rv` value of MatchSelectorKeys(`rv`, `keys`) (see [Resolve Preferences](/spec/formatting.md#resolve-preferences) for more information) -depends on its `Input` and `DecimalPlaces` values. +depends on its `Input`, `DecimalPlaces` and `FailsSelect` values. +- If `FailsSelect` is `true`, + calling the method will fail and not return any value. - If the `Input` is 1 and `DecimalPlaces` is 1, the method will return some slice of the list « `'1.0'`, `'1'` », depending on whether those values are included in `keys`. @@ -111,7 +136,7 @@ depends on its `Input` and `DecimalPlaces` values. When an _expression_ with a `:test:function` _annotation_ is assigned to a _variable_ by a _declaration_ and that _variable_ is used as an _option_ value, -its resolved value is the `Input` value. +its _resolved value_ is the `Input` value. When `:test:function` is used as a _formatter_, a _placeholder_ resolving to a value with a `:test:function` _expression_ @@ -128,6 +153,8 @@ If the formatting target is a sequence of parts, each of the above parts will be emitted separately rather than being concatenated into a single string. +If `FailsFormat` is `true`, +attempting to format the _placeholder_ to any formatting target will fail. ### `:test:select` diff --git a/test/schemas/v0/tests.schema.json b/test/schemas/v0/tests.schema.json index 7b2056292..a37dcfa8d 100644 --- a/test/schemas/v0/tests.schema.json +++ b/test/schemas/v0/tests.schema.json @@ -269,6 +269,9 @@ "name": { "type": "string" }, + "id": { + "type": "string" + }, "options": { "type": "object" } @@ -345,8 +348,6 @@ "duplicate-variant", "unresolved-variable", "unknown-function", - "unsupported-expression", - "unsupported-statement", "bad-selector", "bad-operand", "bad-option", diff --git a/test/tests/bidi.json b/test/tests/bidi.json new file mode 100644 index 000000000..607ba792a --- /dev/null +++ b/test/tests/bidi.json @@ -0,0 +1,145 @@ +{ + "scenario": "Bidi support", + "description": "Tests for correct parsing of messages with bidirectional marks and isolates", + "defaultTestProperties": { + "locale": "en-US" + }, + "tests": [ + { + "description": "simple-message = o [simple-start pattern]", + "src": " \u061C Hello world!", + "exp": " \u061C Hello world!" + }, + { + "description": "complex-message = o *(declaration o) complex-body o", + "src": "\u200E .local $x = {1} {{ {$x}}}", + "exp": " 1" + }, + { + "description": "complex-message = o *(declaration o) complex-body o", + "src": ".local $x = {1} \u200F {{ {$x}}}", + "exp": " 1" + }, + { + "description": "complex-message = o *(declaration o) complex-body o", + "src": ".local $x = {1} {{ {$x}}} \u2066", + "exp": " 1" + }, + { + "description": "input-declaration = input o variable-expression", + "src": ".input \u2067 {$x :number} {{hello}}", + "params": [{"name": "x", "value": "1"}], + "exp": "hello" + }, + { + "description": "local s variable o \"=\" o expression", + "src": ".local $x \u2068 = \u2069 {1} {{hello}}", + "exp": "hello" + }, + { + "description": "local s variable o \"=\" o expression", + "src": ".local \u2067 $x = {1} {{hello}}", + "exp": "hello" + }, + { + "description": "local s variable o \"=\" o expression", + "src": ".local\u2067 $x = {1} {{hello}}", + "exp": "hello" + }, + { + "description": "o \"{{\" pattern \"}}\"", + "src": "\u2067 {{hello}}", + "exp": "hello" + }, + { + "description": "match-statement s variant *(o variant)", + "src": ".local $x = {1 :number}\n.match $x\n1 {{one}}\n\u061C * {{other}}", + "exp": "one" + }, + { + "description": "match-statement s variant *(o variant)", + "src": ".local $x = {1 :number}.match $x \u061c1 {{one}}* {{other}}", + "exp": "one" + }, + { + "description": "match-statement s variant *(o variant)", + "src": ".local $x = {1 :number}.match $x\u061c1 {{one}}* {{other}}", + "expErrors": [{"type": "syntax-error"}] + }, + { + "description": "variant = key *(s key) quoted-pattern", + "src": ".local $x = {1 :number} .local $y = {$x :number}.match $x $y\n1 \u200E 1 {{one}}* * {{other}}", + "exp": "one" + }, + { + "description": "variant = key *(s key) quoted-pattern", + "src": ".local $x = {1 :number} .local $y = {$x :number}.match $x $y\n1\u200E 1 {{one}}* * {{other}}", + "exp": "one" + }, + { + "description": "literal-expression = \"{\" o literal [s function] *(s attribute) o \"}\"", + "src": "{\u200E hello \u200F}", + "exp": "hello" + }, + { + "description": "variable-expression = \"{\" o variable [s function] *(s attribute) o \"}\"", + "src": ".local $x = {1} {{ {\u200E $x \u200F} }}", + "exp": " 1 " + }, + { + "description": "function-expression = \"{\" o function *(s attribute) o \"}\"", + "src": "{1 \u200E :number \u200F}", + "exp": "1" + }, + { + "description": "markup = \"{\" o \"#\" identifier *(s option) *(s attribute) o [\"/\"] \"}\"", + "src": "{\u200F #b \u200E }", + "exp": "" + }, + { + "description": "markup = \"{\" o \"/\" identifier *(s option) *(s attribute) o \"}\"", + "src": "{\u200F /b \u200E }", + "exp": "" + }, + { + "description": "option = identifier o \"=\" o (literal / variable)", + "src": "{1 :number minimumFractionDigits\u200F=\u200E1 }", + "exp": "1.0" + }, + { + "description": "attribute = \"@\" identifier [o \"=\" o (literal / variable)]", + "src": "{1 :number @locale\u200F=\u200Een }", + "exp": "1" + }, + { + "description": " name... excludes U+FFFD and U+061C -- this pases as name -> [bidi] name-start *name-char", + "src": ".local $\u061Cfoo = {1} {{ {$\u061Cfoo} }}", + "exp": " 1 " + }, + { + "description": " name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C", + "src": ".local $foo\u061Cbar = {2} {{ }}", + "expErrors": [{"type": "syntax-error"}] + }, + { + "description": "name = [bidi] name-start *name-char [bidi]", + "src": ".local $\u200Efoo\u200F = {3} {{{$\u200Efoo\u200F}}}", + "exp": "3" + }, + { + "description": "name = [bidi] name-start *name-char [bidi]", + "src": ".local $foo = {4} {{{$\u200Efoo\u200F}}}", + "exp": "4" + }, + { + "description": "name = [bidi] name-start *name-char [bidi]", + "src": ".local $\u200Efoo\u200F = {5} {{{$foo}}}", + "exp": "5" + }, + { + "description": "name = [bidi] name-start *name-char [bidi]", + "src": ".local $foo\u200Ebar = {5} {{{$foo\u200Ebar}}}", + "expErrors": [{"type": "syntax-error"}] + } + ] +} diff --git a/test/tests/data-model-errors.json b/test/tests/data-model-errors.json index 86a674c43..f1f54cabe 100644 --- a/test/tests/data-model-errors.json +++ b/test/tests/data-model-errors.json @@ -6,7 +6,7 @@ }, "tests": [ { - "src": ".match {$foo :x} * * {{foo}}", + "src": ".input {$foo :x} .match $foo * * {{foo}}", "expErrors": [ { "type": "variant-key-mismatch" @@ -14,7 +14,7 @@ ] }, { - "src": ".match {$foo :x} {$bar :x} * {{foo}}", + "src": ".input {$foo :x} .input {$bar :x} .match $foo $bar * {{foo}}", "expErrors": [ { "type": "variant-key-mismatch" @@ -22,7 +22,7 @@ ] }, { - "src": ".match {:foo} 1 {{_}}", + "src": ".input {$foo :x} .match $foo 1 {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -30,7 +30,7 @@ ] }, { - "src": ".match {:foo} other {{_}}", + "src": ".input {$foo :x} .match $foo other {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -38,7 +38,7 @@ ] }, { - "src": ".match {:foo} {:bar} * 1 {{_}} 1 * {{_}}", + "src": ".input {$foo :x} .input {$bar :x} .match $foo $bar * 1 {{_}} 1 * {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -46,7 +46,7 @@ ] }, { - "src": ".match {$foo} one {{one}} * {{other}}", + "src": ".input {$foo} .match $foo one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -54,7 +54,7 @@ ] }, { - "src": ".input {$foo} .match {$foo} one {{one}} * {{other}}", + "src": ".local $foo = {$bar} .match $foo one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -62,7 +62,7 @@ ] }, { - "src": ".local $foo = {$bar} .match {$foo} one {{one}} * {{other}}", + "src": ".input {$bar} .local $foo = {$bar} .match $foo one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -166,7 +166,7 @@ ] }, { - "src": ".match {$var :string} * {{The first default}} * {{The second default}}", + "src": ".input {$var :string} .match $var * {{The first default}} * {{The second default}}", "expErrors": [ { "type": "duplicate-variant" @@ -174,12 +174,16 @@ ] }, { - "src": ".match {$x :string} {$y :string} * foo {{The first foo variant}} bar * {{The bar variant}} * |foo| {{The second foo variant}} * * {{The default variant}}", + "src": ".input {$x :string} .input {$y :string} .match $x $y * foo {{The first foo variant}} bar * {{The bar variant}} * |foo| {{The second foo variant}} * * {{The default variant}}", "expErrors": [ { "type": "duplicate-variant" } ] + }, + { + "src": ".local $star = {star :string} .match $star |*| {{Literal star}} * {{The default}}", + "exp": "The default" } ] } diff --git a/test/tests/functions/date.json b/test/tests/functions/date.json index 494ca8d23..c426173d6 100644 --- a/test/tests/functions/date.json +++ b/test/tests/functions/date.json @@ -35,10 +35,10 @@ "src": "{|2006-01-02| :date style=long}" }, { - "src": ".local $d = {|2006-01-02| :date style=long} {{{$d :date}}}" + "src": ".local $d = {|2006-01-02| :date style=long} {{{$d}}}" }, { - "src": ".local $t = {|2006-01-02T15:04:06| :time} {{{$t :date}}}" + "src": ".local $d = {|2006-01-02| :datetime dateStyle=long timeStyle=long} {{{$d :date}}}" } ] } diff --git a/test/tests/functions/integer.json b/test/tests/functions/integer.json index c8e75077a..7ffdc08a5 100644 --- a/test/tests/functions/integer.json +++ b/test/tests/functions/integer.json @@ -19,7 +19,7 @@ "exp": "hello 4" }, { - "src": ".match {$foo :integer} one {{one}} * {{other}}", + "src": ".input {$foo :integer} .match $foo 1 {{one}} * {{other}}", "params": [ { "name": "foo", @@ -27,6 +27,10 @@ } ], "exp": "one" + }, + { + "src": ".local $x = {1.25 :integer} .local $y = {$x :number} {{{$y}}}", + "exp": "1" } ] } diff --git a/test/tests/functions/number.json b/test/tests/functions/number.json index f59e77343..2b00d83e4 100644 --- a/test/tests/functions/number.json +++ b/test/tests/functions/number.json @@ -209,173 +209,6 @@ } ] }, - { - "src": ".match {$foo :number} one {{one}} * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".match {$foo :number} 1 {{=1}} one {{one}} * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "=1" - }, - { - "src": ".match {$foo :number} one {{one}} 1 {{=1}} * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "=1" - }, - { - "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - }, - { - "name": "bar", - "value": 1 - } - ], - "exp": "one one" - }, - { - "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - }, - { - "name": "bar", - "value": 2 - } - ], - "exp": "one other" - }, - { - "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", - "params": [ - { - "name": "foo", - "value": 2 - }, - { - "name": "bar", - "value": 2 - } - ], - "exp": "other" - }, - { - "src": ".input {$foo :number} .match {$foo} one {{one}} * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".local $foo = {$bar :number} .match {$foo} one {{one}} * {{other}}", - "params": [ - { - "name": "bar", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".input {$foo :number} .local $bar = {$foo} .match {$bar} one {{one}} * {{other}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".input {$bar :number} .match {$bar} one {{one}} * {{other}}", - "params": [ - { - "name": "bar", - "value": 2 - } - ], - "exp": "other" - }, - { - "src": ".input {$bar} .match {$bar :number} one {{one}} * {{other}}", - "params": [ - { - "name": "bar", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".input {$bar} .match {$bar :number} one {{one}} * {{other}}", - "params": [ - { - "name": "bar", - "value": 2 - } - ], - "exp": "other" - }, - { - "src": ".input {$none} .match {$foo :number} one {{one}} * {{{$none}}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".local $bar = {$none} .match {$foo :number} one {{one}} * {{{$bar}}}", - "params": [ - { - "name": "foo", - "value": 1 - } - ], - "exp": "one" - }, - { - "src": ".local $bar = {$none} .match {$foo :number} one {{one}} * {{{$bar}}}", - "params": [ - { - "name": "foo", - "value": 2 - } - ], - "exp": "{$none}", - "expErrors": [ - { - "type": "unresolved-variable" - } - ] - }, { "src": "{42 :number @foo @bar=13}", "exp": "42", diff --git a/test/tests/functions/string.json b/test/tests/functions/string.json index fab459541..231868180 100644 --- a/test/tests/functions/string.json +++ b/test/tests/functions/string.json @@ -7,7 +7,7 @@ }, "tests": [ { - "src": ".match {$foo :string} |1| {{one}} * {{other}}", + "src": ".input {$foo :string} .match $foo |1| {{one}} * {{other}}", "params": [ { "name": "foo", @@ -17,7 +17,7 @@ "exp": "one" }, { - "src": ".match {$foo :string} 1 {{one}} * {{other}}", + "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", "params": [ { "name": "foo", @@ -27,7 +27,7 @@ "exp": "one" }, { - "src": ".match {$foo :string} 1 {{one}} * {{other}}", + "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", "params": [ { "name": "foo", @@ -37,13 +37,38 @@ "exp": "other" }, { - "src": ".match {$foo :string} 1 {{one}} * {{other}}", + "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", "exp": "other", "expErrors": [ { "type": "unresolved-variable" } ] + }, + { + "description": "NFC: keys are normalized (unquoted)", + "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0A\u0323 {{Not normalized}} \u1E0C\u0307 {{Normalized}} * {{Wrong}}", + "expErrors": [{"type": "duplicate-variant"}] + }, + { + "description": "NFC: keys are normalized (quoted)", + "src": ".local $x = {\u1E0A\u0323 :string} .match $x |\u1E0A\u0323| {{Not normalized}} |\u1E0C\u0307| {{Normalized}} * {{Wrong}}", + "expErrors": [{"type": "duplicate-variant"}] + }, + { + "description": "NFC: keys are normalized (mixed)", + "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0A\u0323 {{Not normalized}} |\u1E0C\u0307| {{Normalized}} * {{Wrong}}", + "expErrors": [{"type": "duplicate-variant"}] + }, + { + "description": "NFC: :string normalizes the comparison value (un-normalized selector, normalized key)", + "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0C\u0307 {{Right}} * {{Wrong}}", + "exp": "Right" + }, + { + "description": "NFC: keys are normalized (normalized selector, un-normalized key)", + "src": ".local $x = {\u1E0C\u0307 :string} .match $x \u1E0A\u0323 {{Right}} * {{Wrong}}", + "exp": "Right" } ] } diff --git a/test/tests/functions/time.json b/test/tests/functions/time.json index 416d18a3e..f4ec1b2d5 100644 --- a/test/tests/functions/time.json +++ b/test/tests/functions/time.json @@ -32,10 +32,10 @@ "src": "{|2006-01-02T15:04:06| :time style=medium}" }, { - "src": ".local $t = {|2006-01-02T15:04:06| :time style=medium} {{{$t :time}}}" + "src": ".local $t = {|2006-01-02T15:04:06| :time style=medium} {{{$t}}}" }, { - "src": ".local $d = {|2006-01-02T15:04:06| :date} {{{$d :time}}}" + "src": ".local $t = {|2006-01-02T15:04:06| :datetime dateStyle=long timeStyle=long} {{{$t :time}}}" } ] } diff --git a/test/tests/pattern-selection.json b/test/tests/pattern-selection.json new file mode 100644 index 000000000..29dc146c1 --- /dev/null +++ b/test/tests/pattern-selection.json @@ -0,0 +1,120 @@ +{ + "$schema": "https://raw.githubusercontent.com/unicode-org/message-format-wg/main/test/schemas/v0/tests.schema.json", + "scenario": "Pattern selection", + "description": "Tests for pattern selection", + "defaultTestProperties": { + "locale": "und" + }, + "tests": [ + { + "src": ".local $x = {1 :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "1" + }, + { + "src": ".local $x = {0 :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "other" + }, + { + "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 1 }], + "exp": "1" + }, + { + "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 2 }], + "exp": "other" + }, + { + "src": ".input {$x :test:select} .local $y = {$x} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 1 }], + "exp": "1" + }, + { + "src": ".input {$x :test:select} .local $y = {$x} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 2 }], + "exp": "other" + }, + { + "src": ".local $x = {1 :test:select decimalPlaces=1} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "1.0" + }, + { + "src": ".local $x = {1 :test:select decimalPlaces=1} .match $x 1 {{1}} 1.0 {{1.0}} * {{other}}", + "exp": "1.0" + }, + { + "src": ".local $x = {1 :test:select decimalPlaces=9} .match $x 1.0 {{1.0}} 1 {{1}} * {{bad-option-value}}", + "exp": "bad-option-value", + "expErrors": [{ "type": "bad-option" }, { "type": "bad-selector" }] + }, + { + "src": ".input {$x :test:select} .local $y = {$x :test:select decimalPlaces=1} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 1 }], + "exp": "1.0" + }, + { + "src": ".input {$x :test:select decimalPlaces=1} .local $y = {$x :test:select} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", + "params": [{ "name": "x", "value": 1 }], + "exp": "1.0" + }, + { + "src": ".input {$x :test:select decimalPlaces=9} .local $y = {$x :test:select decimalPlaces=1} .match $y 1.0 {{1.0}} 1 {{1}} * {{bad-option-value}}", + "params": [{ "name": "x", "value": 1 }], + "exp": "bad-option-value", + "expErrors": [ + { "type": "bad-option" }, + { "type": "bad-operand" }, + { "type": "bad-selector" } + ] + }, + { + "src": ".local $x = {1 :test:select fails=select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "other", + "expErrors": [{ "type": "bad-selector" }] + }, + { + "src": ".local $x = {1 :test:select fails=format} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "1" + }, + { + "src": ".local $x = {1 :test:format} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "other", + "expErrors": [{ "type": "bad-selector" }] + }, + { + "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", + "exp": "other", + "expErrors": [ + { "type": "unresolved-variable" }, + { "type": "bad-operand" }, + { "type": "bad-selector" } + ] + }, + { + "src": ".local $x = {1 :test:select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "1,1" + }, + { + "src": ".local $x = {1 :test:select} .local $y = {0 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "1,*" + }, + { + "src": ".local $x = {0 :test:select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "*,1" + }, + { + "src": ".local $x = {0 :test:select} .local $y = {0 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "*,*" + }, + { + "src": ".local $x = {1 :test:select fails=select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "*,1", + "expErrors": [{ "type": "bad-selector" }] + }, + { + "src": ".local $x = {1 :test:select} .local $y = {1 :test:format} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", + "exp": "1,*", + "expErrors": [{ "type": "bad-selector" }] + } + ] +} diff --git a/test/tests/syntax-errors.json b/test/tests/syntax-errors.json index 34d9aa484..00d0420f4 100644 --- a/test/tests/syntax-errors.json +++ b/test/tests/syntax-errors.json @@ -122,6 +122,9 @@ { "src": "bad {:placeholder @attribute=@foo}" }, + { + "src": "bad {:placeholder @attribute=$foo}" + }, { "src": "{ @misplaced = attribute }" }, @@ -155,26 +158,90 @@ { "src": ".local $bar = |foo| {{_}}" }, - { - "src": ".match {#foo} * {{foo}}" - }, - { - "src": ".match {} * {{foo}}" - }, - { - "src": ".match {|foo| :x} {|bar| :x} ** {{foo}}" - }, - { - "src": ".match * {{foo}}" - }, - { - "src": ".match {|x| :x} * foo" - }, - { - "src": ".match {|x| :x} * {{foo}} extra" - }, - { - "src": ".match |x| * {{foo}}" - } + { "src": ".match {{foo}}" }, + { "src": ".match * {{foo}}" }, + { "src": ".match x * {{foo}}" }, + { "src": ".match |x| * {{foo}}" }, + { "src": ".match :x * {{foo}}" }, + { "src": ".match {$foo} * {{foo}}" }, + { "src": ".match {#foo} * {{foo}}" }, + { "src": ".input {$x :x} .match {$x} * {{foo}}" }, + { "src": ".input {$x :x} .match$x * {{foo}}" }, + { "src": ".input {$x :x} .match $x* {{foo}}" }, + { "src": ".input {$x :x} .match $x|x| {{foo}} * {{foo}}" }, + { "src": ".input {$x :x} .local $y = {y :y} .match $x$y * * {{foo}}" }, + { "src": ".input {$x :x} .local $y = {y :y} .match $x $y ** {{foo}}" }, + { "src": ".input {$x :x} .match $x" }, + { "src": ".input {$x :x} .match $x *" }, + { "src": ".input {$x :x} .match $x * foo" }, + { "src": ".input {$x :x} .match $x * {{foo}} extra" }, + { "src": ".n{a}{{}}" }, + { "src": "{^}" }, + { "src": "{!}" }, + { "src": ".n .{a}{{}}" }, + { "src": ".n. {a}{{}}" }, + { "src": ".n.{a}{b}{{}}" }, + { "src": "{!.}" }, + { "src": "{! .}" }, + { "src": "{%}" }, + { "src": "{*}" }, + { "src": "{+}" }, + { "src": "{<}" }, + { "src": "{>}" }, + { "src": "{?}" }, + { "src": "{~}" }, + { "src": "{^.}" }, + { "src": "{^ .}" }, + { "src": "{&}" }, + { "src": "{!.\\{}" }, + { "src": "{!. \\{}" }, + { "src": "{!|a|}" }, + { "src": "foo {+reserved}" }, + { "src": "foo {&private}" }, + { "src": "foo {?reserved @a @b=c}" }, + { "src": ".foo {42} {{bar}}" }, + { "src": ".foo{42}{{bar}}" }, + { "src": ".foo |}lit{| {42}{{bar}}" }, + { "src": ".i {1} {{}}" }, + { "src": ".l $y = {|bar|} {{}}" }, + { "src": ".l $x.y = {|bar|} {{}}" }, + { "src": "hello {|4.2| %number}" }, + { "src": "hello {|4.2| %n|um|ber}" }, + { "src": "{+42}" }, + { "src": "hello {|4.2| &num|be|r}" }, + { "src": "hello {|4.2| ^num|be|r}" }, + { "src": "hello {|4.2| +num|be|r}" }, + { "src": "hello {|4.2| ?num|be||r|s}" }, + { "src": "hello {|foo| !number}" }, + { "src": "hello {|foo| *number}" }, + { "src": "hello {?number}" }, + { "src": "{xyzz }" }, + { "src": "hello {$foo ~xyzz }" }, + { "src": "hello {$x xyzz }" }, + { "src": "{ !xyzz }" }, + { "src": "{~xyzz }" }, + { "src": "{ num x \\\\ abcde |aaa||3.14||42| r }" }, + { "src": "hello {$foo >num x \\\\ abcde |aaa||3.14| |42| r }" }, + { "src" : ".input{ $n ~ }{{{$n}}}" } ] } diff --git a/test/tests/syntax.json b/test/tests/syntax.json index 1a2d601a2..6082d094a 100644 --- a/test/tests/syntax.json +++ b/test/tests/syntax.json @@ -26,6 +26,11 @@ "src": "\\\\", "exp": "\\" }, + { + "description": "message -> simple-message -> simple-start pattern -> 1*escaped-char", + "src": "\\\\\\{\\|\\}", + "exp": "\\{|}" + }, { "description": "message -> simple-message -> simple-start pattern -> simple-start-char pattern -> ... -> simple-start-char *text-char placeholder", "src": "hello {world}", @@ -165,16 +170,10 @@ "exp": "" }, { - "description": "message -> complex-message -> *(declaration [s]) complex-body -> declaration complex-body -> reserved-statement complex-body -> reserved-keyword expression -> \".\" name expression complex-body", - "src": ".n{a}{{}}", - "exp": "", - "expErrors": [ { "type": "unsupported-statement" } ] - }, - { - "description": "message -> complex-message -> complex-body -> matcher -> match-statement variant -> match selector key quoted-pattern -> \".match\" expression literal quoted-pattern", - "src": ".match{a :f}a{{}}*{{}}", + "description": "message -> complex-message -> complex-body -> ... -> matcher -> match-statement variant -> match selector key quoted-pattern -> \".match\" variable literal quoted-pattern", + "src": ".local $a={a :f}.match $a a{{}}*{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] }, { "description": "... input-declaration -> input s variable-expression ...", @@ -196,35 +195,57 @@ "src": ".local $x = {a}{{}}", "exp": "" }, + { + "description": "input-declaration-like content in complex-message", + "src": "{{.input {$x}}}", + "params": [{ "name": "x", "value": "X" }], + "exp": ".input X" + }, + { + "description": "local-declaration-like content in complex-message with leading whitespace", + "src": "{{ .local $x = {$y}}}", + "params": [{ "name": "y", "value": "Y" }], + "exp": " .local $x = Y" + }, { "description": "... matcher -> match-statement [s] variant -> match 1*([s] selector) variant -> match selector selector variant -> match selector selector variant key s key quoted-pattern", - "src": ".match{a :f}{b :f}a b{{}}* *{{}}", + "src": ".local $a={a :f}.local $b={b :f}.match $a $b a b{{}}* *{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [ + { "type": "unknown-function" }, + { "type": "bad-selector" }, + { "type": "unknown-function" }, + { "type": "bad-selector" } + ] }, { "description": "... matcher -> match-statement [s] variant -> match 1*([s] selector) variant -> match selector variant variant ...", - "src": ".match{a :f}a{{}}b{{}}*{{}}", + "src": ".local $a={a :f}.match $a a{{}}b{{}}*{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] }, { "description": "... variant -> key s quoted-pattern -> ...", - "src": ".match{a :f}a {{}}*{{}}", + "src": ".local $a={a :f}.match $a a {{}}*{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] }, { "description": "... variant -> key s key s quoted-pattern -> ...", - "src": ".match{a :f}{b :f}a b {{}}* *{{}}", + "src": ".local $a={a :f}.local $b={b :f}.match $a $b a b {{}}* *{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [ + { "type": "unknown-function" }, + { "type": "bad-selector" }, + { "type": "unknown-function" }, + { "type": "bad-selector" } + ] }, { "description": "... key -> \"*\" ...", - "src": ".match{a :f}*{{}}", + "src": ".local $a={a :f}.match $a *{{}}", "exp": "", - "expErrors": [ { "type": "unknown-function" } ] + "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] }, { "description": "simple-message -> simple-start pattern -> placeholder -> expression -> literal-expression -> \"{\" s literal \"}\"", @@ -277,18 +298,6 @@ "exp": "{:f}", "expErrors": [{ "type": "unknown-function" }] }, - { - "description": "... annotation -> private-use-annotation -> private-start", - "src": "{^}", - "exp": "{^}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... annotation -> reserved-annotation -> reserved-annotation-start", - "src": "{!}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, { "description": "message -> simple-message -> simple-start pattern -> placeholder -> markup -> \"{\" s \"#\" identifier \"}\"", "src": "{ #a}", @@ -399,8 +408,8 @@ "exp": "a" }, { - "description": "... attribute -> \"@\" identifier s \"=\" s variable ...", - "src": "{42 @foo=$bar}", + "description": "... attribute -> \"@\" identifier s \"=\" s quoted-literal ...", + "src": "{42 @foo=|bar|}", "exp": "42", "expParts": [ { @@ -426,9 +435,9 @@ "exp": "\\" }, { - "description": "... quoted-literal -> \"|\" quoted-char escaped-char \"|\"", - "src": "{|a\\\\|}", - "exp": "a\\" + "description": "... quoted-literal -> \"|\" quoted-char 1*escaped-char \"|\"", + "src": "{|a\\\\\\{\\|\\}|}", + "exp": "a\\{|}" }, { "description": "... unquoted-literal -> number-literal -> %x30", @@ -480,114 +489,6 @@ "src": "{0E-1}", "exp": "0E-1" }, - { - "description": "... reserved-statement -> reserved-keyword s reserved-body 1*([s] expression) -> reserved-keyword s reserved-body expression -> \".\" name s reserved-body-part expression -> \".\" name s reserved-char expression ...", - "src": ".n .{a}{{}}", - "exp": "", - "expErrors": [ { "type": "unsupported-statement" } ] - }, - { - "description": "... reserved-statement -> reserved-keyword reserved-body 1*([s] expression) -> reserved-keyword s reserved-body s expression -> \".\" name s reserved-body-part expression -> \".\" name s reserved-char expression ...", - "src": ".n. {a}{{}}", - "exp": "", - "expErrors": [ { "type": "unsupported-statement" } ] - }, - { - "description": "... reserved-statement -> reserved-keyword reserved-body 1*([s] expression) -> reserved-keyword reserved-body expression expression -> \".\" name reserved-body-part expression expression -> \".\" name s reserved-char expression expression ...", - "src": ".n.{a}{b}{{}}", - "exp": "", - "expErrors": [ { "type": "unsupported-statement" } ] - }, - { - "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part -> \"!\" reserved-char ...", - "src": "{!.}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation -> reserved-annotation-start s reserved-body -> \"!\" s reserved-body-part -> \"!\" s reserved-char ...", - "src": "{! .}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{%}", - "exp": "{%}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{*}", - "exp": "{*}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{+}", - "exp": "{+}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{<}", - "exp": "{<}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{>}", - "exp": "{>}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{?}", - "exp": "{?}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation-start ...", - "src": "{~}", - "exp": "{~}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... private-use-annotation -> private-start reserved-body -> \"^\" reserved-body-part -> \"^\" reserved-char ...", - "src": "{^.}", - "exp": "{^}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... private-use-annotation -> private-start s reserved-body -> \"^\" s reserved-body-part -> \"^\" s reserved-char ...", - "src": "{^ .}", - "exp": "{^}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... private-start ...", - "src": "{&}", - "exp": "{&}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part reserved-body-part -> \"!\" reserved-char escaped-char ...", - "src": "{!.\\{}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part s reserved-body-part -> \"!\" reserved-char s escaped-char ...", - "src": "{!. \\{}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, - { - "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part -> \"!\" quoted-literal ...", - "src": "{!|a|}", - "exp": "{!}", - "expErrors": [{ "type": "unsupported-expression" }] - }, { "src": "hello { world\t\n}", "exp": "hello world" @@ -794,125 +695,45 @@ ] }, { - "src": "foo {+reserved}", - "exp": "foo {+}", - "expParts": [ - { - "type": "literal", - "value": "foo " - }, - { - "type": "fallback", - "source": "+" - } - ], - "expErrors": [ - { - "type": "unsupported-expression" - } - ] + "src": "{{trailing whitespace}} \n", + "exp": "trailing whitespace" }, { - "src": "foo {&private}", - "exp": "foo {&}", - "expParts": [ - { - "type": "literal", - "value": "foo " - }, - { - "type": "fallback", - "source": "&" - } - ], - "expErrors": [ - { - "type": "unsupported-expression" - } - ] + "description": "NFC: text is not normalized", + "src": "\u1E0A\u0323", + "exp": "\u1E0A\u0323" }, { - "src": "foo {?reserved @a @b=$c}", - "exp": "foo {?}", - "expParts": [ - { - "type": "literal", - "value": "foo " - }, - { - "type": "fallback", - "source": "?" - } - ], - "expErrors": [ - { - "type": "unsupported-expression" - } - ] + "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is", + "src": ".local $\u0044\u0323\u0307 = {foo} {{{$\u1E0c\u0307}}}", + "exp": "foo" }, { - "src": ".foo {42} {{bar}}", - "exp": "bar", - "expParts": [ - { - "type": "literal", - "value": "bar" - } - ], - "expErrors": [ - { - "type": "unsupported-statement" - } - ] + "description": "NFC: variables are compared to each other as-if normalized; decl is normalized, use isn't", + "src": ".local $\u1E0c\u0307 = {foo} {{{$\u0044\u0323\u0307}}}", + "exp": "foo" }, { - "src": ".foo{42}{{bar}}", - "exp": "bar", - "expParts": [ - { - "type": "literal", - "value": "bar" - } - ], - "expErrors": [ - { - "type": "unsupported-statement" - } - ] + "description": "NFC: variables are compared to each other as-if normalized; decl is normalized, use isn't", + "src": ".input {$\u1E0c\u0307} {{{$\u0044\u0323\u0307}}}", + "params": [{"name": "\u1E0c\u0307", "value": "foo"}], + "exp": "foo" }, { - "src": ".foo |}lit{| {42}{{bar}}", - "exp": "bar", - "expParts": [ - { - "type": "literal", - "value": "bar" - } - ], - "expErrors": [ - { - "type": "unsupported-statement" - } - ] + "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is", + "src": ".input {$\u0044\u0323\u0307} {{{$\u1E0c\u0307}}}", + "params": [{"name": "\u0044\u0323\u0307", "value": "foo"}], + "exp": "foo" }, { - "src": ".l $y = {|bar|} {{}}", - "exp": "", - "expParts": [ - { - "type": "literal", - "value": "bar" - } - ], - "expErrors": [ - { - "type": "unsupported-statement" - } - ] + "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is; reordering", + "src": ".local $\u0044\u0307\u0323 = {foo} {{{$\u1E0c\u0307}}}", + "exp": "foo" }, { - "src": "{{trailing whitespace}} \n", - "exp": "trailing whitespace" + "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is; special case mapping", + "src": ".local $\u0041\u030A\u0301 = {foo} {{{$\u01FA}}}", + "exp": "foo" } ] } diff --git a/test/tests/u-options.json b/test/tests/u-options.json new file mode 100644 index 000000000..3e13b30a2 --- /dev/null +++ b/test/tests/u-options.json @@ -0,0 +1,126 @@ +{ + "$schema": "https://raw.githubusercontent.com/unicode-org/message-format-wg/main/test/schemas/v0/tests.schema.json", + "scenario": "u: Options", + "description": "Common options affecting the function context", + "defaultTestProperties": { + "locale": "en-US" + }, + "tests": [ + { + "src": "{#tag u:id=x}content{/ns:tag u:id=x}", + "exp": "content", + "expParts": [ + { + "type": "markup", + "kind": "open", + "id": "x", + "name": "tag" + }, + { + "type": "literal", + "value": "content" + }, + { + "type": "markup", + "kind": "close", + "id": "x", + "name": "tag" + } + ] + }, + { + "src": "{#tag u:dir=rtl u:locale=ar}content{/ns:tag}", + "exp": "content", + "expErrors": [{ "type": "bad-option" }, { "type": "bad-option" }], + "expParts": [ + { + "type": "markup", + "kind": "open", + "name": "tag" + }, + { + "type": "literal", + "value": "content" + }, + { + "type": "markup", + "kind": "close", + "name": "tag" + } + ] + }, + { + "src": "hello {4.2 :number u:locale=fr}", + "exp": "hello 4,2" + }, + { + "src": "hello {world :string u:dir=ltr u:id=foo}", + "exp": "hello world", + "expParts": [ + { + "type": "literal", + "value": "hello " + }, + { + "type": "string", + "source": "|world|", + "dir": "ltr", + "id": "foo", + "value": "world" + } + ] + }, + { + "src": "hello {world :string u:dir=rtl}", + "exp": "hello \u2067world\u2069", + "expParts": [ + { + "type": "literal", + "value": "hello " + }, + { + "type": "string", + "source": "|world|", + "dir": "rtl", + "value": "world" + } + ] + }, + { + "src": "hello {world :string u:dir=auto}", + "exp": "hello \u2068world\u2069", + "expParts": [ + { + "type": "literal", + "value": "hello " + }, + { + "type": "string", + "source": "|world|", + "dir": "auto", + "value": "world" + } + ] + }, + { + "locale": "ar", + "src": "أهلاً {بالعالم :string u:dir=rtl}", + "exp": "أهلاً \u2067بالعالم\u2069" + }, + { + "locale": "ar", + "src": "أهلاً {بالعالم :string u:dir=auto}", + "exp": "أهلاً \u2068بالعالم\u2069" + }, + { + "locale": "ar", + "src": "أهلاً {world :string u:dir=ltr}", + "exp": "أهلاً \u2066world\u2069" + }, + { + "locale": "ar", + "src": "أهلاً {بالعالم :string}", + "exp": "أهلاً \u2067بالعالم\u2069" + } + ] +} diff --git a/test/tests/unsupported-expressions.json b/test/tests/unsupported-expressions.json deleted file mode 100644 index f7d611509..000000000 --- a/test/tests/unsupported-expressions.json +++ /dev/null @@ -1,53 +0,0 @@ -{ - "scenario": "Reserved and private annotations", - "description": "Tests for unsupported expressions (reserved/private)", - "defaultTestProperties": { - "locale": "en-US", - "expErrors": [ - { - "type": "unsupported-expression" - } - ] - }, - "tests": [ - { "src": "hello {|4.2| %number}" }, - { "src": "hello {|4.2| %n|um|ber}" }, - { "src": "{+42}" }, - { "src": "hello {|4.2| &num|be|r}" }, - { "src": "hello {|4.2| ^num|be|r}" }, - { "src": "hello {|4.2| +num|be|r}" }, - { "src": "hello {|4.2| ?num|be||r|s}" }, - { "src": "hello {|foo| !number}" }, - { "src": "hello {|foo| *number}" }, - { "src": "hello {?number}" }, - { "src": "{xyzz }" }, - { "src": "hello {$foo ~xyzz }" }, - { "src": "hello {$x xyzz }" }, - { "src": "{ !xyzz }" }, - { "src": "{~xyzz }" }, - { "src": "{ num x \\\\ abcde |aaa||3.14||42| r }" }, - { "src": "hello {$foo >num x \\\\ abcde |aaa||3.14| |42| r }" }, - { "src" : ".input{ $n ~ }{{{$n}}}" } - ] -} - diff --git a/test/tests/unsupported-statements.json b/test/tests/unsupported-statements.json deleted file mode 100644 index d944aa0f7..000000000 --- a/test/tests/unsupported-statements.json +++ /dev/null @@ -1,18 +0,0 @@ -{ - "scenario": "Reserved statements", - "description": "Tests for unsupported statements", - "defaultTestProperties": { - "locale": "en-US", - "expErrors": [ - { - "type": "unsupported-statement" - } - ] - }, - "tests": [ - { "src" : ".i {1} {{}}" }, - { "src" : ".l $y = {|bar|} {{}}" }, - { "src" : ".l $x.y = {|bar|} {{}}" } - ] -} - From 17af553c78ec288b267ad94c6138af78b2cc6d3d Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 26 Oct 2024 09:24:25 -0700 Subject: [PATCH 6/9] Add serialization proposal --- exploration/number-selection.md | 89 +++++++++++++++++++++++++++++++-- 1 file changed, 86 insertions(+), 3 deletions(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index d60909632..40238a391 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -548,9 +548,92 @@ and they _might_ converge on some overlap that users could safely use across pla ### Standardize the Serialization Forms -Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45 -and specify numeric matching by specifying the form of matching `key` values. -Comparison is as-if by string comparison of the serialized forms, just as in LDML45. +Modify the above exact match as follows. +Note that this implementation is less restrictive than before, but still leaves some +values that cannot be matched. + +> [!IMPORTANT] +> The exact behavior of exact literal match is only defined for +> a specific range of numeric values and does not support scientific notation. +> Very large or very small numeric values will be difficult to perform +> exact matching on. +> Avoid depending on these types of keys in message selection. + +> [!IMPORTANT] +> For implementations that do not have arbitrary precision numeric types +> or operands that do not use these types, +> it is possible to specify a key value that exceeds the precision +> of the underlying type. +> Such a key value will not work reliably or may not work at all +> in such implementations. +> Avoid depending on such keys values in message selection. + +Number literals in the MessageFormat 2 syntax use a subset of the +[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). +The resolved value of an `operand` exactly matches a numeric literal `key` +if, when the `operand` is serialized using this format +the two strings are equal. + +```abnf +number = [ "-" ] int [ fraction ] +integer = "0" / [ "-" ] (digit19 *DIGIT) +int = "0" / (digit19 *DIGIT) +digit19 = %31-39 ; 1-9 +fraction = "." 1*DIGIT +``` + +If the function `:integer` is used or the `maximumFractionDigits` is 0, +the production `integer` is used and any fractional amount is omitted, +otherwise the `minimumFractionDigits` number of digits is produced, +zero-filled as needed. + +The implementation applies the `maximumSignificantDigits` to the value +being serialized. +This might involve locally-specific rounding. +The `minimumSignificantDigits` has no effect on the value produced for comparison. + +The option `signDisplay` has no effect on the value produced for comparison. + +> [!NOTE] +> Implementations are not expected to implement this exactly as written, +> as there are clearly optimizations that can be applied. + +> Here are some examples: +> ``` +> .input {$num :integer} +> .match $num +> 0 {{The number 0}} +> 1 {{The number 1}} +> -1 {{The number -1}} +> 1.0 {{This cannot match}} +> 1.1 {{This cannot match}} +> ``` +> ``` +> .input {$num :number maximumFractionDigits=2 minimumFractionDigits=2} +> .match $num +> 0 {{This does not match}} +> 0.00 {{This matches the value 0}} +> 0.0 {{This does not match}} +> 0.000 {{This does not match}} +> ``` +> ``` +> .input {$num :number minimumFractionDigits=2 maximumFractionDigits=5} +> .match $num +> 0.12 {{Matches the value 0.12} +> 0.123 {{Matches the value 0.123}} +> 0.12345 {{Matches the values 0.12345}} +> 0.123456 {{Does not match}} +> 0.12346 {{May match the value 0.123456 depending on local rounding mode?}} +> ``` +> ``` +> .input {$num :number} +> -0 {{Error: Bad Variant Key}} +> -99 {{The value -99}} +> 1111111111111111111111111111 {{Might exceed the size of local integer type, but is valid}} +> 11111111111111.1111111111111 {{Might exceed local floating point precision, but is valid}} +> 1.23e-37 {{Error: Bad Variant Key}} +> ``` + ### Compare numeric values From a86acea0bfc1c3eaedee6abd0eb52a235fdd3a88 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 26 Oct 2024 09:33:22 -0700 Subject: [PATCH 7/9] Revert "Add serialization proposal" This reverts commit 17af553c78ec288b267ad94c6138af78b2cc6d3d. --- exploration/number-selection.md | 89 ++------------------------------- 1 file changed, 3 insertions(+), 86 deletions(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index 40238a391..d60909632 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -548,92 +548,9 @@ and they _might_ converge on some overlap that users could safely use across pla ### Standardize the Serialization Forms -Modify the above exact match as follows. -Note that this implementation is less restrictive than before, but still leaves some -values that cannot be matched. - -> [!IMPORTANT] -> The exact behavior of exact literal match is only defined for -> a specific range of numeric values and does not support scientific notation. -> Very large or very small numeric values will be difficult to perform -> exact matching on. -> Avoid depending on these types of keys in message selection. - -> [!IMPORTANT] -> For implementations that do not have arbitrary precision numeric types -> or operands that do not use these types, -> it is possible to specify a key value that exceeds the precision -> of the underlying type. -> Such a key value will not work reliably or may not work at all -> in such implementations. -> Avoid depending on such keys values in message selection. - -Number literals in the MessageFormat 2 syntax use a subset of the -[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). -The resolved value of an `operand` exactly matches a numeric literal `key` -if, when the `operand` is serialized using this format -the two strings are equal. - -```abnf -number = [ "-" ] int [ fraction ] -integer = "0" / [ "-" ] (digit19 *DIGIT) -int = "0" / (digit19 *DIGIT) -digit19 = %31-39 ; 1-9 -fraction = "." 1*DIGIT -``` - -If the function `:integer` is used or the `maximumFractionDigits` is 0, -the production `integer` is used and any fractional amount is omitted, -otherwise the `minimumFractionDigits` number of digits is produced, -zero-filled as needed. - -The implementation applies the `maximumSignificantDigits` to the value -being serialized. -This might involve locally-specific rounding. -The `minimumSignificantDigits` has no effect on the value produced for comparison. - -The option `signDisplay` has no effect on the value produced for comparison. - -> [!NOTE] -> Implementations are not expected to implement this exactly as written, -> as there are clearly optimizations that can be applied. - -> Here are some examples: -> ``` -> .input {$num :integer} -> .match $num -> 0 {{The number 0}} -> 1 {{The number 1}} -> -1 {{The number -1}} -> 1.0 {{This cannot match}} -> 1.1 {{This cannot match}} -> ``` -> ``` -> .input {$num :number maximumFractionDigits=2 minimumFractionDigits=2} -> .match $num -> 0 {{This does not match}} -> 0.00 {{This matches the value 0}} -> 0.0 {{This does not match}} -> 0.000 {{This does not match}} -> ``` -> ``` -> .input {$num :number minimumFractionDigits=2 maximumFractionDigits=5} -> .match $num -> 0.12 {{Matches the value 0.12} -> 0.123 {{Matches the value 0.123}} -> 0.12345 {{Matches the values 0.12345}} -> 0.123456 {{Does not match}} -> 0.12346 {{May match the value 0.123456 depending on local rounding mode?}} -> ``` -> ``` -> .input {$num :number} -> -0 {{Error: Bad Variant Key}} -> -99 {{The value -99}} -> 1111111111111111111111111111 {{Might exceed the size of local integer type, but is valid}} -> 11111111111111.1111111111111 {{Might exceed local floating point precision, but is valid}} -> 1.23e-37 {{Error: Bad Variant Key}} -> ``` - +Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45 +and specify numeric matching by specifying the form of matching `key` values. +Comparison is as-if by string comparison of the serialized forms, just as in LDML45. ### Compare numeric values From 8f56bef247457587e463a65071e87e5a0ff644d0 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 26 Oct 2024 09:33:40 -0700 Subject: [PATCH 8/9] Revert "Update from main (#914)" This reverts commit da9377b5c48128a42749d49dd3869c55a1483682. --- CONTRIBUTING.md | 9 +- README.md | 47 +- exploration/bidi-usability.md | 71 +-- exploration/expression-attributes.md | 4 +- exploration/function-composition-part-1.md | 258 ++-------- exploration/maintaining-registry.md | 316 ------------ exploration/registry-xml/README.md | 3 +- exploration/selection-declaration.md | 150 +----- meetings/2024/notes-2024-08-19.md | 272 ---------- meetings/2024/notes-2024-08-26.md | 151 ------ meetings/2024/notes-2024-09-09.md | 167 ------- meetings/2024/notes-2024-09-10.md | 361 -------------- meetings/2024/notes-2024-09-16.md | 398 --------------- meetings/2024/notes-2024-09-30.md | 216 -------- meetings/2024/notes-2024-10-07.md | 396 --------------- meetings/2024/notes-2024-10-14.md | 298 ----------- spec/README.md | 84 ++-- spec/appendices.md | 6 +- spec/data-model/README.md | 92 +++- spec/data-model/message.dtd | 22 +- spec/data-model/message.json | 80 ++- spec/errors.md | 116 +++-- spec/formatting.md | 484 +++++++++--------- spec/message.abnf | 91 ++-- spec/registry.md | 208 ++------ spec/syntax.md | 549 +++++++++++---------- spec/u-namespace.md | 87 ---- test/README.md | 49 +- test/schemas/v0/tests.schema.json | 5 +- test/tests/bidi.json | 145 ------ test/tests/data-model-errors.json | 24 +- test/tests/functions/date.json | 4 +- test/tests/functions/integer.json | 6 +- test/tests/functions/number.json | 167 +++++++ test/tests/functions/string.json | 33 +- test/tests/functions/time.json | 4 +- test/tests/pattern-selection.json | 120 ----- test/tests/syntax-errors.json | 109 +--- test/tests/syntax.json | 319 +++++++++--- test/tests/u-options.json | 126 ----- test/tests/unsupported-expressions.json | 53 ++ test/tests/unsupported-statements.json | 18 + 42 files changed, 1492 insertions(+), 4626 deletions(-) delete mode 100644 exploration/maintaining-registry.md delete mode 100644 meetings/2024/notes-2024-08-19.md delete mode 100644 meetings/2024/notes-2024-08-26.md delete mode 100644 meetings/2024/notes-2024-09-09.md delete mode 100644 meetings/2024/notes-2024-09-10.md delete mode 100644 meetings/2024/notes-2024-09-16.md delete mode 100644 meetings/2024/notes-2024-09-30.md delete mode 100644 meetings/2024/notes-2024-10-07.md delete mode 100644 meetings/2024/notes-2024-10-14.md delete mode 100644 spec/u-namespace.md delete mode 100644 test/tests/bidi.json delete mode 100644 test/tests/pattern-selection.json delete mode 100644 test/tests/u-options.json create mode 100644 test/tests/unsupported-expressions.json create mode 100644 test/tests/unsupported-statements.json diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1b2bb58bf..d28236c05 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,6 +1,13 @@ # Contributing to this project -To join this Working Group, please read the information in the [README.md](./README.md) as well as the Contributor License Agreement information just below: +## Joining the Working Group + +We are looking for participation from software developers, localization engineers and others with experience +in Internationalization (I18N) and Localization (L10N). If you wish to contribute to this work, please review +the information on the Contributor License Agreement below. In addition, you should: + +1. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) +2. Watch this repository (use the "Watch" button in the upper right corner) diff --git a/README.md b/README.md index a0fe16304..8323f49e6 100644 --- a/README.md +++ b/README.md @@ -76,8 +76,7 @@ Functions can optionally take _options_: Messages can use a _selector_ to choose between different _variants_, which correspond to the grammatical (or other) requirements of the language: - .input {$count :integer} - .match $count + .match {$count :integer} 0 {{You have no notifications.}} one {{You have {$count} notification.}} * {{You have {$count} notifications.}} @@ -106,23 +105,6 @@ The `main` branch of this repository contains changes implemented since the tech Implementers should be aware of the following normative changes during the tech preview period. See the [commit history](https://github.com/unicode-org/message-format-wg/commits) after 2024-04-13 for a list of all commits (including non-normative changes). -- [#885](https://github.com/unicode-org/message-format-wg/issues/885) Address equality of `name` and `literal` values, including requiring keys to use NFC -- [#884](https://github.com/unicode-org/message-format-wg/issues/884) Add support for bidirectional isolates and strong marks in syntax and address UAX31/UTS55 requirements -- [#883](https://github.com/unicode-org/message-format-wg/issues/883) Remove forward-compatibility promise and all reserved/private syntax. -- [#882](https://github.com/unicode-org/message-format-wg/issues/882) Specify `bad-option` error for bad digit size options in `:number` and `:integer` functions -- [#878](https://github.com/unicode-org/message-format-wg/issues/878) Clarify "rule" selection in `:number` and `:integer` functions -- [#877](https://github.com/unicode-org/message-format-wg/issues/877) Match on variables instead of expressions. -- [#854](https://github.com/unicode-org/message-format-wg/issues/854) Allow whitespace at complex message start -- [#853](https://github.com/unicode-org/message-format-wg/issues/853) Add a "duplicate-variant" error -- [#845](https://github.com/unicode-org/message-format-wg/issues/845) Define "attributes" feature -- [#834](https://github.com/unicode-org/message-format-wg/issues/834) Modify the stability policy (not currently in effect due to Tech Preview) -- [#816](https://github.com/unicode-org/message-format-wg/issues/816) Refine error handling -- [#815](https://github.com/unicode-org/message-format-wg/issues/815) Removed machine-readable function registry as a deliverable -- [#813](https://github.com/unicode-org/message-format-wg/issues/813) Change default of `:date` and `:datetime` date formatting from `short` to `medium` -- [#812](https://github.com/unicode-org/message-format-wg/issues/812) Allow trailing whitespace for complex messages -- [#793](https://github.com/unicode-org/message-format-wg/issues/793) Recommend the use of escapes only when necessary -- [#775](https://github.com/unicode-org/message-format-wg/issues/775) Add formal definitions for variable, external variable, and local variable -- [#774](https://github.com/unicode-org/message-format-wg/issues/774) Refactor errors, adding Message Function Errors - [#771](https://github.com/unicode-org/message-format-wg/issues/771) Remove inappropriate normative statement from errors.md - [#767](https://github.com/unicode-org/message-format-wg/issues/767) Add a test schema and [#778](https://github.com/unicode-org/message-format-wg/issues/778) validate tests against it @@ -131,9 +113,7 @@ after 2024-04-13 for a list of all commits (including non-normative changes). - [#769](https://github.com/unicode-org/message-format-wg/issues/769) Add `:test:function`, `:test:select` and `:test:format` functions for implementation testing - [#743](https://github.com/unicode-org/message-format-wg/issues/743) Collapse all escape sequence rules into one (affects the ABNF) - -In addition to the above, the test suite is significantly modified and updated. - +- _more to be added as they are merged_ ## Implementations @@ -157,27 +137,18 @@ We invite feedback about the current syntax draft, as well as the real-life use- - General questions and thoughts → [post a discussion thread](https://github.com/unicode-org/message-format-wg/discussions). - Actionable feedback (bugs, feature requests) → [file a new issue](https://github.com/unicode-org/message-format-wg/issues). -## Participation / Joining the Working Group - -We are looking for participation from software developers, localization engineers and others with experience -in Internationalization (I18N) and Localization (L10N). -If you wish to contribute to this work, please review the information about the Contributor License Agreement below. +## Participation -To follow this work: -1. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -2. Watch this repository (use the "Watch" button in the upper right corner) +To join in: -To contribute to this work, in addition to the above: -1. Each individual MUST have a copy of the CLA on file. See below. -2. Individuals who are employees of Unicode Member organizations SHOULD contact their member representative. - Individuals who are not employees of Unicode Member organizations MUST contact the chair to request Invited Expert status. - Employees of Unicode Member organizations MAY also apply for Invited Expert status, - subject to approval from their member representative. +1. Review [CONTRIBUTING.md](./CONTRIBUTING.md) +2. Apply to join our [mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) +3. Watch this repository (use the "Watch" button in the upper right corner) ### Copyright & Licenses Copyright © 2019-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries. -A CLA is required to contribute to this project - please refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) file (or start a Pull Request) for more information. +The project is released under [LICENSE](./LICENSE). -The contents of this repository are governed by the Unicode [Terms of Use](https://www.unicode.org/copyright.html) and are released under [LICENSE](./LICENSE). +A CLA is required to contribute to this project - please refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) file (or start a Pull Request) for more information. diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 49bfcc1aa..3f70ed700 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -273,39 +273,6 @@ Not allowing these to mix could produce annoying parse errors. _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ -I propose adopting a hybrid approach in which we permit "super-loose isolation". -This allows user to include isolates and strongly directional characters into the whitespace -portions of the syntax in order to make messages appear correctly. - -The second part of the hybrid approach would be to recommend ("SHOULD") the "strict isolation" -design for serializers. -(Note that "strict" and "super-loose" use non-identical productions with the name `bidi`. -These serve different purposes and are consistent with strict being narrower with super-loose.) -This syntax is a subset of the super-loose syntax and can be applied selectively to messages that -have RTL sequences or which have problematic display. - - -## Alternatives Considered - -_What other solutions are available?_ -_How do they compare against the requirements?_ -_What other properties they have?_ - -### Nothing -We could do nothing. - -A likely outcome of doing nothing is that RTL users would insert bidi controls into -_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. -These controls would become part of the output of the _message_, -showing up inappropriately at runtime. -Because these characters are invisible, users might be very frustrated trying to manage -the results or debug what is wrong with their messages. - -By contrast, if users insert too many or the wrong controls using the recommended design, -the _message_ would still be functional and would emit no undesired characters. - -### LTR Messages with isolating sequences - The syntax of a _message_ assumes a left-to-right base direction both for the complete text of the _message_ as well as for each line (paragraph) contained therein. @@ -416,7 +383,7 @@ ns-separator = [bidi] ":" bidi = [ %x200E-200F / %x061C ] ``` -**Open Issues** +### Open Issues with Proposed Design The ABNF changes found above put isolates and strongly directional marks into specific locations, such as directly next to `{`/`}`/`{{`/`}}` markers @@ -426,6 +393,24 @@ A more permissive design would add the isolates and strongly directional marks t whitespace in the syntax and depend on users/editors to appropriately pair or position the marks to get optimal display. +## Alternatives Considered + +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ + +### Nothing +We could do nothing. + +A likely outcome of doing nothing is that RTL users would insert bidi controls into +_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. +These controls would become part of the output of the _message_, +showing up inappropriately at runtime. +Because these characters are invisible, users might be very frustrated trying to manage +the results or debug what is wrong with their messages. + +By contrast, if users insert too many or the wrong controls using the recommended design, +the _message_ would still be functional and would emit no undesired characters. ### Super-loose isolation @@ -433,17 +418,7 @@ Add isolates and strongly directional marks to required and optional whitespace This would permit users to get the effects described by the above design, as long as they use isolates/marks in a "responsible" way. -The exception to this is the namespace separator, used in `identifier`. -This requires the ability to insert isolates or strongly directional marks -between the namespace and name portions, where whitespace is not permitted. -This is the only location in the syntax where such characters might be needed -but whitespace is not at least optional. -This could be defined as: -```abnf -ns-separator = [bidi] ":" [bidi] -``` - -Here are the other ABNF changes: +(Omitting other changes found in #673) ```abnf ; strongly directional marks and bidi isolates @@ -472,7 +447,7 @@ s = ( SP / HTAB / CR / LF / %x3000 ) ### Strict isolation all the time Apply bidi isolates in a strict way. -In this design: +The main differences to the proposed solution is: 1. The open/close isolate characters are syntactically required to be paired. This introduces parse errors for unpaired invisible characters, which could lead to bad user experiences. @@ -492,7 +467,7 @@ markup = "{" [s] "#" identifier [bidi] *(s option) *(s attribute) [s] [" / "{" [s] "/" identifier [bidi] *(s option) *(s attribute) [s] "}" ; close / "{" LRI [s] "/" identifier [bidi] *(s option) *(s attribute) [s] close-isolate "}" ; close identifier = [(namespace ns-separator)] name -ns-separator = [bidi] ":" [bidi] +ns-separator = [bidi] ":" bidi = [ %x200E-200F / %x061C ] ``` @@ -635,8 +610,6 @@ adherence to the stricter grammar. syntax errors - Provides a foundation for tools to claim strict conformance and message normalization as well as guidance to implementers to make them want to adopt it -- Messages are valid while being edited (such as when the open or close isolate has been - inserted but the corresponding opposite isolate hasn't been entered yet) **Cons** - Requires additional effort to maintain the grammar diff --git a/exploration/expression-attributes.md b/exploration/expression-attributes.md index 0253fc49e..2edde5613 100644 --- a/exploration/expression-attributes.md +++ b/exploration/expression-attributes.md @@ -1,6 +1,6 @@ # Expression Attributes -Status: **Accepted** +Status: **Proposed**
Metadata @@ -15,8 +15,6 @@ Status: **Accepted**
#772
#780
#792
-
#845
-
#846
diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index 3fb267713..ca392386f 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -1,6 +1,6 @@ # Function Composition -Status: **Obsolete** +Status: **Proposed**
Metadata @@ -11,20 +11,22 @@ Status: **Obsolete**
2024-03-26
Pull Requests
#753
-
#806
-## Objectives +## Objective -* Present a complete list of alternative designs for how to -provide the machinery for function composition. -* Create a shared vocabulary for discussing these alternatives. +_What is this proposal trying to achieve?_ -> [!NOTE] -> This design document is preserved as part of a valuable conversation about -> function composition, but it is not the basis for the design eventually -> accepted. +### Non-goal + +The objective of this design document is not to make +a concrete proposal, but rather to explore a problem space. +This space is complicated enough that agreement on vocabulary +is desired before defining a solution. + +Instead of objectives, we present a primary problem +and a set of subsidiary problems. ### Problem statement: defining resolved values @@ -836,10 +838,7 @@ so that functions can be passed the values they need. It also needs to provide a mechanism for declaring when functions can compose with each other. -### Guarantee portability - -A message that has a valid result in one implementation -should not result in an error in a different implementation. +Other requirements: ### Identify a set of use cases that must be supported @@ -976,217 +975,26 @@ Hence, revisiting the extensibility of the runtime model now that the data model is settled may result in a more workable solution. -## Alternatives to be considered - -The goal of this section is to present a _complete_ list of -alternatives that may be considered by the working group. - -Each alternative corresponds to a different concrete -definition of "resolved value". - -## Introducing type names - -It's useful to be able to refer to three types: - -* `InputType`: This type encompasses strings, numbers, date/time values, -all other possible implementation-specific types that input variables can be -assigned to. The details are implementation-specific. -* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). -* `ValueType`: This type is the union of an `InputType` and a `MessageValue`. - -It's tagged with a string tag so functions can do type checks. - -``` -interface ValueType { - type(): string - value(): unknown -} -``` - -## Alternatives to consider - -In lieu of the usual "Proposed design" and "Alternatives considered" sections, -we offer some alternatives already considered in separate discussions. - -Because of our constraints, implementations are **not required** -to use the `MessageValue` interface internally as described in -any of the sections. -The purpose of defining the interface is to guide implementors. -An implementation that uses different types internally -but allows the same observable behavior for composition -is compliant with the spec. - -Five alternatives are presented: -1. Typed functions -2. Formatted value model -3. Preservation model -4. Allow both kinds of composition -5. Don't allow composition - -### Typed functions - -Types are a way for users of a language -to reason about the kinds of data -that functions can operate on. -The most ambitious solution is to specify -a type system for MessageFormat functions. - -In this solution, `ValueType` is not what is defined above, -but instead is the most general type -in a system of user-defined types. -(The internal definitions are omitted.) -Using the function registry, -each custom function could declare its own argument type -and result type. -This does not imply the existence of any static typechecking. - -Example B1: -``` - .local $age = {$person :getAge} - .local $y = {$age :duration skeleton=yM} - .local $z = {$y :uppercase} -``` - -In an informal notation, -the three custom functions in this example -have the following type signatures: - -``` -getAge : Person -> Number -duration : Number -> String -uppercase : String -> String -``` - -The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) -could be extended to define `Number` and `String` -as subtypes of `MessageValue`. -A custom function author could use the custom -registry they define to define `Person` as -a subtype of `MessageValue`. - -An optional static typechecking pass (linting) -would then detect any cases where functions are composed in a way that -doesn't make sense. The advantage of this approach is documentation. - -### Formatted value model (Composition operates on output) - -To implement the "formatted value" model, -the `MessageValue` definition would look as in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728), but without -the `resolvedOptions()` method: - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - selectKeys(keys: string[]): string[] -} -``` - -`MessageValue` is effectively a `ValueType` with methods. - -Using this definition would make some of the use cases -impractical. For example, the result of Example A4 -might be surprising. Also, Example 1.3 from -[the dataflow composability design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md) -wouldn't work because options aren't preserved. - -### Preservation model (Composition can operate on input and options) - -In the preservation model, -functions "pipeline" the input through multiple calls. - -The `ValueType` definition is different: - -```ts -interface ValueType { - type(): string - value(): InputType | MessageValue -} -``` - -The resolved value interface would include both "input" -and "output" methods: - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getInput(): ValueType - getOutput(): ValueType - properties(): { [key: string]: ValueType } - selectKeys(keys: string[]): string[] -} -``` - -Compared to PR 728: -The `resolvedOptions()` method is renamed to `properties`. -Individual function implementations -choose which options to pass through into the resulting -`MessageValue`. - -Instead of using `unknown` as the result type of `getValue()`, -we use `ValueType`, mentioned previously. -Instead of using `unknown` as the value type for the -`properties()` object, we use `ValueType`, -since options can also be full `MessageValue`s with their own options. -(The motivation for this is Example 1.3 from -[the "dataflow composability" design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md).) - -This solution allows functions to pipeline input, -operate on output, or both; as well as to examine -previously passed options. Any example from this -document can be implemented. - -Without a mechanism for type signatures, -it may be hard for users to tell which combinations -of functions compose without errors, -and for implementors to document that information -for users. - -### Allow both kinds of composition (with different syntax) - -By introducing new syntax, the same function could have -either "preservation" or "formatted value" behavior. - -Consider (this suggestion is from Elango Cheran): - -``` - .local $x = {$num :number maxFrac=2} - .pipeline $y = {$x :number maxFrac=5 padStart=3} - {{$x} {$y}} -``` - -`.pipeline` would be a new keyword that acts like `.local`, -except that if its expression has a function annotation, -the formatter would apply the "preservation model" semantics -to the function. - -### Don't allow composition for built-in functions - -Another option is to define the built-in functions this way, -notionally: - -``` -number : Number -> FormattedNumber -date : Date -> FormattedDate -``` - -The `MessageValue` type would be defined the same way -as in the formatted value model. - -The difference is that built-in functions -would not accept a "formatted result" -(would signal a runtime error in these cases). - -As with the formatted value model, this restricts the -behavior of custom functions. - -### Non-alternative: Allow composition in some implementations - -Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). - -This violates the portability requirement. +## Proposed design and alternatives considered + +These sections are omitted from this document and will be added in +a future follow-up document, +given the length so far and need to agree on a common vocabulary. + +We expect that any proposed design +would fall into one of the following categories: + +1. Provide a general mechanism for custom function authors +to specify how functions compose with each other. +1. Specify composition rules for built-in functions, +but not in general, allowing custom functions +to cooperate in an _ad hoc_ way. +1. Recommend a rich representation of resolved values +without specifying any constraints on how these values +are used. +(This is the approach in [PR 645](https://github.com/unicode-org/message-format-wg/pull/645).) +1. Restrict function composition for built-in functions +(in order to prevent unintuitive behavior). ## Acknowledgments diff --git a/exploration/maintaining-registry.md b/exploration/maintaining-registry.md deleted file mode 100644 index be2d141dc..000000000 --- a/exploration/maintaining-registry.md +++ /dev/null @@ -1,316 +0,0 @@ -# Maintaining and Registering Functions - -Status: **Proposed** - -
- Metadata -
-
Contributors
-
@aphillips
-
First proposed
-
2024-02-12
-
Pull Requests
-
#634
-
-
- -## Objective - -_What is this proposal trying to achieve?_ - -Describe how to manage the registration of functions and options under the -auspices of MessageFormat 2.0. -This includes the Standard Functions which are normatively required by MF2.0, -functions or options in the Unicode `u:` namespace, -and functions/options that are recommended for interoperability. - -## Background - -_What context is helpful to understand this proposal?_ - -MessageFormat v2 originally included the concept of "function registries", -including a "default function registry" required of conformant implementations. - -The terms "registry" and "default registry" suggest machine-readbility -and various relationships between function sets that the working group decided -was not appropriate. - -MessageFormat v2 includes a standard set of functions. -Implementations are required to implement all of the _selectors_ -and _formatters_ in this set, -including _operands_, _options_, and option values. -Our goal is to be as universal as possible, -making MFv2's message syntax available to developers in many different -runtimes in a wholly consistent manner. -Because we want broad adoption in many different programming environments -and because the capabilities -and functionality available in these environments vary widely, -this standard set of functions must be conservative in its requirements -such that every implementation can reasonably implement it. - -Promoting message interoperability can and should go beyond this. -Even when a given feature or function cannot be adopted by all platforms, -diversity in the function names, operands, options, error behavior, -and so forth remains undesirable. -Another way to say this is that, ideally, there should be only one way to -do a given formatting or selection operation in terms of the syntax of a message. - -This suggests that there exist a set of functions and options that -extends the standard set of functions. -Such a set contains the "templates" for functions that go beyond those every implementation -must provide or which contain additional, optional features (options, option values) -that implementations can provide if they are motivated and capable of doing so. -These specifications are normative for the functionality that they provide, -but are optional for implementaters. - -There also needs to be a mechanism and process by which functions in the default namespace -can be incubated for future inclusion in either the standard set of functions -or in this extended, optional set. - -### Examples - -_Function Incubation_ - -CLDR and ICU have defined locale data and formatting for personal names. -This functionality is new in CLDR and ICU. -Because it is new, few, if any, non-ICU implementations are currently prepared to implement -a function such as a `:person` formatter or selector. -Implementation and usage experience is limited in ICU. -Where functionality is made available, we don't want it to vary from -platform to platform. - -_Option Incubation_ - -In the Tech Preview (LDML45) release, options for `:number` (and friends) -and `:datetime` (and friends) were omitted, including `currency` for `:number` -and `timeZone` for `:datetime`. -The options and their values were reserved, possibly for the LDML46 release as required, -but they also might be retained at a lower level of maturity. - -## Use-Cases - -_What use-cases do we see? Ideally, quote concrete examples._ - -As an implementer, I want to know what functions, options, and option values are -required to claim support for MF2: -- I want to know what options I am required to implement. -- I want to know what the values of each option are. -- I want to know what the options and their values mean. -- I want to be able to implement all of the required functions using my runtime environment - without difficulty. -- I want to be able to use my local I18N APIs, which might use an older release of CLDR - or might not be based on CLDR data at all. - This could mean that my output might not match that of an CLDR-based implementation. - -As an implementer, user, translator, or tools author I expect functions, options -and option values to be stable. -The meaning and use of these, once established, should never change. -Messages that work today must work tomorrow. -This doesn't mean that the output is stabilized or that selectors won't -produce different results for a given input or locale. - -As an implementer, I want to track best practices for newer I18N APIs -(such as implementing personal name formatting/selection) -without being required to implement any such APIs that I'm not ready for. - -As an implementer, I want to be assured that functions or options added in the future -will not conflict with functions or options that I have created for my local users. - -As a developer, I want to be able to implement my own local functions or local options -and be assured that these do not conflict with future additions by the core standard. - -As a tools developer, I want to track both required and optional function development -so that I can produce consistent support for messages that use these features. - -As a translator, I want messages to be consistent in their meaning. -I want functions and options to work consistently. -I want to selection and formatting rules to be consistent so that I only have -to learn them once and so that there are no local quirks. - -As a user, I want to be able to use required functions and their options in my messages. -I want to be able to quickly adopt new additions as my implementation supports them -or be able to choose plug-in or shim implementations. -I never want to have to rewrite a message because a function or its options have changed. - -As an implementer or user, I want to be able to suggest useful additions to MF2 functionality -so that users can benefit from consistent, standardized features. -I want to understand the status of my proposal (and those of others) and know that a public, -structured, well-managed process has been applied. - -## Requirements - -_What properties does the solution have to manifest to enable the use-cases above?_ - -The Standard Function Set needs to describe the minimum set of selectors and formatters -needed to create messages effectively. -This must be compatible with ICU MessageFormat 1 messages. - -There must be a clear process for the creation of new selectors that are required -by the Standard Function Set, -which includes a maturation process that permits implementer feedback. - -There must be a clear process for the creation of new formatters that are required -by the Standard Function Set, -which includes a maturation process that permits implementer feedback. - -There must be a clear process for the addition of options or option values that are required -by the Standard Function Set, -which includes a maturation process that permits implementer feedback. - -There must be a clear process for the deprecation of any functions, options, or option values -that are no longer I18N best practices. -The stability guarantees of our standard do not permit removal of any of these. - -## Constraints - -_What prior decisions and existing conditions limit the possible design?_ - -## Proposed Design - -_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ - -The MessageFormat WG will release a set of specifications -that standardize the implementation of functions and options in the default namespace of -MessageFormat v2 beginning with the LDML46 release. -Implementations and users are strongly discouraged from defining -their own functions or options that use the default namespace -Future updates to these sets of functions and options will coincide with LDML releases. - -Each _function_ is described by a single specification document. -Each such document will use a common template. -A _function_ can be a _formatting function_, -a _selector_, -or both. - -The specification will indicate if the _formatting function_, -the _selector function_, or, where applicable, both are `Standard` or `Optional`. -The specification must describe operands, including literal representations. - -The specification includes all defined _options_ for the function. -Each _option_ must define which values it accepts. -An _option_ is either `Standard` or `Optional`. - -_Functions_ or _options_ that have an `Optional` status -must have a maturity level assigned. -The maturity levels are: -- **Proposed** -- **Accepted** -- **Released** -- **Deprecated** - -_Functions_ and _options_ that have a `Standard` status have only the -`Released` and `Deprecated` statuses. - -* An _option_ can be `Standard` for an `Optional` function. - This means that the function is optional to implement, but that, when implemented, must include the option. -* An _option_ can be `Optional` for a `Standard` function. - This means that the function is required, but implementations are not required to implement the option. -* An _option_ can be `Optional` for an `Optional` function. - This means that the function is optional to implement and the option is optional when implementing the function. - -A function specification describes the functions _operand_ or _operands_, -its formatting options (if any) and their values, -its selection options (if any) and their values, -its formatting behavior (if any), -its selection behavior (if any), -and its resolved value behavior. - -`Standard` functions are stable and subject to stability guarantees. -Such entries will be limited in scope to functions that can reasonably be -implemented in nearly any programming environment. -> Examples: `:string`, `:number`, `:datetime`, `:date`, `:time` - - -`Optional` functions are stable and subject to stability guarantees once they -reach the status of **Released**. -Implmentations are not required to implement _functions_ or _options_ with an `Optional` status -when claiming MF2 conformance. -Implementations MUST NOT implement functions or options that conflict with `Optional` functions or options. - -`Optional` values may have their status changed to `Standard`, -but not vice-versa. - -> Option Examples `:datetime` might have a `timezone` option in LDML46. -> Function Examples: We don't currently have any, but potential work here -> might includes personal name formatting, gender-based selectors, etc. - -The CLDR-TC reserves the `u:` namespace for use by the Unicode Consortium. -This namespace can contain _functions_ or _options_. -Implementations are not required to implement these _functions_ or _options_ -and may adopt or ignore them at their discretion, -but are encouraged to implement these items. - -Items in the Unicode Reserved Namespace are stable and subject to stability guarantees. -This namespace might sometimes be used to incubate functionality before -promotion to the default namespace in a future release. -In such cases, the `u:` namespace version is retained, but deprecated. -> Examples: Number and date skeletons are an example of Unicode extension -> possibilities. -> Providing a well-documented shorthand to augment "option bags" is -> popular with some developers, -> but it is not universally available and could represent a barrier to adoption -> if normatively required. - -All `Standard`, `Optional`, and Unicode namespace function or option specifications goes through -a development process that includes these levels of maturity: - -1. **Proposed** The _function_ or _option_, along with necessary documentation, - has been proposed for inclusion in a future release. -2. **Accepted** The _function_ or _option_ has been accepted but is not yet released. - During this period, changes can still be made. -3. **Released** The _function_ or _option_ is accepted as of a given LDML release that MUST be specified. -4. **Deprecated** The _function_ or _option_ was previously _released_ but has been deprecated. - Implementations are still required to support `Standard` functions or options that are deprecated. -5. **Rejected** The _function_ or _option_ was considered and rejected by the MF2 WG and/or the CLDR-TC. - Such items are not part of any standard, but might be maintained for historical reference. - -A proposal can seek to modify an existing function. -For example, if a _function_ `:foo` were an `Optional` function in the LDMLxx release, -a proposal to add an _option_ `bar` to this function would take the form -of a proposal to alter the existing specification of `:foo`. -Multiple proposals can exist for a given _function_ or _option_. - -### Process - -Proposals for additions are made via pull requests in a unicode-org github repo -using a specific template TBD. -Proposals for changes are made via pull requests in a unicode-org github repo -using a specific template TBD against the existing specification for the function or option. - -Proposals must be made at least _x months_ prior to the release date to be included -in a specific LDML release. -The CLDR-TC will consider each proposal using _process details here_ and make a determination. -The CLDR-TC may delegate approval to the MF2 WG. -Decisions by the MF2 WG may be appealed to the CLDR-TC. -Decisions by the CLDR-TC may be appealed using _existing process_. - -Technical discussion during the approval process is strongly encouraged. -Changes to the proposal, -such as in response to comments or implementation experience, are permitted -until the proposal has been approved. -Once approved, changes require re-approval (how?) - - -The timing of official releases of the Standard Function Set and Optional Set is the same as CLDR/LDML. -Each LDML release will include: -- **Released** specifications in the Standard Function Set -- **Released** specifications in the Unicode reserved namespace -- a section of the MF2 specification specifically incorporating versions of the above -- **Accepted** entries for each of the above available for testing and feedback - -Proposals for additions to any of the above include the following: -- a design document, which MUST contain: - - the exact text to include in the MF2 specification using a template to be named later - -Each proposal is stored in a directory indicating indicating its maturity level. -The maturity levels are: -- **Accepted** Items waiting for the next CLDR release. -- **Released** Complete designs that are released. -- **Proposed** Proposals that have not yet been considered by the MFWG or which are under active development. -- **Rejected** Proposals that have been rejected by the MFWG in the past. - -## Alternatives Considered - -_What other solutions are available?_ -_How do they compare against the requirements?_ -_What other properties they have?_ diff --git a/exploration/registry-xml/README.md b/exploration/registry-xml/README.md index a3a3a6890..75b049041 100644 --- a/exploration/registry-xml/README.md +++ b/exploration/registry-xml/README.md @@ -163,8 +163,7 @@ For the sake of brevity, only `locales="en"` is considered. Given the above description, the `:number` function is defined to work both in a selector and a placeholder: ``` -.input {$count :number} -.match $count +.match {$count :number} 1 {{One new message}} * {{{$count :number} new messages}} ``` diff --git a/exploration/selection-declaration.md b/exploration/selection-declaration.md index 39215cc18..0e2c8abc7 100644 --- a/exploration/selection-declaration.md +++ b/exploration/selection-declaration.md @@ -1,6 +1,6 @@ # Effect of Selectors on Subsequent Placeholders -Status: **Accepted** +Status: **Proposed**
Metadata @@ -10,14 +10,7 @@ Status: **Accepted**
First proposed
2024-03-27
Pull Requests
-
#755
-
#824
-
#860
-
#867
-
#877
-
Ballot
-
#872 (discussion)
-
#873 (voting)
+
#000
@@ -146,16 +139,6 @@ _What use-cases do we see? Ideally, quote concrete examples._ * {{...}} ``` - As another example of where the selection function and formatting functions differ, consider a person object provided as a formatting input. - A `:gender` function can return the person's gender, - but a `:personName` person name formatter function formats the name. - ``` - .match {$person :gender} - male {{Bienvenido {$person :personName}}} - female {{Bienvenida {$person :personName}}} - other {{Le damos la bienvenida {$person :personName}}} - ``` - ## Requirements _What properties does the solution have to manifest to enable the use-cases above?_ @@ -166,11 +149,14 @@ _What prior decisions and existing conditions limit the possible design?_ ## Proposed Design -The design alternative [Match on variables instead of expressions](#match-on-variables-instead-of-expressions) -described below is selected. +_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ ## Alternatives Considered +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ + ### Do nothing In this alternative, selectors are independent of declarations. @@ -189,7 +175,6 @@ Examples: **Pros** - No changes required. - `.local` can be used to solve problems with variations in selection and formatting -- No confusion or overlap of keywords' behavior (ex: `.match`, `.input`) - Supports multiple selectors on the same operand **Cons** @@ -197,52 +182,6 @@ Examples: - Can produce a mismatch between formatting and selection, since the operand's formatting isn't visible to the selector. -### Require annotation of selector variables in placeholders - -In this alternative, the pre-existing validity requirement - -> Each _selector_ MUST have an _annotation_, -> or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. - -is expanded to also require later uses of a variable that's used as a selector to be annotated: - -> In a _complex message_, -> each _placeholder_ _expression_ using the same _operand_ as a _selector_ MUST have an _annotation_, -> or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. - -Example invalid message with this alternative: -``` -.match {$n :number minimumFractionDigits=2} -* {{Data model error: {$n}}} -``` - -Valid, recommended form for the above message: -``` -.input {$n :number minimumFractionDigits=2} -.match {$n} -* {{Formats '$n' as a number with fraction digits: {$n}}} -``` - -Technically valid but not recommended: -``` -.input {$n :integer} -.match {$n :number minimumFractionDigits=2} -* {{Formats '$n' as an integer: {$n}}} - -.match {$n :number minimumFractionDigits=2} -* {{Formats '$n' as an integer: {$n :integer}}} -``` - -**Pros** -- No syntax changes required. -- `.local` can be used to solve problems with variations in selection and formatting -- Supports multiple selectors on the same operand -- Avoids mismatches between formatting and selection by requiring their annotation. - -**Cons** -- May require the user to annotate the operand for both formatting and selection, - unless they use a declaration. - ### Allow both local and input declarative selectors with immutability In this alternative, we modify the syntax to allow selectors to @@ -278,8 +217,6 @@ declaration = s variable [s] "=" [s] expression - Produces an error when users inappropriately annotate some items **Cons** -- Complexity: `.match` means more than one thing -- Complexity: `.match` implicitly creates a new lexical scope - Selectors can't provide additional selection-specific options if the variable name is already in scope - Doesn't allow multiple selection on the same operand, e.g. @@ -312,8 +249,6 @@ Instead the selector's annotation replaces what came before. - Shorthand version works intuitively with minimal typing. **Cons** -- Complexity: `.match` means more than one thing -- Complexity: `.match` implicitly creates a new lexical scope - Violates immutability that we've established everywhere else ### Allow _immutable_ input declarative selectors @@ -335,7 +270,7 @@ This implies that multiple selecton on the same operand is pointless. .match {$num :number maximumFractionDigits=0} * {{This message produces a Duplicate Declaration error}} -.match {$num :integer} {$num :number} +.input {$num :integer} {$num :number} * * {{This message produces a Duplicate Declaration error}} ``` @@ -345,8 +280,6 @@ This implies that multiple selecton on the same operand is pointless. - Produces an error when users inappropriately annotate some items **Cons** -- Complexity: `.match` means more than one thing -- Complexity: `.match` implicitly creates a new lexical scope - Selectors can't provide additional selection-specific options if the value has already been annotated - Doesn't allow multiple selection on the same operand, e.g. @@ -388,7 +321,6 @@ The ABNF change would look like: - Preserves immutability. **Cons** -- Complicates the situations where selection != formatting due to the strictness's design nudges - A separate declaration is required for each selector. ### Provide a `#`-like Feature @@ -426,69 +358,3 @@ and a data model error otherwise. Removes some self-documentation from the pattern. - Requires the pattern to change if the selectors are modified. - Limits number of referenceable selectors to 10 (in the current form) - -### Hybrid approach: Match may mutate, no duplicates - -In this alternative, in a `.match` statement: - -1. variables are mutated by their annotation -2. no variable can be the operand in two selectors - -This keeps most messages more concise, producing the expected results in Example 1. - -#### Example 1 - -``` -.match {$count :integer} -one {{You have {$count} whole apple.}} -* {{You have {$count} whole apples.}} -``` -is precisely equivalent to: - -#### Example 2 -``` -.local $count2 = {$count :integer} -.match {$count2} -one {{You have {$count2} whole apple.}} -* {{You have {$count2} whole apples.}} -``` - -This avoids the serious problems with mismatched selection and formats -as in Example 1 under "Do Nothing", whereby the input of `count = 1.2`, -results the malformed "You have 1.2 whole apple." - -Due to clause 2, this requires users to declare any selector using a `.input` or `.local` declaration -before writing the `.match`. That is, the following is illegal. - -#### Example 3 -``` -.match {$count }{$count } -``` -It would need to be rewritten as something along the lines of: - -#### Example 4 -``` -.local $count3 = {$count} -.match {$count }{$count3 } -``` -Notes: -- The number of times the same variable is used twice in a match (or the older Select) is vanishingly small. Since it is an error — and the advice to fix is easy — that will prevent misbehavior. -- There would be no change to the ABNF; but there would be an additional constraint in the spec, and relaxation of immutability within the .match statement. - -**Pros** -- No new syntax is required -- Preserves immutability before and after the .match statement -- Avoids the serious problem of mismatch of selector and format of option "Do Nothing" -- Avoids the extra syntax of option "Allow both local and input declarative selectors with immutability" -- Avoids the problem of multiple variables in "Allow immutable input declarative selectors" -- Is much more consise than "Match on variables instead of expressions", since it doesn't require a .local or .input for every variable with options -- Avoids the readability issues with "Provide a #-like Feature" - -**Cons** -- Complexity: `.match` means more than one thing -- Complexity: `.match` implicitly creates a new lexical scope -- Violates immutability that we've established everywhere else -- Requires additional `.local` declarations in cases where a variable would occur twice - such as `.match {$date :date option=monthOnly} {$date :date option=full}` - - diff --git a/meetings/2024/notes-2024-08-19.md b/meetings/2024/notes-2024-08-19.md deleted file mode 100644 index f7a704243..000000000 --- a/meetings/2024/notes-2024-08-19.md +++ /dev/null @@ -1,272 +0,0 @@ -# 19 August 2024 | MessageFormat Working Group Teleconference - -### Attendees -- Addison Phillips - Unicode (APP) - chair -- Eemeli Aro - Mozilla (EAO) -- Mihai Niță - Google (MIH) -- Elango Cheran - Google (ECH) -- Richard Gibson - OpenJSF (RGN) - -Scribe: MIH, help from ECH - -## Topic: Info Share - -## Topic: Issue review -https://github.com/unicode-org/message-format-wg/issues -Currently we have 58 open (was 64 last time). -15 are Preview-Feedback -0 are resolve-candidate and proposed for close. -0 are Agenda+ and proposed for discussion. -0 are ballots - - - -### #859 [DESIGN] Number selection design refinements #859 - -EAO : do you get my point about why I find problematic to take some of the options instead of all the options - -APP: I think we are all grappling with the same problems. -As a message writer, if the UX asks me for certain values, how do I write them. -For integers it is obvious, but decimals are problematic. -Problems with different notations, scientific, etc. -How do I select 1.2345 ? -In code how does the string get into a number? -There is no easy way out of that. - -EAO: I would say that fundamentally if we need a match for 1.2345 the dev would create a value from data outside MF2 that is testable. - -We should offer no support to make it easy in MF2 to do rounding in code, etc. - -MIH: I cannot find any good use case to provide `|1.00|` as a key in a `.match` statement. This existed in MF1 for a long time, and it does matching numbers, and nobody complained. Matching ` |$1.00|` is already handled by plural rules. - -APP: Yes, I semi-agree with that. The only case is when the exact match value also matches a plural category that is specified. Ex: if you have driving directions that say - - -Think “in half a mile” or situations like that. -Maybe we should stick to our guns in regard to decimal matching. - -To EAO’s point about formatting outside, occasionally we need selections on non-numbers. - -MIH: NumberFormat does selection on fractionals. It’s not like it only works on integers. If you do numeric match on 0.5, it’s fine, it results in “in half a mile”. There’s not a case where you match on `0.5` to say “in half a mile”, but you say something different for 0.500. - -EAO: The “in half a mile” example sounds like choice format. You’re not looking for a specific value, but insead, a range of values. - -One can write a custom selection function. - -APP: time-box this. -I will go back to the design doc and make some changes. -To simplify the selection, to integer selection. - -We can discuss some more. But I think we can have a path forward. -Without making this too complicated. - -EAO: Technically, what we have is not matching on integers. It’s matching on all numbers, but we haven’t defined how the matching is to be done. - -APP: Understood, I’ll revise the design -At least document better where the “sharp edges” are. - -ECH: Is there a way to solve this in the registry? -If we need to do selection on decimal numbers? - -APP: exact match on decimal numbers, we can look at the serialization and match against serialization. -If I give you a big decimal 1.00000001 + options, how do I know if this matches? - -ECH: ICU had to do this already, both in C++ and Java. - -MIH: I’m still debating whether it is good to match on a formatted number. I still think it is worth considering whether it is beneficial to have the keys as strings and then match as numbers. -If the keys are numbers, and we compare numbers, it is a lot easier. - -EAO: Should we consider the options for number formatting. We have an example in ICU MessageFormat. You can offset for a plurals / ordinal selector. That option is not taken into account for exact matching, but it is taken into account for plural rule category matching. - -APP: I will rehash the design document? -### #845 Accept attributes design & remove spec note #845 - -APP Has merge conflicts. -I think it only implements things that we already agred on. -### #824: Select "Match on variables instead of expressions" for selection-declarations #824 - -APP: EAO, you were not here. -I think we should go to ballot with this. -You might want to go look at the notes - -EAO: can we narrow down the options to a smaller set? - -APP: personally I would like to see an emerging consensus. - -ECH: RGN, since you are here, how do you feel about this PR? -Because I couldn't really quite tell. -Also refers to the discussion in #736. - -ECH: we have a .match and we put the function in the selectors. -Do we or do we not that the options we provide to the selectors are “sticky” to the arguments we are giving to the selection. - -RGN: people can have intuition in both directions. -Existing tools can eliminate ambiguity with an extra declaration. - -APP: I hear that. Doing nothing is confusing, at least to me. -I look at the message in the example, and it feels messed up. -When we write date selections, we would want to get to the calendar. -What can I do so that if I write an `.input` I don’t repeat myself again in `.match` -We can disallow any kind of expressions in the selector. - -EAO: as we discovered, when we look at the syntax of `.match` we don’t all agree that the expression there is modifying the argument or not. -Do you agree that this is a problem, and we should not allow this confusion to arise? -With the cost that when we select on an expression that does not show in formatting we still have to declare with an `.input` or `.local`. - -ECH: when you select on a person object, the selection might be on the gender of the person. -But the formatting of the name depends on the whole `Person` object. Ex: - -``` -.match {$person :gender} -male Bienvenido {$person :person} -female Bienvenida {$person :person} -other Bienvendos {$person :person} -``` - -I think we are conflating these operations. -If someone needs to repeat, they have the option to declare, today. -There is no need to force it on them. -Legislating things adds friction. - -APP: when you select on numbers, you want to select on the same things as what you format. -I can imagine selectors where you want a different selector than the formatter. -That means “double annotation”. Some selectors might be better not-annotated. - - -RGN: concision and simplicity are not the same thing. -Extra verbosity is well worthy for extra clarity. -I mean to require an explicit declaration. - -ECH: you don’t need to require it, but if you want to match selection and formatting, you need to declare. -This is compatible with “do nothing”, I think. - -RGN: I am in favor to forbid the possibility of the formatting options used for selection be different from those used in formatting the pattern. - -APP: look carefully at the examples in the “leave as is” - -ECH: I understand the plural selection example very well, in which the formatting required for selection should be reused when formatting the input number for the message pattern. However, I gave you a case right above where the formatting for selection is not the same as the formatting for the message. That’s a counter example proving that we can’t say in all cases that selection “formatting” is the same as pattern formatting. - -Because it’s not universal in all cases, I can’t support the alternative based on an assumption that these concepts are connected in all cases. That alternative to “match on variables instead of expressions” is legislating that formatting is done in a `.local` and referenced as a variable in `.match`, which is motivated by the idea that the formatting of the placeholder in the pattern will be the same as the selector, and that’s not always true. - -MIH: I would like to not changing the syntax, in which we support assignments in the `.match`. But we can make it a data model error as EAO suggested if there is a different function used in the selector vs. the function used in a placeholder of a pattern. If this is what EAO described, I would be fine with it. - -EAO: this alternative says that you need to annotate a selector. -The new wording would be that if that when you use a variable in a selector, everywhere where you use that should be annotated. - -APP: let’s add that at the doc, read the doc afterwards. -Let’s see if we can close in and decide what we do here. - -### #834 Update the stability policy #834 - -EAO: what we promise (too much) is that a format will always format the same, forever. -What I think we should promise is that it will format to something valid. - -APP: the output might change (because locale data changes). -But the output will not “break” - -EAO: if it formats “fine” with v2.0 then it will format “fine” with v2.1 - -APP: I was thinking “would not produce an error” -I think you suggest something stronger. - -EAO: if an implementation changes nothing, only moves from v2.0 to v2.1 the result will still be formatted without error and without fallback, if v2.0 was without error and without fallback. -“No error” is less than what I want. - -APP: I added some extra requirements. -I am trying to make all of these promises, making them properly enumerated, rather than one portmanteau promise. - -EAO: I think I want that portmanteau promise. -It would not want to go from a non-error to an error when we update. - -APP: a stability policy is a strait jacket we agree to put on ourselves. -As opposed to things that we do outside of policy. -I don’t want to overpromise, because that would prevent honest corrections. -### #634 [DESIGN] registry maintenance #634 - -EAO: … - -APP: this PR is a design doc, on how to manage the registries. -I’m open to suggestions. -But we are starting to play with the registries. -How will we work with these? -This is a swipe at that. -I would like to submit as proposed, so that we can make it easier to read it. -And open to update afterward. - -EAO: I’m entirely on board describing functions and options one is allowed (not not required) to implement. -“If you do this, do it this way” - -I’m on board with `u:` for the “global” options described. -What would a `u:` function do? Why would it exist? Why put it in the `u;` space and not in the standard one? - -APP: I didn’t want to say “we will never put a function in the `u:` namespace.. -We already have functions for testing (for conformance) that might live in the `u:` space. - -EAO: is “registry” the right word to use for these things? -Should we be calling them default functions? -Sounds as if we can choose between registries? - -APP: everyone is required to have the default one. -And there can be extras, at all kind of levels (proposed, company, application) -We’ve been using “registry” for years now. - -MIH: You touched on the fact that IANA might want to use that. If we don’t like “default”, then we can call it “standard” to say that you cannot take out things from it or change it. The word “registry” does not bother me that much. - -EAO: in our communication we often talked about “the registry” -Maybe we should not use “registry” when we talks about the standard functions. -And I think that “RGI” is too clunky. - -APP: specifications, what forms “the registry”. -It is not machine readable. It used to be, but we took it away. -It is the same as when you registry various URI prefixes. You write a spec. -And have a status, maybe a namespace. -So it is a collection of specs. -That might be a way to thing about it. -The RGI was intentionally “clunky” - -EAO: built-in functions sounds like a name that is user friendly. -“Registry” is an implementation detail. - -MIH: The registry is growing over time. Also, all functions are on an equal footing: functions that are in the standard registry are no more special than functions that are in private registries. That is why I prefer calling them the same thing. - -APP: maybe we can talk about “registration process” -With “recommended addons” or something like that. -Might not be a registry file that one can download. - -ECH: are we bikeshedding the name? - -APP: yes - -ECH: I am not “wedded” to the word “registry” -I’m cool with “registration process”, but consistency (standard functions, company functions, custom functions) - -EAO: need to talk about how the registries will be versioned. -Is there a way to implement an updated version of the registered functions. - -ECH: a hassle to deal with different versions of various parts. - -MIH: the main reason we separated the spec proper from the registry was to be able to add functions without changing the spec. -Changing the spec version sounds scary. -Will all my tooling break (if the spec changed), when all we did was add an option to a function? -We can maybe version “the whole registry”, separate from the MF2 spec proper (syntax, data model, etc) - -APP: I think the function registries can be LDML versioned, and my expectation is that we might change in time. We can try to stabilize, as each implementation is required to implement those functions. -We can use the “twice per year” LDML version. -Gives us some kind of predictability. - -EAO: biannual tied to LDML sounds like a good idea. -But for a user reading the spec, I am interested in options not required, but recommended. -Wouldn't it be more readable if they were in the same document? -For example if I want to implement the `:number` function I have to read 3 documents. -Isn’t that clunky? - -APP: OK - -I was thinking RGI as a bucket for “this is done, but not required” - -MIH: Can we do something like what ICU does for public APIs, where we tag them with statuses “draft”, “technical preview”, “beta”, “final”? - -APP: the things in the `u:` namespace live in a different place than the main registry. -So it needs rules for that too. - - diff --git a/meetings/2024/notes-2024-08-26.md b/meetings/2024/notes-2024-08-26.md deleted file mode 100644 index dbf27974e..000000000 --- a/meetings/2024/notes-2024-08-26.md +++ /dev/null @@ -1,151 +0,0 @@ -# 26 August 2024 | MessageFormat Working Group Teleconference - -### Attendees -- Addison Phillips - Unicode (APP) - chair -- Eemeli Aro - Mozilla (EAO) -- Elango Cheran - Google (ECH) -- Mihai Niță - Google (MIH) -- Mark Davis - Google (MED) -- Richard Gibson - OpenJSF (RGN) - -* Scribe: ECH - -## Agenda -MED: The final date for the spec is Sept 25. Last time it was delayed a bit. This time, we need to not update the spec past Sept 25 in ways that would make ICU4C and ICU4J implementations invalid. - -APP: Yes, and regarding deadlines, we either need to make faster progress, or we will slip into next spring. - -MED: One question for MIH is how much have the changes in the spec so far affected ICU? - -MIH: I haven’t tried to implement them as they happened. - -MED: You’re caught up, right? - -MIH: No, not yet. We have a PR for ICU4C and updated ICU4J to make the shared MF2 tests all pass, which helps, but we’re not there yet. - -EAO: Even if we don’t get out of Technical Preview for LDML 46, we should still get out a release. I have updated the JS implementation to the current version of the spec. I’ve added changes to the tests. - -APP: A thing that I’m concerned about is that some of the things that are coming soon might have impact. Some will make changes to whitespace or Bidi controls. That may not have major impact but we need to make sure the details work. Other things may have more impact like how we deal with selectors and related declarations. -I promise to set up meetings instead of having face to face meetings. We have to come to a call of whether we’re exiting Tech Preview for CLDR 46 or not. - -EAO: As a comment from the work I did is that the biggest changes I’ve made are reification of attributes. Also, changing the data model representation of options as a mapping. Duplicate options and attribute usage are called errors, but they can’t show up in the data model, so I call them “syntax errors”. - -MED: For things where we previously disallowed things to be allowed is much easier to do for implementers, because old messages are still valid. - -APP: Dates are coming up. - -## Topic: Info Share - -ECH: Glad we’re talking about testing, feedback. Another thing I mentioned before is conformance testing. Hasn’t been brought up. Did some improvements. Lot of concern because setup wasn’t good before. TEsting the tests we had. Testing against ICU74… surprise! There are going to be errors. Still some things to resolve. Link to dashboard to see the green-ness: - -https://unicode-org.github.io/conformance/ - -EAO: On Dec 13 I will be talking with the Finnish National Committee to talk about how Finnish gets represented in CLDR, and whether it works for Finnish. There could be similar entities out there. - -### #863 Add tests for pattern selection - -EAO: There is somewhere in the spec where a lowercase “should” should be a capital SHOULD or MUST. - -MED: I want to say that it should be a capital SHOULD. The reason that it cannot be a capital MUST is that we do have to adjust plural rules. An example is that we realized recently that pluralization for French changes for compact notation when you get up into the millions. Also, there is a change for Wolof that we had to do. - -EAO: I am also in favor of making the recommendation be a capital SHOULD. We should separate the SHOULD recommendations back to MUST recommendations. - -APP: Let’s have a PR for that. Any objection to merging this PR, and then we can make followup - -### #862 Miscellaneous test fixes - -## Topic: Disallow “whitespace or special char” prefixed `.` in reserved-statement’s body (#840) -RGN: I would like to see a more significant change to be made, if we are to make a change. - -EAO: I would like to see a small change, and move from there. - -APP: I don’t understand the change because it moves part of `nameChar` to `name`, and maybe `nameChar` is used somewhere else apart from `name`. What’s the benefit here? - -RGN: There is a description of the benefit in the commentary of the Markdown. The approximate motivation is to recover in a parse error, such that the statement body should not be so broad that something that is truncated should not be a reserved truncated statement. - -MIH: I’m not against it, but it seems wrong to tinker with small things here and there. We are changing one production in the grammar in reserved statements, and redefine what reserved means. How often are we going to see this? I think we put too much emphasis on the whole reserved thing, and it feels like busy work. - -APP: I think the purpose is that if we ever added another keyword, we want to recognize the end of the statement. The purpose of reserved statements is to have it be possible to `.foo` and see that even when that statement is broken, we could still proceed to the `.match` and see that it was valid. If this PR helps us with that, I think it’s useful. - -EAO: This happens in practice when you’re editing a message. It creates a bad user experience for an author of a message using tools. A further step from this proposal is if we don’t allow whitespace to show up in a statement, it allows up to drop the requirement for a ___ statement to end in an expression. - -MIH: I think - -APP: I think we need to study it more as a group. - -ECH: Just observing that the discussion makes it sound like we’re over optimization based on future use cases that we haven’t seen. - -RGN: I think this has relevance to more than just syntax highlighting during editing. Reserved statement syntax affects which declarations would be valid in the future. - -https://github.com/unicode-org/message-format-wg/issues/547 - -The change in this PR seems to address at least one bullet point from that issue (`.strict true .local $var = {|val|}`), which gives me more confidence that it is a step in the right direction. - -APP: One thing to note is that our grammar says that it is consistent with `ncName`, although this change would make that not true. - -MIH: The fact that this change fixes an issue that RGN brought up a while ago by accident makes me worried that it’s breaking something by accident because we’re just tinkering. - -EAO: This PR doesn’t relax the requirement of having an expression at the end of a reserved statement. So what it does is intentional, and what RGN wants would be a further change. - -APP: Let’s study this seriously and discuss next week. - -## Topic: Selection-declaration (#824) -_Discuss the design options seeking WG consensus. Timeboxed to 15 minutes or will go to ballot._ - -https://github.com/unicode-org/message-format-wg/blob/main/exploration/selection-declaration.md - -MED: I put in a comment that I think could solve the problem. https://github.com/unicode-org/message-format-wg/pull/824#discussion_r1731496159 It would put in a restriction that would make EAO’s `$count` work right. - -APP: We don’t have a selector for `:date` or `:person`. It’s possible that they would want to produce different - -MED: It could happen more often, but it affects a relatively small percentage of cases. - -APP: I’ve been saying that we would ballot this a few weeks in a row without doing it. MED, I should check if your alternative is not already covered. If you want the alternative to include that proposal, please make a PR to the design doc. And then we can ballot this next week, but we need to ballot it because this is an important issue. - -## Topic: Bidi design (#811) -_Bidi and whitespace options need to be discussed in light of the design document._ - -https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md - -APP: We merged this design document so that you all could read it. This has a lot of impact if we go down this path. - -EAO: I still think that we should have `name` be isolated. It would have the same effect that some of the parts of this PR would have. I would like to get input from other implementers. - -APP: I thought a lot about the implementation aspect and also people editing, including translators and message authors. Bidi controls are invisible. Moving curly brackets around that cause the controls to do funky unexpected things. My proposal is to make the Bidi controls and strong markers optional for super loose isolation. Parsers that parse messages would just ignoring things because they’re ignorable syntactically. It allows mirrored symbols to be unpaired. And messages could be tightly wrapped. I’m open to getting feedback, but I don’t want to make our syntax so fiddly because they need to work with Bidi things that they don’t understand. - -EAO: I’m concerned that we can get the right behavior without having isolation. For example, in a RTL context, interspersing a placeholder that has RTL content, I don’t see how we can get that to work right without isolation. Within `name` and identifiers, I can see high value for allowing just maybe the RTL mark, then that ensures the LTR doesn’t bleed to the end. - -APP: You would put that at the end because the `$` sign is already strongly LTR. I agree, that is an area where we have vagueness. Another thing is to have key lists because what you see in visual order may not be what is written in logical order, so we have to be careful. - -EAO: I would like the LTR mark to be in the `name` construct and not all the places where `name` shows up. - -APP: NFC doesn’t interact with that. - -MIH: It really feels like we are micro optimizing, but I’m not sure for what. I expect translators to use professional tools to edit messages. It might be useful only once in a while where someone uses a text editor to fix stuff. It feels like we’re designing a programming language and worry about what happens if people edit their source code in Notepad. No one edits code in Notepad. When it comes to inserting the marks, we don’t know how it will negatively affect messages. And lastly, we cannot say “I think it works” without trying it because there are lots of text editors that don’t support Bidi correctly. - -EAO: Look at #847 to see why we are considering things as a corner case. I think we should be conformant with UAX 31 and UTS 55. - -MIH: I already understand what we’re trying to do. If we try to follow specs, then we should make our text reflect that we introduce a feature to solve which statement in which spec, but not because we feel like it or think it would be good. - -APP: I think we should be conformant with Unicode specs. There are some things whitespace-wise that come to the front. I pushed the Bidi discussion because that issue is not covered by that. - -## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) [was “registry maintenance”] -_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ - -https://github.com/unicode-org/message-format-wg/pull/634 - -APP: Based on last week’s discussion where we would move from “registries” to specifications of “standard” and “optional” functions. - -EAO: This is leading to spec language where we need labels on functions and options. I would like feedback from others on that. APP, you proposed `accepted`, `released`, and `deprecated`. Some iteration on names would be helpful. - -ECH: Why not reuse the terms that ICU uses for APIs? - -APP: I pulled that from somewhere that seemed reasonable, but I’m happy to match what ICU does, which sounds reasonable. - -EAO: I would be very happy to reuse something else as a starting point. Can you find a link to the ICU API states and add it to the PR. - - - -## Topic: AOB? - -(discussion of process) diff --git a/meetings/2024/notes-2024-09-09.md b/meetings/2024/notes-2024-09-09.md deleted file mode 100644 index 0af122093..000000000 --- a/meetings/2024/notes-2024-09-09.md +++ /dev/null @@ -1,167 +0,0 @@ -# 9 September 2024 | MessageFormat Working Group Teleconference - -### Attendees -- Addison Phillips - Unicode (APP) - chair -- Mihai Niță - Google (MIH) -- Eemeli Aro - Mozilla (EAO) -- Mark Davis - Google (MED) -- Tim Chevalier - Igalia (TIM) -- Richard Gibson - OpenJSF (RGN) -- Harmit Goswami - Mozilla (HGO) - -Scribe: HGO - -## Topic: LDML46 and the end of Technical Preview -_The v46 release is upcoming. There is also a desire to finish the 2.0 release (exit technical preview). Let’s discuss the practical considerations for doing both, including the possibility of a 46.1. This is also the section of the meeting in which we’ll set out the goals for the next 2-3 days._ - -[APP]: Current plan for #46 is to bookmark where we’re at and run the spec out. We still call it a technical preview but release out to-date work - -[MED]: Deadline is 25th for tech preview, we need time for back-and-forth and review, I don’t see time for that so we should target end of November to be done in this community - -[EAO]: Why do we need to complete this in this calendar year? - -[MED]: Funding issues, also without a forcing factor, this group might take ages. A deadline helps us to get done - -[EAO]: My concern with finishing the tech preview is that we will need to await on external inputs (Although I like the deadline) - -[MED]: If this is done properly, we can fix problems later (if it’s done properly). Trying to perfect it now is risky. - -[APP]: I think we can do enough to go ask the larger community prior to finishing the core issues remaining. We can run off a copy of #46 as a ‘stake in the ground’ - -[MED]: Sounds good, we don't want to force things into the tech preview since there’s only a week. - -[EAO]: Wanted to clarify the parts of the spec that are not able to be complete within the week. If people outside this group have different thoughts, I’m concerned the balance between opinions and decisions we can make will get out of hand, and worst-case can lead to a v3 release - -[APP]: Most of the concerns are regarding syntax. I agree, but people who don’t like the syntax will either have to live with it or create their own standard. We’ve reached our goals with what we wanted to accomplish with the syntax, other people can discuss whitespace, etc., but that won’t be in MF2. We can’t keep opening that box. - -[APP]: On monday, we’ll finalize what to add in, and submit on wednesday. - -## Topic: PR Review -Timeboxed review of items ready for merge. - - -## Topic: … (#879) -[Merged] - -## Topic: … (#878) -[MED and EAO approve, merged] - - -## Topic: Selection-declaration (#824, #873, #872) -_Discuss the design options seeking WG consensus. Timeboxed to 15 minutes._ - -- https://github.com/unicode-org/message-format-wg/blob/main/exploration/selection-declaration.md -- https://github.com/unicode-org/message-format-wg/issues/873 -- https://github.com/unicode-org/message-format-wg/issues/872 - -[APP]: There wasn’t a consensus on #873, but solution F seems to be getting an emergent consensus. I think that’s the proposal on the table, any challenges? - -[MED]: I think it’s suboptimal, but can be extensively modified in the future (see solution E). I think it’s good for release 46. - -[APP]: I’m also unhappy with it currently - -[EAO]: If there’s a desire to make this backwards extensible, then we need to reserve the space in the syntax, opposed to what we currently do - -[APP]: Or we look at our stability guarantee to see if we can make that change - -[MED]: The key thing people want is backwards compatibility - -[EAO]: In our current stability policy, the 2.0 parser should parse without syntax error a message made in 2.1 … 2.n version. So then I feel we must reserve the space - -[MED]: I think it’s a mistake to promise the syntax is forwards and backwards compatible, since that ties our hands for the future. Changing forward compatibility needs a good reason, but tying our hands now can be bad, as I’ve seen in my career - -[EAO]: I’d be okay with no forwards compatibility. This also lets us drop all the reserved structures from the syntax. - -[MIH]: I have mixed feelings about dropping. L10n tools would work, which is the main benefit. On the other hand, currently having reserved structures is clunky, so I’m okay with removing forwards compatibility - -[APP]: I think it’s a reasonable evil. I doubt we’ll use the structure but I could be wrong. - -[EAO]: I’d be okay with losing forwards compatibility, partly due to this, but also because it’ll help us simplify a lot, and can get rid of all the reserved stuff. Effectively, everything that’s an error can be fixed later. It’d also make me less unhappy about rushing this out of the tech preview, since we have more options in the future - -[APP]: We’re suggesting that we can make additions to the syntax that won’t break your compatibility? [All: yes] - -[APP]: Okay so we need to rework our guarantee (MED: to guarantee backwards compat, but not forwards), remove reserved structures, and move forward with solution F? [MED]: Yup [APP]: EAO, we should do a PR for solution F first, then make 2 additional PRs - -## Topic: Disallow “whitespace or special char” prefixed `.` in reserved-statement’s body (#840) -_Discuss making this technical change in the reserved-statement syntax._ - -[APP]: Now out of scope! - -## Topic: Bidi design (#811) -_Bidi and whitespace options need to be discussed in light of the design document._ - -https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md - - -[APP]: A piece of homework for this topic was to review the ALM mark, which has an effect when used BEFORE a sequence of characters, but not when you add it to the end of a token. The way we use strong characters in the syntax, there’s not many ways you can incorporate ALM into it. - -[EAO]: So you propose we drop ALM from the allowed things? - -[APP]: It’s an allowed character but not allowed in the syntax - -… - - -## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) [was “registry maintenance”] -_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ - -[APP]: Should I add this in as proposed and we iterate later? [No objections] - -## Topic: Uniqueness (#869, #847) -_String equality (used in key matching or operand uniqueness) is affected by Unicode Normalization concerns. We need to decide whether to require a specific normalization form (typically NFC) or whether we warn users about the consequences of using denormalized values._ - -[APP]: We should address string equality, given the nature of Unicode. - -[EAO]: Mentioning that we have option and attribute names checking for uniqueness - -[MIH]: My take is that I strongly favor comparing strings as they are without normalization. If you want to normalize, you are free to do it outside, but in terms of preprocessing, what gets to MF2 is processed as is. - -[MED]: Almost every process nowadays has access to NFC normalization, if the dataset is small. You can do a very quick check to see if a text looks suspicious or not. I’m more worried about odd errors hitting people, since one implementation normalizes and another does not. This won’t affect European languages as much, but it’ll hit other languages a lot - -[APP]: I’ve always wanted people to check for normalization. If we want broad adoption, not insisting on normalization will help, but then we have to warn people that naming variables “options” and “operand”, etc. is a bad idea. - -[MED]: I see two issues. One, if all comparisons are within MF2 itself, and the second, if it depends on parameters and whether or not the parameters are normalized. I think it’s a mistake not to have a ‘SHOULD’ that comparisons should be done with normalization if possible. - -[EAO]: Agree with MED, SHOULD is good but MUST is too much of a fight - -[APP]: Should is hard to test though - -[MED]: I don’t think it’s too hard, you can easily provide such test cases. You can mark the test cases as they’re SHOULD - -[APP]: If we give authoring guidance that you should use normalized values, but the implementation doesn’t require normalization, then you can get yourself into trouble since it may sometimes work and other times won't. If you write a normalization-sensitive message, then it’s liable to cause problems, and there should be a warning - -[EAO]: I still think we should have a SHOULD. In the spec, you can get noticeable differences in behavior between normalized and non-normalized messages. - -[APP]: Agreed, it should be given to the author. - -[MED]: If we don’t go for MUST, then we should go with ‘MF2 text should be normalized with NFC, and parameters should be compared with normalization’. There can also be a section of the site that talks about implementation features, and this isn’t as formal so can be modified easily over time. - -[APP]: Normalizing the whole message is a bad idea since we have quoted text pieces that we promise as verbatim. That’s why I say it should be inside the comparison. I understand EAOs point, but if some messages behave differently in different environments, then I think it’s okay to just put a warning sticker there. - -[EAO]: It’s either we enforce with a MUST, or recommend with SHOULD, and handle the diverging corner cases - -[MED]: Agreed. The SHOULD should be put on building the message and comparisons. - -[MIH]: We still have to say the comparison should be normalized away. The comparison should be there no matter what. As an implementer, I don't really care since I implement on top of ICU. I’m still reluctant to ask for normalization behavior at runtime, but whatever - -[EAO]: Comparisons is the only place we should put the SHOULD, since that’s the only thing we control [All agree] - -[EAO]: We might also want to include a definition for ‘unique’ and ‘duplicate’, so we can point to those definitions in the PR - -[MIH]: I’m reluctant to claim a user should normalize an ArgMap, it’s just not that obvious. There might be use-cases where I want the denormalized form, and I can imagine a use-case - -[EAO]: My implementation plan won’t include normalizing the ArgMap, since it’ll be ASCII only. - - -## Topic: Issue review -https://github.com/unicode-org/message-format-wg/issues -Currently we have 56 open (was 60 last time). -- 14 are Preview-Feedback -- 1 is resolve-candidate and proposed for close. -- 3 are Agenda+ and proposed for discussion. -- 1 is a ballot - - - -## Topic: AOB? - diff --git a/meetings/2024/notes-2024-09-10.md b/meetings/2024/notes-2024-09-10.md deleted file mode 100644 index 96023890e..000000000 --- a/meetings/2024/notes-2024-09-10.md +++ /dev/null @@ -1,361 +0,0 @@ -# Sep 10, 2024 | [MFWG: Virtual F2F](https://www.google.com/calendar/event?eid=MGw2M2M5czZzYWw4ZnRwMTlhZG01N2dyYWZfMjAyNDA5MTBUMTYzMDAwWiBhZGRpc29uQHVuaWNvZGUub3Jn) - -### Attendees - -- Addison Phillips \- Unicode (APP) \- chair -- Eemeli Aro \- Mozilla (EAO) -- Mark Davis \- Google (MED) -- Mihai Niță \- Google (MIH) -- Elango Cheran \- Google (ECH) -- Staś Małolepszy \- Google (STA) - - -**Scribe:** MIH - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## Topic: Tech Preview - -Let’s review the Task List: - -[https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing](https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing) - -## Topic: PR Review - -*Timeboxed review of items ready for merge.* - -| PR | Description | Recommendation | -| ----- | ----- | ----- | -| #883 | [Remove forward-compatibility promise and all reserved & private syntax](https://github.com/unicode-org/message-format-wg/pull/883) | Merge | -| #882 | Specify bad-option for bad digit size option values | Discuss | -| #877 | [Match on variables instead of expressions](https://github.com/unicode-org/message-format-wg/pull/877) | Merge | -| #869 | Add section on Uniqueness and Equality | Discuss | -| #859 | \[DESIGN\] Number selection design refinements | Discuss | -| #846 | Add Unicode Registry definition | Discuss (\#634) | -| #842 | Match numbers numerically | Discuss | -| #840 | Disallow whitespace and special char prefixed . in reserved-statement’s body | Reject (Out-of-scope) | -| #823 | Define function composition for :number and :integer values | Discuss | -| #814 | Define function composition for date/time values | Discuss | -| #806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | -| #799 | Unify input and local declarations in model | Discuss | -| #798 | Define function composition for :string values | Discuss | -| #728 | Add "resolved values" section to formatting | Blocked by \#806 and \#798 | -| #673 | Fix whitespace conformance to match UAX31 | Discuss | -| #646 | Update spec as if PR \#645 were accepted | Discuss | -| #634 | [\[DESIGN\] Maintaining the Standard, Optional and Unicode Namespace Function Sets](https://github.com/unicode-org/message-format-wg/pull/634) | Discuss (Agenda+) | -| #584 | Add new terms to glossary | Discuss | - -## Topic: Resolved Values (646, 728, 798, 806, 814, 823, 842, 859) - -_This is the most controversial topic in Tech Preview and blocks a large number of our PRs as well as our exit from preview. The resolution to this should be achievable._ - -## Topic: Bidi design (#811) - -_Bidi and whitespace options need to be discussed in light of the design document._ - -[https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md](https://github.com/unicode-org/message-format-wg/blob/main/exploration/bidi-usability.md) - -## Topic: Standard, Optional, and Unicode Namespace Function Set maintenance (#634) \[was “registry maintenance”\] - -_This is the function registry maintenance procedure design. Let’s review with an eye towards using as a template for other work._ - -## Topic: Issue review -[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) - -Currently we have 61 open (was 60 last time). - -* 15 are `Preview-Feedback` -* 0 are `resolve-candidate` and proposed for close. -* 2 are `Agenda+` and proposed for discussion. -* 1 is a ballot - -| Issue | Description | Recommendation | -| ----- | ----- | ----- | -| #865 | TC39-TG2 would like to see completion of the TG5 study | Discuss | -| #881 | Should we drop private-use annotations? | Discuss | -| #847 | Conformance with UAX\#31 and UTS\#55 | Discuss | -| #735 | Recovery from data model errors | Resolve | - -## **\#\# Topic: Design Status Review** - -| Doc | Description | Status | -| ----- | ----- | ----- | -| bidi-usability | Manage bidi isolation | Proposed, Discuss | -| dataflow-composability | Data Flow for Composable Functions | Proposed | -| function-composition-part-1 | Function Composition | Proposed | -| maintaining-registry | Maintaining the function registry | Proposed (\#624), Discuss | -| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | -| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | -| beauty-contest | Choose between syntax options | Obsolete | -| selection-matching-options | Selection Matching Options (ballot) | Obsolete | -| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | -| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | -| formatted-parts | Define how format-to-parts works | Rejected | -| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | -| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | -| code-mode-introducer | Choose the pattern for complex messages | Accepted | -| data-driven-tests | Capture the planned approach for the test suite | Accepted | -| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | -| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | -| error-handling | Decide whether and what implementations do after a runtime error | Accepted | -| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | -| expression-attributes | Define how attributes may be attached to expressions | Accepted | -| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | -| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | -| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | -| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | -| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | - -## Topic: AOB? - -=== - -#603 omitting the `*` key when the msg authors thing they are exhaustive - -EAO : an example would be French, there the number of options went up in time. So what was exhaustive then it was not. -It can be exhaustive for a boolean. - -APP: a fallback option if nothing matches, which would be different from \* as the most likely option. - -MED: there are 2 conflicting things. The reason plurals work is because there is a default. If there is a default value, then that’s identical to \`\*\`. - -EAO: I am happier to leave it open. Now that we don’t have a guarantee for forward compatibility. - -APP: this was also working before we changed the -`*` is technically different from `other`, in the matching algorithm. Technically you can write a plural algorithm that recognizes \`other\` as a keyword. - -EAO: if we leave out the `*`, with the current algorithm when nothing matches the selector is going to end up to `*`. -Maybe we should reconsider `other` in `:number`. -I don’t think we need that, with the - -MED: Guides constructing things by hand, because you don’t need to write entries for both \`other\` and \`\*\`. -We can put in a note that it is a tech preview, and might be relaxed in the future. -APP: I think we should stay with what we have and keep it for the future. - -MED: if we really care about conciseness we can invent some kind of fall-through. - -EAO: what I was saying about `:number` is … \[reading from spec\]: -> Apply the rules to the resolved value of the operand and the relevant function options, and return the resulting keyword. If no rules match, return \`other\`. -They return `other` I don’t think we need. - -MED: I think we should leave this alone. -I have some strong opinions about how things are resolved, but I would leave it as is for now. - ---- - -APP: Error handling. I think we are now done with error handling. -EAO, are we now done with all the tests? - -EAO: I did not check that all the tests if all cases where errors are expected are updated. - -APP: Bidi / whitespace handling, we discussed. We have a design. We need to discuss, we also discussed a bit yesterday. -I wait for the EAO change that removes the resolved. - -Interchange data model: informative. - -EAO: PR #799 - -APP: since it is not a deliverable, we can put it aside until we release. -This becomes even more interesting because of what we did with \`.match\`. It might be easier if we unify. -We should do this for 2.0, not necessarily for LDML 46\. - -APP: other things on the data model? - -EAO: should namespaces be part of the variable or keep them as one. - -EAO: XLIFF - -APP & MED: nice to have. - -EAO: For 2.0 as well? - -EAO: I can present what I have. It does not require extensions to XLIFF. - -APP: XLIFF is not a deliverable anymore. - -EAO: XLIFF is still listed, see [https://github.com/unicode-org/message-format-wg/blob/main/docs/goals.md\#goals](https://github.com/unicode-org/message-format-wg/blob/main/docs/goals.md#goals) -If we drop XLIFF, we have to make an explicit note about that decision. - -EAO: I would still be happy to present my XLIFF mapping later. - -MED: I don’t think that XLIFF is needed for MF 2.0. Needs a lot of testing. It is binding to another standard, and need to make sure it works for people who already use XLIFF. - -APP: I agree. -I want to push past this. This is interesting and important, but we need to solve what we must release now. - -EAO: at Mozilla I find the data model the most useful part. - -MIH: the data model is in ICU4C and ICU4J, and it is a public API. - -MED: we need to pun a pin in this, and will not be there in LDML 46\. - -EAO: I think it should be published somewhere. -Given that we have several implementations. - -MED: I am not against having the data model, but tbd if we do something with it. - ---- - -APP: the function registry is not a registry anymore. We have “function sets”. -We need to update everything. - -EAO: namespace \`u:\`. Introduces \`locale\`, \`id\`, and \`dir\`. -Would be good to finalize this. Affects the \`syntax.md\` -There should be a note to discourage rolling your own implementation of such functionality. - -MIH: do we really need to change the name from “function registry”? Because now it is public API in ICU. - -APP: now we don’t have a machine readable format. - -MED: for ICU we will have to have all function capabilities of MF1. - -APP: we did that in 45 - -EAO: back to “don’t roll your own locale”, this is what we have: -[https://github.com/unicode-org/message-format-wg/pull/845/files\#diff-dd0b88aaa872a181a51fffcc6c3ba8a005b84075c053b70b6693e92e41ea00c9L738](https://github.com/unicode-org/message-format-wg/pull/845/files#diff-dd0b88aaa872a181a51fffcc6c3ba8a005b84075c053b70b6693e92e41ea00c9L738) - -APP: Would be good to land 846 (the \`u:\` namespace). And make sure that the text “don’t roll your own” is there. - -EAO: if we land 846 we don’t need a note. - -APP: for function sets this can be post LDML 46\. -We probably need an update to \`registry.md\` - -APP: markup, \#650. - -MED: no need to fix now. - -APP: but we need to close this somehow. - -APP: expression attributes. We did (?) - -APP: tests, we need to make sure that we have them. - -APP: we have a PR list, and maybe a couple that we can merge. - -- PR: Remove forward-compatibility promise and all reserved & private syntax #883 - -APP: are we ready to merge? - -MED: very important to have a note saying what “deprecated” means. It means it should not be used, but it will never be removed. Because (for example) ISO just removes stuff when it is deprecated. - -EAO: I would be interested if STS has an opinion on this. - -STA: I don’t know :-) - -APP: summary: yesterday we decided that we remove all reserved parts, because we drop the request for forward stability. -So in the future we can do whatever we want, as long as we don’t break the old stuff. So MF 2.1 can read and understand 2.0, but a MF 2.0 engine might fail to read MF 2.1 syntax. - -STA: in Seville we didn’t yet have namespaces. - -APP: we also envisioned being able to write one parser that is future proof. - -EAO: also for STA: another aspect from yesterday is that there is strong pressure for us to deliver 2.0 by the end of this year. -It is much harder to make sure what what we release is also future-prof. - -- PR: Match on variables instead of expressions #877 - ---- - -Housekeeping. - -Issue #735: Recovery from data model errors #735 -APP: I intend to close this. I think that the decisions on error handling cover this. - -Issue #881, Should we drop private-use annotations? #881 -APP: we just discussed - -Issue #673 (WRONG number) -APP: whitespace conformance. I have a PR for this (#673) -Also related to the BiDi design document. -I am waiting for the other big changes before trying to merge this. - -MIH: we have several issue for function compositions, separate for :number / :integer, or :string, or :date / :time -But that is not very useful, or interesting. All it does is combine options bags for formatting / selection. It saves typing, it’s all it does. Instead of typing 3 options and 8 options more, I can only type the updates. - -MIH: The more useful one is transforming functions. -Take a person and gets the date of birth. So formatting a person is completely different than formatting a date. The option bags are different, they don’t merge. - -MED: yes, transforming functions are very handy. Think uppercase transforms, or normalization. - -MED: by “mutating” I don’t mean mutating the input value. - -STA: I am very happy that this topic shows up when I join meetings. -Options: -* save typing -* extract option (get a field, like a date, of gender) -* inspect a value. This is how I imagined the grammatical accord. We should accept that certain functions only work with other functions. As a translator I can make sure things work together. - -STA: I don’t claim to have all the answers, or how to say it in the spec. -EAO: one of the reasons for the series of PRs is that we’ve been covering the same ground over several conversations, and with explicit functions we can work on concrete functions. - -MIH: the inputs now can be all kinds of types. A \`:date\` formatter can take a Java Date, or Temporal, or Calendar, or even a long (as epoch time). And we return a formatted-to-parts list of objects, which can be passed to the next function in the chain. - -MED: for the functionality we need right now we don’t need the concept of a “resolved value” -... -We don’t need to decide this for 2.0. - -EAO: is this a valid message, or not? -``` -.local $x \= {2.1 :integer} -{{{12.3 :number minimumFractionDigits=$x}}} -``` -This is internal to MF2, and the behavior should be the same in all implementations. - -APP: I agree that this is a good illustration. -There is a tension between the idea of immutability, and that the annotation does something to the variable. -We should resolve the above, if the above is an assignment, or we just put \`$x\` in a map? - -MED: my inclination is that \`:integer\` and \`:number\` don’t change the value. -They only format and select. -If you want a mutating, returning a number, we need another kind of function. - -EAO: the case of a person that interacts with MF2, when they see something like the above, they will presume that \`$x\` is assigned, and it is an integer. -If not, should it have a string value? Or some kind of number? We processed the input a bit, but not much. -I would argue that it should be an integer type, with an integer value. - -STA: I like the example. And I have 2 obs. -One, nobody should do messages like this. -We should yield control to the function itself. -``` -.local $x \= {2.1 :integer signDisplay=always} -{{{12.3 :number signDisplay=$x}}} -``` - -APP: I find this example weird. I see what you are doing, but I can see myself spending time explaining this to localization engineers. -Every function should say: these are the types it can take, and what can be put out. -And we can dodge the question, somewhat. -Especially now that we don’t have a match repeating the expression. - -MIH: I would argue that right now, for a plural implementation, the :number does return some kind of numeric value. Because when one does \`.match {$foo :number}\`, to make the decision one has to do the operations described in CLDR. Which is do \`$foo\` modulo 100, and if the result is between 10 and 20 then the plural is \`many\` (for example). -But to do this kind of modulo operations it means that the \`$foo\` is some kind of numeric value, not a string. - -EAO: I think I agree with Stas, to say that each function can define it’s own resolved value, with resolved options. -A function is allowed to do anything it wants. - -APP: as a function author I can implement a \`:number\` function that returns a string. But it is not mandated to return a string. - -EAO: if we describe a resolved value in spec we can help an implementer understand how this would work. - -STA: implementations should allow functions that return something other than string. - -APP: a function might return a resolved value that can be something other than string. - -APP: I can imagine a “part of speech” class, a subclass of string, but would have attributes other than strings. It would be implementation specific. -We can describe that, the trouble is how to do it. - -APP: -1. What are going to do here -2. Next meeting? - -CONSENSUS: - -* A function MUST define its resolved value. The resolved value MAY be different from the value of the operand of the function. It MAY be an implementation specific type. It is not required to be the same type as the operand. - -* A function MUST define its resolved options. The resolved options MAY be different from the options of the function. - -Timebox discussion of :u and whether its discoverable or handled at the processor level diff --git a/meetings/2024/notes-2024-09-16.md b/meetings/2024/notes-2024-09-16.md deleted file mode 100644 index 752a28301..000000000 --- a/meetings/2024/notes-2024-09-16.md +++ /dev/null @@ -1,398 +0,0 @@ -# 16 September 2024 | MessageFormat Working Group Teleconference - -### Attendees - -- Addison Phillips \- Unicode (APP) \- chair -- Mark Davis \- Google (MED) (first 30 min) -- Mihai Niță \- Google (MIH) -- Eemeli Aro \- Mozilla (EAO) -- Richard Gibson \- OpenJSF (RGN) -- Harmit Goswami \- Mozilla (HGO) -- Matt Radbourne \- Bloomberg (MRR) -- Elango Cheran \- Google (ECH) -- Tim Chevalier \- Igalia (TIM) - - - -**Scribe:** MRR - - -## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) - -**Next week: cancel call because TPAC and LDML46 spec beta?** - -## Topic: Info Share - -Addison: [https://github.com/tc39/tg5/issues/3\#issuecomment-2350218930](https://github.com/tc39/tg5/issues/3#issuecomment-2350218930) - -You may want to look at the comments I’ve made. I’ve made them without my chair hat on. I’d appreciate others looking at them. - -EAO: The JS implementation has a PR open. Then it will be up to date with the current state of the spec. -Second thing: I’v’m talking at the Unicode Tech Workshop about message resources. - -APP: I think you and I will tag team. - -MED: Myself and Elango (ECH) will be there so we can meet some of you in person. - -## Topic: LDML46 Final Touches - -*\_Let’s make sure we address open issues for LDML46 and reach consensus of what is included in our milestone Tech Preview release.\_* - -- Syntax freeze? -- Add a note about renaming the function registry or should we change it now? See [https://github.com/unicode-org/message-format-wg/blob/main/exploration/maintaining-registry.md](https://github.com/unicode-org/message-format-wg/blob/main/exploration/maintaining-registry.md) -- Composition - -APP: One of the open PRs has changes to whitespace and bi-di. I don’t know how much churn that would introduce for implementers. We’re getting close to behaving as if we have a syntax freeze. We’ll want to discuss what syntax freezes we have in 46\. - -APP: We did agree that we’d get rid of the idea of a function registry. The section is still called “Registry” so we either want to fast-track some renaming of this or provide some explanatory text. - -MED: Leaving a note is perfectly fine. Section headings and things can change before the .1 release. -If a note is easy and there’s a lot of stuff piled up, a note is fine. - -APP: I’ll fast-track a note and will be looking for approvals on that. - -APP: The other thing is function composition. We have a rough consensus but the devil is in the detail and we’re not going to do this for 46\. Do we want to say something in 46 about ‘this is the shape of what we’re doing’. - -MED: We don’t need to prematurely signal where we might be going until it’s really solid. - -APP: There is a note in there. - -EAO: It would be good to know if TIM will be participating in this discussion. - -APP: Yes, but not at this moment, - -APP: If we don’t agree to merge it today, it’s not going into 46\. EAO, I saw you raised a PR with a typo, we can fast-track that, - -MED: Typos can come in afterwards. Clear obvious small changes. - -## Topic: PR Review - -*Timeboxed review of items ready for merge.* - -| PR | Description | Recommendation | -| ----- | ----- | ----- | -| \#885 | [Address name and literal equality](https://github.com/unicode-org/message-format-wg/pull/885) | Discuss | -| \#884 | [Add bidi support and address UAX31/UTS55 requirements](https://github.com/unicode-org/message-format-wg/pull/884) | Discuss | -| \#882 | [Specify `bad-option` for bad digit size option values](https://github.com/unicode-org/message-format-wg/pull/882) | Merge | -| \#869 | [Add section on Uniqueness and Equality](https://github.com/unicode-org/message-format-wg/pull/869) | Competes with \#885 | -| \#859 | \[DESIGN\] Number selection design refinements | Merge (Proposed) | -| \#846 | Add Unicode Registry definition | Discuss (\#634) | -| \#842 | Match numbers numerically | Discuss (Reject) | -| \#823 | Define function composition for :number and :integer values | Discuss | -| \#814 | Define function composition for date/time values | Discuss | -| \#806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | -| \#799 | Unify input and local declarations in model | Discuss | -| \#798 | Define function composition for :string values | Discuss | -| \#728 | Add "resolved values" section to formatting | Blocked by \#806 and \#798 | -| \#673 | Fix whitespace conformance to match UAX31 | Discuss | -| \#646 | Update spec as if PR \#645 were accepted | Discuss | -| \#584 | Add new terms to glossary | Discuss | - -## Topic: String Equality (#885, #869) - -*Addison proposed changes to address string equality. There is one controversial detail: whether literals require NFC for equality or not.* - -APP: \#885 I don’t think theres any disagreement around name equality. -The only place where our spec does literal matching is with key values. -Literals themselves are not constrained \- they’re just strings. -The question is \- do we want to require a key comparison to be done under NFC or roco\[?\] points? - -EAO: I don’t think we need to address this but we can: Are the keys normalized? -We _need_ to define equality for key lists. - -APP: We don’t have to require implementations to do any normalization. -If we do equality, we have to do NFC on the values or at least check that they aren’t normalized. - -MED: I think EAO is right that we separate these two items \- you generate a ‘duplicate key’ error if the key list is equal according to canonical \[?\]. Secondly, whether or not this is done before we pass a literal to a function or we leave it up to the functions. - -EAO: My preference is to allow normalization before, but use as if normalized. - -MIH: Do we want to leave the freedom to functions? I don’t find good use cases but it’s a custom function so it can do what it wants. - -APP: I agree with that. If we say we don’t normalize and then allow option values or operands. If denormalization works only some of the time, that’s cognitively tricky, versus saying ‘we’re not going to normalize these for you. - -``` -.local $angstromsAreCool = {Å :string} -.match $angstromsAreCool -Å {{U+212B is the only way to be cool}} -Å {{I'm U+00C5, so almost cool}} -Å {{I'm A + U+030A, so I combine with cool}} -* {{I'm not cool}} -``` -We’re not going to _stop_ you normalizing your data. - -MED: The implementation load is minimal for normalizing literals. -I think it’s far far more likely that people will have errors because of denormalized text rather than them wanting to do something with denormalized text. If we want this feature, we can think of a syntax but I don’t see a need. - -EAO: I would be fine with us handling key values differently from literal values because they feel syntax-y from a user point-of-view. When thinking about implementation and the requirement of match selector keys returning exact inputs. If I want to enable normalization to happen with my custom matching function, it becomes weird (e.g. hanging on to original values, comparison on normalized but return unnormalized values.) My \_key\_ point here is that we can define behaviour for keys separately from what we define for literals elsewhere in the spec. - -APP: I agree. Since we require key list uniqueness. Having a note about not requiring keys to be normalized. I can imagine lots of places where I want to do use a denormalized string as an operand. It behaves like text and we know how to handle text. For the operations that we control, saying this makes sense. - -MED: I’d prefer to not jump on this for literals before release. I think this takes a little more thought. - -APP: They already can be. What we’re saying is that the function will get whatever is there. We should clarify key equality and key comparison. - -MED: Requiring the comparison to be canonically equivalent (normalizing NFC) is good. I think normalizing before passing to selector. I certainly would want to make it possible to do. It’s a tricky subject. - -\[MED leaves\] - -MIH: I think if they are passed normalized then they should be returned normalized. - -EAO: We should do the same with option names. They’re compared as NFC but their values are not NFC. At the point of passing to a custom function, we don’t have any language saying what will happen. The normalizations should correspond with each other. - -ECH: \+1 to the idea that they are canonically equivalent. - -EAO: Happy to leave further consideration until later. - -APP: So implementations should normalize key names? I’m cool with that. Do we want to require a name to be NFC? - -EAO: Again, we should not talk about ’name’ in general, but with option names. We have the same non-duplication. We end up passing the option name uncanonicalized. If what we’re doing with option names and key values and attribute values would all match each other. If we talk about those specifically and not ‘names’ it would help. - -ECH: It's an interesting discussion but I'm not sure if we need to enforce people to provide things in NFC composed form. I think it’s good for checking equality. When it comes to function option names, checking for duplicates is useful. I think you just need to know what the contract is with the function. Maybe also we can revisit this and not worry for the time being. - -APP: It’s not so much that people are going to use denormalized latin script, it’s that they’ll use the domain things in their own language. When we say comparisons are done (see note in PR) and the name is not normalized, we treat them as equal so you can’t have the same names. In practice most people are going to choose rational values (things that don’t change because of encoding etc.). I’ve seen plenty of code written in different languages (e.g. variable names in Russian). - -MIH: I would be inclined to really normalize the names as if they were equal. In ICU, I think I put them in a map. If there’s a requirement to pass the real thing, it’s just weird. I can’t think of a good use case where people really care about this. - -EAO: I agree with MIH. I think we ought to normalize the string values of keys, option names and attribute names. I don’t know that we need to normalize anything else. Anything else can be normalized with a function, but this can be a later discussion. As a side node, I believe MF2 will lead us to localize our variable names. In this sort of use with Finnish, it made more sense for me to use Finnish variable names. - -APP: We would like all of our comparisons to be under normalization. -Permit literals that are not compared internally. -We think we might impose NFC on identifier name in future but not in 46\. Is that a fair summary? Can I make changes to the PR? - -EAO: I support that. I’m not hearing objections to requiring the normalization with keys, option names and attribute names. We could do that in 46 or later. - -APP: I will not merge today. - -## Topic: Whitespace Handling (\#884, \#847) - -*This pull request implements the design discussion from \#811 (“bidi-usability.md” design) and addresses UAX31/UTS55 requirements. Discuss merging.* - -APP: This implements the loose part of the bi-di design. It also changes whitespace handling \- as a result, it replaces S-production with an O-production for required whitespace. There’s text in the spec to deal with UAX requirements which are not a material change. The biggest kicker is to allow some of the bi-di markers into the syntax outside of text. - -EAO: I think I approved this PR. - -APP: You did. -Re. syntax stabilization, I’d like to say this is pretty close to the people who are tracking our progress. It makes a lot of very small changes to the optionality of whitespace (removing some square brackets). - -EAO: I propose we merge. - -APP: Any objection? - -TIM: There’s no spec tests, since there are a fair amount of changes being made to the ABNF. - -APP: I agree \- there are spec changes. The tests would need to be updated to have a bunch of the bi-di controls. - -EAO: Could we add the tests as a separate further change. - -TIM: Fine with me. - -APP: Any objection to merging this? I see none. I see some agreement. \[Merged\] - -APP: Anything else on whitespace. - -EAO: Track issue for tests. Separately, adding the recommended text “if you’re emitting message format 2, this is how you should be doing the bi-di output. Like with the data model, we could have a recommended part \- “these are instructions that you should be following but we’re not requiring you to do so. - -## 882 - -EAO: For boolean values that expect “true” and get a different literal string value, we’d expect them to behave the same as digit size options. - -APP: We have a task to specify additional places for this option. I’m going to squash and merge this one. - -MIH: I can’t find any boolean type of thing. \[Merged\] - -## Topic: Number Selection (#859, #842, #823) - -_Let’s resolve how number selection is described. We have some PRs loosely coupled to this, notably the design doc in 859 and @eemeli’s proposal to use number value selection in 842._ - -APP: \#859 is a change to the design document based on comments by EAO around matching numerically. It changes the status of the design from ‘approved’ back to ‘proposed’. Does anyone mind if we merge, knowing that we’ve captured this in the design document. - -EAO: We could iterate on the PR with the changes. Reopening is fine with me as well. I’d prefer using a different term than ‘proposed’ like ‘reopened’ to indicate that it might have a more colorful history. - -APP: I’ll do that. Do we want to talk about number selection today? - -EAO: I’d be happy to talk about that. - -APP: Current state is that we currently say something about using a serialization of a number as the thing that gets compared. EAO’s proposal is to change it to actual numeric comparison. - -EAO: The two really viable options: -Do the selection, ignoring all of the options on :number, because different implementations will understand the options differently (e.g. rounding \- we don’t define how that happens). I think the only really reasonable we to get consistent bahaviour is to ignore all options. -Or leave as-is but clarify that exact value selection is implementation-defined. -I’m not aware of other satisfactory options. - -MIH: I’d be happy to introduce something that looks like a numeric type that can be platform-specific. What we have now is just for exact keys, which are relatively rarely used. - -APP: 0 and 1 are used a lot. - -MIH: I’ve never seen an exact key that looks like an arbitrary precision. If somebody needs something like that, it’s a custom function. It’s easy to say that the values in this function are strings. It’s something that we can add to the plural later on. If we discover it’s not enough, number can also accept the arbitrary precision value. - -APP: I would urge people to read through the long thread on \#842. E.g. in plurals, having fraction digits selects a different value. I would want our key definition to be as clear as we can make it. And that certain kinds of matching may have idiosyncrasies. I think there are corner cases where people want to do integer matching. Occasionally fractional values get matched \- the most memorable example for me: 0.00 gets turned into ‘free’ when you have a currency value. We shouldn’t make it impossible but maybe we don’t have to specify all of the rounding etc. that EAO mentions. We’ll conflict with different programming languages. - -MIH: With plural $1 vs $1.00, it’s not about exact values. You really make it a number and apply the rules from CLDR. You make it a number anyway. It’s true that we care about the decimals for plural selection but not for exact match.We’re not blocking ourselves and can bring it back when people need it. If we treat everything as strings, people have to parse a string to a number. Libraries for MF2 should implement string \-\> number, which feels very clunky. - -APP: Where are we at? I don’t think we want to merge EAO’s proposal today. Our current wording attempts to solve this problem in a specific way but it doesn’t sound like we’re happy with it. It doesn’t sound like we’re going to fix this in 46\. - -EAO: I can imagine custom functions desiring 0.00 to indicate how formatting should happen. If it’s parsed as a numerical value, this information is lost. Behaving differently if it’s quoted vs not quoted, it’s weird. - -MIH: foo=|=0.00| -I would argue that, if somebody needs to make that distinction they can use quotes, etc. -If people want a string, they should treat it as a string. -I can have the options in JSON and we’re back to where we started \- I cannot convert to JSON. - -EAO: An option value of 1.3, everyone agrees on “1.3” but I can think of three different distinct numeric values of this. I think that we’d be imposing a high cost by requiring this within MF2. - -MIH: -``` -options" : { - "maxFractionalDigits": 1.00 -} -``` -\=\> this parses as a 1 (number) - -APP: we just disallowed that earlier in the call -I think the damage is limited to exact match keys. If we can contain it \[to this\] it’s easier. In either case, we don’t have text to merge today. I think a change can be made to the integer text to propose changes. - -EAO: What might be achievable is seeking consensus on whether comparison should be implementation-dependent. - -MIH: I would not merge this as-is. We argued that precision is going to screw you over. Either we care about precision or we don’t care about precision. - -APP: I disagree with actual numeric comparison. I think MED and I are coming from a similar place \- the number you are going to format later is what you are going to compare. EAO, you called out gaps in the current text. I don’t think it’s perfect. For 46, we could put in a note that we’re studying this problem and that comments are welcome. I don’t think we’ve solved it yet. - -EAO: Do we have consensus on it being implementation-defined? - -APP: I _might_. I think we should have clear guidance for authors. It wouldn’t be implementation-dependent and it would enhance portability. I would be open to introduce implementation-defined stuff. We could say, e.g. floating-point is somewhat implementation-defined. I would prefer if we could define it well and define the boundaries. - -MIH: Considering we have a code-freeze in 3 days, we should leave it as implementation-defined. - -APP: It’s not defined as that now, - -EAO: It’s currently implicitly implementation-defined. I don’t remember the exact text but it’s leaving wiggle-room for the implementers. Going from a number to its JSON representation, there’s not one JSON number representation that can be used. APP, you might need to write the write proposal text. I an welcome to be shown I am wrong but someone else will need to propose text. - -APP: In 45, we proposed only integer matching is required: -“Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits.” - -EAO: Might be good to review if the note satisfies some of the conditions we’ve mentioned here. -E.g. 1234 \-\> 12.34 or 12.00 in JSON - -APP: I’ll propose text for 46 that clarifies our note. I think we could fast-track that. - -EAO: It sounds like none of the function composition stuff is going to be merged for 46\. - -APP: That’s accurate. Although I believe we’re now near consensus of what we’re going to do. Am I hearing that we want to sit on :string for the time being. - -MIH: I think that last time we reached a generic way to do compositions. Without being 100% sure that those are bringing us closer to what we decided last week, I’d rather not do this now. - -EAO: The two tasks for function composition are in-line with what we talked about last week. In addition, we’ll still need to define how this stuff works for the functions that we define. - -APP: I think you guys are in violent agreement but not on timing. - -MIH: Timing and order. Since we didn’t fully agree on the generic rule in writing. We can’t say that we agree. - -EAO: We have pretty exact language in the notes from last week but can be returned to later. - -APP: We’re going to skip next week. I’ll use the email list and GitHub to communicate as we go through the 46 stuff. It’ll be effectively what we’ve merged now plus the fat-tracked items discussed in the call, then we’ll resume in 2 weeks. - -EAO: The skip next week is for W3C TPAC. - -APP: Fighting a good fight, but with a different hat\! - - -## Topic: Issue review** -[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) - -Currently we have 50 open (was 56 last time). - -* 14 are `Preview-Feedback` -* 3 are `resolve-candidate` and proposed for close. -* 2 are `Agenda+` and proposed for discussion. -* None are ballots - -| Issue | Description | Recommendation | -| ----- | ----- | ----- | -| \#865 | TC39-TG2 would like to see completion of the TG5 study | Discuss | -| \#847 | [Conformance with UAX \#31 & UTS \#55](https://github.com/unicode-org/message-format-wg/issues/847) | Discuss | -| | | | - - -## Topic: Design Status Review - -| Doc | Description | Status | -| ----- | ----- | ----- | -| bidi-usability | Manage bidi isolation | Accepted | -| dataflow-composability | Data Flow for Composable Functions | Proposed | -| function-composition-part-1 | Function Composition | Proposed | -| maintaining-registry | Maintaining the function registry | Proposed, Discuss | -| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | -| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | -| beauty-contest | Choose between syntax options | Obsolete | -| selection-matching-options | Selection Matching Options (ballot) | Obsolete | -| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | -| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | -| formatted-parts | Define how format-to-parts works | Rejected | -| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | -| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | -| code-mode-introducer | Choose the pattern for complex messages | Accepted | -| data-driven-tests | Capture the planned approach for the test suite | Accepted | -| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | -| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | -| error-handling | Decide whether and what implementations do after a runtime error | Accepted | -| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | -| expression-attributes | Define how attributes may be attached to expressions | Accepted | -| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | -| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | -| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | -| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | -| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | - -## Topic: AOB? - -— - -Chat stuff: - -You -9:34 AM -[https://docs.google.com/document/d/1zofxbu8PdxEpHbRVA1EtHnbPyrmEAPv4\_jqjFL4hx5o/edit](https://docs.google.com/document/d/1zofxbu8PdxEpHbRVA1EtHnbPyrmEAPv4_jqjFL4hx5o/edit?authuser=2) -*keep*Pinned -Mihai ⦅U⦆ Niță -9:36 AM -ICU has code freeze Sept 19\. So what's in by then, that's it (implementation wise) -Elango Cheran -9:51 AM -FYI to those new to Unicode string normalization: [https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html](https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html) -You -10:05 AM -\> \[\!NOTE\] \> Implementations are not required to normalize \_names\_. \> Comparisons of \_name\_ values only need be done "as-if" normalization \> has occured. \> Since most text in the wild is already in NFC \> and since checking for NFC is fast and efficient, \> implementations can often substitute checking for actually applying normalization \> to \_name\_ values. -Elango Cheran -10:09 AM -French and German have combining marks (umlaut, cedilla, accent, etc.) -You -10:09 AM -... but nobody types them denormalized -Mihai ⦅U⦆ Niță -10:20 AM -\> ... but nobody types them denormalized Vietnamese might type them denormalized -The Windows Vietnamese code page is denormalized. And legacy keyboards produced that form. I don't know if they are still widely used or not. -Mihai ⦅U⦆ Niță -10:38 AM -foo=|=0.00| -Mihai ⦅U⦆ Niță -10:41 AM -"options" : { "maxFractionalDigits": 1.00 } \=\> this parses as a 1 (number) -You -10:41 AM -we just disallowed that earlier in the call -Mihai ⦅U⦆ Niță -10:43 AM -I am not asking for treating quoted / not-quoted numbers differently \! -Mihai ⦅U⦆ Niță -10:47 AM -\> we just disallowed that earlier in the call What I'm saying is that the example I show is json And it is parsed as a number by the json parser. Which does not care about what we disallowed or not -You -10:49 AM -Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits. -^ is a note -Mihai ⦅U⦆ Niță -10:57 AM -\> French and German have combining marks (umlaut, cedilla, accent, etc.) Yes. But nobody types them in decomposed form. Vienamese does (some older keyboards) -MessageFormat Working Group teleconference diff --git a/meetings/2024/notes-2024-09-30.md b/meetings/2024/notes-2024-09-30.md deleted file mode 100644 index 38b5d1845..000000000 --- a/meetings/2024/notes-2024-09-30.md +++ /dev/null @@ -1,216 +0,0 @@ -# 30 September 2024 | MessageFormat Working Group Teleconference - -### Attendees - -- Addison Phillips - Unicode (APP) - chair -- Eemeli Aro - Mozilla (EAO) -- Elango Cheran - Google (ECH) -- Mihai Niță - Google (MIH) -- Richard Gibson - OpenJSF (RGN) -- Tim Chevalier - Igalia (TIM) -- - -**Scribe:** EAO - -## Topic: Info Share - -### TPAC Fallout - -APP: Physically present for half the conference; remoted in for the latter due to a cold. - -EAO: I filed [this issue](https://github.com/w3c/webextensions/issues/698) after talking to webextension CG, which has FF, WK, Chrome support for adopting MF2 as soon as we adopt. Kind of discussed a year ago. Had an hour to present to them. Reception was very positive. Solves a real problem. Issue has more details about what’s involved, and what the state of play is… I think notes have been published if more interested. - -… otherwise had good conversations with interesting people. Github, tiktok, others. Tiktok is potentially interesting, more than any other in US/EU, they have development in Chinese. Probably dealing somehow with sourcing in Chinese and then getting translate. Maybe hacking at it? Interesting problem? Dunno, hope to find out more. Will share. - -EAO: Mention JS implementation is up to date with spec. Maybe missing a minor detail. NPM was down. Will update it. - -ECH: program for UTW is now available. At least a couple sessions. Slots available. [https://www.unicode.org/events/utw/2024/](https://www.unicode.org/events/utw/2024/) - -### LDML 46 tag, branch, publication status - -APP: Updated as of last week. - -## Topic: LDML46 and Beyond - -- Review by ICU-TC and CLDR-TC -- Final work - -APP: Obviously we’re not finishing tech preview quite yet. Mark has mooted finishing our work this calendar year, and proposed a 46.1 release for MF 2.0 (e.g. 20 Nov). Both ICU & CLDR committees have expressed interest in reviewing the spec. Somewhat worried about receiving comments after finishing the work, rather than before. Approval for a 46.1 release is not certain, though. - -EAO: Reminds me of TG5 work. Ought to connect or addison, you, with the guy organizing the user study. - -ECH: there was a meeting on wednesday. Did they talk survey? - -EAO: I was there, yes, discussed survey and next steps. Gathering questions of content. Mentioned what APP proposed. Left on me to chase up. ECH, shall I include you? - -ECH: Yes, that sounds good. - -## Topic: PR Review - -*Timeboxed review of items ready for merge.* - -| PR | Description | Recommendation | -| ----- | ----- | ----- | -| 859 | \[DESIGN\] Number selection design refinements | Merge (Proposed) | -| 846 | Add Unicode Registry definition | Discuss (634) | -| 842 | Match numbers numerically | Discuss (Reject) | -| 823 | Define function composition for :number and :integer values | Discuss | -| 814 | Define function composition for date/time values | Discuss | -| 806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | -| 799 | Unify input and local declarations in model | Discuss | -| 798 | Define function composition for :string values | Discuss | -| 728 | Add "resolved values" section to formatting | Blocked by 806 and 798 | -| 646 | Update spec as if PR 645 were accepted | Discuss | -| 584 | Add new terms to glossary | Discuss | - -859 - -APP: Action on me to write some prose describing how this should happen. - -842 - -APP: Leaving open while 859 is in flight. - -### Number Selection - - - -### Resolved Value Implementation - -From [2024-09-10 call](https://github.com/unicode-org/message-format-wg/blob/main/meetings/2024/notes-2024-09-10.md): quote: - -> CONSENSUS: -> -> * A function MUST define its resolved value. The resolved value MAY be different from the value of the operand of the > function. It MAY be an implementation specific type. It is not required to be the same type as the operand. -> -> * A function MUST define its resolved options. The resolved options MAY be different from the options of the function. - -APP: Any concerns or objections? Is this still our consensus? - -…: \[tumbleweed\] - -ECH: Do we define “resolved value” in the spec? - -EAO: It would be added by PR 728. - -EAO: We should have a better place in the spec for providing these instructions to function authors. - -APP: Maybe in the syntax’s function definition? - -EAO: Would be more appropriately under “resolved value” in formatting, if we introduce that. - -EAO: With this consensus, could we look again at 728 today, or later? - -MIH: Add this for next week’s agenda? - -APP: A solid read-through makes sense before considering it. - -EAO: I’ll update 728 to include the above consensus for review during this week & approval next week. - -#### 823 - -… - -MIH: We should not include currencies and units in :number formatting. - -APP: Functions should say what they use, what they consume, what they emit. - -MIH: Also add options. Are we being too specific? - -EAO: With the proposed :string, :number, and :integer we’re covering this whole spectrum, as :string eats everything, :number passes everything through, and :integer filters out a few specific named options. - -MIH: We should be lax with the restrictions we impose. - -APP: A function should be specific about its side effects. - -MIH: Worried about nailing this down for :number and :integer. - -EAO: \[reads changes from PR\] - -… - -APP: Will review the PR again. - -## Topic: Issue review - -[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) - -Currently we have 49 open (was 50 last time). - -* 3 are (late for) LDML46 -* 15 are for 46.1 -* 14 are `Preview-Feedback` -* 4 are `resolve-candidate` and proposed for close. -* 4 are `Agenda+` and proposed for discussion. -* None are ballots - -| Issue | Description | Recommendation | -| ----- | ----- | ----- | -| 865 | TC39-TG2 would like to see completion of the TG5 study | Discuss, Agenda+ | -| 847 | [Conformance with UAX 31 & UTS 55](https://github.com/unicode-org/message-format-wg/issues/847) | Discuss, Agenda+ | -| 650 | Extra spaces in markup | Discuss, Agenda+ | -| 895 | The standard as is right now is unfriendly / unusual for tech stacks that are "native utf-16" | Discuss, Agenda+ | -| 837, 721, 650, 635 | (resolve candidates) | Close | - -### 847 - -EAO: We should have Someone™ check if we’re now conformant. - -APP: After discussion with Robin Berjon, we may be conformant now. I’ll do a check-through. - -### 650 - -APP: Are you satisfied with the resolution, after our prior discussions? - -MIH: It’s just an eyesore, if you ask me. HTML does not allow spaces before the tag identifier. The / is not a sigil like the others. It logically attaches to the {}, not the identifier. - -EAO: For me, the analogy with HTML/XML breaks because we introduced options on closing markup, \`{/foo opt=bar}\`. - -EAO: At the moment, the syntax uses sigils \`$ : / @\` as prefixes to the subsequent part of code, and allows whitespace (including newlines) quite liberally. Breaking this balance seems unnecessary. - -… - -MIH: Ok, let’s close it. - -APP: We could ballot this. - -… - -MIH: I’m fine to let it be. - -TIM: No issues implementing spec as is, no strong opinions on usability. - -RGN: Does not look like a significant benefit or hindrance for usability. - -## Topic: Design Status Review - -| Doc | Description | Status | -| ----- | ----- | ----- | -| bidi-usability | Manage bidi isolation | Accepted | -| dataflow-composability | Data Flow for Composable Functions | Proposed | -| function-composition-part-1 | Function Composition | Proposed | -| maintaining-registry | Maintaining the function registry | Proposed, Discuss | -| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | -| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | -| beauty-contest | Choose between syntax options | Obsolete | -| selection-matching-options | Selection Matching Options (ballot) | Obsolete | -| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | -| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | -| formatted-parts | Define how format-to-parts works | Rejected | -| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | -| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | -| code-mode-introducer | Choose the pattern for complex messages | Accepted | -| data-driven-tests | Capture the planned approach for the test suite | Accepted | -| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | -| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | -| error-handling | Decide whether and what implementations do after a runtime error | Accepted | -| exact-match-selector-options | Choose the name for the “exact match” selector function (this is \`:string\`) | Accepted | -| expression-attributes | Define how attributes may be attached to expressions | Accepted | -| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | -| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | -| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | -| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | -| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | - -## Topic: AOB? - diff --git a/meetings/2024/notes-2024-10-07.md b/meetings/2024/notes-2024-10-07.md deleted file mode 100644 index 869e3c57c..000000000 --- a/meetings/2024/notes-2024-10-07.md +++ /dev/null @@ -1,396 +0,0 @@ -# 7 October 2024 | MessageFormat Working Group Teleconference - - -### Attendees - -- Addison Phillips \- Unicode (APP) \- chair -- Eemeli Aro \- Mozilla (EAO) -- Mihai Niță \- Google (MIH) -- Tim Chevalier \- Igalia (TIM) -- Elango Cheran \- Google (ECH) -- Richard Gibson \- OpenJSF (RGN) -- Matt Radbourne \- Bloomberg (MRR) - -### Previous Attendees - -- Addison Phillips \- Unicode (APP) \- chair -- Eemeli Aro \- Mozilla (EAO) -- Elango Cheran \- Google (ECH) -- Mihai Niță \- Google (MIH) -- Richard Gibson \- OpenJSF (RGN) -- Tim Chevalier \- Igalia (TIM) -- - - - -**Scribe:** TIM - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## Topic: Info Share - -(discussion about EAO's upcoming talk about locale identifiers) - -## Topic: Schedule for Release - -*The CLDR-TC, ICU-TC and MFWG discussed a schedule for completing the 2.0 release. We propose to complete a dot-release of CLDR called 46.1 with balloting complete on 25 November. Stable (Draft) API in v47. The terminology here needs to be discussed to be clear.* - -*This means that we have just six weeks following this one to complete our work.* - -APP: EAO and I met with Mark Davis, Annemarie Apple, and a few others, about the possibilities for/schedules for doing an official release of MF2. To summarize, we would like to shoot for doing our release in this calendar year as an LDML 46.1, and then a stable draft release – draft is a specific status in ICU – in version 77 of ICU, which would be March 2025\. This means we need to be done with our work for 46.1, not 47\. A date that was suggested would be balloting complete on the spec by the 25th of November. Not counting this meeting, that leaves six more of these calls before we’d need to be done. I want to throw that out as a proposal and see if we are willing to commit to trying to make these dates. - -EAO: We would aim to be done with the spec by mid-November and we would declare our job done and have the spec be in a state where we can and will and should pass it on to the ICU TC, the CLDR TC, and probably the W3C TAG and TC39 TG2 to review and comment on and validate that this is suitable for the stated purposes, so that we can include it in next spring’s release? - -APP: We would want to be done in our own minds. One of my side goals is to indoctrinate CLDR and ICU TC so they would rubber stamp our work rather than spending a lot of time commenting. The other reviews would be external in the Technical Preview time frame. They would be post-us-saying-we’re-done. We would respond to feedback, but would be in a position of saying this isn’t going to change. - -EAO: On behalf of Unicode, there would not be a block for W3C TAG or TC39 TG2 to review and accept MF2 as a spec, but any input we would get could and should be taken into account, either in the 2.0 release or in future work that we do on the spec? - -APP: We would have an opportunity, because the draft version wouldn’t be until 47 / 77\. We would not persist in having weekly meetings working to resolve things. - -TIM: Do we have a list of what really needs to be resolved before mid-November. I’m wondering if we know what absolutely needs to be done. - -APP: I’ve updated `Things that Need Doing`. It’s relatively short. There are 47 issues. There are some housekeeping issues beyond the main important issues. That’s assuming we get through main issues like function composition - -ECH: Are we close to done? I guess so. Maybe it’s not a question of being close to done so much as: is what we have good enough? Is it a good place to put a stake in the sand and say “here’s a release”? - -EAO: I’m relatively confident that we are nearly done in the work needed for 2.0. At least from my point of view, a big change of us relaxing the stability policy to allow for later changes that we were previously not supporting makes it much easier to consider some issues in a post-2.0 world, rather than needing to get absolutely everything nailed down and fully agreed on before 2.0. The biggest things we need to figure out – there’s the u-options stuff, some questions around that, and then there’s the composition of `:date` and `:time` values specifically, and the point that Shane raised about wanting to get semantic skeleton considerations into the date/time stuff. One way to resolve that would be to leave it not required but optional, the `:datetime` field formatting options. If we resolve these things to some resolution, then I think we should have this thing sorted. Assuming we agree to the “easy” parts of resolved values and function composition. - -APP: I’d add the concept of standard or required and optional functions and options. I think that’s going to be an interesting thing we need to go through. We’ll have to invest some thought to make that concrete. So do we shoot for finishing balloting in the meeting on the 25th? - -EAO: Or sooner - -APP: If we’re finishing it there, then we have to be done sooner - -## Topic: `resolve-candidate` - -*The following issues are proposed for resolve:* -837 - -APP: Closed two resolve candidates this morning because they related to the reserved syntax we removed from the ABNF. The other one I have marked as resolved-feedback is feedback from Luca Casonato about “dot cannot be escaped”. This is also a problem because of reserved-statement, but we removed reserved-statement and so I think we can also close this one. Any objection? \[no objections\] That one’s closed. - -## Topic: UTF-16 unpaired surrogate handling (895) - -*Timeboxed discussion of how to handle unpaired surrogates.* - -APP: During the run-up to 46, Tim and Mihai ran into a potential infelicity because `content-char` does not allow unpaired surrogates, but string types in ICU4C/ICU4J do allow it, and their code was checking for unpaired surrogates in text. Seems like substantial overhead. They are asking whether we should change at least the `content-char` in text to allow for unpaired surrogate values in there. I counter-suggested that we add a note permitting implementations to not check for these, even though when we talk about the grammar of a message, we don’t permit it. That’s maybe to help some tools; I can’t think of a case where an unpaired surrogate is any kind of valid data that people would want to have in a message. I think it’s an error. Mihai or Tim, do you want to comment? - -MIH: I agree with you that there’s no good use case and it should be an error. The thing is, it does happen. The existing APIs that I know of don’t care, they just pass them through. A lot of string functions in those platforms consider strings to be a bunch of code units, not code points. I’ve seen cases with translated messages that had unpaired surrogates by accident and I don’t think you want to bring down a whole application because of something like that. On the other hand, I’ve seen people abusing unpaired surrogates by putting special markers in the strings. I don’t think these are good use cases, but people do that, and if you want to move between versions of MF2, you’d expect stuff like that to not explode in your face. We should have linters, but reality is what it is. - -ECH: Isn’t this a discussion we had a couple years ago? This is where it initially got introduced. I found RCH’s PR, 290, that introduced the change. I know that we talked about this stuff. - -APP: We did. There’s a couple of things here. There’s a practical consideration: do we need to require UTF-16-based implementations to write a bunch of code to check for this. I think my reaction there is that we probably don’t, for text. But disallowing them in names and other things is responsible. I don’t think those things work reliably. I think it probably makes more sense to keep the restriction in some places and allow for implementations to go “this bag of code units, I’m not going to check it”. If you think about a bunch of other places, like encoding, the unpaired surrogate’s going to be a replacement character. I hear you, Mihai, about people abusing code points for bad things, but Unicode has a bazillion private-use and other special things that you can use for that stuff. - -EAO: My preference order on solving this is first, to keep the restrictions we currently have; second, to allow for unpaired surrogates in `content-char` but only there; and beyond that, have this suggested text where implementations are free to vary on this. That sets up a bad situation, where switching between implementations breaks someone’s code. This is GIGO and I’m fine with that for content. I’d prefer us to not allow it, but we should do one or the other. - -APP: I will briefly note that `content-char` serves as the basis for `quoted-char` and `text-char`, so – - -EAO: We would need to change the inheritance between the chars to make this apply only to text content and nothing else. Not literal content either, probably. - -APP: I think what you’re suggesting is that `text-char` would allow surrogates - -EAO: That’s probably what I meant to say, yes - -MIH: Would we be okay to say something like “unpaired surrogates are converted by MF2 to the replacement character”? I’m not going to explode in your face, but if we see this, that’s what we’re going to do; it’s in the spec, it’s not optional. - -APP: We would be a USV(?) string, then. You’d have to check for unpaired. - -MIH: It’s in the spec right now; we check for the characters to be in those ranges. It’s not about it being difficult to implement. Accounting for reality, not what we would like necessarily. - -APP: A few proposals. One to permit them in `text-char`. One to allow them to be replaced with the replacement char. A third is not to do anything. Do we want to make a choice here? - -EAO: I’m interested to hear what RCH thinks, given the preceding iteration of this discussion had participation from him - -RCH: Mostly I wanted it nailed down. As long as it’s clear and the ability to output strings that are not expressible in a transformation format remains, then it’s fine. Nailing down names is acceptable to me, I don’t know why someone would want the names to be non-conforming, and they don’t affect the output anyway. - -EAO: If we are to not error on unpaired surrogates in text, my preference is to just pass them through as they are. Needing to treat them as a special escaped or replacement thing would add complexity that ought to be unnecessary. - -RCH: I agree - -APP: Would my suggestion work better, which is to say our syntax is rigorous but we allow implementations to ignore it for text? - -EAO: No, that’s worse, because we end up with inconsistent implementations and that’s going to be bad. It’s sounding like the least bad option is to allow for unpaired surrogates in text and pass them through as they are. - -APP: For all implementations? If we have a UTF-8 implementation, it won’t work. - -EAO: Isn’t that handled before the content gets to the MF2 parser? - -MIH: Yes, it’s lost before. - -RCH: There are implementations where it wouldn’t be possible to express the text content including an unpaired surrogate - -APP: We don’t want to require them to support it. - -MIH: The surrogates are lost already before that, so… - -APP: We should have very careful wording about the handling of unpaired surrogates. Who would like to write the PR? - -MIH: I can do that. I raised the issue and asked for it, kind of. - -EAO: `text-char` and only `text-char`. `text-char` currently inherits from `content-char`; it might be easier to define them separately. - -APP: No, you just OR on the unpaired range. - -EAO: Let’s see what MIH comes up with and go from there - -## Resolved Value Implementation (728) - -APP: This has spawned several additional bits of work, which we should not consider here. This is the main thing to make “resolved value” a formal term and define it in the way we’ve been discussing, which is to say the value from a function that also includes options and annotation. I have said okay, Tim has said okay, everyone else is sitting on the sidelines. Is this ready to go in? Anyone object to it going in? All right, we’re resolving resolved value. - -## Topic: PR Review - -*Timeboxed review of items ready for merge.* - -| PR | Description | Recommendation | -| ----- | ----- | ----- | -| 859 | \[DESIGN\] Number selection design refinements | Discuss | -| 846 | Add Unicode Registry definition | Discuss (634) | -| 842 | Match numbers numerically | Discuss (Reject) | -| 823 | Define function composition for :number and :integer values | Discuss | -| 814 | Define function composition for date/time values | Discuss | -| 806 | DESIGN: Add alternative designs to the design doc on function composition | Discuss | -| 799 | Unify input and local declarations in model | Discuss (for 14 Oct) | -| 798 | Define function composition for :string values | Discuss | -| 728 | Add "resolved values" section to formatting | Discuss (Merge, Revise summary) | -| 646 | Update spec as if PR 645 were accepted | Discuss | -| 584 | Add new terms to glossary | Discuss | - -### #799 (data model) - -APP: Hasn’t received a lot of love lately. - -EAO: I just refreshed this so it doesn’t have any merge conflicts and it’s easier to see the diff. The last comment there is from me replying to a bunch of stuff from Mihai, Elango and Stas about their concerns with respect to this. I think that was in July or something, and it hasn’t advanced from there. I would be very happy to actively ask Mihai and Elango to look at this and discuss it more on that thread during this week. - -MIH: Just one question to clarify. The last comment there is from July 28\. What changed since then? - -EAO: There’s a merge from main to that branch, accounting for changes done in the interim. - -MIH: The argument we all tried to make is: what’s the point of doing this? The debate is that there’s no good reason to do this. - -EAO: My request here is for you to review my last comment there and reply to it in the thread, and for us to discuss this next week. - -APP: So if I’m hearing correctly, there may be a disagreement about whether to do this and we’re going to have a technical discussion next week about it. - -APP: I think all of the other PRs have to do with resolved value or function composition, which is resolved value. I think an ask for the various authors is to go through and ensure those are consistent. Tim, I don’t know if 646 is germane anymore. I’ll close it. The other one is Simon Clark had some terms he wanted to add to the glossary. I think there are open comments against it. He’s not here to defend himself, so I will ping him. - -EAO: I was going to note that I have gone through the number and integer function composition and the date/time/datetime composition PRs, and in order to align them with the text that now landed, the only ones are the ones I proposed today, linkifying the “resolved value” term. Otherwise, these correspond with how we currently define resolved values. - -APP: If you’re interested, go through, everyone, and check to see if these are merge-ready. Then I will work on the number selection design piece. - -### Number Selection (#842, #859) - - APP: The outstanding thing we have left is non-integer number selection. And/or any changes to integer number selection. The thing that’s missing there is I have a proposal… - -### bidi changes - -APP: Has anyone worked on tests for those? - -EAO: I do not have automated tests that validate it. - -MIH: I didn’t have time to do anything, due to CLDR/ICU release cycle. - -MRR: I can write some tests for that - -### Function composition for number and integer - -EAO: As we’ve discussed number and integer function composition for a bit, the text there should align with our current understanding of what a resolved value is, would it be possible to consider that for merging today? - -APP: I have some wording things. Maybe that could be considered separately. Do others have a feeling? Any objections? We’ll be back here soon if we change the number selection. \[No objections\] Merging - -EAO: So the next thing is the date and time and datetime composition thing. There, I think the biggest question is whether – what do we do when you have a `:time` value and you feed it to a `:datetime`, and what’s supposed to happen there? The argument I’m proposing in the current PR is to consider it an error. From a `:time` you get a “time-like thing” and the input requirements for a `:datetime` must be a “datetime-like thing”. - -APP: I think that’s too stringent, because – there’s classical timekeeping of the milliseconds since epoch/calendar variety, and there’s Temporal-type time types, and a subset of the Temporal-like time types are restricted in that way. But most of the classical ones have ?? for this kind of thing. I think there’s a tripping hazard where if I knowingly pass a `Java.util.date` in my arguments array, and the first time I touch it I annotate it with `:time`, I’m still thinking it’s a `Java.util.date` so I can touch it a second time with a `:datetime`. I can support the idea that a `:time` may throw an error, because it might only be a time. I’m reticent to break classical timekeeping. - -EAO: As whatever `:time` can do is a strict subset of what you can do with a `:datetime`, in order to get the effect of what you're looking for, you could and probably should use a `:datetime` on the input. Even if we allow for an error to occur, it means that a reader of a message who doesn’t know how the value from the outside is coming in – it becomes quite dangerous to presume that you could use a `:time` thing in the resolved value of a `:time` annotated expression and then do `:date` operations on it. Where what you ought to be using is `:datetime`. - -MIH: I have two arguments. One is – I agree with EAO that that would feel like the correct behavior. On the other hand, there are PLs that don’t even have any special types for date and time, like C. There is in libraries and whatnot, but the language doesn’t have anything in the standard libraries. The other argument is that one can imagine something like `time`… imagine something that takes a time and gives you back a datetime by gluing today’s date to it. I don’t know if that’s the current time function. Similarly for the other way around. I can imagine a function that takes a `time` and gives you back a `datetime`. If you say that’s not the current function `time`, then you’re probably right. I would tend to be tolerant the way APP described. - -APP: I would be okay with saying that a `:time` annotated value or a `:date` annotated value may throw a Bad Operand error, or with other function types, because it’s using an implementation-defined type that isn’t supported. For example, a zoned time would throw a Bad Operand error if you tried to `:date` it. That’s an explainable thing and there’s a developer on the end of that stick who would understand why it happens, so the usage pattern is clear. There’s a bunch of operations that we’re kind of ignoring. Coercing time zone on and off values to float and unfloat the value, other things people commonly want to do with time values – MF2 should have a clear story. I built a whole bunch of things for that in past lives that are effective and that I can explain to developers. What I’m afraid of is that there’s a lot of developers in the world and they’re going to be passing in values and are not thinking of annotations as having an effect on the value. We want to make it simple for them to do the right things and possible for them to do the hard things, and that’s why I tend to be reticent about making a hard limit on that when it may just be an expression thing. - -EAO: Sounds like there could be a consensus position here where a `:datetime` is always fine with an operand that is coming from a `:datetime`; a `:time` is always fine with an operand coming from `:datetime` or `:time`; and a `:date` is always fine with an operand coming from `:datetime` or `:date`. And if you otherwise combine these resolved values with such annotations, the behavior is implementation-defined and that behavior may be to complain about a bad operand. Does this match what you are proposing? - -MIH: I think that would be a good way to put it in the spec. On the other side, I think I would leave this kind of stuff to a linter. In the early days of MF2, we tried not to be opinionated about things that aren’t really i18n. PLs are catching up; JS has a Temporal proposal, Java added something… it’s a stretch for us to be opinionated. Leave this to a linter, enforce what EAO described, but not in the spec. - -APP: I don’t know that I agree with linting. EAO’s proposal makes sense because it’s an enumerable thing to say that some implementation-defined types may cause Bad Operand. Suppose I have a local time value to use a specific type. Does `:datetime` format it or is that a bad operand? - -EAO: That’s an implementation-defined behavior. - -APP: In your implementation, how would you handle it? - -EAO: That would depend on what `Intl.DateTimeFormat` does with whatever value you end up giving it. Given that `Intl.DateTimeFormat` does not currently support such a value, it might depend on exactly what options are declared there. - -APP: And I know that that’s how Java works. DateTimeFormat works fine on that unless you ask for a year. - -EAO: Just to clarify, we are talking here about the behavior when combining resolved values rather than formatted values. …That’s behavior we can entirely control in the spec. I want to modify the PR to match what I presented earlier and there’s certainly space there for linters around it. We should be recommending against messages that feed in a `:time` to a `:date` or a `:date` to a `:datetime`. Fundamentally, because the words we’re using imply to a reader that they’re not quite sure what might happen. Even if we leave it as an implementation-defined behavior, we should recommend against it, given that with `:datetime` we can make it happen in a way that’s clear to the reader. - -MIH: If the proposal is changed in the way EAO is described, I won’t oppose, but I think it’s overreaching. We should be opinionated about i18n, but this isn’t i18n, it’s bad programming practice. Not my business to handle that. - -APP: I understand about “we’re not going to actually call the function” but I think there’s still room to say “implementation-defined types”. We do say that the resolved value is an implementation-defined type, and that’s generally narrower than the ones that it accepts. Potentially an implementation could say “here’s the list of types I will emit as a resolved value” and if you mix and match, it could result in a bad operand. - -EAO: I would like to push back at MIH, I think it’s relevant to translation and l10n. If we have a message with an input that has a `:date`, the resolved value of this input is then used as an operand for a `:datetime`, a translator looking at this can either reasonably presume that the value being formatted is the full original date/time passed in, or it could also be the date with a 00 time on it for the beginning of the day, because it was passed through a `:date` and therefore it’s lost the time. If we allow for this, and particularly if linters don’t complain, we’ll end up with messages that are valid but confusing. This confusion is what I’m seeking most to avoid here. - -APP: One observation: the option bag conversation will become interesting here, because that’s one of the other things that composes, and as you mentioned earlier, Shane wants us to lean towards the nascent semantic skeleton thing, and maybe make some of these option bags optional. We want to carefully consider what the options are. That might have an influence there. You’re right that it’s possible to write a message that would effectively filter information out of a date and time value. That is potentially antithetical to our idea of immutability. Translators will generally see placeholders that say what they want to do. They’re not thinking about whether the numbers are going to be 0 or not, they’re thinking about what values are going to appear here. - -``` -.input {$date :datetime} -.local $t = {$date :time} -.local $d = {$t :date} -{{What does {$d} at {$t} say?}} -``` - -EAO: I’d be happy for us to move on to that discussion and specifically a proposal I’d like to make on the topic, which is that I think we should make for the initial release of the default functions the field options of `:datetime` optional rather than required. So that implementations can implement those, but they are not required to do so. - -MIH: So you mean the whole option bag that we have now would be optional? - -EAO: Not the whole option bag, the field options. So that excludes some of the options – do we call them locale options? – and the timestyle and datestyle options, which I do think should be required. - -MIH: I’m very reluctant to do that. One of the big requirements from Mark Davis, and I agree with it, is to have a way to migrate existing messages to MF2. Existing messages do have equivalent things to what we have here with option bags. MF1 has option bags and the JS formatter has something like this. Even if semantic skeletons land sooner or later, this is kind of well-established stuff that I think would be good to support. People do that today; they use it with existing native APIs. - -APP: Let me present Shane’s argument. The best practice at some near-term future moment would be to use skeletons and in particular, the semantic skeletons that aren’t programmable with the weird pattern language ICU has. If that were the best practice, then you want it to be standard and built-in. Any of the existing implementations should be able to handle that because they are going to feed it through the datetime pattern generator behind the scenes. They would have a way to generate that option bag or generate the pattern through local functionality. This would push people toward good things, so therefore it should be standard. There would be these optional options, where we would say how they’re implemented and what the valid values are, and our definition of optional is that you’re not required to implement them, but if you do, do it like this. I could see implementing this as optional and I can see ICU as having it. People have programmed wacky patterns in the past. We don’t currently have picture strings at all. We should address those requirements in the right way, and it might be through optional options. Or if we require it, then everyone has to write that code. - -EAO: I was just going to mention that Mihai, I think the requirement for migrating from MF1 content into MF2 is already going to require some set of extensions to the default functions. Skeletons come to mind, picture strings is another, which is entirely valid for MF1. Also the spellout and other functions for number, and the plural offset, which we also do not have. All of these things are required for having MF1-to-MF2 transformability. So us making these options as optional rather than required is not going to increase the burden for any such migration. In particular, as none of these options are directly supported by MF1. - -MIH: I’m very split. Picture strings are bad i18n, we rejected them from very early on, and that’s part of the area where we’re entitled to be opinionated. We know it’s bad i18n. This is not about bad i18n, it’s something that – soon it’s going to be best practices, but what’s the definition of soon? Soon can be five years or more. Stuff like this – I don’t know. You mentioned skeletons. Yes, but the skeletons can be mapped 1:1 to the existing option bags. It’s just syntactic sugar. So for MF1, skeletons are supported. I can do the same thing you used to do then today. - -EAO: I’m pretty sure for the majority of cases, that is true, but on the edges, there is functionality in semantic skeleta that’s supported in date/time formatting that is not supported in JS at all. I’ve written a parser for those formats so I could build exactly those option bags, and needing to leave some of the values on the edges, unsupported. - -APP:`Intl.DateTimeFormat` is a subset of the functionality present in ICU. So – ICU is more capable of representing a bunch of things, so I’d be unsurprised by that assertion. Two interesting things: one, one of Shane’s things is that the semantic skeleta are limited in what you can represent. They don’t let you do some things that the current skeleton lets you do, like year-month-hour. You can’t say that in a semantic skeleton. That’s maybe an interesting thing. Mihai might be interested to note that when you do the resolved value thing, will ICU skeleton result in resolved options that look like year/month/hour/minute field option bags/ Or will it look like ICU skeleton as the option? - -MIH: Everything looks like option bags. They get converted to an ICU skeleton in order to do the formatting, only when you do format-to-string things. So the resolved value would contain option bags. - -EAO: I would also like to note that the thing I’m asking for is specifically and only downgrading these field options from required to optional in the initial release. Doing so and still defining them and saying which values they’re supposed to take in makes it possible for us to later change our minds and make them required. The intent with this change would be to give a little time for the work on semantic skeleta to proceed and see if it is on a track to becoming a widely adopted standard. Allowing near-future implementations to not need to implement also the field options if they go the other way out. This is a concern for the ICU4X implementation. - -MIH: I don’t know. We’ve been pushing skeletons for many years and people are starting to adopt them. I would be reluctant to push something out and have people say “you can’t even do date and time now.” If I look at the spec and say “I can’t even do this basic stuff I’ve been doing for ten years”, it feels like a bummer. So I think the semantic skeletons are going the right direction, but the thing is, we have existing things in current languages/frameworks that do it a certain way, not just in ICU, in ECMAScript, with Java.time. So you want as little friction as you can. It’s my problem if I want “December at 5 PM”, it’s not an i18n problem. - -EAO: I don’t think people are going to make decisions at that sort of level are going to be looking at the spec. They’ll be looking at the implementation that they’re going to be using. For the JS implementation, I’m still going to opt into all of the field options if we make them optional. I’m in a position where I can do that and trust that the situation is going to resolve one way or the other before the `Intl.MessageFormat` part of the language is locked down. I kind of trust and believe that the ICU impl might choose to opt into these options. The ICU impl might include an `icu:skeleton` option directly. These are going to be the interfaces that people need to look at to choose what they’re doing. Rather than us saying in the spec for `:datetime` that these specific options are optional. - -MIH: I would say that a big selling point of MF2 is being cross-platform. I can write a bunch of messages and use them in GMail Android, web, and iOS. That’s a big selling point. Having extensions is one thing, another one is icu: options, it’s not portable anymore. You say you’re in a position to do that as optional, I don’t think you are. You might be able to put it in Firefox but not in Chrome. We can’t even guarantee we have a JS implementation that is consistent everywhere. If we have some kind of “draft” namespace that’s the same everywhere, that would help, but I don’t think it’s a good idea. - -APP: I think maybe there’s a gap in the phrasing that we’re using. EAO and I have been discussing that in refactoring the function registry, I think we discussed it in previous calls, instead of having a built-in registry and proto-registry, that we have `:number`, which has required things and optional things. Optional options are part of the `:number` spec and if you are an implementor, you are not required to implement them. If you do, then you have to implement them like that. Different than the optional registry. What we’re saying is that every implementation absolutely has to have this set of options, `datestyle` and `timestyle`, and you may have these other ones, and because they’re standardized, toolchains would know what those things meant. They would be built in, but not every implementation would accept those options. The current thing that we have is a brief window in which we could leave out some set of options and therefore not have a whole bunch of options that are ?? deprecated, sort of the way some of the early date stuff in Java is. It’s been deprecated for 30 years and it would be good not to reinvent at-deprecating some of these things if we can. If we think we have to have the option bags, so be it, but then everyone will have to implement it. - -EAO: Just thought I’d clarify that when I say “JS implementation” I mean the npm-installable library that is an OpenJS Foundation project, that is in part a polyfill for the JS spec for `Intl.MessageFormat`. So the spec for `Intl.MessageFormat` will need some definition of what it supports. That’s currently at stage 1 and it will take some time to advance through standardization. Separately, the package on npm, which is entirely controlled by me, I can make it accept all of the current options of the formatters. The key is that later on, I can do a major version update to that library where I drop features and switch to a different sort of option bag if semantic skeletons advance sufficiently that they become available on `Intl.DateTimeFormat` in JS, and it starts to make sense for the `Intl.MessageFormat` implementation to only support semantic skeletons and not these field options. This is what I mean by me being able to control what I do in my implementation, and the spec later when it finalizes may say something else. - -MIH: Then I want to ask a question. You said these are optional the same way we have certain options on the number and integer formatters. If that’s the case, then this is not in the same bucket with skeletons in ICU, because that’s in a namespace that’s implementation-specific. I’m not sure what we’re proposing yet. Leave them out completely, or say “you can implement this in a namespace”? - -EAO: No; the proposal specifically is that we leave them as they are with the names they currently have, which are namespaced, and say “you may implement these options on `:datetime`”. - -MIH: Then we can never take them away - -APP: That’s right - -EAO: We can never take them away from the spec, but an implementation would not need to support them - -MIH: Meaning they’re not portable - -EAO: At the moment they’re not portable, correct - -APP: We’ve been talking about this a while. I think we’ve talked about the abstract aspects of it and I think we should work on a concrete proposal or maybe even a design doc that says “here are the options”. As we’ve got six weeks to agree. We should have a clearer understanding – bringing this up is good because we should have some level of policy here. We should be parsimonious about what we put in, because everything we put in is required forever. At the same time, we should put in everything that we think is necessary for meaningful adoption. - -EAO: For an example of an optional formatter that I think we should define, maybe add on later, is `:list`. List formatting is something that is actively supported in multiple places; we have a decent idea of what it looks like, and we should allow for it to be supported. At the same time, I don’t think we’re in a position where we want to require all implementations to support it. - -MIH: I agree with you and I think I even have a list as a proof of concept in one of the unit tests, just to make sure that my implementation can support stuff like that. Certain things will be under the icu namespace, like durations. But list is not in MF1, so not a strong requirement from ICU to say “you have to support that in MF2”. The whole idea of dropping these option bags, I think I would like to take this up with the ICU TC to ask them how they feel about it. In the end, I have to land that thing in ICU itself. - -APP: Let’s see how much we can resolve within the WG in a week. It may be a no-op. - -EAO: Two things. `:duration` like `:list` is another one I’d be happy for us to define as an optional formatter. And then say, if you’re going to do it, do it this way. But we can return to this later as we can expand and work on the core set of functions. Another point is that the intent with what I proposed here is not to drop the field options, but to make them optional, so the question to ICU TC would be whether to support field options or not, as they are spec’d but as optional. - -MIH: I really don’t like the idea of making them optional without a namespace. I see there that I can use it in ICU, I will assume it’s standard and portable and I can use it. People don’t use the spec, they’ll be in their editor and copy/paste examples, they’ll see it works on three platforms but the fourth one doesn’t. I’d feel better with the namespace. `icu:` is a big warning that it’s not portable. When it becomes final, you drop the `icu:`. They don’t read the spec and notice that this stuff they’ve copy/pasted that works everywhere else doesn’t work in one place. - -EAO: Are you also arguing against defining `:list` and `:duration` as formatters that would be optional? - -MIH: At this point, we don’t have time for it, so I’m opposing it based on – - -EAO: What you’re proposing about these options is also an argument that could be made about having optional-but-not-required formatters defined at all in the spec. - -APP: I think we have to define functions that some implementations are not required to implement. PHP will implement this, perl, awk… they don’t have a list formatter, so they’re not going to do that. Would you be happier, Mihai, if we used the `u:` namespace? - -MIH: Kind of; you say it will be deprecated, but it will never really be deprecated - -APP: If we specify them, they will always be there, but as you well know, there will be things we can say “but best practices say…” That’s documentation, not implementation. Implementations have to do what the spec says. With `list` as an example, if we specify list formatters, then we want people to do it like X. If we use the `u:` namespace, we can always remove that to make it required by every implementation. Which I assume we would version MF if we did that, because we’d be breaking a bunch of implementations. - -MIH: We version the registry, but not MF - -APP: We don’t have a registry anymore, but we version specs. There’s that, and things like some of these optional options which we might never promote. We would still say, if you write one, then it looks like this. - -MIH: One of the ideas with the machine-readable registry was that you can use it to implement a linter or tooling like IDEs, or integrate it with translation tools. So translators know not to scrub stuff… even `u:`, if I lint, all I can say is “warning: this is not portable.” - -APP: I’m going to timebox this. Somebody should take the action item to put the options together in a design doc. Adding a machine-readable registry description is a fine task for us to do in the preview period after 46.1, as something we consider adding on. Unless we think that suddenly becomes a requirement again, I don’t see us doing it now. Does what I’m suggesting sound like the right outcome? - -EAO: I’m here to say that if we’re going to define the `u:` namespace as stuff that might or might not work, we should consider whether the `u` letter is useful or if some other prefix would be better, if `x` is appropriate or otherwise. I think we should stick to the plan that Addison has been advancing, which would allow for optional things to be in the root namespace. It sounds like a conversation we’ll need to continue later. - -APP: Who wants the action to write a design doc? - -EAO: On what part of this? - -APP: The options – enumerating them to consider in technical arguments. - -EAO: I nominate Shane - -APP: He’s not here - -MIH: I will try to take some temperature readings in the ICU TC - -APP: Are you going to write the design doc? - -EAO: I think we really want Shane to do it; because he’s the one who originally wants this. - -EAO: Next actions on me are to update the date/time function composition as we agreed on here. Making the changes sooner will make the later discussion easier. Separately I’ll look at the string composition one. If we could get that to land next week, it would be really good. With an explicitly defined resolved value, we can do much better at defining what a fallback value is. - -EAO: We should send to the mailing list a note about this upcoming deadline - -APP: I will do that when we hang up - -## Topic: Issue review - -[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) - -Currently we have 48 open (was 50 last time). - -* 3 are (late for) LDML46 -* 15 are for 46.1 -* 15 are `Preview-Feedback` -* 1 is `resolve-candidate` and proposed for close. -* 2 are `Agenda+` and proposed for discussion. -* None are ballots - -| Issue | Description | Recommendation | -| ----- | ----- | ----- | -| 865 | TC39-TG2 would like to see completion of the TG5 study | Discuss, Agenda+ | -| 895 | The standard as is right now is unfriendly / unusual for tech stacks that are "native utf-16" | Discuss, Agenda+ | -| 837 | (resolve candidates) | Close | - -## Topic: Design Status Review - -| Doc | Description | Status | -| ----- | ----- | ----- | -| bidi-usability | Manage bidi isolation | Accepted | -| dataflow-composability | Data Flow for Composable Functions | Proposed | -| function-composition-part-1 | Function Composition | Proposed | -| maintaining-registry | Maintaining the function registry | Proposed, Discuss | -| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | -| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | -| beauty-contest | Choose between syntax options | Obsolete | -| selection-matching-options | Selection Matching Options (ballot) | Obsolete | -| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | -| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | -| formatted-parts | Define how format-to-parts works | Rejected | -| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | -| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | -| code-mode-introducer | Choose the pattern for complex messages | Accepted | -| data-driven-tests | Capture the planned approach for the test suite | Accepted | -| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | -| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | -| error-handling | Decide whether and what implementations do after a runtime error | Accepted | -| exact-match-selector-options | Choose the name for the “exact match” selector function (this is `:string`) | Accepted | -| expression-attributes | Define how attributes may be attached to expressions | Accepted | -| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | -| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | -| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | -| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | -| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | - -## Topic: AOB? - diff --git a/meetings/2024/notes-2024-10-14.md b/meetings/2024/notes-2024-10-14.md deleted file mode 100644 index 3ad42c97c..000000000 --- a/meetings/2024/notes-2024-10-14.md +++ /dev/null @@ -1,298 +0,0 @@ -# 14 October 2024 | MessageFormat Working Group Teleconference - -### Attendees - -- Addison Phillips - Unicode (APP) -chair -- Eemeli Aro - Mozilla (EAO) -- Mihai Niță - Google (MIH) -- Tim Chevalier - Igalia (TIM) -- Richard Gibson - OpenJSF (RGN) -- Matt Radbourne - Bloomberg (MRR) -- Mark Davis - Google (MED) - - -**Scribe:** MIH - - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## [**Agenda**](https://github.com/unicode-org/message-format-wg/wiki#agenda) - -To request that the chair add an *issue* to the agenda, add the label `Agenda+` To request that the chair add an agenda item, send email to the message-format-wg group email. - -## Topic: Info Share - -(none) - -## Topic: Schedule for Release - -(none) - -## Topic: `resolve-candidate` - -*The following issues are proposed for resolve:* -797 -786 -752 -703 - -## ** Topic: Agenda+ Topics** - -### Bag of options vs. semantic skeletons - -### - -### Topic: Allow surrogates in content - -*The previous consensus was to allow unpaired surrogate code points in text but not in literal or other constructs. Mihai points out some issues with this.* - -MIH: My initial understanding was that we should allow this in localizable text, and literals are localizable text - -### Topic: Add alternative designs to the design doc on function composition - -*This topic should take only a minute. The discussion here is whether to merge PR 806, marking the design as “obsolete” or just close the PR.* - -### : Topic: 799/786 Possible simplification of the data model/unify input/local definitions - -***This was homework for this week.** The PR proposes to unify local and input declarations in the data model. We should accept or reject this proposal.* - -### Topic: 603 We should not require \* if the variant keys exhaust all possibilities - -*We should review this proposal and categorically accept or reject it for 46.1* - -## ** Topic: PR Review** - -*Timeboxed review of items ready for merge.* - -| PR | Description | Recommendation | -| ----- | ----- | ----- | -| 906 | Allow surrogates in content | Discuss, Agenda+ | -| 905 | Apply NFC normalization during :string key comparison | Merge | -| 904 | Add tests for changes due to 885 (name/literal equality) | Merge | -| 903 | Fix fallback value definition and use | Discuss | -| 902 | Add tests for changes due to bidi/whitespace | Merge | -| 901 | Clarify note about eager vs. lazy evaluation | Discuss | -| 859 | \[DESIGN\] Number selection design refinements | Discuss | -| 846 | Add u: options namespace | Discuss (634) | -| 842 | Match numbers numerically | Discuss (Reject) | -| 814 | Define function composition for date/time values | Discuss | -| 806 | DESIGN: Add alternative designs to the design doc on function composition | Merge as Obsolete, Agenda+ | -| 799 | Unify input and local declarations in model | Discuss (for 14 Oct) | -| 798 | Define function composition for :string values | Discuss | -| 584 | Add new terms to glossary | Discuss | - -## Topic: Issue review - -[https://github.com/unicode-org/message-format-wg/issues](https://github.com/unicode-org/message-format-wg/issues) - -Currently we have 46 open (was 48 last time). - -* 3 are (late for) LDML46 -* 15 are for 46.1 -* 11 are `Preview-Feedback` -* 4 are `resolve-candidate` and proposed for close. -* 3 are `Agenda+` and proposed for discussion. -* None are ballots - -| Issue | Description | Recommendation | -| ----- | ----- | ----- | -| | | | -| | | | -| | | | - -## ** Topic: Design Status Review** - -| Doc | Description | Status | -| ----- | ----- | ----- | -| bidi-usability | Manage bidi isolation | Accepted | -| dataflow-composability | Data Flow for Composable Functions | Proposed | -| function-composition-part-1 | Function Composition | Proposed | -| maintaining-registry | Maintaining the function registry | Proposed, Discuss | -| number-selection | Define how selection on numbers happens | Revision Proposed, Discuss | -| selection-declaration | Define what effect (if any) the annotation of a selector has on subsequence placeholders | Proposed, Discuss (Agenda+) | -| beauty-contest | Choose between syntax options | Obsolete | -| selection-matching-options | Selection Matching Options (ballot) | Obsolete | -| syntax-exploration-2 | Balloting of the revised syntax used in the Tech Preview | Obsolete | -| variants | A collection of message examples which require a branching logic to handle grammatical variations | Obsolete | -| formatted-parts | Define how format-to-parts works | Rejected | -| quoted-literals | Document the rationale for including quoted literals in MF and for choosing the | as the quote symbol | Accepted | -| builtin-registry-capabilities | Tech Preview default registry definition | Accepted | -| code-mode-introducer | Choose the pattern for complex messages | Accepted | -| data-driven-tests | Capture the planned approach for the test suite | Accepted | -| default-registry-and-mf1-compatibility | Default Registry and MF1 Compatibility | Accepted | -| delimiting-variant-patterns | Delimiting of Patterns in Complex Messages (Ballot) | Accepted | -| error-handling | Decide whether and what implementations do after a runtime error | Accepted | -| exact-match-selector-options | Choose the name for the “exact match” selector function (this is `:string`) | Accepted | -| expression-attributes | Define how attributes may be attached to expressions | Accepted | -| open-close-placeholders | Describe the use cases and requirements for placeholders that enclose parts of a pattern | Accepted | -| overriding-extending-namespacing | Defines how externally-authored functions can appear in a message; how externally authored options can appear; and effect of namespacing | Accepted | -| pattern-exterior-whitespace | Specify how whitespace inside of a pattern (at the start/end) works | Accepted | -| string-selection-formatting | Define how selection and formatting of string values takes place. | Accepted | -| variable-mutability | Describe how variables are named and how externally passed variables and internally defined variables interact | Accepted | - -## ** Topic: AOB?** - -EAO: I will probably not be available in the next two meetings - -### Make bag of options for `` `:date` `` and `` `:time` `` optional in wait for semantic skeletons - -MED: do we go out with nothing, or with an interim - -EAO: can we have some time with these non-required, and make them required later - -APP: we are talking about required options. Non required means you can still implement them. - -APP: we decided early on to go with a bag of options because they can go back and forth to string skeletons. They are equivalent. - -APP: what are we going to do with semantic skeletons they they come? - -APP: we can’t really ship only with date / time style. We can’t say we are complete without something more flexible. - -MED: I feel strongly that semantic skeletons are where we want to go. -The current skeletons / bag of options would be a migration path. -We can make them optional for now, and that gives us freedom to make them required, or keep them optional forever. - -APP: but we do them as a package. If you implement, we implement all. - -APP: anything else you are interested on in the agenda - -### 603 We should not require \* if the variant keys exhaust all possibilities - -MED: touching on the star, the issue of not requiring it means that things are not that robust. -Messages build without a star you get into problems. It is kind of ugly to mix `\*` and `other`, but it is more robust. - -EAO: the other case is the booleans. If you define true / false you will have nothing else ever. - -APP: you need to know how to “explode” the cases. - -MED: I think that we can back away from it if we require selectors to identify a default value. -So at least the default value should be there. -But has the downside that implementations need to know about all the selectors. - -MIH: you mentioned we discussed it. Thought we reached a decision. Mentioning booleans. Seems like they have only two values, but some languages, like java, can have a null there. Localization tools have to know the functions. No way for tools to know without machine readable registry for now. - -MED: eventually we need a machine readable registry. - -MIH: for a while we don’t have it. - -EAO: how an implementation communicates about custom functions is the language server work. -When we have a selector like `:boolean` if there is a `{$x :boolean}`, if `$x` is not provided then the selection fails. - -APP: probably best we can do. - -EAO: with `\*` the selection would use that. - -APP: in the end plural will be a pointer to CLDR -Other selectors will likely behave the same. -Machine readability needs to be able to include a “hey, look there” - -MED: a lot of tools will take the messages in a source language, expand, translated, then compact. -So in theory it can compact to `\* \* \*`. -The star makes the tooling much more reliable. - -APP: this is also a thing we can examine in the tech preview. We asked, we had no feedback. -This can be tightened in the future, if we need to. -We have a proposal on the table. - -EAO: we can’t loosen it in the future. - -APP: this is a data model. It is checked before we do function resolution. -Which makes it tricky. - -MED: requiring it is backward compatible. If we relax it in the future, the old messages are still valid. - -EAO: I wanted to note that it looks like the proposal is rejected. Maybe for future consideration. - -APP: any other topics you want to touch. - -### 797 Create a PR for function interaction - -Can I close this? Objections. - -### 786 Possible simplification of the data model - -APP: Find to resolve? - -### 752 Improve test coverage for built-in function options - -TIM: fin to close it? - -### 793 Recommend not escaping all the things - -TIM: no objections to close it - -### 905 Apply NFC normalization during :string key comparison 905 - - -APP: Closing, approved by MED, TIM, APP - -### 904 Add tests for changes due to 885 (name/literal equality) - -APP: EAO approved, I have some minor comments - -EAO: I left a comment. - -### 902 Tests for bidi and whitespace - -APP: EAO an me already approved. Comments? - -### 806 DESIGN: Add alternative designs to the design doc on function composition - -APP: we already did a lot of that work -Do we want to merge? -Some good work here. I can merge but mark it as obsolete. - -### 895 Allowing surrogates - -APP: there are areas that are localizable. -One of the examples was with text in a placeholder. -I tend to agree that the first pass through UTF-8 will break shoes characters. - -APP: the proposal as you make it means we can use one in a key. - -EAO: can I jump into this? -Bad tooling can make mistakes in the text. Bot in literals. - -APP: I tend to agree. If MF2 implementation would break in unpaired surrogates it might be a feature. - -MIH: I don’t see a difference between text and localizable literals. -If a tool is bad then it is bad in both. - -TIM: for implementation I didn’t know what the correct behavior is when we find invalid surrogates. - -APP: is the proposal to allow unpaired surrogates everywhere? - -MIH: no, only in localizable text - -EAO: is NFC well defined for unpaired surrogates? - -APP: yes - -RGN: I am 90% confident it normalizes to replacement character. - -APP: I checked, NFC normalizes as itself - -EAO: when you update this make sure to change all mentions of code units, to code points. - -EAO: will you include a warning to not use unpaired surrogates? - -MIH: yes - -### 814 Define function composition for date/time values - -EAO: can we merge that? - -APP: that is not permanent? Is it a solution for now? - -EAO: it allows us to change later. - -APP: I think we will be back here when we get to semantic skeletons - -MIH: we are introducing a strong type system, even when the underlying programming language does not do that. We basically say that ``:date`` returns a date kind of type, and it is an error to feed that into ``:time``, because it is a bad type. - -### 799, 786 Unify input and local declarations in data model / \[FEEDBACK\] Possible simplification of the data model - -MIH: Long discussion, unfortunately I was involved in it an didn’t manage to take notes. -But the final decision was to drop it - -APP: drop diff --git a/spec/README.md b/spec/README.md index c603282ca..a631901c6 100644 --- a/spec/README.md +++ b/spec/README.md @@ -17,7 +17,6 @@ 1. [Resolution Errors](errors.md#resolution-errors) 1. [Message Function Errors](errors.md#message-function-errors) 1. [Default Function Registry](registry.md) -1. [`u:` Namespace](u-namespace.md) 1. [Formatting](formatting.md) 1. [Interchange data model](data-model/README.md) @@ -80,33 +79,41 @@ A reference to a _term_ looks like this. > The provisions of the stability policy are not in effect until > the conclusion of the technical preview and adoption of this specification. -Updates to this specification will not make any valid _message_ invalid. - +Updates to this specification will not change +the syntactical meaning, the runtime output, or other behaviour +of valid messages written for earlier versions of this specification +that only use functions defined in this specification. Updates to this specification will not remove any syntax provided in this version. +Future versions MAY add additional structure or meaning to existing syntax. -Updates to this specification MUST NOT specify an error for any message -that previously did not specify an error. - -Updates to this specification MUST NOT specify the use of a fallback value for any message -that previously did not specify a fallback value. +Updates to this specification will not remove any reserved keywords or sigils. -Updates to this specification will not change the syntactical meaning -of any syntax defined in this specification. +> [!NOTE] +> Future versions may define new keywords. -Updates to this specification will not remove any functions defined in the default registry. +Updates to this specification will not reserve or assign meaning to +any character "sigils" except for those in the `reserved` production. -Updates to this specification will not remove any options or option values -defined in the default registry. +Updates to this specification +will not remove any functions defined in the default registry nor +will they remove any options or option values. +Additional options or option values MAY be defined. > [!NOTE] -> The foregoing policies are _not_ a guarantee that the results of formatting will never change. -> Even when this specification or its implementation do not change, +> This does not guarantee that the results of formatting will never change. +> Even when the specification doesn't change, > the functions for date formatting, number formatting and so on -> can change their results over time or behave differently due to local runtime -> differences in implementation or changes to locale data -> (such as due to the release of new CLDR versions). +> will change their results over time. + +Later specification versions MAY make previously invalid messages valid. + +Updates to this specification will not introduce message syntax that, +when parsed according to earlier versions of this specification, +would produce syntax or data model errors. +Such messages MAY produce errors when formatted +according to an earlier version of this specification. -Updates to this specification will only reserve, define, or require +From version 2.0, MessageFormat will only reserve, define, or require function names or function option names consisting of characters in the ranges a-z, A-Z, and 0-9. All other names in these categories are reserved for the use of implementations or users. @@ -114,31 +121,28 @@ All other names in these categories are reserved for the use of implementations > [!NOTE] > Users defining custom names SHOULD include at least one character outside these ranges > to ensure that they will be compatible with future versions of this specification. -> They SHOULD also use the namespace feature to avoid collisions with other implementations. -Future versions of this specification will not introduce changes +Later versions of this specification will not introduce changes to the data model that would result in a data model representation based on this version being invalid. > For example, existing interfaces or fields will not be removed. -> [!IMPORTANT] -> This stability policy allows any of the following, non-exhaustive list, of changes -> in future versions of this specification: -> - Future versions may define new syntax and structures -> that would not be supported by this version of the specification. -> - Future versions may add additional structure or meaning to existing syntax. -> - Future versions may define new keywords. -> - Future versions may make previously invalid messages valid. -> - Future versions may define additional functions in the default registry -> or may reserve the names of functions for the purposes of interoperability. -> - Future versions may define additional options to existing functions. -> - Future versions may define additional option values for existing options. -> - Future versions may deprecate (but not remove) keywords, functions, options, or option values. -> - Future versions of this specification may introduce changes -> to the data model that would result in future data model representations -> not being valid for implementations of this version of the data model. -> - For example, a future version could introduce a new keyword, -> whose data model representation would be a new interface -> that is not recognized by this version's data model. +Later versions of this specification MAY introduce changes +to the data model that would result in future data model representations +not being valid for implementations of this version of the data model. + +> For example, a future version could introduce a new keyword, +> whose data model representation would be a new interface +> that is not recognized by this version's data model. + +Later specification versions will not introduce syntax that cannot be +represented by this version of the data model. + +> For example, a future version could introduce a new keyword. +> The future version's data model would provide an interface for that keyword +> while this version of the data model would parse the value into +> the interface `UnsupportedStatement`. +> Both data models would be "valid" in their context, +> but this version's would be missing any functionality for the new statement type. diff --git a/spec/appendices.md b/spec/appendices.md index b65036c6c..e94544596 100644 --- a/spec/appendices.md +++ b/spec/appendices.md @@ -14,10 +14,12 @@ host environments, their serializations and resource formats, that might be sufficient to prevent most problems. However, MessageFormat itself does not supply such a restriction. -MessageFormat _messages_ permit nearly all Unicode code points +MessageFormat _messages_ permit nearly all Unicode code points, +with the exception of surrogates, to appear in _literals_, including the text portions of a _pattern_. This means that it can be possible for a _message_ to contain invisible characters -(such as bidirectional controls, ASCII control characters in the range U+0000 to U+001F, +(such as bidirectional controls, +ASCII control characters in the range U+0000 to U+001F, or characters that might be interpreted as escapes or syntax in the host format) that abnormally affect the display of the _message_ when viewed as source code, or in resource formats or translation tools, diff --git a/spec/data-model/README.md b/spec/data-model/README.md index bd7028df0..517596f1c 100644 --- a/spec/data-model/README.md +++ b/spec/data-model/README.md @@ -17,10 +17,11 @@ Implementations that expose APIs supporting the production, consumption, or tran _message_ as a data structure are encouraged to use this data model. This data model provides these capabilities: -- any MessageFormat 2.0 message can be parsed into this representation +- any MessageFormat 2 message (including future versions) + can be parsed into this representation - this data model representation can be serialized as a well-formed -MessageFormat 2.0 message -- parsing a MessageFormat 2.0 message into a data model representation +MessageFormat 2 message +- parsing a MessageFormat 2 message into a data model representation and then serializing it results in an equivalently functional message This data model might also be used to: @@ -58,6 +59,10 @@ declarations, options, and attributes to be optional rather than required proper > In the MessageFormat 2 [syntax](/spec/syntax.md), the source for these `name` fields > sometimes uses the production `identifier`. > This happens when the named item, such as a _function_, supports namespacing. +> +> In the Tech Preview, feedback on whether to separate the `namespace` from the `name` +> and represent both separately, or just, as here, use an opaque single field `name` +> is desired. ## Messages @@ -80,7 +85,7 @@ interface PatternMessage { interface SelectMessage { type: "select"; declarations: Declaration[]; - selectors: VariableRef[]; + selectors: Expression[]; variants: Variant[]; } ``` @@ -93,8 +98,21 @@ The `name` does not include the initial `$` of the _variable_. The `name` of an `InputDeclaration` MUST be the same as the `name` in the `VariableRef` of its `VariableExpression` `value`. +An `UnsupportedStatement` represents a statement not supported by the implementation. +Its `keyword` is a non-empty string name (i.e. not including the initial `.`). +If not empty, the `body` is the "raw" value (i.e. escape sequences are not processed) +starting after the keyword and up to the first _expression_, +not including leading or trailing whitespace. +The non-empty `expressions` correspond to the trailing _expressions_ of the _reserved statement_. + +> [!NOTE] +> Be aware that future versions of this specification +> might assign meaning to _reserved statement_ values. +> This would result in new interfaces being added to +> this data model. + ```ts -type Declaration = InputDeclaration | LocalDeclaration; +type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement; interface InputDeclaration { type: "input"; @@ -107,6 +125,13 @@ interface LocalDeclaration { name: string; value: Expression; } + +interface UnsupportedStatement { + type: "unsupported-statement"; + keyword: string; + body?: string; + expressions: Expression[]; +} ``` In a `SelectMessage`, @@ -148,35 +173,45 @@ type Pattern = Array; type Expression = | LiteralExpression | VariableExpression - | FunctionExpression; + | FunctionExpression + | UnsupportedExpression; interface LiteralExpression { type: "expression"; arg: Literal; - function?: FunctionRef; + annotation?: FunctionAnnotation | UnsupportedAnnotation; attributes: Attributes; } interface VariableExpression { type: "expression"; arg: VariableRef; - function?: FunctionRef; + annotation?: FunctionAnnotation | UnsupportedAnnotation; attributes: Attributes; } interface FunctionExpression { type: "expression"; arg?: never; - function: FunctionRef; + annotation: FunctionAnnotation; + attributes: Attributes; +} + +interface UnsupportedExpression { + type: "expression"; + arg?: never; + annotation: UnsupportedAnnotation; attributes: Attributes; } + +type Attributes = Map; ``` ## Expressions The `Literal` and `VariableRef` correspond to the the _literal_ and _variable_ syntax rules. When they are used as the `body` of an `Expression`, -they represent _expression_ values with no _function_. +they represent _expression_ values with no _annotation_. `Literal` represents all literal values, both _quoted literal_ and _unquoted literal_. The presence or absence of quotes is not preserved by the data model. @@ -196,14 +231,14 @@ interface VariableRef { } ``` -A `FunctionRef` represents a _function_. +A `FunctionAnnotation` represents a _function_ _annotation_. The `name` does not include the `:` starting sigil. `Options` is a key-value mapping containing options, -and is used to represent the _function_ and _markup_ _options_. +and is used to represent the _annotation_ and _markup_ _options_. ```ts -interface FunctionRef { +interface FunctionAnnotation { type: "function"; name: string; options: Options; @@ -212,13 +247,31 @@ interface FunctionRef { type Options = Map; ``` +An `UnsupportedAnnotation` represents a +_private-use annotation_ not supported by the implementation or a _reserved annotation_. +The `source` is the "raw" value (i.e. escape sequences are not processed), +including the starting sigil. + +When parsing the syntax of a _message_ that includes a _private-use annotation_ +supported by the implementation, +the implementation SHOULD represent it in the data model +using an interface appropriate for the semantics and meaning +that the implementation attaches to that _annotation_. + +```ts +interface UnsupportedAnnotation { + type: "unsupported-annotation"; + source: string; +} +``` + ## Markup A `Markup` object has a `kind` of either `"open"`, `"standalone"`, or `"close"`, each corresponding to _open_, _standalone_, and _close_ _markup_. The `name` in these does not include the starting sigils `#` and `/` or the ending sigil `/`. -The `options` for markup use the same key-value mapping as `FunctionRef`. +The `options` for markup use the same key-value mapping as `FunctionAnnotation`. ```ts interface Markup { @@ -230,17 +283,6 @@ interface Markup { } ``` -## Attributes - -`Attributes` is a key-value mapping -used to represent the _expression_ and _markup_ _attributes_. - -_Attributes_ with no value are represented by `true` here. - -```ts -type Attributes = Map; -``` - ## Extensions Implementations MAY extend this data model with additional interfaces, diff --git a/spec/data-model/message.dtd b/spec/data-model/message.dtd index bc51dd159..33be40df2 100644 --- a/spec/data-model/message.dtd +++ b/spec/data-model/message.dtd @@ -1,5 +1,5 @@ @@ -10,7 +10,13 @@ name NMTOKEN #REQUIRED > - + + + + @@ -18,8 +24,8 @@ @@ -27,13 +33,15 @@ - - + + - + + + diff --git a/spec/data-model/message.json b/spec/data-model/message.json index b669af462..77fc3a4f4 100644 --- a/spec/data-model/message.json +++ b/spec/data-model/message.json @@ -32,11 +32,11 @@ "attributes": { "type": "object", "additionalProperties": { - "oneOf": [{ "$ref": "#/$defs/literal" }, { "const": true }] + "oneOf": [{ "$ref": "#/$defs/literal-or-variable" }, { "const": true }] } }, - "function": { + "function-annotation": { "type": "object", "properties": { "type": { "const": "function" }, @@ -45,17 +45,65 @@ }, "required": ["type", "name"] }, - "expression": { + "unsupported-annotation": { + "type": "object", + "properties": { + "type": { "const": "unsupported-annotation" }, + "source": { "type": "string" } + }, + "required": ["type", "source"] + }, + "annotation": { + "oneOf": [ + { "$ref": "#/$defs/function-annotation" }, + { "$ref": "#/$defs/unsupported-annotation" } + ] + }, + + "literal-expression": { "type": "object", "properties": { "type": { "const": "expression" }, - "arg": { "$ref": "#/$defs/literal-or-variable" }, - "function": { "$ref": "#/$defs/function" }, + "arg": { "$ref": "#/$defs/literal" }, + "annotation": { "$ref": "#/$defs/annotation" }, "attributes": { "$ref": "#/$defs/attributes" } }, + "required": ["type", "arg"] + }, + "variable-expression": { + "type": "object", + "properties": { + "type": { "const": "expression" }, + "arg": { "$ref": "#/$defs/variable" }, + "annotation": { "$ref": "#/$defs/annotation" }, + "attributes": { "$ref": "#/$defs/attributes" } + }, + "required": ["type", "arg"] + }, + "function-expression": { + "type": "object", + "properties": { + "type": { "const": "expression" }, + "annotation": { "$ref": "#/$defs/function-annotation" }, + "attributes": { "$ref": "#/$defs/attributes" } + }, + "required": ["type", "annotation"] + }, + "unsupported-expression": { + "type": "object", + "properties": { + "type": { "const": "expression" }, + "annotation": { "$ref": "#/$defs/unsupported-annotation" }, + "attributes": { "$ref": "#/$defs/attributes" } + }, + "required": ["type", "annotation"] + }, + "expression": { "oneOf": [ - { "required": ["type", "arg"] }, - { "required": ["type", "function"] } + { "$ref": "#/$defs/literal-expression" }, + { "$ref": "#/$defs/variable-expression" }, + { "$ref": "#/$defs/function-expression" }, + { "$ref": "#/$defs/unsupported-expression" } ] }, @@ -100,12 +148,26 @@ }, "required": ["type", "name", "value"] }, + "unsupported-statement": { + "type": "object", + "properties": { + "type": { "const": "unsupported-statement" }, + "keyword": { "type": "string" }, + "body": { "type": "string" }, + "expressions": { + "type": "array", + "items": { "$ref": "#/$defs/expression" } + } + }, + "required": ["type", "keyword", "expressions"] + }, "declarations": { "type": "array", "items": { "oneOf": [ { "$ref": "#/$defs/input-declaration" }, - { "$ref": "#/$defs/local-declaration" } + { "$ref": "#/$defs/local-declaration" }, + { "$ref": "#/$defs/unsupported-statement" } ] } }, @@ -139,7 +201,7 @@ "declarations": { "$ref": "#/$defs/declarations" }, "selectors": { "type": "array", - "items": { "$ref": "#/$defs/variable" } + "items": { "$ref": "#/$defs/expression" } }, "variants": { "type": "array", diff --git a/spec/errors.md b/spec/errors.md index 5782622b2..7a6375ee9 100644 --- a/spec/errors.md +++ b/spec/errors.md @@ -24,36 +24,22 @@ or _Message Function Errors_ in _expressions_ that are not otherwise used by the such as _placeholders_ in unselected _patterns_ or _declarations_ that are never referenced during _formatting_. -When formatting a _message_ with one or more errors, -an implementation MUST provide a mechanism to discover and identify -at least one of the errors. -The exact form of error signaling is implementation defined. -Some examples include throwing an exception, -returning an error code, -or providing a function or method for enumerating any errors. - -For all _valid_ _messages_, -an implementation MUST enable a user to get a formatted result. -The formatted result might include _fallback values_ -such as when a _placeholder_'s _expression_ produced an error -during formatting. - -The two above requirements MAY be fulfilled by a single formatting method, -or separately by more than one such method. +In all cases, when encountering a runtime error, +a message formatter MUST provide some representation of the message. +An informative error or errors MUST also be separately provided. When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors SHOULD prioritise _Syntax Errors_ and _Data Model Errors_ over others. -When an error occurs while resolving a _selector_ -or calling MatchSelectorKeys with its resolved value, +When an error occurs within a _selector_, the _selector_ MUST NOT match any _variant_ _key_ other than the catch-all `*` -and a _Bad Selector_ error MUST be emitted. +and a _Resolution Error_ or a _Message Function Error_ MUST be emitted. ## Syntax Errors -**_Syntax Errors_** occur when the syntax representation of a message is not _well-formed_. +**_Syntax Errors_** occur when the syntax representation of a message is not well-formed. > Example invalid messages resulting in a _Syntax Error_: > @@ -75,7 +61,7 @@ and a _Bad Selector_ error MUST be emitted. ## Data Model Errors -**_Data Model Errors_** occur when a message is not _valid_ due to +**_Data Model Errors_** occur when a message is invalid due to violating one of the semantic requirements on its structure. ### Variant Key Mismatch @@ -86,16 +72,13 @@ does not equal the number of _selectors_. > Example invalid messages resulting in a _Variant Key Mismatch_ error: > > ``` -> .input {$one :func} -> .match $one +> .match {$one :func} > 1 2 {{Too many}} > * {{Otherwise}} > ``` > > ``` -> .input {$one :func} -> .input {$two :func} -> .match $one $two +> .match {$one :func} {$two :func} > 1 2 {{Two keys}} > * {{Missing a key}} > * * {{Otherwise}} @@ -109,16 +92,13 @@ does not include a _variant_ with only catch-all keys. > Example invalid messages resulting in a _Missing Fallback Variant_ error: > > ``` -> .input {$one :func} -> .match $one +> .match {$one :func} > 1 {{Value is one}} > 2 {{Value is two}} > ``` > > ``` -> .input {$one :func} -> .input {$two :func} -> .match $one $two +> .match {$one :func} {$two :func} > 1 * {{First is one}} > * 1 {{Second is one}} > ``` @@ -126,27 +106,27 @@ does not include a _variant_ with only catch-all keys. ### Missing Selector Annotation A **_Missing Selector Annotation_** error occurs when the _message_ -contains a _selector_ that does not -directly or indirectly reference a _declaration_ with a _function_. +contains a _selector_ that does not have an _annotation_, +or contains a _variable_ that does not directly or indirectly reference a _declaration_ with an _annotation_. > Examples of invalid messages resulting in a _Missing Selector Annotation_ error: > > ``` -> .match $one +> .match {$one} > 1 {{Value is one}} > * {{Value is not one}} > ``` > > ``` > .local $one = {|The one|} -> .match $one +> .match {$one} > 1 {{Value is one}} > * {{Value is not one}} > ``` > > ``` > .input {$one} -> .match $one +> .match {$one} > 1 {{Value is one}} > * {{Value is not one}} > ``` @@ -206,16 +186,13 @@ same list of _keys_ is used for more than one _variant_. > Examples of invalid messages resulting in a _Duplicate Variant_ error: > > ``` -> .input {$var :string} -> .match $var +> .match {$var :string} > * {{The first default}} > * {{The second default}} > ``` > > ``` -> .input {$x :string} -> .input {$y :string} -> .match $x $y +> .match {$x :string} {$y :string} > * foo {{The first "foo" variant}} > bar * {{The "bar" variant}} > * |foo| {{The second "foo" variant}} @@ -240,8 +217,7 @@ An **_Unresolved Variable_** error occurs when a variable reference c > ``` > > ``` -> .input {$var :func} -> .match $var +> .match {$var :func} > 1 {{The value is one.}} > * {{The value is not one.}} > ``` @@ -260,33 +236,67 @@ a reference to a function which cannot be resolved. > ``` > > ``` -> .local $horse = {|horse| :func} -> .match $horse +> .match {|horse| :func} > 1 {{The value is one.}} > * {{The value is not one.}} > ``` +### Unsupported Expression + +An **_Unsupported Expression_** error occurs when an expression uses +syntax reserved for future standardization, +or for private implementation use that is not supported by the current implementation. + +> For example, attempting to format this message +> would result in an _Unsupported Expression_ error +> because it includes a _reserved annotation_. +> +> ``` +> The value is {!horse}. +> ``` +> +> Attempting to format this message would result in an _Unsupported Expression_ error +> if done within a context that does not support the `^` private use sigil: +> +> ``` +> .match {|horse| ^private} +> 1 {{The value is one.}} +> * {{The value is not one.}} +> ``` + +### Unsupported Statement + +An **_Unsupported Statement_** error occurs when a message includes a _reserved statement_. + +> For example, attempting to format this message +> would result in an _Unsupported Statement_ error: +> +> ``` +> .some {|horse|} +> {{The message body}} +> ``` + ### Bad Selector A **_Bad Selector_** error occurs when a message includes a _selector_ -with a _resolved value_ which does not support selection. +with a resolved value which does not support selection. > For example, attempting to format this message > would result in a _Bad Selector_ error: > > ``` > .local $day = {|2024-05-01| :date} -> .match $day +> .match {$day} > * {{The due date is {$day}}} > ``` ## Message Function Errors A **_Message Function Error_** is any error that occurs -when calling a _function handler_ +when calling a message function implementation or which depends on validation associated with a specific function. -Implementations SHOULD provide a way for _function handlers_ to emit +Implementations SHOULD provide a way for _functions_ to emit (or cause to be emitted) any of the types of error defined in this section. Implementations MAY also provide implementation-defined _Message Function Error_ types. @@ -300,7 +310,7 @@ Implementations MAY also provide implementation-defined _Message Function Error_ > 3. Uses a `:get` message function which requires its argument to be an object and > an option `field` to be provided with a string value. > -> The exact type of _Message Function Error_ is determined by the _function handler_. +> The exact type of _Message Function Error_ is determined by the message function implementation. > > ``` > Hello, {horse :get field=name}! @@ -338,8 +348,7 @@ for that specific _function_. > ``` > > ``` -> .local $horse = {|horse| :number} -> .match $horse +> .match {|horse| :number} > 1 {{The value is one.}} > * {{The value is not one.}} > ``` @@ -376,8 +385,7 @@ does not match the expected implementation-defined format. > which is a requirement of the `:number` function: > > ``` -> .local $answer = {42 :number} -> .match $answer +> .match {42 :number} > 1 {{The value is one.}} > horse {{The value is a horse.}} > * {{The value is not one.}} diff --git a/spec/formatting.md b/spec/formatting.md index f1a12cae0..dc3719b10 100644 --- a/spec/formatting.md +++ b/spec/formatting.md @@ -7,26 +7,16 @@ when formatting a message for display in a user interface, or for some later pro To start, we presume that a _message_ has either been parsed from its syntax or created from a data model description. -If the resulting _message_ is not _well-formed_, a _Syntax Error_ is emitted. -If the resulting _message_ is _well-formed_ but is not _valid_, a _Data Model Error_ is emitted. +If this construction has encountered any _Syntax Errors_ or _Data Model Errors_, +an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result. -The formatting of a _message_ is defined by the following operations: - -- **_Pattern Selection_** determines which of a message's _patterns_ is formatted. - For a message with no _selectors_, this is simple as there is only one _pattern_. - With _selectors_, this will depend on their resolution. - -- **_Formatting_** takes the _resolved values_ of - the _text_ and _placeholder_ parts of the selected _pattern_, - and produces the formatted result for the _message_. - Depending on the implementation, this result could be a single concatenated string, - an array of objects, an attributed string, or some other locally appropriate data type. +Formatting of a _message_ is defined by the following operations: - **_Expression and Markup Resolution_** determines the value of an _expression_ or _markup_, with reference to the current _formatting context_. This can include multiple steps, such as looking up the value of a variable and calling formatting functions. - The form of the _resolved value_ is implementation defined and the + The form of the resolved value is implementation defined and the value might not be evaluated or formatted yet. However, it needs to be "formattable", i.e. it contains everything required by the eventual formatting. @@ -34,15 +24,6 @@ The formatting of a _message_ is defined by the following operations: The resolution of _text_ is rather straightforward, and is detailed under _literal resolution_. -Implementations are not required to expose -the _expression resolution_ and _pattern selection_ operations to their users, -or even use them in their internal processing, -as long as the final _formatting_ result is made available to users -and the observable behavior of the _formatting_ matches that described here. - -_Attributes_ MUST NOT have any effect on the formatted output of a _message_, -nor be made available to _function handlers_. - > [!IMPORTANT] > > **This specification does not require either eager or lazy _expression resolution_ of _message_ @@ -54,9 +35,28 @@ nor be made available to _function handlers_. > value of a given _expression_ until it is actually used by a > selection or formatting process. > However, when an _expression_ is resolved, it MUST behave as if all preceding -> _declarations_ affecting _variables_ referenced by that _expression_ +> _declarations_ and _selectors_ affecting _variables_ referenced by that _expression_ > have already been evaluated in the order in which the relevant _declarations_ -> appear in the _message_. +> and _selectors_ appear in the _message_. + +- **_Pattern Selection_** determines which of a message's _patterns_ is formatted. + For a message with no _selectors_, this is simple as there is only one _pattern_. + With _selectors_, this will depend on their resolution. + + At the start of _pattern selection_, + if the _message_ contains any _reserved statements_, + emit an _Unsupported Statement_ error. + +- **_Formatting_** takes the resolved values of the selected _pattern_, + and produces the formatted result for the _message_. + Depending on the implementation, this result could be a single concatenated string, + an array of objects, an attributed string, or some other locally appropriate data type. + +Formatter implementations are not required to expose +the _expression resolution_ and _pattern selection_ operations to their users, +or even use them in their internal processing, +as long as the final _formatting_ result is made available to users +and the observable behavior of the formatter matches that described here. ## Formatting Context @@ -78,92 +78,61 @@ At a minimum, it includes: This is often determined by a user-provided argument of a formatting function call. - The _function registry_, - providing the _function handlers_ of the functions referred to by message _functions_. + providing the implementations of the functions referred to by message _functions_. -- Optionally, a fallback string to use for the message if it is not _valid_. +- Optionally, a fallback string to use for the message + if it contains any _Syntax Errors_ or _Data Model Errors_. Implementations MAY include additional fields in their _formatting context_. -## Resolved Values - -A **_resolved value_** is the result of resolving a _text_, _literal_, _variable_, _expression_, or _markup_. -The _resolved value_ is determined using the _formatting context_. -The form of the _resolved value_ is implementation-defined. +## Expression and Markup Resolution -In a _declaration_, the _resolved value_ of an _expression_ is bound to a _variable_, -which makes it available for use in later _expressions_ and _markup_ _options_. +_Expressions_ are used in _declarations_, _selectors_, and _patterns_. +_Markup_ is only used in _patterns_. -> For example, in -> ``` -> .input {$a :number minimumFractionDigits=3} -> .local $b = {$a :integer notation=compact} -> .match $a -> 0 {{The value is zero.}} -> * {{In compact form, the value {$a} is rendered as {$b}.}} -> ``` -> the _resolved value_ bound to `$a` is used as the _operand_ -> of the `:integer` _function_ when resolving the value of the _variable_ `$b`, -> as a _selector_ in the `.match` statement, -> as well as for formatting the _placeholder_ `{$a}`. +In a _declaration_, the resolved value of the _expression_ is bound to a _variable_, +which is available for use by later _expressions_. +Since a _variable_ can be referenced in different ways later, +implementations SHOULD NOT immediately fully format the value for output. In an _input-declaration_, the _variable_ operand of the _variable-expression_ identifies not only the name of the external input value, -but also the _variable_ to which the _resolved value_ of the _variable-expression_ is bound. +but also the _variable_ to which the resolved value of the _variable-expression_ is bound. -In a _pattern_, the _resolved value_ of an _expression_ or _markup_ is used in its _formatting_. +In _selectors_, the resolved value of an _expression_ is used for _pattern selection_. -The form that _resolved values_ take is implementation-dependent, +In a _pattern_, the resolved value of an _expression_ or _markup_ is used in its _formatting_. + +The form that resolved values take is implementation-dependent, and different implementations MAY choose to perform different levels of resolution. -> While this specification does not require it, -> a _resolved value_ could be implemented by requiring each _function handler_ to -> return a value matching the following interface: +> For example, the resolved value of the _expression_ `{|0.40| :number style=percent}` +> could be an object such as > -> ```ts -> interface MessageValue { -> formatToString(): string -> formatToX(): X // where X is an implementation-defined type -> getValue(): unknown -> resolvedOptions(): { [key: string]: MessageValue } -> selectKeys(keys: string[]): string[] -> } +> ``` +> { value: Number('0.40'), +> formatter: NumberFormat(locale, { style: 'percent' }) } > ``` > -> With this approach: -> - An _expression_ could be used as a _placeholder_ if -> calling the `formatToString()` or `formatToX()` method of its _resolved value_ -> did not emit an error. -> - A _variable_ could be used as a _selector_ if -> calling the `selectKeys(keys)` method of its _resolved value_ -> did not emit an error. -> - Using a _variable_, the _resolved value_ of an _expression_ -> could be used as an _operand_ or _option_ value if -> calling the `getValue()` method of its _resolved value_ did not emit an error. -> In this use case, the `resolvedOptions()` method could also -> provide a set of option values that could be taken into account by the called function. -> -> Extensions of the base `MessageValue` interface could be provided for different data types, -> such as numbers or strings, -> for which the `unknown` return type of `getValue()` and -> the generic `MessageValue` type used in `resolvedOptions()` -> could be narrowed appropriately. -> An implementation could also allow `MessageValue` values to be passed in as input variables, -> or automatically wrap each variable as a `MessageValue` to provide a uniform interface -> for custom functions. +> Alternatively, it could be an instance of an ICU4J `FormattedNumber`, +> or some other locally appropriate value. -## Expression and Markup Resolution +Depending on the presence or absence of a _variable_ or _literal_ operand +and a _function_, _private-use annotation_, or _reserved annotation_, +the resolved value of the _expression_ is determined as follows: -_Expressions_ are used in _declarations_ and _patterns_. -_Markup_ is only used in _patterns_. +If the _expression_ contains a _reserved annotation_, +an _Unsupported Expression_ error is emitted and +a _fallback value_ is used as the resolved value of the _expression_. -Depending on the presence or absence of a _variable_ or _literal_ operand and a _function_, -the _resolved value_ of the _expression_ is determined as follows: +Else, if the _expression_ contains a _private-use annotation_, +its resolved value is defined according to the implementation's specification. -If the _expression_ contains a _function_, -its _resolved value_ is defined by _function resolution_. +Else, if the _expression_ contains an _annotation_, +its resolved value is defined by _function resolution_. Else, if the _expression_ consists of a _variable_, -its _resolved value_ is defined by _variable resolution_. +its resolved value is defined by _variable resolution_. An implementation MAY perform additional processing when resolving the value of an _expression_ that consists only of a _variable_. @@ -182,13 +151,13 @@ that consists only of a _variable_. > the pattern included the function `:datetime` with some set of default options. Else, the _expression_ consists of a _literal_. -Its _resolved value_ is defined by _literal resolution_. +Its resolved value is defined by _literal resolution_. -> [!NOTE] -> This means that a _literal_ value with no _function_ +> **Note** +> This means that a _literal_ value with no _annotation_ > is always treated as a string. > To represent values that are not strings as a _literal_, -> a _function_ needs to be provided: +> an _annotation_ needs to be provided: > > ``` > .local $aNumber = {1234 :number} @@ -199,58 +168,43 @@ Its _resolved value_ is defined by _literal resolution_. ### Literal Resolution -The _resolved value_ of a _text_ or a _literal_ contains +The resolved value of a _text_ or a _literal_ is the character sequence of the _text_ or _literal_ after any character escape has been converted to the escaped character. When a _literal_ is used as an _operand_ or on the right-hand side of an _option_, -the formatting function MUST treat its _resolved value_ the same +the formatting function MUST treat its resolved value the same whether its value was originally a _quoted literal_ or an _unquoted literal_. > For example, > the _option_ `foo=42` and the _option_ `foo=|42|` are treated as identical. - -> For example, in a JavaScript formatter -> the _resolved value_ of a _text_ or a _literal_ could have the following implementation: -> -> ```ts -> class MessageLiteral implements MessageValue { -> constructor(value: string) { -> this.formatToString = () => value; -> this.getValue = () => value; -> } -> resolvedOptions: () => ({}); -> selectKeys(_keys: string[]) { -> throw Error("Selection on unannotated literals is not supported"); -> } -> } -> ``` +The resolution of a _text_ or _literal_ MUST resolve to a string. ### Variable Resolution To resolve the value of a _variable_, its _name_ is used to identify either a local variable or an input variable. -If a _declaration_ exists for the _variable_, its _resolved value_ is used. +If a _declaration_ exists for the _variable_, its resolved value is used. Otherwise, the _variable_ is an implicit reference to an input value, and its value is looked up from the _formatting context_ _input mapping_. -The resolution of a _variable_ fails if no value is identified for its _name_. -If this happens, an _Unresolved Variable_ error is emitted. +The resolution of a _variable_ MAY fail if no value is identified for its _name_. +If this happens, an _Unresolved Variable_ error MUST be emitted. If a _variable_ would resolve to a _fallback value_, this MUST also be considered a failure. ### Function Resolution -To resolve an _expression_ with a _function_, +To resolve an _expression_ with a _function_ _annotation_, the following steps are taken: 1. If the _expression_ includes an _operand_, resolve its value. If this fails, use a _fallback value_ for the _expression_. 2. Resolve the _identifier_ of the _function_ and, based on the starting sigil, - find the appropriate _function handler_ to call. - If the implementation cannot find the _function handler_, + find the appropriate function implementation to call. + If the implementation cannot find the function, or if the _identifier_ includes a _namespace_ that the implementation does not support, emit an _Unknown Function_ error and use a _fallback value_ for the _expression_. @@ -260,76 +214,108 @@ the following steps are taken: 3. Perform _option resolution_. -4. Determine the _function context_ for calling the _function handler_. - - The **_function context_** contains the context necessary for - the _function handler_ to resolve the _expression_. This includes: - - - The current _locale_, - potentially including a fallback chain of locales. - - The base directionality of the _message_ and its _text_ tokens. - - If the resolved mapping of _options_ includes any _`u:` options_ - supported by the implementation, process them as specified. - Such `u:` options MAY be removed from the resolved mapping of _options_. +4. Call the function implementation with the following arguments: -5. Call the function implementation with the following arguments: - - - The _function context_. + - The current _locale_. - The resolved mapping of _options_. - - If the _expression_ includes an _operand_, its _resolved value_. + - If the _expression_ includes an _operand_, its resolved value. The form that resolved _operand_ and _option_ values take is implementation-defined. - An implementation MAY pass additional arguments to the _function handler_, + A _declaration_ binds the resolved value of an _expression_ + to a _variable_. + Thus, the result of one _function_ is potentially the _operand_ + of another _function_, + or the value of one of the _options_ for another function. + For example, in + ``` + .input {$n :number minimumIntegerDigits=3} + .local $n1 = {$n :number maximumFractionDigits=3} + ``` + the value bound to `$n` is the + resolved value used as the _operand_ + of the `:number` _function_ + when resolving the value of the _variable_ `$n1`. + + Implementations that provide a means for defining custom functions + SHOULD provide a means for function implementations + to return values that contain enough information + (e.g. a representation of + the resolved _operand_ and _option_ values + that the function was called with) + to be used as arguments to subsequent calls + to the function implementations. + For example, an implementation might define an interface that allows custom function implementation. + Such an interface SHOULD define an implementation-specific + argument type `T` and return type `U` + for implementations of functions + such that `U` can be coerced to `T`. + Implementations of a _function_ SHOULD emit a + _Bad Operand_ error for _operands_ whose resolved value + or type is not supported. + +> [!NOTE] +> The behavior of the previous example is +> currently implementation-dependent. Supposing that +> the external input variable `n` is bound to the string `"1"`, +> and that the implementation formats to a string, +> the formatted result of the following message: +> +> ``` +> .input {$n :number minimumIntegerDigits=3} +> .local $n1 = {$n :number maximumFractionDigits=3} +> {{$n1}} +> ``` +> +> is currently implementation-dependent. +> Depending on whether the options are preserved +> between the resolution of the first `:number` _annotation_ +> and the resolution of the second `:number` _annotation_, +> a conformant implementation +> could produce either "001.000" or "1.000" +> +> Each function **specification** MAY have +> its own rules to preserve some options in the returned structure +> and discard others. +> In instances where a function specification does not determine whether an option is preserved or discarded, +> each function **implementation** of that specification MAY have +> its own rules to preserve some options in the returned structure +> and discard others. +> + +> [!NOTE] +> During the Technical Preview, +> feedback on how the registry describes +> the flow of _resolved values_ and _options_ +> from one _function_ to another, +> and on what requirements this specification should impose, +> is highly desired. + + An implementation MAY pass additional arguments to the function, as long as reasonable precautions are taken to keep the function interface simple and minimal, and avoid introducing potential security vulnerabilities. -6. If the call succeeds, + An implementation MAY define its own functions. + An implementation MAY allow custom functions to be defined by users. + + Function access to the _formatting context_ MUST be minimal and read-only, + and execution time SHOULD be limited. + + Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_. + +5. If the call succeeds, resolve the value of the _expression_ as the result of that function call. If the call fails or does not return a valid value, emit the appropriate _Message Function Error_ for the failure. - Implementations MAY provide a mechanism for the _function handler_ to provide + Implementations MAY provide a mechanism for the _function_ to provide additional detail about internal failures. Specifically, if the cause of the failure was that the datatype, value, or format of the _operand_ did not match that expected by the _function_, - the _function_ SHOULD cause a _Bad Operand_ error to be emitted. + the _function_ might cause a _Bad Operand_ error to be emitted. - In all failure cases, use the _fallback value_ for the _expression_ as its _resolved value_. - -#### Function Handler - -A **_function handler_** is an implementation-defined process -such as a function or method -which accepts a set of arguments and returns a _resolved value_. -A _function handler_ is required to resolve a _function_. - -An implementation MAY define its own functions and their handlers. -An implementation MAY allow custom functions to be defined by users. - -Implementations that provide a means for defining custom functions -MUST provide a means for _function handlers_ -to return _resolved values_ that contain enough information -to be used as _operands_ or _option_ values in subsequent _expressions_. - -The _resolved value_ returned by a _function handler_ -MAY be different from the value of the _operand_ of the _function_. -It MAY be an implementation specified type. -It is not required to be the same type as the _operand_. - -A _function handler_ MAY include resolved options in its _resolved value_. -The resolved options MAY be different from the _options_ of the function. - -A _function handler_ SHOULD emit a -_Bad Operand_ error for _operands_ whose _resolved value_ -or type is not supported. - -_Function handler_ access to the _formatting context_ MUST be minimal and read-only, -and execution time SHOULD be limited. - -Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_. + In all failure cases, use the _fallback value_ for the _expression_ as the resolved value. #### Option Resolution @@ -339,7 +325,7 @@ For each _option_: - Resolve the _identifier_ of the _option_. - If the _option_'s right-hand side successfully resolves to a value, - bind the _identifier_ of the _option_ to the _resolved value_ in the mapping. + bind the _identifier_ of the _option_ to the resolved value in the mapping. - Otherwise, bind the _identifier_ of the _option_ to an unresolved value in the mapping. Implementations MAY later remove this value before calling the _function_. (Note that an _Unresolved Variable_ error will have been emitted.) @@ -352,27 +338,26 @@ This mapping can be empty. Unlike _functions_, the resolution of _markup_ is not customizable. -The _resolved value_ of _markup_ includes the following fields: +The resolved value of _markup_ includes the following fields: - The type of the markup: open, standalone, or close - The _identifier_ of the _markup_ - The resolved _options_ values after _option resolution_. -If the resolved mapping of _options_ includes any _`u:` options_ -supported by the implementation, process them as specified. -Such `u:` options MAY be removed from the resolved mapping of _options_. - The resolution of _markup_ MUST always succeed. ### Fallback Resolution -A **_fallback value_** is the _resolved value_ for an _expression_ that fails to resolve. +A **_fallback value_** is the resolved value for an _expression_ that fails to resolve. An _expression_ fails to resolve when: -- A _variable_ used as an _operand_ (with or without a _function_) fails to resolve. +- A _variable_ used as an _operand_ (with or without an _annotation_) fails to resolve. * Note that this does not include a _variable_ used as an _option_ value. -- A _function_ fails to resolve. +- A _function_ _annotation_ fails to resolve. +- A _private-use annotation_ is unsupported by the implementation or if + a _private-use annotation_ fails to resolve. +- The _expression_ has a _reserved annotation_. The _fallback value_ depends on the contents of the _expression_: @@ -386,8 +371,9 @@ The _fallback value_ depends on the contents of the _expression_: > In a context where `:func` fails to resolve, > `{42 :func}` resolves to the _fallback value_ `|42|` and > `{|C:\\| :func}` resolves to the _fallback value_ `|C:\\|`. + > In any context, `{|| @reserved}` resolves to the _fallback value_ `||`. -- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without a _function_): +- _expression_ with _variable_ _operand_ referring to a local _declaration_ (with or without an _annotation_): the _value_ to which it resolves (which may already be a _fallback value_) > Examples: @@ -404,12 +390,12 @@ The _fallback value_ depends on the contents of the _expression_: > (transitively) resolves to the _fallback value_ `:now` and > the message formats to `{:now}`. -- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without a _function_): +- _expression_ with _variable_ _operand_ not referring to a local _declaration_ (with or without an _annotation_): U+0024 DOLLAR SIGN `$` followed by the _name_ of the _variable_ > Examples: - > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` - > both resolve to the _fallback value_ `$var`. + > In a context where `$var` fails to resolve, `{$var}` and `{$var :number}` and `{$var @reserved}` + > all resolve to the _fallback value_ `$var`. > In a context where `:func` fails to resolve, > the _pattern_'s _expression_ in `.input $arg {{{$arg :func}}}` > resolves to the _fallback value_ `$arg` and @@ -422,6 +408,21 @@ The _fallback value_ depends on the contents of the _expression_: > In a context where `:func` fails to resolve, `{:func}` resolves to the _fallback value_ `:func`. > In a context where `:ns:func` fails to resolve, `{:ns:func}` resolves to the _fallback value_ `:ns:func`. +- unsupported _private-use annotation_ or _reserved annotation_ with no _operand_: + the _annotation_ starting sigil + + > Examples: + > In any context, `{@reserved}` and `{@reserved |...|}` both resolve to the _fallback value_ `@`. + +- supported _private-use annotation_ with no _operand_: + the _annotation_ starting sigil, optionally followed by implementation-defined details + conforming with patterns in the other cases (such as quoting literals). + If details are provided, they SHOULD NOT leak potentially private information. + + > Examples: + > In a context where `^` expressions are used for comments, `{^▽^}` might resolve to the _fallback value_ `^`. + > In a context where `&` expressions are _function_-like macro invocations, `{&foo |...|}` might resolve to the _fallback value_ `&foo`. + - Otherwise: the U+FFFD REPLACEMENT CHARACTER `�` This is not currently used by any expression, but may apply in future revisions. @@ -430,33 +431,8 @@ _Option_ _identifiers_ and values are not included in the _fallback value_. _Pattern selection_ is not supported for _fallback values_. -> For example, in a JavaScript formatter -> the _fallback value_ could have the following implementation, -> where `source` is one of the above-defined strings: -> -> ```ts -> class MessageFallback implements MessageValue { -> constructor(source: string) { -> this.formatToString = () => `{${source}}`; -> this.getValue = () => undefined; -> } -> resolvedOptions: () => ({}); -> selectKeys(_keys: string[]) { -> throw Error("Selection on fallback values is not supported"); -> } -> } -> ``` - ## Pattern Selection -If the _message_ being formatted is not _well-formed_ and _valid_, -the result of pattern selection is a _pattern_ consisting of a single _fallback value_ -using the _message_'s fallback string defined in the _formatting context_ -or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. - -If the _message_ being formatted does not contain a _matcher_, -the result of pattern selection is its _pattern_ value. - When a _message_ contains a _matcher_ with one or more _selectors_, the implementation needs to determine which _variant_ will be used to provide the _pattern_ for the formatting operation. @@ -474,8 +450,7 @@ according to their _key_ values and selecting the first one. > > For example, in the `pl` (Polish) locale, this _message_ cannot reach > > the `*` _variant_: > > ``` -> > .input {$num :integer} -> > .match $num +> > .match {$num :integer} > > 0 {{ }} > > one {{ }} > > few {{ }} @@ -495,16 +470,13 @@ Each _key_ corresponds to a _selector_ by its position in the _variant_. > For example, in this message: > > ``` -> .input {$one :number} -> .input {$two :number} -> .input {$three :number} -> .match $one $two $three +> .match {:one} {:two} {:three} > 1 2 3 {{ ... }} > ``` > -> The first _key_ `1` corresponds to the first _selector_ (`$one`), -> the second _key_ `2` to the second _selector_ (`$two`), -> and the third _key_ `3` to the third _selector_ (`$three`). +> The first _key_ `1` corresponds to the first _selector_ (`{:one}`), +> the second _key_ `2` to the second _selector_ (`{:two}`), +> and the third _key_ `3` to the third _selector_ (`{:three}`). To determine which _variant_ best matches a given set of inputs, each _selector_ is used in turn to order and filter the list of _variants_. @@ -517,25 +489,39 @@ Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priorit When all of the _selectors_ have been processed, the earliest-sorted _variant_ in the remaining list of _variants_ is selected. +> [!NOTE] +> A _selector_ is not a _declaration_. +> Even when the same _function_ can be used for both formatting and selection +> of a given _operand_ +> the _annotation_ that appears in a _selector_ has no effect on subsequent +> _selectors_ nor on the formatting used in _placeholders_. +> To use the same value for selection and formatting, +> set its value with a `.input` or `.local` _declaration_. + This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here. +If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, +the result of pattern selection MUST be a pattern resolving to a single _fallback value_ +using the message's fallback string defined in the _formatting context_ +or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. + ### Resolve Selectors First, resolve the values of each _selector_: -1. Let `res` be a new empty list of _resolved values_ that support selection. +1. Let `res` be a new empty list of resolved values that support selection. 1. For each _selector_ `sel`, in source order, - 1. Let `rv` be the _resolved value_ of `sel`. + 1. Let `rv` be the resolved value of `sel`. 1. If selection is supported for `rv`: 1. Append `rv` as the last element of the list `res`. 1. Else: - 1. Let `nomatch` be a _resolved value_ for which selection always fails. + 1. Let `nomatch` be a resolved value for which selection always fails. 1. Append `nomatch` as the last element of the list `res`. 1. Emit a _Bad Selector_ error. -The form of the _resolved values_ is determined by each implementation, +The form of the resolved values is determined by each implementation, along with the manner of determining their support for selection. ### Resolve Preferences @@ -549,9 +535,9 @@ Next, using `res`, resolve the preferential order for all message keys: 1. Let `key` be the `var` key at position `i`. 1. If `key` is not the catch-all key `'*'`: 1. Assert that `key` is a _literal_. - 1. Let `ks` be the _resolved value_ of `key` in Unicode Normalization Form C. + 1. Let `ks` be the resolved value of `key`. 1. Append `ks` as the last element of the list `keys`. - 1. Let `rv` be the _resolved value_ at index `i` of `res`. + 1. Let `rv` be the resolved value at index `i` of `res`. 1. Let `matches` be the result of calling the method MatchSelectorKeys(`rv`, `keys`) 1. Append `matches` as the last element of the list `pref`. @@ -563,9 +549,6 @@ The returned list MAY be empty. The most-preferred key is first, with each successive key appearing in order by decreasing preference. -The resolved value of each _key_ MUST be in Unicode Normalization Form C ("NFC"), -even if the _literal_ for the _key_ is not. - If calling MatchSelectorKeys encounters any error, a _Bad Selector_ error is emitted and an empty list is returned. @@ -582,7 +565,7 @@ filter the list of _variants_ to the ones that match with some preference: 1. If `key` is the catch-all key `'*'`: 1. Continue the inner loop on `pref`. 1. Assert that `key` is a _literal_. - 1. Let `ks` be the _resolved value_ of `key`. + 1. Let `ks` be the resolved value of `key`. 1. Let `matches` be the list of strings at index `i` of `pref`. 1. If `matches` includes `ks`: 1. Continue the inner loop on `pref`. @@ -608,7 +591,7 @@ Finally, sort the list of variants `vars` and select the _pattern_: 1. Let `key` be the `tuple` _variant_ key at position `i`. 1. If `key` is not the catch-all key `'*'`: 1. Assert that `key` is a _literal_. - 1. Let `ks` be the _resolved value_ of `key`. + 1. Let `ks` be the resolved value of `key`. 1. Let `matchpref` be the integer position of `ks` in `matches`. 1. Set the `tuple` integer value as `matchpref`. 1. Set `sortable` to be the result of calling the method `SortVariants(sortable)`. @@ -635,7 +618,7 @@ _This section is non-normative._ #### Example 1 -Presuming a minimal implementation which only supports `:string` _function_ +Presuming a minimal implementation which only supports `:string` annotation which matches keys by using string comparison, and a formatting context in which the variable reference `$foo` resolves to the string `'foo'` and @@ -643,9 +626,7 @@ the variable reference `$bar` resolves to the string `'bar'`, pattern selection proceeds as follows for this message: ``` -.input {$foo :string} -.input {$bar :string} -.match $foo $bar +.match {$foo :string} {$bar :string} bar bar {{All bar}} foo foo {{All foo}} * * {{Otherwise}} @@ -676,9 +657,7 @@ Alternatively, with the same implementation and formatting context as in Example pattern selection would proceed as follows for this message: ``` -.input {$foo :string} -.input {$bar :string} -.match $foo $bar +.match {$foo :string} {$bar :string} * bar {{Any and bar}} foo * {{Foo and any}} foo bar {{Foo and bar}} @@ -727,7 +706,7 @@ the pattern selection proceeds as follows for this message: ``` .input {$count :number} -.match $count +.match {$count} one {{Category match for {$count}}} 1 {{Exact match for {$count}}} * {{Other match for {$count}}} @@ -758,18 +737,19 @@ one {{Category match for {$count}}} After _pattern selection_, each _text_ and _placeholder_ part of the selected _pattern_ is resolved and formatted. -_Resolved values_ cannot always be formatted by a given implementation. +Resolved values cannot always be formatted by a given implementation. When such an error occurs during _formatting_, -an appropriate _Message Function Error_ is emitted and -a _fallback value_ is used for the _placeholder_ with the error. +an implementation SHOULD emit an appropriate _Message Function Error_ and produce a +_fallback value_ for the _placeholder_ that produced the error. +A formatting function MAY substitute a value to use instead of a _fallback value_. Implementations MAY represent the result of _formatting_ using the most appropriate data type or structure. Some examples of these include: - A single string concatenated from the parts of the resolved _pattern_. - A string with associated attributes for portions of its text. -- A flat sequence of objects corresponding to each _resolved value_. -- A hierarchical structure of objects that group spans of _resolved values_, +- A flat sequence of objects corresponding to each resolved value. +- A hierarchical structure of objects that group spans of resolved values, such as sequences delimited by _markup-open_ and _markup-close_ _placeholders_. Implementations SHOULD provide _formatting_ result types that match user needs, @@ -782,6 +762,10 @@ MUST be an empty string. Implementations MAY offer functionality for customizing this, such as by emitting XML-ish tags for each _markup_. +_Attributes_ are reserved for future standardization. +Other than checking for valid syntax, they SHOULD NOT +affect the processing or output of a _message_. + ### Examples _This section is non-normative._ @@ -806,9 +790,8 @@ the _fallback value_ as a string, and a U+007D RIGHT CURLY BRACKET `}`. > For example, -> a _message_ that is not _well-formed_ would format to a string as `{�}`, -> unless a fallback string is defined in the _formatting context_, -> in which case that string would be used instead. +> a message with a _Syntax Error_ and no fallback string +> defined in the _formatting context_ would format to a string as `{�}`. ### Handling Bidirectional Text @@ -818,16 +801,7 @@ That is, the text can can consist of a mixture of left-to-right and right-to-lef The display of bidirectional text is defined by the [Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9]. -The directionality of the formatted _message_ as a whole is provided by the _formatting context_. - -> [!NOTE] -> Keep in mind the difference between the formatted output of a _message_, -> which is the topic of this section, -> and the syntax of _message_ prior to formatting. -> The processing of a _message_ depends on the logical sequence of Unicode code points, -> not on the presentation of the _message_. -> Affordances to allow users appropriate control over the appearance of the -> _message_'s syntax have been provided. +The directionality of the message as a whole is provided by the _formatting context_. When a _message_ is formatted, _placeholders_ are replaced with their formatted representation. @@ -884,7 +858,7 @@ The _Default Bidi Strategy_ is defined as follows: These correspond to the message having left-to-right directionality, right-to-left directionality, and to the message's directionality not being known. 1. For each _expression_ `exp` in _pattern_: - 1. Let `fmt` be the formatted string representation of the _resolved value_ of `exp`. + 1. Let `fmt` be the formatted string representation of the resolved value of `exp`. 1. Let `dir` be the directionality of `fmt`, one of « `'LTR'`, `'RTL'`, `'unknown'` », with the same meanings as for `msgdir`. 1. If `dir` is `'LTR'`: diff --git a/spec/message.abnf b/spec/message.abnf index a9293040c..3377275da 100644 --- a/spec/message.abnf +++ b/spec/message.abnf @@ -1,41 +1,45 @@ message = simple-message / complex-message -simple-message = o [simple-start pattern] +simple-message = [s] [simple-start pattern] simple-start = simple-start-char / escaped-char / placeholder pattern = *(text-char / escaped-char / placeholder) placeholder = expression / markup -complex-message = o *(declaration o) complex-body o -declaration = input-declaration / local-declaration +complex-message = [s] *(declaration [s]) complex-body [s] +declaration = input-declaration / local-declaration / reserved-statement complex-body = quoted-pattern / matcher -input-declaration = input o variable-expression -local-declaration = local s variable o "=" o expression +input-declaration = input [s] variable-expression +local-declaration = local s variable [s] "=" [s] expression -quoted-pattern = o "{{" pattern "}}" +quoted-pattern = "{{" pattern "}}" -matcher = match-statement s variant *(o variant) -match-statement = match 1*(s selector) -selector = variable -variant = key *(s key) quoted-pattern +matcher = match-statement 1*([s] variant) +match-statement = match 1*([s] selector) +selector = expression +variant = key *(s key) [s] quoted-pattern key = literal / "*" ; Expressions -expression = literal-expression - / variable-expression - / function-expression -literal-expression = "{" o literal [s function] *(s attribute) o "}" -variable-expression = "{" o variable [s function] *(s attribute) o "}" -function-expression = "{" o function *(s attribute) o "}" +expression = literal-expression + / variable-expression + / annotation-expression +literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" +variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" +annotation-expression = "{" [s] annotation *(s attribute) [s] "}" -markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone - / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close +annotation = function + / private-use-annotation + / reserved-annotation + +markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone + / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close ; Expression and literal parts function = ":" identifier *(s option) -option = identifier o "=" o (literal / variable) - -attribute = "@" identifier [o "=" o literal] +option = identifier [s] "=" [s] (literal / variable) +; Attributes are reserved for future standardization +attribute = "@" identifier [[s] "=" [s] (literal / variable)] variable = "$" name @@ -50,15 +54,32 @@ input = %s".input" local = %s".local" match = %s".match" +; Reserve additional .keywords for use by future versions of this specification. +reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) +; Note that the following production is a simplification, +; as this rule MUST NOT be considered to match existing keywords +; (`.input`, `.local`, and `.match`). +reserved-keyword = "." name + +; Reserve additional sigils for use by future versions of this specification. +reserved-annotation = reserved-annotation-start [[s] reserved-body] +reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" + +; Reserve sigils for private-use by implementations. +private-use-annotation = private-start [[s] reserved-body] +private-start = "^" / "&" +reserved-body = reserved-body-part *([s] reserved-body-part) +reserved-body-part = reserved-char / escaped-char / quoted-literal + ; Names and identifiers ; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName -; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C +; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD identifier = [namespace ":"] name namespace = name -name = [bidi] name-start *name-char [bidi] +name = name-start *name-char name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF - / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D + / %x370-37D / %x37F-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF name-char = name-start / DIGIT / "-" / "." @@ -66,8 +87,9 @@ name-char = name-start / DIGIT / "-" / "." ; Restrictions on characters in various contexts simple-start-char = content-char / "@" / "|" -text-char = content-char / ws / "." / "@" / "|" -quoted-char = content-char / ws / "." / "@" / "{" / "}" +text-char = content-char / s / "." / "@" / "|" +quoted-char = content-char / s / "." / "@" / "{" / "}" +reserved-char = content-char / "." content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x0B-0C ; omit CR (%x0D) / %x0E-1F ; omit SP (%x20) @@ -76,21 +98,12 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x41-5B ; omit \ (%x5C) / %x5D-7A ; omit { | } (%x7B-7D) / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) - / %x3001-10FFFF ; allowing surrogates is intentional + / %x3001-D7FF ; omit surrogates + / %xE000-10FFFF ; Character escapes escaped-char = backslash ( backslash / "{" / "|" / "}" ) backslash = %x5C ; U+005C REVERSE SOLIDUS "\" -; Required whitespace -s = *bidi ws o - -; Optional whitespace -o = *(ws / bidi) - -; Bidirectional marks and isolates -; ALM / LRM / RLM / LRI, RLI, FSI & PDI -bidi = %x061C / %x200E / %x200F / %x2066-2069 - -; Whitespace characters -ws = SP / HTAB / CR / LF / %x3000 +; Whitespace +s = 1*( SP / HTAB / CR / LF / %x3000 ) diff --git a/spec/registry.md b/spec/registry.md index eb8fb6297..918d7baed 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -1,7 +1,7 @@ # MessageFormat 2.0 Default Function Registry -This section describes the functions for which each implementation MUST provide -a _function handler_ to be conformant with this specification. +This section describes the functions which each implementation MUST provide +to be conformant with this specification. Implementations MAY implement additional _functions_ or additional _options_. In particular, implementations are encouraged to provide feedback on proposed @@ -51,18 +51,27 @@ The function `:string` has no options. #### Selection When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](/spec/formatting.md#resolve-preferences) -where `resolvedSelector` is the _resolved value_ of a _selector_ +where `resolvedSelector` is the resolved value of a _selector_ _expression_ and `keys` is a list of strings, -the `:string` selector function performs as described below. +the `:string` selector performs as described below. -1. Let `compare` be the string value of `resolvedSelector` - in Unicode Normalization Form C (NFC) [\[UAX#15\]](https://www.unicode.org/reports/tr15) +1. Let `compare` be the string value of `resolvedSelector`. 1. Let `result` be a new empty list of strings. 1. For each string `key` in `keys`: 1. If `key` and `compare` consist of the same sequence of Unicode code points, then 1. Append `key` as the last element of the list `result`. 1. Return `result`. +> [!NOTE] +> Matching of `key` and `compare` values is sensitive to the sequence of code points +> in each string. +> As a result, variations in how text can be encoded can affect the performance of matching. +> The function `:string` does not perform case folding or Unicode Normalization of string values. +> Users SHOULD encode _messages_ and their parts (such as _keys_ and _operands_), +> in Unicode Normalization Form C (NFC) unless there is a very good reason +> not to. +> See also: [String Matching](https://www.w3.org/TR/charmod-norm) + > [!NOTE] > Unquoted string literals in a _variant_ do not include spaces. > If users wish to match strings that include whitespace @@ -71,28 +80,14 @@ the `:string` selector function performs as described below. > > For example: > ``` -> .input {$string :string} -> .match $string +> .match {$string :string} > | space key | {{Matches the string " space key "}} > * {{Matches the string "space key"}} > ``` #### Formatting -The `:string` function returns the string value of the _resolved value_ of the _operand_. - -> [!NOTE] -> The function `:string` does not perform Unicode Normalization of its formatted output. -> Users SHOULD encode _messages_ and their parts in Unicode Normalization Form C (NFC) -> unless there is a very good reason not to. - -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:string` _function_, -its _resolved value_ contains the string value of the _operand_ of the annotated _expression_, -together with its resolved locale and directionality. -None of the _options_ set on the _expression_ are part of the _resolved value_. +The `:string` function returns the string value of the resolved value of the _operand_. ## Numeric Value Selection and Formatting @@ -157,20 +152,6 @@ The following options and their values are required to be available on the funct - `maximumSignificantDigits` - ([digit size option](#digit-size-options)) -If the _operand_ of the _expression_ is an implementation-defined type, -such as the _resolved value_ of an _expression_ with a `:number` or `:integer` _annotation_, -it can include option values. -These are included in the resolved option values of the _expression_, -with _options_ on the _expression_ taking priority over any option values of the _operand_. - -> For example, the _placeholder_ in this _message_: -> ``` -> .input {$n :number notation=scientific minimumFractionDigits=2} -> {{{$n :number minimumFractionDigits=1}}} -> ``` -> would be formatted with the resolved options -> `{ notation: 'scientific', minimumFractionDigits: '1' }`. - > [!NOTE] > The following options and option values are being developed during the Technical Preview > period. @@ -214,8 +195,7 @@ but can cause problems in target locales that the original developer is not cons > For example, a naive developer might use a special message for the value `1` without > considering a locale's need for a `one` plural: > ``` -> .input {$var :number} -> .match $var +> .match {$var :number} > 1 {{You have one last chance}} > one {{You have {$var} chance remaining}} > * {{You have {$var} chances remaining}} @@ -239,14 +219,6 @@ MUST be multiplied by 100 for the purposes of formatting. The _function_ `:number` performs selection as described in [Number Selection](#number-selection) below. -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:number` _annotation_, -its _resolved value_ contains an implementation-defined numerical value -of the _operand_ of the annotated _expression_, -together with the resolved options' values. - ### The `:integer` function The function `:integer` is a selector and formatter for matching or formatting numeric @@ -256,6 +228,7 @@ values as integers. The function `:integer` requires a [Number Operand](#number-operands) as its _operand_. + #### Options Some options do not have default values defined in this specification. @@ -289,25 +262,12 @@ function `:integer`: - `useGrouping` - `auto` (default) - `always` - - `never` - `min2` - `minimumIntegerDigits` - ([digit size option](#digit-size-options), default: `1`) - `maximumSignificantDigits` - ([digit size option](#digit-size-options)) -If the _operand_ of the _expression_ is an implementation-defined type, -such as the _resolved value_ of an _expression_ with a `:number` or `:integer` _annotation_, -it can include option values. -In general, these are included in the resolved option values of the _expression_, -with _options_ on the _expression_ taking priority over any option values of the _operand_. -Option values with the following names are however discarded if included in the _operand_: -- `compactDisplay` -- `notation` -- `minimumFractionDigits` -- `maximumFractionDigits` -- `minimumSignificantDigits` - > [!NOTE] > The following options and option values are being developed during the Technical Preview > period. @@ -351,8 +311,7 @@ but can cause problems in target locales that the original developer is not cons > For example, a naive developer might use a special message for the value `1` without > considering a locale's need for a `one` plural: > ``` -> .input {$var :integer} -> .match $var +> .match {$var :integer} > 1 {{You have one last chance}} > one {{You have {$var} chance remaining}} > * {{You have {$var} chances remaining}} @@ -376,14 +335,6 @@ MUST be multiplied by 100 for the purposes of formatting. The _function_ `:integer` performs selection as described in [Number Selection](#number-selection) below. -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:integer` _annotation_, -its _resolved value_ contains the implementation-defined integer value -of the _operand_ of the annotated _expression_, -together with the resolved options' values. - ### Number Operands The _operand_ of a number function is either an implementation-defined type or @@ -420,37 +371,34 @@ All other values produce a _Bad Operand_ error. ### Digit Size Options Some _options_ of number _functions_ are defined to take a "digit size option". -The _function handlers_ for number _functions_ use these _options_ to control aspects of numeric display +Implementations of number _functions_ use these _options_ to control aspects of numeric display such as the number of fraction, integer, or significant digits. A "digit size option" is an _option_ value that the _function_ interprets as a small integer value greater than or equal to zero. -Implementations MAY define an upper limit on the _resolved value_ +Implementations MAY define an upper limit on the resolved value of a digit size option option consistent with that implementation's practical limits. In most cases, the value of a digit size option will be a string that -encodes the value as a non-negative integer. +encodes the value as a decimal integer. Implementations MAY also accept implementation-defined types as the value. When provided as a string, the representation of a digit size option matches the following ABNF: >```abnf > digit-size-option = "0" / (("1"-"9") [DIGIT]) >``` -If the value of a digit size option does not evaluate as a non-negative integer, -or if the value exceeds any implementation-defined upper limit -or any option-specific lower limit, a _Bad Option Error_ is emitted. ### Number Selection Number selection has three modes: - `exact` selection matches the operand to explicit numeric keys exactly - `plural` selection matches the operand to explicit numeric keys exactly - followed by a plural rule category if there is no explicit match + or to plural rule categories if there is no explicit match - `ordinal` selection matches the operand to explicit numeric keys exactly - followed by an ordinal rule category if there is no explicit match + or to ordinal rule categories if there is no explicit match When implementing [`MatchSelectorKeys(resolvedSelector, keys)`](/spec/formatting.md#resolve-preferences) -where `resolvedSelector` is the _resolved value_ of a _selector_ +where `resolvedSelector` is the resolved value of a _selector_ _expression_ and `keys` is a list of strings, numeric selectors perform as described below. @@ -475,47 +423,32 @@ numeric selectors perform as described below. #### Rule Selection -Rule selection is intended to support the grammatical matching needs of different -languages/locales in order to support plural or ordinal numeric values. - -If the _option_ `select` is set to `exact`, rule-based selection is not used. -Otherwise rule selection matches the _operand_, as modified by function _options_, to exactly one of these keywords: -`zero`, `one`, `two`, `few`, `many`, or `other`. -The keyword `other` is the default. +If the option `select` is set to `exact`, rule-based selection is not used. +Return the empty string. > [!NOTE] > Since valid keys cannot be the empty string in a numeric expression, returning the > empty string disables keyword selection. -The meaning of the keywords is locale-dependent and implementation-defined. -A _key_ that matches the rule-selected keyword is a stronger match than the fallback key `*` -but a weaker match than any exact match _key_ value. +If the option `select` is set to `plural`, selection should be based on CLDR plural rule data +of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) +for examples. -The rules for a given locale might not produce all of the keywords. -A given _operand_ value might produce different keywords depending on the locale. +If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data +of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) +for examples. -Apply the rules to the _resolved value_ of the _operand_ and the relevant function _options_, +Apply the rules defined by CLDR to the resolved value of the operand and the function options, and return the resulting keyword. If no rules match, return `other`. -If the option `select` is set to `plural`, the rules applied to selection SHOULD be -the CLDR plural rule data of type `cardinal`. -See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) -for examples. - -If the option `select` is set to `ordinal`, the rules applied to selection SHOULD be -the CLDR plural rule data of type `ordinal`. -See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html) -for examples. - > **Example.** > In CLDR 44, the Czech (`cs`) plural rule set can be found > [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs). > > A message in Czech might be: > ``` -> .input {$numDays :number} -> .match $numDays +> .match {$numDays :number} > one {{{$numDays} den}} > few {{{$numDays} dny}} > many {{{$numDays} dne}} @@ -534,11 +467,11 @@ for examples. #### Determining Exact Literal Match > [!IMPORTANT] -> The exact behavior of exact literal match is currently only well defined for non-zero-filled +> The exact behavior of exact literal match is only defined for non-zero-filled > integer values. -> Functions that use fraction digits or significant digits might work in specific +> Annotations that use fraction digits or significant digits might work in specific > implementation-defined ways. -> Users should avoid depending on these types of keys in message selection in this release. +> Users should avoid depending on these types of keys in message selection. Number literals in the MessageFormat 2 syntax use the @@ -548,19 +481,10 @@ if, when the numeric value of `resolvedSelector` is serialized using the format the two strings are equal. > [!NOTE] -> The above description of numeric matching contains -> [open issues](https://github.com/unicode-org/message-format-wg/issues/675) -> in the Technical Preview, since a given numeric value might be formatted in -> several different ways under RFC8259 -> and since the effect of formatting options, such as the number of fraction -> digits or significant digits, is not described. -> The Working Group intends to address these issues before final release -> with a number of design options -> [being considered](https://github.com/unicode-org/message-format-wg/pull/859). -> -> Users should avoid creating messages that depend on exact matching of non-integer -> numeric values. -> Feedback, including use cases encountered in message authoring, is strongly desired. +> Only integer matching is required in the Technical Preview. +> Feedback describing use cases for fractional and significant digits-based +> selection would be helpful. +> Otherwise, users should avoid using matching with fractional numbers or significant digits. ## Date and Time Value Formatting @@ -600,12 +524,7 @@ or can use a collection of _field options_ (but not both) to control the formatt output. If both are specified, a _Bad Option_ error MUST be emitted -and a _fallback value_ used as the _resolved value_ of the _expression_. - -If the _operand_ of the _expression_ is an implementation-defined date/time type, -it can include _style options_, _field options_, or other option values. -These are included in the resolved option values of the _expression_, -with _options_ on the _expression_ taking priority over any option values of the _operand_. +and a _fallback value_ used as the resolved value of the _expression_. > [!NOTE] > The names of _options_ and their _values_ were derived from the @@ -630,6 +549,8 @@ The function `:datetime` has these _style options_. _Field options_ describe which fields to include in the formatted output and what format to use for that field. +The implementation may use this _annotation_ to configure which fields +appear in the formatted output. > [!NOTE] > _Field options_ do not have default values because they are only to be used @@ -706,15 +627,7 @@ are encouraged to track development of these options during Tech Preview: - valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier) - `timeZone` (default is system default time zone or UTC) - valid identifier per [BCP175](https://www.rfc-editor.org/rfc/rfc6557) - -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:datetime` _annotation_, -its _resolved value_ contains an implementation-defined date/time value -of the _operand_ of the annotated _expression_, -together with the resolved options values. - + ### The `:date` function The function `:date` is used to format the date portion of date/time values. @@ -738,19 +651,6 @@ The function `:date` has these _options_: - `medium` (default) - `short` -If the _operand_ of the _expression_ is an implementation-defined date/time type, -it can include other option values. -Any _operand_ option values matching the `:datetime` _style options_ or _field options_ are ignored, -as is any `style` option. - -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:date` _annotation_, -its _resolved value_ is implementation-defined. -An implementation MAY emit a _Bad Operand_ or _Bad Option_ error (as appropriate) -when this happens. - ### The `:time` function The function `:time` is used to format the time portion of date/time values. @@ -774,18 +674,6 @@ The function `:time` has these _options_: - `medium` - `short` (default) -If the _operand_ of the _expression_ is an implementation-defined date/time type, -it can include other option values. -Any _operand_ option values matching the `:datetime` _style options_ or _field options_ are ignored, -as is any `style` option. - -#### Composition - -When an _operand_ or an _option_ value uses a _variable_ annotated, -directly or indirectly, by a `:time` _annotation_, -its _resolved value_ is implementation-defined. -An implementation MAY emit a _Bad Operand_ or _Bad Option_ error (as appropriate) -when this happens. ### Date and Time Operands @@ -835,3 +723,5 @@ For more information, see [Working with Timezones](https://w3c.github.io/timezon > The form of these serializations is known and is a de facto standard. > Support for these extensions is expected to be required in the post-tech preview. > See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/ + + diff --git a/spec/syntax.md b/spec/syntax.md index 6100b562d..42d742ef1 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -60,8 +60,7 @@ The syntax specification takes into account the following design restrictions: control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and - U+100000 through U+10FFFD), unassigned code points, unpaired surrogates (U+D800 through U+DFFF), - and other potentially confusing content. + U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content. ## Messages and their Syntax @@ -91,14 +90,14 @@ Attempting to parse a _message_ that is not _well-formed_ will result in a _Synt A _message_ is **_valid_** if it is _well-formed_ and **also** meets the additional content restrictions and semantic requirements about its structure defined below for -_declarations_, _matcher_, and _options_. +_declarations_, _matcher_ and _options_. Attempting to parse a _message_ that is not _valid_ will result in a _Data Model Error_. ## The Message A **_message_** is the complete template for a specific message formatting request. -A **_variable_** is a _name_ associated to a _resolved value_. +A **_variable_** is a _name_ associated to a resolved value. An **_external variable_** is a _variable_ whose _name_ and initial value are supplied by the caller @@ -114,22 +113,6 @@ A **_local variable_** is a _variable_ created as the result of a _lo > In particular, it avoids using quote characters common to many file formats and formal languages > so that these do not need to be escaped in the body of a _message_. -> [!NOTE] -> _Text_ and _quoted literals_ allow unpaired surrogate code points -> (`U+D800` to `U+DFFF`). -> This is for compatibility with formats or data structures -> that use the UTF-16 encoding -> and do not check for unpaired surrogates. -> (Strings in Java or JavaScript are examples of this.) -> These code points SHOULD NOT be used in a _message_. -> Unpaired surrogate code points are likely an indication of mistakes -> or errors in the creation, serialization, or processing of the _message_. -> Many processes will convert them to -> � U+FFFD REPLACEMENT CHARACTER -> during processing or display. -> Implementations not based on UTF-16 might not be able to represent -> a _message_ containing such code points. - > [!NOTE] > In general (and except where required by the syntax), whitespace carries no meaning in the structure > of a _message_. While many of the examples in this spec are written on multiple lines, the formatting @@ -151,20 +134,17 @@ A **_local variable_** is a _variable_ created as the result of a _lo > > An exception to this is: whitespace inside a _pattern_ is **always** significant. > [!NOTE] -> The MessageFormat 2 syntax assumes that each _message_ will be displayed -> with a left-to-right display order +> The syntax assumes that each _message_ will be displayed with a left-to-right display order > and be processed in the logical character order. -> The syntax permits the use of right-to-left characters in _identifiers_, +> The syntax also permits the use of right-to-left characters in _identifiers_, > _literals_, and other values. -> This can result in confusion when viewing the message -> or users might incorrectly insert bidi controls or marks that negatively affect the output -> of the message. -> -> To assist with this, the syntax permits the use of various controls and -> strongly-directional markers in both optional and required _whitespace_ -> in a _message_, as well was encouraging the use of isolating controls -> with _expressions_ and _quoted patterns_. -> See: [whitespace](#whitespace) (below) for more information. +> This can result in confusion when viewing the _message_. +> +> Additional restrictions or requirements, +> such as permitting the use of certain bidirectional control characters in the syntax, +> might be added during the Tech Preview to better manage bidirectional text. +> Feedback on the creation and management of _messages_ +> containing bidirectional tokens is strongly desired. A _message_ can be a _simple message_ or it can be a _complex message_. @@ -174,13 +154,13 @@ message = simple-message / complex-message A **_simple message_** contains a single _pattern_, with restrictions on its first non-whitespace character. -An empty string is a _valid_ _simple message_. +An empty string is a valid _simple message_. Whitespace at the start or end of a _simple message_ is significant, and a part of the _text_ of the _message_. ```abnf -simple-message = o [simple-start pattern] +simple-message = [s] [simple-start pattern] simple-start = simple-start-char / escaped-char / placeholder ``` @@ -196,7 +176,7 @@ Whitespace at the start or end of a _complex message_ is not significant, and does not affect the processing of the _message_. ```abnf -complex-message = o *(declaration o) complex-body o +complex-message = [s] *(declaration [s]) complex-body [s] ``` ### Declarations @@ -207,14 +187,17 @@ _Declarations_ are optional: many messages will not contain any _declarations_. An **_input-declaration_** binds a _variable_ to an external input value. The _variable-expression_ of an _input-declaration_ -MAY include a _function_ that is applied to the external value. +MAY include an _annotation_ that is applied to the external value. + +A **_local-declaration_** binds a _variable_ to the resolved value of an _expression_. -A **_local-declaration_** binds a _variable_ to the _resolved value_ of an _expression_. +For compatibility with later MessageFormat 2 specification versions, +_declarations_ MAY also include _reserved statements_. ```abnf -declaration = input-declaration / local-declaration -input-declaration = input o variable-expression -local-declaration = local s variable o "=" o expression +declaration = input-declaration / local-declaration / reserved-statement +input-declaration = input [s] variable-expression +local-declaration = local s variable [s] "=" [s] expression ``` _Variables_, once declared, MUST NOT be redeclared. @@ -223,7 +206,7 @@ _Duplicate Declaration_ error during processing: - A _declaration_ MUST NOT bind a _variable_ that appears as a _variable_ anywhere within a previous _declaration_. - An _input-declaration_ MUST NOT bind a _variable_ - that appears anywhere within the _function_ of its _variable-expression_. + that appears anywhere within the _annotation_ of its _variable-expression_. - A _local-declaration_ MUST NOT bind a _variable_ that appears in its _expression_. A _local-declaration_ MAY overwrite an external input value as long as the @@ -231,18 +214,46 @@ external input value does not appear in a previous _declaration_. > [!NOTE] > These restrictions only apply to _declarations_. -> A _placeholder_ can apply a different _function_ to a _variable_ +> A _placeholder_ or _selector_ can apply a different annotation to a _variable_ > than one applied to the same _variable_ named in a _declaration_. > For example, this message is _valid_: > ``` > .input {$var :number maximumFractionDigits=0} -> .local $var2 = {$var :number maximumFractionDigits=2} -> .match $var2 -> 0 {{The selector can apply a different function to {$var} for the purposes of selection}} -> * {{A placeholder in a pattern can apply a different function to {$var :number maximumFractionDigits=3}}} +> .match {$var :number maximumFractionDigits=2} +> 0 {{The selector can apply a different annotation to {$var} for the purposes of selection}} +> * {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}} > ``` > (See the [Errors](./errors.md) section for examples of invalid messages) +#### Reserved Statements + +A **_reserved statement_** reserves additional `.keywords` +for use by future versions of this specification. +Any such future keyword must start with `.`, +followed by two or more lower-case ASCII characters. + +The rest of the statement supports +a similarly wide range of content as _reserved annotations_, +but it MUST end with one or more _expressions_. + +```abnf +reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) +reserved-keyword = "." name +``` + +> [!NOTE] +> The `reserved-keyword` ABNF rule is a simplification, +> as it MUST NOT be considered to match any of the existing keywords +> `.input`, `.local`, or `.match`. + +This allows flexibility in future standardization, +as future definitions MAY define additional semantics and constraints +on the contents of these _reserved statements_. + +Implementations MUST NOT assign meaning or semantics to a _reserved statement_: +these are reserved for future standardization. +Implementations MUST NOT remove or alter the contents of a _reserved statement_. + ### Complex Body The **_complex body_** of a _complex message_ is the part that will be formatted. @@ -274,7 +285,7 @@ A _quoted pattern_ starts with a sequence of two U+007B LEFT CURLY BRACKET `{{` and ends with a sequence of two U+007D RIGHT CURLY BRACKET `}}`. ```abnf -quoted-pattern = o "{{" pattern "}}" +quoted-pattern = "{{" pattern "}}" ``` A _quoted pattern_ MAY be empty. @@ -288,8 +299,8 @@ A _quoted pattern_ MAY be empty. ### Text **_text_** is the translateable content of a _pattern_. -Any Unicode code point is allowed, except for U+0000 NULL. - +Any Unicode code point is allowed, except for U+0000 NULL +and the surrogate code points U+D800 through U+DFFF inclusive. The characters U+005C REVERSE SOLIDUS `\`, U+007B LEFT CURLY BRACKET `{`, and U+007D RIGHT CURLY BRACKET `}` MUST be escaped as `\\`, `\{`, and `\}` respectively. @@ -305,8 +316,9 @@ be preserved during formatting. ```abnf simple-start-char = content-char / "@" / "|" -text-char = content-char / ws / "." / "@" / "|" -quoted-char = content-char / ws / "." / "@" / "{" / "}" +text-char = content-char / s / "." / "@" / "|" +quoted-char = content-char / s / "." / "@" / "{" / "}" +reserved-char = content-char / "." content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x0B-0C ; omit CR (%x0D) / %x0E-1F ; omit SP (%x20) @@ -315,14 +327,10 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A) / %x41-5B ; omit \ (%x5C) / %x5D-7A ; omit { | } (%x7B-7D) / %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000) - / %x3001-10FFFF ; allowing surrogates is intentional + / %x3001-D7FF ; omit surrogates + / %xE000-10FFFF ``` -> [!NOTE] -> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive) -> are allowed for compatibility with UTF-16 based implementations -> that do not check for this encoding error. - When a _pattern_ is quoted by embedding the _pattern_ in curly brackets, the resulting _message_ can be embedded into various formats regardless of the container's whitespace trimming rules. @@ -360,31 +368,27 @@ and at least one _variant_. When the _matcher_ is processed, the result will be a single _pattern_ that serves as the template for the formatting process. -A _message_ can only be considered _valid_ if the following requirements are satisfied; -otherwise, a corresponding _Data Model Error_ will be produced during processing: - -- _Variant Key Mismatch_: - The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. -- _Missing Fallback Variant_: - At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. -- _Missing Selector Annotation_: - Each _selector_ MUST be a _variable_ that - directly or indirectly references a _declaration_ with a _function_. -- _Duplicate Variant_: - Each _variant_ MUST use a list of _keys_ that is unique from that +A _message_ can only be considered _valid_ if the following requirements are +satisfied: + +- The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. +- At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. +- Each _selector_ MUST have an _annotation_, + or contain a _variable_ that directly or indirectly references a _declaration_ with an _annotation_. +- Each _variant_ MUST use a list of _keys_ that is unique from that of all other _variants_ in the _message_. _Literal_ _keys_ are compared by their contents, not their syntactical appearance. ```abnf -matcher = match-statement s variant *(o variant) -match-statement = match 1*(s selector) +matcher = match-statement 1*([s] variant) +match-statement = match 1*([s] selector) ``` > A _message_ with a _matcher_: > > ``` > .input {$count :number} -> .match $count +> .match {$count} > one {{You have {$count} notification.}} > * {{You have {$count} notifications.}} > ``` @@ -392,18 +396,18 @@ match-statement = match 1*(s selector) > A _message_ containing a _matcher_ formatted on a single line: > > ``` -> .local $os = {:platform} .match $os windows {{Settings}} * {{Preferences}} +> .match {:platform} windows {{Settings}} * {{Preferences}} > ``` ### Selector -A **_selector_** is a _variable_ whose _resolved value_ ranks or excludes the +A **_selector_** is an _expression_ that ranks or excludes the _variants_ based on the value of the corresponding _key_ in each _variant_. The combination of _selectors_ in a _matcher_ thus determines which _pattern_ will be used during formatting. ```abnf -selector = variable +selector = expression ``` There MUST be at least one _selector_ in a _matcher_. @@ -414,8 +418,7 @@ There MAY be any number of additional _selectors_. > based on grammatical case: > > ``` -> .local $hasCase = {$userName :hasCase} -> .match $hasCase +> .match {$userName :hasCase} > vocative {{Hello, {$userName :person case=vocative}!}} > accusative {{Please welcome {$userName :person case=accusative}!}} > * {{Hello!}} @@ -426,7 +429,7 @@ There MAY be any number of additional _selectors_. > ``` > .input {$numLikes :integer} > .input {$numShares :integer} -> .match $numLikes $numShares +> .match {$numLikes} {$numShares} > 0 0 {{Your item has no likes and has not been shared.}} > 0 one {{Your item has no likes and has been shared {$numShares} time.}} > 0 * {{Your item has no likes and has been shared {$numShares} times.}} @@ -442,14 +445,14 @@ There MAY be any number of additional _selectors_. A **_variant_** is a _quoted pattern_ associated with a list of _keys_ in a _matcher_. Each _variant_ MUST begin with a sequence of _keys_, -and terminate with a _valid_ _quoted pattern_. +and terminate with a valid _quoted pattern_. The number of _keys_ in each _variant_ MUST match the number of _selectors_ in the _matcher_. Each _key_ is separated from each other by whitespace. Whitespace is permitted but not required between the last _key_ and the _quoted pattern_. ```abnf -variant = key *(s key) quoted-pattern +variant = key *(s key) [s] quoted-pattern key = literal / "*" ``` @@ -462,12 +465,6 @@ A _key_ can be either a _literal_ value or the "catch-all" key `*`. The **_catch-all key_** is a special key, represented by `*`, that matches all values for a given _selector_. -The value of each _key_ MUST be treated as if it were in -[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC"). -Two _keys_ are considered equal if they are canonically equivalent strings, -that is, if they consist of the same sequence of Unicode code points after -Unicode Normalization Form C has been applied to both. - ## Expressions An **_expression_** is a part of a _message_ that will be determined @@ -480,27 +477,28 @@ An _expression_ cannot contain another _expression_. An _expression_ MAY contain one more _attributes_. A **_literal-expression_** contains a _literal_, -optionally followed by a _function_. +optionally followed by an _annotation_. A **_variable-expression_** contains a _variable_, -optionally followed by a _function_. +optionally followed by an _annotation_. -A **_function-expression_** contains a _function_ without an _operand_. +An **_annotation-expression_** contains an _annotation_ without an _operand_. ```abnf -expression = literal-expression - / variable-expression - / function-expression -literal-expression = "{" o literal [s function] *(s attribute) o "}" -variable-expression = "{" o variable [s function] *(s attribute) o "}" -function-expression = "{" o function *(s attribute) o "}" +expression = literal-expression + / variable-expression + / annotation-expression +literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}" +variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}" +annotation-expression = "{" [s] annotation *(s attribute) [s] "}" ``` There are several types of _expression_ that can appear in a _message_. All _expressions_ share a common syntax. The types of _expression_ are: 1. The value of a _local-declaration_ -2. A kind of _placeholder_ in a _pattern_ +2. A _selector_ +3. A kind of _placeholder_ in a _pattern_ Additionally, an _input-declaration_ can contain a _variable-expression_. @@ -513,6 +511,12 @@ Additionally, an _input-declaration_ can contain a _variable-expression_. > .local $y = {|This is an expression|} > ``` > +> Selectors: +> +> ``` +> .match {$selector :functionRequired} +> ``` +> > Placeholders: > > ``` @@ -522,26 +526,36 @@ Additionally, an _input-declaration_ can contain a _variable-expression_. > This placeholder contains a function expression with a variable-valued option: {:function option=$variable} > ``` -### Operand +### Annotation + +An **_annotation_** is part of an _expression_ containing either +a _function_ together with its associated _options_, or +a _private-use annotation_ or a _reserved annotation_. + +```abnf +annotation = function + / private-use-annotation + / reserved-annotation +``` An **_operand_** is the _literal_ of a _literal-expression_ or the _variable_ of a _variable-expression_. +An _annotation_ can appear in an _expression_ by itself or following a single _operand_. +When following an _operand_, the _operand_ serves as input to the _annotation_. + #### Function -A **_function_** is named functionality in an _expression_. +A **_function_** is named functionality in an _annotation_. _Functions_ are used to evaluate, format, select, or otherwise process data values during formatting. -A _function_ can appear in an _expression_ by itself or following a single _operand_. -When following an _operand_, the _operand_ serves as input to the _function_. - Each _function_ is defined by the runtime's _function registry_. A _function_'s entry in the _function registry_ will define whether the _function_ is a _selector_ or formatter (or both), whether an _operand_ is required, what form the values of an _operand_ can take, -what _options_ and _option_ values are acceptable, +what _options_ and _option_ values are valid, and what outputs might result. See [function registry](./registry.md) for more information. @@ -569,17 +583,16 @@ The _identifier_ is separated from the _value_ by an U+003D EQUALS SIGN `=` alon optional whitespace. The value of an _option_ can be either a _literal_ or a _variable_. -Multiple _options_ are permitted in a _function_. +Multiple _options_ are permitted in an _annotation_. _Options_ are separated from the preceding _function_ _identifier_ and from each other by whitespace. -Each _option_'s _identifier_ MUST be unique within the _function_: -a _function_ with duplicate _option_ _identifiers_ is not _valid_ -and will produce a _Duplicate Option Name_ error during processing. +Each _option_'s _identifier_ MUST be unique within the _annotation_: +an _annotation_ with duplicate _option_ _identifiers_ is not valid. The order of _options_ is not significant. ```abnf -option = identifier o "=" o (literal / variable) +option = identifier [s] "=" [s] (literal / variable) ``` > Examples of _functions_ with _options_ @@ -598,6 +611,82 @@ option = identifier o "=" o (literal / variable) > Today is {$date :datetime weekday=$dateStyle}! > ``` +#### Private-Use Annotations + +A **_private-use annotation_** is an _annotation_ whose syntax is reserved +for use by a specific implementation or by private agreement between multiple implementations. +Implementations MAY define their own meaning and semantics for _private-use annotations_. + +A _private-use annotation_ starts with either U+0026 AMPERSAND `&` or U+005E CIRCUMFLEX ACCENT `^`. + +Characters, including whitespace, are assigned meaning by the implementation. +The definition of escapes in the `reserved-body` production, used for the body of +a _private-use annotation_ is an affordance to implementations that +wish to use a syntax exactly like other functions. Specifically: + +- The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}` respectively + when they appear in the body of a _private-use annotation_. +- The character `|` is special: it SHOULD be escaped as `\|` in a _private-use annotation_, + but can appear unescaped as long as it is paired with another `|`. + This is an affordance to allow _literals_ to appear in the private use syntax. + +A _private-use annotation_ MAY be empty after its introducing sigil. + +```abnf +private-use-annotation = private-start [[s] reserved-body] +private-start = "^" / "&" +``` + +> [!NOTE] +> Users are cautioned that _private-use annotations_ cannot be reliably exchanged +> and can result in errors during formatting. +> It is generally a better idea to use the function registry +> to define additional formatting or annotation options. + +> Here are some examples of what _private-use_ sequences might look like: +> +> ``` +> Here's private use with an operand: {$foo &bar} +> Here's a placeholder that is entirely private-use: {&anything here} +> Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|} +> The character \| has to be paired or escaped: {&private || |something between| or isolated: \| } +> Stop {& "translate 'stop' as a verb" might be a translator instruction or comment } +> Protect stuff in {^ph}{^/ph}private use{^ph}{^/ph} +> ``` + +#### Reserved Annotations + +A **_reserved annotation_** is an _annotation_ whose syntax is reserved +for future standardization. + +A _reserved annotation_ starts with a reserved character. +The remaining part of a _reserved annotation_, called a _reserved body_, +MAY be empty or contain arbitrary text that starts and ends with +a non-whitespace character. + +This allows maximum flexibility in future standardization, +as future definitions MAY define additional semantics and constraints +on the contents of these _annotations_. + +Implementations MUST NOT assign meaning or semantics to +an _annotation_ starting with `reserved-annotation-start`: +these are reserved for future standardization. +Whitespace before or after a _reserved body_ is not part of the _reserved body_. +Implementations MUST NOT remove or alter the contents of a _reserved body_, +including any interior whitespace, +but MAY remove or alter whitespace before or after the _reserved body_. + +While a reserved sequence is technically "well-formed", +unrecognized _reserved-annotations_ or _private-use-annotations_ have no meaning. + +```abnf +reserved-annotation = reserved-annotation-start [[s] reserved-body] +reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" + +reserved-body = reserved-body-part *([s] reserved-body-part) +reserved-body-part = reserved-char / escaped-char / quoted-literal +``` + ## Markup **_Markup_** _placeholders_ are _pattern_ parts @@ -624,8 +713,8 @@ It MAY include _options_. is a _pattern_ part ending a span. ```abnf -markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone - / "{" o "/" identifier *(s option) *(s attribute) o "}" ; close +markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone + / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close ``` > A _message_ with one `button` markup span and a standalone `img` markup element: @@ -634,8 +723,7 @@ markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and > {#button}Submit{/button} or {#img alt=|Cancel| /}. > ``` -> A _message_ containing _markup_ that uses _options_ to pair -> two closing markup _placeholders_ to the one open markup _placeholder_: +> A _message_ with attributes in the closing tag: > > ``` > {#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.} @@ -649,25 +737,66 @@ on the pairing, ordering, or contents of _markup_ during _formatting_. ## Attributes +**_Attributes_ are reserved for standardization by future versions of this specification._** +Examples in this section are meant to be illustrative and +might not match future requirements or usage. + +> [!NOTE] +> The Tech Preview does not provide a built-in mechanism for overriding +> values in the _formatting context_ (most notably the locale) +> Nor does it provide a mechanism for identifying specific expressions +> such as by assigning a name or id. +> The utility of these types of mechanisms has been debated. +> There are at least two proposed mechanisms for implementing support for +> these. +> Specifically, one mechanism would be to reserve specifically-named options, +> possibly using a Unicode namespace (i.e. `locale=xxx` or `u:locale=xxx`). +> Such options would be reserved for use in any and all functions or markup. +> The other mechanism would be to use the reserved "expression attribute" syntax +> for this purpose (i.e. `@locale=xxx` or `@id=foo`) +> Neither mechanism was included in this Tech Preview. +> Feedback on the preferred mechanism for managing these features +> is strongly desired. +> +> In the meantime, function authors and other implementers are cautioned to avoid creating +> function-specific or implementation-specific option values for this purpose. +> One workaround would be to use the implementation's namespace for these +> features to insure later interoperability when such a mechanism is finalized +> during the Tech Preview period. +> Specifically: +> - Avoid specifying an option for setting the locale of an expression as different from +> that of the overall _message_ locale, or use a namespace that later maps to the final +> mechanism. +> - Avoid specifying options for the purpose of linking placeholders +> (such as to pair opening markup to closing markup). +> If such an option is created, the implementer should use an +> implementation-specific namespace. +> Users and implementers are cautioned that such options might be +> replaced with a standard mechanism in a future version. +> - Avoid specifying generic options to communicate with translators and +> translation tooling (i.e. implementation-specific options that apply to all +> functions. +> The above are all desirable features. +> We welcome contributions to and proposals for such features during the +> Technical Preview. + An **_attribute_** is an _identifier_ with an optional value that appears in an _expression_ or in _markup_. -During formatting, _attributes_ have no effect, -and they can be treated as code comments. _Attributes_ are prefixed by a U+0040 COMMERCIAL AT `@` sign, followed by an _identifier_. -An _attribute_ MAY have a _literal_ _value_ which is separated from the _identifier_ +An _attribute_ MAY have a _value_ which is separated from the _identifier_ by an U+003D EQUALS SIGN `=` along with optional whitespace. +The _value_ of an _attribute_ can be either a _literal_ or a _variable_. Multiple _attributes_ are permitted in an _expression_ or _markup_. Each _attribute_ is separated by whitespace. -Each _attribute_'s _identifier_ SHOULD be unique within the _expression_ or _markup_: -all but the last _attribute_ with the same _identifier_ are ignored. -The order of _attributes_ is not otherwise significant. +The order of _attributes_ is not significant. + ```abnf -attribute = "@" identifier [o "=" o literal] +attribute = "@" identifier [[s] "=" [s] (literal / variable)] ``` > Examples of _expressions_ and _markup_ with _attributes_: @@ -709,33 +838,15 @@ A _literal_ can appear as a _key_ value, as the _operand_ of a _literal-expression_, or in the value of an _option_. -A _literal_ MAY include any Unicode code point except for U+0000 NULL. +A _literal_ MAY include any Unicode code point +except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF. All code points are preserved. -> [!IMPORTANT] -> Most text, including that produced by common keyboards and input methods, -> is already encoded in the canonical form known as -> [Unicode Normalization Form C](https://unicode.org/reports/tr15) ("NFC"). -> A few languages, legacy character encoding conversions, or operating environments -> can result in _literal_ values that are not in this form. -> Some uses of _literals_ in MessageFormat, -> notably as the value of _keys_, -> apply NFC to the _literal_ value during processing or comparison. -> While there is no requirement that the _literal_ value actually be entered -> in a normalized form, -> users are cautioned to employ the same character sequences -> for equivalent values and, whenever possible, ensure _literals_ are in NFC. - A **_quoted literal_** begins and ends with U+005E VERTICAL BAR `|`. The characters `\` and `|` within a _quoted literal_ MUST be escaped as `\\` and `\|`. -> [!NOTE] -> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive) -> are allowed in _quoted literals_ for compatibility with UTF-16 based -> implementations that do not check for this encoding error. - An **_unquoted literal_** is a _literal_ that does not require the `|` quotes around it to be distinct from the rest of the _message_ syntax. An _unquoted literal_ MAY be used when the content of the _literal_ @@ -756,30 +867,26 @@ number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / " ### Names and Identifiers -A **_name_** is a character sequence used in an _identifier_ -or as the name for a _variable_ -or the value of an _unquoted literal_. +An **_identifier_** is a character sequence that +identifies a _function_, _markup_, or _option_. +Each _identifier_ consists of a _name_ optionally preceeded by +a _namespace_. +When present, the _namespace_ is separated from the _name_ by a +U+003A COLON `:`. +Built-in _functions_ and their _options_ do not have a _namespace_ identifier. -A _name_ can be preceded or followed by bidirectional marks or isolating controls -to aid in presenting names that contain right-to-left or neutral characters. -These characters are **not** part of the value of the _name_ and MUST be treated as if they were not present -when matching _name_ or _identifier_ strings or _unquoted literal_ values. +The _namespace_ `u` (U+0075 LATIN SMALL LETTER U) +is reserved for future standardization. -_Variable_ _names_ are prefixed with `$`. +_Function_ _identifiers_ are prefixed with `:`. +_Markup_ _identifiers_ are prefixed with `#` or `/`. +_Option_ _identifiers_ have no prefix. -Two _names_ are considered equal if they are canonically equivalent strings, -that is, if they consist of the same sequence of Unicode code points after -[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC") -has been applied to both. +A **_name_** is a character sequence used in an _identifier_ +or as the name for a _variable_ +or the value of an _unquoted literal_. -> [!NOTE] -> Implementations are not required to normalize all _names_. -> Comparisons of _name_ values only need be done "as-if" normalization -> has occured. -> Since most text in the wild is already in NFC -> and since checking for NFC is fast and efficient, -> implementations can often substitute checking for actually applying normalization -> to _name_ values. +_Variable_ names are prefixed with `$`. Valid content for _names_ is based on Namespaces in XML 1.0's [NCName](https://www.w3.org/TR/xml-names/#NT-NCName). @@ -792,21 +899,6 @@ Otherwise, the set of characters allowed in a _name_ is large. > Such variables cannot be referenced in a _message_, > but are not otherwise errors. -An **_identifier_** is a character sequence that -identifies a _function_, _markup_, or _option_. -Each _identifier_ consists of a _name_ optionally preceeded by -a _namespace_. -When present, the _namespace_ is separated from the _name_ by a -U+003A COLON `:`. -Built-in _functions_ and their _options_ do not have a _namespace_ identifier. - -The _namespace_ `u` (U+0075 LATIN SMALL LETTER U) -is reserved for future standardization. - -_Function_ _identifiers_ are prefixed with `:`. -_Markup_ _identifiers_ are prefixed with `#` or `/`. -_Option_ _identifiers_ have no prefix. - Examples: > A variable: >``` @@ -830,14 +922,14 @@ in this release. ```abnf variable = "$" name -option = identifier o "=" o (literal / variable) +option = identifier [s] "=" [s] (literal / variable) identifier = [namespace ":"] name namespace = name -name = [bidi] name-start *name-char [bidi] +name = name-start *name-char name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF - / %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D + / %x370-37D / %x37F-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF name-char = name-start / DIGIT / "-" / "." @@ -850,7 +942,8 @@ An **_escape sequence_** is a two-character sequence starting with U+005C REVERSE SOLIDUS `\`. An _escape sequence_ allows the appearance of lexically meaningful characters -in the body of _text_ or _quoted literal_ sequences. +in the body of _text_, _quoted literal_, or _reserved_ +(which includes, in this case, _private-use_) sequences. Each _escape sequence_ represents the literal character immediately following the initial `\`. ```abnf @@ -870,112 +963,24 @@ and inside _patterns_ only escape `{` and `}`. ### Whitespace -The syntax limits whitespace characters outside of a _pattern_ to the following: -`U+0009 CHARACTER TABULATION` (tab), -`U+000A LINE FEED` (new line), -`U+000D CARRIAGE RETURN`, -`U+3000 IDEOGRAPHIC SPACE`, -or `U+0020 SPACE`. +**_Whitespace_** is defined as one or more of +U+0009 CHARACTER TABULATION (tab), +U+000A LINE FEED (new line), +U+000D CARRIAGE RETURN, +U+3000 IDEOGRAPHIC SPACE, +or U+0020 SPACE. Inside _patterns_ and _quoted literals_, whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax. -There are two whitespace productions in the syntax. -**_Optional whitespace_** is whitespace that is not required by the syntax, -but which users might want to include to increase the readability of a _message_. -**_Required whitespace_** is whitespace that is required by the syntax. - -Both types of whitespace optionally permit the use of the bidirectional isolate controls -and certain strongly directional marks. -These can assist users in presenting _messages_ that contain right-to-left -text, _literals_, or _names_ (including those for _functions_, _options_, -_option values_, and _keys_) - -_Messages_ that contain right-to-left (aka RTL) characters SHOULD use one of the -following mechanisms to make messages display intelligibly in plain-text editors: - -1. Use paired isolating bidi controls `U+2066 LEFT-TO-RIGHT ISOLATE` ("LRI") - and `U+2069 POP DIRECTIONAL ISOLATE` ("PDI") as permitted by the ABNF around - parts of any _message_ containing RTL characters: - - _inside_ of _placeholder_ markers `{` and `}` - - _outside_ _quoted-pattern_ markers `{{` and `}}` - - _outside_ of _variable_, _function_, _markup_, or _attribute_, - including the identifying sigil (e.g. `$var
` or `:ns:name
`) -2. Use the 'local-effect' bidi marks - `U+061C ARABIC LETTER MARK`, `U+200E LEFT-TO-RIGHT MARK` or - `U+200F RIGHT-TO-LEFT MARK` as permitted by the ABNF before or after _identifiers_, - _names_, unquoted _literals_, or _option_ values, - especially when the values contain a mix of neutral, weakly directional, and - strongly directional characters. - -> [!IMPORTANT] -> Always take care **not** to add bidirectional controls or marks -> where they would be semantically significant -> or where they would unintentionally become part of the _message_'s output: -> - do not put them inside of a _literal_ except when they are part of the value, -> (instead put them outside of _literal_ quotes, such as `|...|`) -> - do not put them inside quoted _patterns_ except when they are part of the text, -> (instead put them outside of quoted _patterns_, such as `{{...}}`) -> - do not put them outside _placeholders_, -> (instead put them inside the _placeholder_, such as `{$foo :number}`) -> -> Controls placed inside _literal_ quotes or quoted _patterns_ are part of the _literal_ -> or _pattern_. -> Controls in a _pattern_ will appear in the output of the message. -> Controls inside _literal_ quotes are part of the _literal_ and -> will be considered in operations such as matching a _key_ to a _selector_. - -> [!NOTE] -> Users cannot be expected to create or manage bidirectional controls or -> marks in _messages_, since the characters are invisible and can be difficult -> to manage. -> Tools (such as resource editors or translation editors) -> and other implementations of MessageFormat 2 serialization are strongly -> encouraged to provide paired isolates around any right-to-left -> syntax as described above so that _messages_ display appropriately as plain text. - -These definitions of _whitespace_ implement -[UAX#31 Requirement R3a-2](https://www.unicode.org/reports/tr31/#R3a-2). -It is a profile of R3a-1 in that specification because: -- The following pattern whitespace characters are not allowed: - `U+000B FORM FEED`, - `U+000C VERTICAL TABULATION`, - `U+0085 NEXT LINE`, - `U+2028 LINE SEPARATOR` and - `U+2029 PARAGRAPH SEPARATOR`. -- The character `U+3000 IDEOGRAPHIC SPACE` - _is_ interpreted as whitespace. - - The following directional marks and isolates - are treated as ignorable format controls: - `U+061C ARABIC LETTER MARK`, - `U+200E LEFT-TO-RIGHT MARK`, - `U+200F RIGHT-TO-LEFT MARK`, - `U+2066 LEFT-TO-RIGHT ISOLATE`, - `U+2067 RIGHT-TO-LEFT ISOLATE`, - `U+2068 FIRST STRONG ISOLATE`, - and `U+2069 POP DIRECTIONAL ISOLATE`. - (The character `U+061C` is an addition according to R3a.) - - > [!NOTE] > The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for > compatibility with certain East Asian keyboards and input methods, > in which users might accidentally create these characters in a _message_. ```abnf -; Required whitespace -s = *bidi ws o - -; Optional whitespace -o = *(s / bidi) - -; Bidirectional marks and isolates -; ALM / LRM / RLM / LRI, RLI, FSI & PDI -bidi = %x061C / %x200E / %x200F / %x2066-2069 - -; Whitespace characters -ws = SP / HTAB / CR / LF / %x3000 +s = 1*( SP / HTAB / CR / LF / %x3000 ) ``` ## Complete ABNF diff --git a/spec/u-namespace.md b/spec/u-namespace.md deleted file mode 100644 index dabbcc70f..000000000 --- a/spec/u-namespace.md +++ /dev/null @@ -1,87 +0,0 @@ -# MessageFormat 2.0 Unicode Namespace - -The `u:` _namespace_ is reserved for the definition of _options_ -which affect the _function context_ of the specific _expressions_ -in which they appear, -or for the definition of _options_ that are universally applicable -rather than function-specific. -It might also be used to define _functions_ in a future release. - -The CLDR Technical Committee of the Unicode Consortium -manages the specification for this namespace, hence the name `u:`. - -## Options - -This section describes common **_`u:` options_** which each implementation SHOULD support -for all _functions_ and _markup_. - -### `u:id` - -A string value that is included as an `id` or other suitable value -in the formatted parts for the _placeholder_, -or any other structured formatted results. - -Ignored when formatting a message to a string. - -The value of the `u:id` _option_ MUST be a _literal_ or a -_variable_ whose _resolved value_ is either a string -or can be resolved to a string without error. -For other values, a _Bad Option_ error is emitted -and the `u:id` option is ignored. - -### `u:locale` - -Replaces the _locale_ defined in the _function context_ for this _expression_. - -A comma-delimited list consisting of -well-formed [BCP 47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) -language tags, -or an implementation-defined list of such tags. - -If this option is set on _markup_, a _Bad Option_ error is emitted -and the value of the `u:locale` option is ignored. - -During processing, the `u:locale` option -MUST be removed from the resolved mapping of _options_ -before calling the _function handler_. - -Values matching the following ABNF are always accepted: -```abnf -u-locale-option = unicode_bcp47_locale_id *(o "," o unicode_bcp47_locale_id) -``` -using `unicode_bcp47_locale_id` as defined for -[Unicode Locale Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#unicode_bcp47_locale_id). - -Implementations MAY support additional language tags, -such as private-use or grandfathered tags, -or tags using `_` instead of `-` as a separator. -When the value of `u:locale` is set by a _variable_, -implementations MAY support non-string values otherwise representing locales. - -Implementations MAY emit a _Bad Option_ error -and MAY ignore the value of the `u:locale` _option_ as a whole -or any of the entries in the list of language tags. -This might be because the locale specified is not supported -or because the language tag is not well-formed, -not valid, or some other reason. - -### `u:dir` - -Replaces the base directionality defined in -the _function context_ for this _expression_. - -If this option is set on _markup_, a _Bad Option_ error is emitted -and the value of the `u:dir` option is ignored. - -During processing, the `u:dir` option -MUST be removed from the resolved mapping of _options_ -before calling the _function handler_. - -The value of the `u:dir` _option_ MUST be one of the following _literal_ values -or a _variable_ whose _resolved value_ is one of these _literals_: -- `ltr`: left-to-right directionality -- `rtl`: right-to-left directionality -- `auto`: directionality determined from _expression_ contents - -For other values, a _Bad Option_ error is emitted -and the value of the `u:dir` option is ignored. diff --git a/test/README.md b/test/README.md index d5cbee831..95a8ef7f0 100644 --- a/test/README.md +++ b/test/README.md @@ -10,8 +10,6 @@ These test files are intended to be useful for testing multiple different messag - `data-model-errors.json` - Strings that should produce a Data Model Error when processed. Error names are defined in ["MessageFormat 2.0 Errors"](../spec/errors.md) in the spec. -- `u-options.json` — Test cases for the `u:` options, using built-in functions. - - `functions/` — Test cases that correspond to built-in functions. The behaviour of the built-in formatters is implementation-specific so the `exp` field is often omitted and assertions are made on error cases. @@ -23,7 +21,6 @@ Some examples of test harnesses using these tests, from the source repository: - [Formatting tests](https://github.com/messageformat/messageformat/blob/11c95dab2b25db8454e49ff4daadb817e1d5b770/packages/mf2-messageformat/src/messageformat.test.ts) A [JSON schema](./schemas/) is included for the test files in this repository. - ## Error Codes The following table relates the error names used in the [JSON schema](./schemas/) @@ -37,12 +34,13 @@ to the error names used in ["MessageFormat 2.0 Errors"](../spec/errors.md) in th | Bad Variant Key | bad-variant-key | | Duplicate Declaration | duplicate-declaration | | Duplicate Option Name | duplicate-option-name | -| Duplicate Variant | duplicate-variant | | Missing Fallback Variant | missing-fallback-variant | | Missing Selector Annotation | missing-selector-annotation | | Syntax Error | syntax-error | | Unknown Function | unknown-function | | Unresolved Variable | unresolved-variable | +| Unsupported Expression | unsupported-expression | +| Unsupported Statement | unsupported-statement | | Variant Key Mismatch | variant-key-mismatch | The "Message Function Error" error name used in the spec @@ -67,40 +65,29 @@ The function `:test:function` requires a [Number Operand](/spec/registry.md#numb #### Options -The following _options_ are available on `:test:function`: -- `decimalPlaces`, a _digit size option_ for which only `0` and `1` are valid values. - - `0` - - `1` -- `fails` - - `never` (default) - - `select` - - `format` - - `always` +The only _option_ `:test:function` recognizes is `decimalPlaces`, +a _digit size option_ for which only `0` and `1` are valid values. All other _options_ and their values are ignored. #### Behavior When resolving a `:test:function` expression, -its `Input`, `DecimalPlaces`, `FailsFormat`, and `FailsSelect` values are determined as follows: +its `Input` and `DecimalPlaces` values are determined as follows: 1. Let `DecimalPlaces` be 0. -1. Let `FailsFormat` be `false`. -1. Let `FailsSelect` be `false`. -1. Let `arg` be the _resolved value_ of the _expression_ _operand_. -1. If `arg` is the _resolved value_ of an _expression_ +1. Let `arg` be the resolved value of the _expression_ _operand_. +1. If `arg` is the resolved value of an _expression_ with a `:test:function`, `:test:select`, or `:test:format` _annotation_ for which resolution has succeeded, then 1. Let `Input` be the `Input` value of `arg`. 1. Set `DecimalPlaces` to be `DecimalPlaces` value of `arg`. - 1. Set `FailsFormat` to be `FailsFormat` value of `arg`. - 1. Set `FailsSelect` to be `FailsSelect` value of `arg`. 1. Else if `arg` is a numerical value or a string matching the `number-literal` production, then 1. Let `Input` be the numerical value of `arg`. 1. Else, 1. Emit "bad-input" _Resolution Error_. - 1. Use a _fallback value_ as the _resolved value_ of the _expression_. + 1. Use a _fallback value_ as the resolved value of the _expression_. Further steps of this algorithm are not followed. 1. If the `decimalPlaces` _option_ is set, then 1. If its value resolves to a numerical integer value 0 or 1 @@ -108,25 +95,13 @@ its `Input`, `DecimalPlaces`, `FailsFormat`, and `FailsSelect` values are determ 1. Set `DecimalPlaces` to be the numerical value of the _option_. 1. Else if its value is not an unresolved value set by _option resolution_, 1. Emit "bad-option" _Resolution Error_. - 1. Use a _fallback value_ as the _resolved value_ of the _expression_. -1. If the `fails` _option_ is set, then - 1. If its value resolves to the string `'always'`, then - 1. Set `FailsFormat` to be `true`. - 1. Set `FailsSelect` to be `true`. - 1. Else if its value resolves to the string `'format'`, then - 1. Set `FailsFormat` to be `true`. - 1. Else if its value resolves to the string `'select'`, then - 1. Set `FailsSelect` to be `true`. - 1. Else if its value does not resolve to the string `'never'`, then - 1. Emit "bad-option" _Resolution Error_. + 1. Use a _fallback value_ as the resolved value of the _expression_. When `:test:function` is used as a _selector_, the behaviour of calling it as the `rv` value of MatchSelectorKeys(`rv`, `keys`) (see [Resolve Preferences](/spec/formatting.md#resolve-preferences) for more information) -depends on its `Input`, `DecimalPlaces` and `FailsSelect` values. +depends on its `Input` and `DecimalPlaces` values. -- If `FailsSelect` is `true`, - calling the method will fail and not return any value. - If the `Input` is 1 and `DecimalPlaces` is 1, the method will return some slice of the list « `'1.0'`, `'1'` », depending on whether those values are included in `keys`. @@ -136,7 +111,7 @@ depends on its `Input`, `DecimalPlaces` and `FailsSelect` values. When an _expression_ with a `:test:function` _annotation_ is assigned to a _variable_ by a _declaration_ and that _variable_ is used as an _option_ value, -its _resolved value_ is the `Input` value. +its resolved value is the `Input` value. When `:test:function` is used as a _formatter_, a _placeholder_ resolving to a value with a `:test:function` _expression_ @@ -153,8 +128,6 @@ If the formatting target is a sequence of parts, each of the above parts will be emitted separately rather than being concatenated into a single string. -If `FailsFormat` is `true`, -attempting to format the _placeholder_ to any formatting target will fail. ### `:test:select` diff --git a/test/schemas/v0/tests.schema.json b/test/schemas/v0/tests.schema.json index a37dcfa8d..7b2056292 100644 --- a/test/schemas/v0/tests.schema.json +++ b/test/schemas/v0/tests.schema.json @@ -269,9 +269,6 @@ "name": { "type": "string" }, - "id": { - "type": "string" - }, "options": { "type": "object" } @@ -348,6 +345,8 @@ "duplicate-variant", "unresolved-variable", "unknown-function", + "unsupported-expression", + "unsupported-statement", "bad-selector", "bad-operand", "bad-option", diff --git a/test/tests/bidi.json b/test/tests/bidi.json deleted file mode 100644 index 607ba792a..000000000 --- a/test/tests/bidi.json +++ /dev/null @@ -1,145 +0,0 @@ -{ - "scenario": "Bidi support", - "description": "Tests for correct parsing of messages with bidirectional marks and isolates", - "defaultTestProperties": { - "locale": "en-US" - }, - "tests": [ - { - "description": "simple-message = o [simple-start pattern]", - "src": " \u061C Hello world!", - "exp": " \u061C Hello world!" - }, - { - "description": "complex-message = o *(declaration o) complex-body o", - "src": "\u200E .local $x = {1} {{ {$x}}}", - "exp": " 1" - }, - { - "description": "complex-message = o *(declaration o) complex-body o", - "src": ".local $x = {1} \u200F {{ {$x}}}", - "exp": " 1" - }, - { - "description": "complex-message = o *(declaration o) complex-body o", - "src": ".local $x = {1} {{ {$x}}} \u2066", - "exp": " 1" - }, - { - "description": "input-declaration = input o variable-expression", - "src": ".input \u2067 {$x :number} {{hello}}", - "params": [{"name": "x", "value": "1"}], - "exp": "hello" - }, - { - "description": "local s variable o \"=\" o expression", - "src": ".local $x \u2068 = \u2069 {1} {{hello}}", - "exp": "hello" - }, - { - "description": "local s variable o \"=\" o expression", - "src": ".local \u2067 $x = {1} {{hello}}", - "exp": "hello" - }, - { - "description": "local s variable o \"=\" o expression", - "src": ".local\u2067 $x = {1} {{hello}}", - "exp": "hello" - }, - { - "description": "o \"{{\" pattern \"}}\"", - "src": "\u2067 {{hello}}", - "exp": "hello" - }, - { - "description": "match-statement s variant *(o variant)", - "src": ".local $x = {1 :number}\n.match $x\n1 {{one}}\n\u061C * {{other}}", - "exp": "one" - }, - { - "description": "match-statement s variant *(o variant)", - "src": ".local $x = {1 :number}.match $x \u061c1 {{one}}* {{other}}", - "exp": "one" - }, - { - "description": "match-statement s variant *(o variant)", - "src": ".local $x = {1 :number}.match $x\u061c1 {{one}}* {{other}}", - "expErrors": [{"type": "syntax-error"}] - }, - { - "description": "variant = key *(s key) quoted-pattern", - "src": ".local $x = {1 :number} .local $y = {$x :number}.match $x $y\n1 \u200E 1 {{one}}* * {{other}}", - "exp": "one" - }, - { - "description": "variant = key *(s key) quoted-pattern", - "src": ".local $x = {1 :number} .local $y = {$x :number}.match $x $y\n1\u200E 1 {{one}}* * {{other}}", - "exp": "one" - }, - { - "description": "literal-expression = \"{\" o literal [s function] *(s attribute) o \"}\"", - "src": "{\u200E hello \u200F}", - "exp": "hello" - }, - { - "description": "variable-expression = \"{\" o variable [s function] *(s attribute) o \"}\"", - "src": ".local $x = {1} {{ {\u200E $x \u200F} }}", - "exp": " 1 " - }, - { - "description": "function-expression = \"{\" o function *(s attribute) o \"}\"", - "src": "{1 \u200E :number \u200F}", - "exp": "1" - }, - { - "description": "markup = \"{\" o \"#\" identifier *(s option) *(s attribute) o [\"/\"] \"}\"", - "src": "{\u200F #b \u200E }", - "exp": "" - }, - { - "description": "markup = \"{\" o \"/\" identifier *(s option) *(s attribute) o \"}\"", - "src": "{\u200F /b \u200E }", - "exp": "" - }, - { - "description": "option = identifier o \"=\" o (literal / variable)", - "src": "{1 :number minimumFractionDigits\u200F=\u200E1 }", - "exp": "1.0" - }, - { - "description": "attribute = \"@\" identifier [o \"=\" o (literal / variable)]", - "src": "{1 :number @locale\u200F=\u200Een }", - "exp": "1" - }, - { - "description": " name... excludes U+FFFD and U+061C -- this pases as name -> [bidi] name-start *name-char", - "src": ".local $\u061Cfoo = {1} {{ {$\u061Cfoo} }}", - "exp": " 1 " - }, - { - "description": " name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C", - "src": ".local $foo\u061Cbar = {2} {{ }}", - "expErrors": [{"type": "syntax-error"}] - }, - { - "description": "name = [bidi] name-start *name-char [bidi]", - "src": ".local $\u200Efoo\u200F = {3} {{{$\u200Efoo\u200F}}}", - "exp": "3" - }, - { - "description": "name = [bidi] name-start *name-char [bidi]", - "src": ".local $foo = {4} {{{$\u200Efoo\u200F}}}", - "exp": "4" - }, - { - "description": "name = [bidi] name-start *name-char [bidi]", - "src": ".local $\u200Efoo\u200F = {5} {{{$foo}}}", - "exp": "5" - }, - { - "description": "name = [bidi] name-start *name-char [bidi]", - "src": ".local $foo\u200Ebar = {5} {{{$foo\u200Ebar}}}", - "expErrors": [{"type": "syntax-error"}] - } - ] -} diff --git a/test/tests/data-model-errors.json b/test/tests/data-model-errors.json index f1f54cabe..86a674c43 100644 --- a/test/tests/data-model-errors.json +++ b/test/tests/data-model-errors.json @@ -6,7 +6,7 @@ }, "tests": [ { - "src": ".input {$foo :x} .match $foo * * {{foo}}", + "src": ".match {$foo :x} * * {{foo}}", "expErrors": [ { "type": "variant-key-mismatch" @@ -14,7 +14,7 @@ ] }, { - "src": ".input {$foo :x} .input {$bar :x} .match $foo $bar * {{foo}}", + "src": ".match {$foo :x} {$bar :x} * {{foo}}", "expErrors": [ { "type": "variant-key-mismatch" @@ -22,7 +22,7 @@ ] }, { - "src": ".input {$foo :x} .match $foo 1 {{_}}", + "src": ".match {:foo} 1 {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -30,7 +30,7 @@ ] }, { - "src": ".input {$foo :x} .match $foo other {{_}}", + "src": ".match {:foo} other {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -38,7 +38,7 @@ ] }, { - "src": ".input {$foo :x} .input {$bar :x} .match $foo $bar * 1 {{_}} 1 * {{_}}", + "src": ".match {:foo} {:bar} * 1 {{_}} 1 * {{_}}", "expErrors": [ { "type": "missing-fallback-variant" @@ -46,7 +46,7 @@ ] }, { - "src": ".input {$foo} .match $foo one {{one}} * {{other}}", + "src": ".match {$foo} one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -54,7 +54,7 @@ ] }, { - "src": ".local $foo = {$bar} .match $foo one {{one}} * {{other}}", + "src": ".input {$foo} .match {$foo} one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -62,7 +62,7 @@ ] }, { - "src": ".input {$bar} .local $foo = {$bar} .match $foo one {{one}} * {{other}}", + "src": ".local $foo = {$bar} .match {$foo} one {{one}} * {{other}}", "expErrors": [ { "type": "missing-selector-annotation" @@ -166,7 +166,7 @@ ] }, { - "src": ".input {$var :string} .match $var * {{The first default}} * {{The second default}}", + "src": ".match {$var :string} * {{The first default}} * {{The second default}}", "expErrors": [ { "type": "duplicate-variant" @@ -174,16 +174,12 @@ ] }, { - "src": ".input {$x :string} .input {$y :string} .match $x $y * foo {{The first foo variant}} bar * {{The bar variant}} * |foo| {{The second foo variant}} * * {{The default variant}}", + "src": ".match {$x :string} {$y :string} * foo {{The first foo variant}} bar * {{The bar variant}} * |foo| {{The second foo variant}} * * {{The default variant}}", "expErrors": [ { "type": "duplicate-variant" } ] - }, - { - "src": ".local $star = {star :string} .match $star |*| {{Literal star}} * {{The default}}", - "exp": "The default" } ] } diff --git a/test/tests/functions/date.json b/test/tests/functions/date.json index c426173d6..494ca8d23 100644 --- a/test/tests/functions/date.json +++ b/test/tests/functions/date.json @@ -35,10 +35,10 @@ "src": "{|2006-01-02| :date style=long}" }, { - "src": ".local $d = {|2006-01-02| :date style=long} {{{$d}}}" + "src": ".local $d = {|2006-01-02| :date style=long} {{{$d :date}}}" }, { - "src": ".local $d = {|2006-01-02| :datetime dateStyle=long timeStyle=long} {{{$d :date}}}" + "src": ".local $t = {|2006-01-02T15:04:06| :time} {{{$t :date}}}" } ] } diff --git a/test/tests/functions/integer.json b/test/tests/functions/integer.json index 7ffdc08a5..c8e75077a 100644 --- a/test/tests/functions/integer.json +++ b/test/tests/functions/integer.json @@ -19,7 +19,7 @@ "exp": "hello 4" }, { - "src": ".input {$foo :integer} .match $foo 1 {{one}} * {{other}}", + "src": ".match {$foo :integer} one {{one}} * {{other}}", "params": [ { "name": "foo", @@ -27,10 +27,6 @@ } ], "exp": "one" - }, - { - "src": ".local $x = {1.25 :integer} .local $y = {$x :number} {{{$y}}}", - "exp": "1" } ] } diff --git a/test/tests/functions/number.json b/test/tests/functions/number.json index 2b00d83e4..f59e77343 100644 --- a/test/tests/functions/number.json +++ b/test/tests/functions/number.json @@ -209,6 +209,173 @@ } ] }, + { + "src": ".match {$foo :number} one {{one}} * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".match {$foo :number} 1 {{=1}} one {{one}} * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "=1" + }, + { + "src": ".match {$foo :number} one {{one}} 1 {{=1}} * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "=1" + }, + { + "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + }, + { + "name": "bar", + "value": 1 + } + ], + "exp": "one one" + }, + { + "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + }, + { + "name": "bar", + "value": 2 + } + ], + "exp": "one other" + }, + { + "src": ".match {$foo :number} {$bar :number} one one {{one one}} one * {{one other}} * * {{other}}", + "params": [ + { + "name": "foo", + "value": 2 + }, + { + "name": "bar", + "value": 2 + } + ], + "exp": "other" + }, + { + "src": ".input {$foo :number} .match {$foo} one {{one}} * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".local $foo = {$bar :number} .match {$foo} one {{one}} * {{other}}", + "params": [ + { + "name": "bar", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".input {$foo :number} .local $bar = {$foo} .match {$bar} one {{one}} * {{other}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".input {$bar :number} .match {$bar} one {{one}} * {{other}}", + "params": [ + { + "name": "bar", + "value": 2 + } + ], + "exp": "other" + }, + { + "src": ".input {$bar} .match {$bar :number} one {{one}} * {{other}}", + "params": [ + { + "name": "bar", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".input {$bar} .match {$bar :number} one {{one}} * {{other}}", + "params": [ + { + "name": "bar", + "value": 2 + } + ], + "exp": "other" + }, + { + "src": ".input {$none} .match {$foo :number} one {{one}} * {{{$none}}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".local $bar = {$none} .match {$foo :number} one {{one}} * {{{$bar}}}", + "params": [ + { + "name": "foo", + "value": 1 + } + ], + "exp": "one" + }, + { + "src": ".local $bar = {$none} .match {$foo :number} one {{one}} * {{{$bar}}}", + "params": [ + { + "name": "foo", + "value": 2 + } + ], + "exp": "{$none}", + "expErrors": [ + { + "type": "unresolved-variable" + } + ] + }, { "src": "{42 :number @foo @bar=13}", "exp": "42", diff --git a/test/tests/functions/string.json b/test/tests/functions/string.json index 231868180..fab459541 100644 --- a/test/tests/functions/string.json +++ b/test/tests/functions/string.json @@ -7,7 +7,7 @@ }, "tests": [ { - "src": ".input {$foo :string} .match $foo |1| {{one}} * {{other}}", + "src": ".match {$foo :string} |1| {{one}} * {{other}}", "params": [ { "name": "foo", @@ -17,7 +17,7 @@ "exp": "one" }, { - "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", + "src": ".match {$foo :string} 1 {{one}} * {{other}}", "params": [ { "name": "foo", @@ -27,7 +27,7 @@ "exp": "one" }, { - "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", + "src": ".match {$foo :string} 1 {{one}} * {{other}}", "params": [ { "name": "foo", @@ -37,38 +37,13 @@ "exp": "other" }, { - "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}", + "src": ".match {$foo :string} 1 {{one}} * {{other}}", "exp": "other", "expErrors": [ { "type": "unresolved-variable" } ] - }, - { - "description": "NFC: keys are normalized (unquoted)", - "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0A\u0323 {{Not normalized}} \u1E0C\u0307 {{Normalized}} * {{Wrong}}", - "expErrors": [{"type": "duplicate-variant"}] - }, - { - "description": "NFC: keys are normalized (quoted)", - "src": ".local $x = {\u1E0A\u0323 :string} .match $x |\u1E0A\u0323| {{Not normalized}} |\u1E0C\u0307| {{Normalized}} * {{Wrong}}", - "expErrors": [{"type": "duplicate-variant"}] - }, - { - "description": "NFC: keys are normalized (mixed)", - "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0A\u0323 {{Not normalized}} |\u1E0C\u0307| {{Normalized}} * {{Wrong}}", - "expErrors": [{"type": "duplicate-variant"}] - }, - { - "description": "NFC: :string normalizes the comparison value (un-normalized selector, normalized key)", - "src": ".local $x = {\u1E0A\u0323 :string} .match $x \u1E0C\u0307 {{Right}} * {{Wrong}}", - "exp": "Right" - }, - { - "description": "NFC: keys are normalized (normalized selector, un-normalized key)", - "src": ".local $x = {\u1E0C\u0307 :string} .match $x \u1E0A\u0323 {{Right}} * {{Wrong}}", - "exp": "Right" } ] } diff --git a/test/tests/functions/time.json b/test/tests/functions/time.json index f4ec1b2d5..416d18a3e 100644 --- a/test/tests/functions/time.json +++ b/test/tests/functions/time.json @@ -32,10 +32,10 @@ "src": "{|2006-01-02T15:04:06| :time style=medium}" }, { - "src": ".local $t = {|2006-01-02T15:04:06| :time style=medium} {{{$t}}}" + "src": ".local $t = {|2006-01-02T15:04:06| :time style=medium} {{{$t :time}}}" }, { - "src": ".local $t = {|2006-01-02T15:04:06| :datetime dateStyle=long timeStyle=long} {{{$t :time}}}" + "src": ".local $d = {|2006-01-02T15:04:06| :date} {{{$d :time}}}" } ] } diff --git a/test/tests/pattern-selection.json b/test/tests/pattern-selection.json deleted file mode 100644 index 29dc146c1..000000000 --- a/test/tests/pattern-selection.json +++ /dev/null @@ -1,120 +0,0 @@ -{ - "$schema": "https://raw.githubusercontent.com/unicode-org/message-format-wg/main/test/schemas/v0/tests.schema.json", - "scenario": "Pattern selection", - "description": "Tests for pattern selection", - "defaultTestProperties": { - "locale": "und" - }, - "tests": [ - { - "src": ".local $x = {1 :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "1" - }, - { - "src": ".local $x = {0 :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "other" - }, - { - "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 1 }], - "exp": "1" - }, - { - "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 2 }], - "exp": "other" - }, - { - "src": ".input {$x :test:select} .local $y = {$x} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 1 }], - "exp": "1" - }, - { - "src": ".input {$x :test:select} .local $y = {$x} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 2 }], - "exp": "other" - }, - { - "src": ".local $x = {1 :test:select decimalPlaces=1} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "1.0" - }, - { - "src": ".local $x = {1 :test:select decimalPlaces=1} .match $x 1 {{1}} 1.0 {{1.0}} * {{other}}", - "exp": "1.0" - }, - { - "src": ".local $x = {1 :test:select decimalPlaces=9} .match $x 1.0 {{1.0}} 1 {{1}} * {{bad-option-value}}", - "exp": "bad-option-value", - "expErrors": [{ "type": "bad-option" }, { "type": "bad-selector" }] - }, - { - "src": ".input {$x :test:select} .local $y = {$x :test:select decimalPlaces=1} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 1 }], - "exp": "1.0" - }, - { - "src": ".input {$x :test:select decimalPlaces=1} .local $y = {$x :test:select} .match $y 1.0 {{1.0}} 1 {{1}} * {{other}}", - "params": [{ "name": "x", "value": 1 }], - "exp": "1.0" - }, - { - "src": ".input {$x :test:select decimalPlaces=9} .local $y = {$x :test:select decimalPlaces=1} .match $y 1.0 {{1.0}} 1 {{1}} * {{bad-option-value}}", - "params": [{ "name": "x", "value": 1 }], - "exp": "bad-option-value", - "expErrors": [ - { "type": "bad-option" }, - { "type": "bad-operand" }, - { "type": "bad-selector" } - ] - }, - { - "src": ".local $x = {1 :test:select fails=select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "other", - "expErrors": [{ "type": "bad-selector" }] - }, - { - "src": ".local $x = {1 :test:select fails=format} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "1" - }, - { - "src": ".local $x = {1 :test:format} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "other", - "expErrors": [{ "type": "bad-selector" }] - }, - { - "src": ".input {$x :test:select} .match $x 1.0 {{1.0}} 1 {{1}} * {{other}}", - "exp": "other", - "expErrors": [ - { "type": "unresolved-variable" }, - { "type": "bad-operand" }, - { "type": "bad-selector" } - ] - }, - { - "src": ".local $x = {1 :test:select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "1,1" - }, - { - "src": ".local $x = {1 :test:select} .local $y = {0 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "1,*" - }, - { - "src": ".local $x = {0 :test:select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "*,1" - }, - { - "src": ".local $x = {0 :test:select} .local $y = {0 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "*,*" - }, - { - "src": ".local $x = {1 :test:select fails=select} .local $y = {1 :test:select} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "*,1", - "expErrors": [{ "type": "bad-selector" }] - }, - { - "src": ".local $x = {1 :test:select} .local $y = {1 :test:format} .match $x $y 1 1 {{1,1}} 1 * {{1,*}} * 1 {{*,1}} * * {{*,*}}", - "exp": "1,*", - "expErrors": [{ "type": "bad-selector" }] - } - ] -} diff --git a/test/tests/syntax-errors.json b/test/tests/syntax-errors.json index 00d0420f4..34d9aa484 100644 --- a/test/tests/syntax-errors.json +++ b/test/tests/syntax-errors.json @@ -122,9 +122,6 @@ { "src": "bad {:placeholder @attribute=@foo}" }, - { - "src": "bad {:placeholder @attribute=$foo}" - }, { "src": "{ @misplaced = attribute }" }, @@ -158,90 +155,26 @@ { "src": ".local $bar = |foo| {{_}}" }, - { "src": ".match {{foo}}" }, - { "src": ".match * {{foo}}" }, - { "src": ".match x * {{foo}}" }, - { "src": ".match |x| * {{foo}}" }, - { "src": ".match :x * {{foo}}" }, - { "src": ".match {$foo} * {{foo}}" }, - { "src": ".match {#foo} * {{foo}}" }, - { "src": ".input {$x :x} .match {$x} * {{foo}}" }, - { "src": ".input {$x :x} .match$x * {{foo}}" }, - { "src": ".input {$x :x} .match $x* {{foo}}" }, - { "src": ".input {$x :x} .match $x|x| {{foo}} * {{foo}}" }, - { "src": ".input {$x :x} .local $y = {y :y} .match $x$y * * {{foo}}" }, - { "src": ".input {$x :x} .local $y = {y :y} .match $x $y ** {{foo}}" }, - { "src": ".input {$x :x} .match $x" }, - { "src": ".input {$x :x} .match $x *" }, - { "src": ".input {$x :x} .match $x * foo" }, - { "src": ".input {$x :x} .match $x * {{foo}} extra" }, - { "src": ".n{a}{{}}" }, - { "src": "{^}" }, - { "src": "{!}" }, - { "src": ".n .{a}{{}}" }, - { "src": ".n. {a}{{}}" }, - { "src": ".n.{a}{b}{{}}" }, - { "src": "{!.}" }, - { "src": "{! .}" }, - { "src": "{%}" }, - { "src": "{*}" }, - { "src": "{+}" }, - { "src": "{<}" }, - { "src": "{>}" }, - { "src": "{?}" }, - { "src": "{~}" }, - { "src": "{^.}" }, - { "src": "{^ .}" }, - { "src": "{&}" }, - { "src": "{!.\\{}" }, - { "src": "{!. \\{}" }, - { "src": "{!|a|}" }, - { "src": "foo {+reserved}" }, - { "src": "foo {&private}" }, - { "src": "foo {?reserved @a @b=c}" }, - { "src": ".foo {42} {{bar}}" }, - { "src": ".foo{42}{{bar}}" }, - { "src": ".foo |}lit{| {42}{{bar}}" }, - { "src": ".i {1} {{}}" }, - { "src": ".l $y = {|bar|} {{}}" }, - { "src": ".l $x.y = {|bar|} {{}}" }, - { "src": "hello {|4.2| %number}" }, - { "src": "hello {|4.2| %n|um|ber}" }, - { "src": "{+42}" }, - { "src": "hello {|4.2| &num|be|r}" }, - { "src": "hello {|4.2| ^num|be|r}" }, - { "src": "hello {|4.2| +num|be|r}" }, - { "src": "hello {|4.2| ?num|be||r|s}" }, - { "src": "hello {|foo| !number}" }, - { "src": "hello {|foo| *number}" }, - { "src": "hello {?number}" }, - { "src": "{xyzz }" }, - { "src": "hello {$foo ~xyzz }" }, - { "src": "hello {$x xyzz }" }, - { "src": "{ !xyzz }" }, - { "src": "{~xyzz }" }, - { "src": "{ num x \\\\ abcde |aaa||3.14||42| r }" }, - { "src": "hello {$foo >num x \\\\ abcde |aaa||3.14| |42| r }" }, - { "src" : ".input{ $n ~ }{{{$n}}}" } + { + "src": ".match {#foo} * {{foo}}" + }, + { + "src": ".match {} * {{foo}}" + }, + { + "src": ".match {|foo| :x} {|bar| :x} ** {{foo}}" + }, + { + "src": ".match * {{foo}}" + }, + { + "src": ".match {|x| :x} * foo" + }, + { + "src": ".match {|x| :x} * {{foo}} extra" + }, + { + "src": ".match |x| * {{foo}}" + } ] } diff --git a/test/tests/syntax.json b/test/tests/syntax.json index 6082d094a..1a2d601a2 100644 --- a/test/tests/syntax.json +++ b/test/tests/syntax.json @@ -26,11 +26,6 @@ "src": "\\\\", "exp": "\\" }, - { - "description": "message -> simple-message -> simple-start pattern -> 1*escaped-char", - "src": "\\\\\\{\\|\\}", - "exp": "\\{|}" - }, { "description": "message -> simple-message -> simple-start pattern -> simple-start-char pattern -> ... -> simple-start-char *text-char placeholder", "src": "hello {world}", @@ -170,10 +165,16 @@ "exp": "" }, { - "description": "message -> complex-message -> complex-body -> ... -> matcher -> match-statement variant -> match selector key quoted-pattern -> \".match\" variable literal quoted-pattern", - "src": ".local $a={a :f}.match $a a{{}}*{{}}", + "description": "message -> complex-message -> *(declaration [s]) complex-body -> declaration complex-body -> reserved-statement complex-body -> reserved-keyword expression -> \".\" name expression complex-body", + "src": ".n{a}{{}}", + "exp": "", + "expErrors": [ { "type": "unsupported-statement" } ] + }, + { + "description": "message -> complex-message -> complex-body -> matcher -> match-statement variant -> match selector key quoted-pattern -> \".match\" expression literal quoted-pattern", + "src": ".match{a :f}a{{}}*{{}}", "exp": "", - "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "... input-declaration -> input s variable-expression ...", @@ -195,57 +196,35 @@ "src": ".local $x = {a}{{}}", "exp": "" }, - { - "description": "input-declaration-like content in complex-message", - "src": "{{.input {$x}}}", - "params": [{ "name": "x", "value": "X" }], - "exp": ".input X" - }, - { - "description": "local-declaration-like content in complex-message with leading whitespace", - "src": "{{ .local $x = {$y}}}", - "params": [{ "name": "y", "value": "Y" }], - "exp": " .local $x = Y" - }, { "description": "... matcher -> match-statement [s] variant -> match 1*([s] selector) variant -> match selector selector variant -> match selector selector variant key s key quoted-pattern", - "src": ".local $a={a :f}.local $b={b :f}.match $a $b a b{{}}* *{{}}", + "src": ".match{a :f}{b :f}a b{{}}* *{{}}", "exp": "", - "expErrors": [ - { "type": "unknown-function" }, - { "type": "bad-selector" }, - { "type": "unknown-function" }, - { "type": "bad-selector" } - ] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "... matcher -> match-statement [s] variant -> match 1*([s] selector) variant -> match selector variant variant ...", - "src": ".local $a={a :f}.match $a a{{}}b{{}}*{{}}", + "src": ".match{a :f}a{{}}b{{}}*{{}}", "exp": "", - "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "... variant -> key s quoted-pattern -> ...", - "src": ".local $a={a :f}.match $a a {{}}*{{}}", + "src": ".match{a :f}a {{}}*{{}}", "exp": "", - "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "... variant -> key s key s quoted-pattern -> ...", - "src": ".local $a={a :f}.local $b={b :f}.match $a $b a b {{}}* *{{}}", + "src": ".match{a :f}{b :f}a b {{}}* *{{}}", "exp": "", - "expErrors": [ - { "type": "unknown-function" }, - { "type": "bad-selector" }, - { "type": "unknown-function" }, - { "type": "bad-selector" } - ] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "... key -> \"*\" ...", - "src": ".local $a={a :f}.match $a *{{}}", + "src": ".match{a :f}*{{}}", "exp": "", - "expErrors": [{ "type": "unknown-function" }, { "type": "bad-selector" }] + "expErrors": [ { "type": "unknown-function" } ] }, { "description": "simple-message -> simple-start pattern -> placeholder -> expression -> literal-expression -> \"{\" s literal \"}\"", @@ -298,6 +277,18 @@ "exp": "{:f}", "expErrors": [{ "type": "unknown-function" }] }, + { + "description": "... annotation -> private-use-annotation -> private-start", + "src": "{^}", + "exp": "{^}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... annotation -> reserved-annotation -> reserved-annotation-start", + "src": "{!}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, { "description": "message -> simple-message -> simple-start pattern -> placeholder -> markup -> \"{\" s \"#\" identifier \"}\"", "src": "{ #a}", @@ -408,8 +399,8 @@ "exp": "a" }, { - "description": "... attribute -> \"@\" identifier s \"=\" s quoted-literal ...", - "src": "{42 @foo=|bar|}", + "description": "... attribute -> \"@\" identifier s \"=\" s variable ...", + "src": "{42 @foo=$bar}", "exp": "42", "expParts": [ { @@ -435,9 +426,9 @@ "exp": "\\" }, { - "description": "... quoted-literal -> \"|\" quoted-char 1*escaped-char \"|\"", - "src": "{|a\\\\\\{\\|\\}|}", - "exp": "a\\{|}" + "description": "... quoted-literal -> \"|\" quoted-char escaped-char \"|\"", + "src": "{|a\\\\|}", + "exp": "a\\" }, { "description": "... unquoted-literal -> number-literal -> %x30", @@ -489,6 +480,114 @@ "src": "{0E-1}", "exp": "0E-1" }, + { + "description": "... reserved-statement -> reserved-keyword s reserved-body 1*([s] expression) -> reserved-keyword s reserved-body expression -> \".\" name s reserved-body-part expression -> \".\" name s reserved-char expression ...", + "src": ".n .{a}{{}}", + "exp": "", + "expErrors": [ { "type": "unsupported-statement" } ] + }, + { + "description": "... reserved-statement -> reserved-keyword reserved-body 1*([s] expression) -> reserved-keyword s reserved-body s expression -> \".\" name s reserved-body-part expression -> \".\" name s reserved-char expression ...", + "src": ".n. {a}{{}}", + "exp": "", + "expErrors": [ { "type": "unsupported-statement" } ] + }, + { + "description": "... reserved-statement -> reserved-keyword reserved-body 1*([s] expression) -> reserved-keyword reserved-body expression expression -> \".\" name reserved-body-part expression expression -> \".\" name s reserved-char expression expression ...", + "src": ".n.{a}{b}{{}}", + "exp": "", + "expErrors": [ { "type": "unsupported-statement" } ] + }, + { + "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part -> \"!\" reserved-char ...", + "src": "{!.}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation -> reserved-annotation-start s reserved-body -> \"!\" s reserved-body-part -> \"!\" s reserved-char ...", + "src": "{! .}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{%}", + "exp": "{%}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{*}", + "exp": "{*}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{+}", + "exp": "{+}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{<}", + "exp": "{<}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{>}", + "exp": "{>}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{?}", + "exp": "{?}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation-start ...", + "src": "{~}", + "exp": "{~}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... private-use-annotation -> private-start reserved-body -> \"^\" reserved-body-part -> \"^\" reserved-char ...", + "src": "{^.}", + "exp": "{^}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... private-use-annotation -> private-start s reserved-body -> \"^\" s reserved-body-part -> \"^\" s reserved-char ...", + "src": "{^ .}", + "exp": "{^}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... private-start ...", + "src": "{&}", + "exp": "{&}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part reserved-body-part -> \"!\" reserved-char escaped-char ...", + "src": "{!.\\{}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part s reserved-body-part -> \"!\" reserved-char s escaped-char ...", + "src": "{!. \\{}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, + { + "description": "... reserved-annotation -> reserved-annotation-start reserved-body -> \"!\" reserved-body-part -> \"!\" quoted-literal ...", + "src": "{!|a|}", + "exp": "{!}", + "expErrors": [{ "type": "unsupported-expression" }] + }, { "src": "hello { world\t\n}", "exp": "hello world" @@ -695,45 +794,125 @@ ] }, { - "src": "{{trailing whitespace}} \n", - "exp": "trailing whitespace" + "src": "foo {+reserved}", + "exp": "foo {+}", + "expParts": [ + { + "type": "literal", + "value": "foo " + }, + { + "type": "fallback", + "source": "+" + } + ], + "expErrors": [ + { + "type": "unsupported-expression" + } + ] }, { - "description": "NFC: text is not normalized", - "src": "\u1E0A\u0323", - "exp": "\u1E0A\u0323" + "src": "foo {&private}", + "exp": "foo {&}", + "expParts": [ + { + "type": "literal", + "value": "foo " + }, + { + "type": "fallback", + "source": "&" + } + ], + "expErrors": [ + { + "type": "unsupported-expression" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is", - "src": ".local $\u0044\u0323\u0307 = {foo} {{{$\u1E0c\u0307}}}", - "exp": "foo" + "src": "foo {?reserved @a @b=$c}", + "exp": "foo {?}", + "expParts": [ + { + "type": "literal", + "value": "foo " + }, + { + "type": "fallback", + "source": "?" + } + ], + "expErrors": [ + { + "type": "unsupported-expression" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is normalized, use isn't", - "src": ".local $\u1E0c\u0307 = {foo} {{{$\u0044\u0323\u0307}}}", - "exp": "foo" + "src": ".foo {42} {{bar}}", + "exp": "bar", + "expParts": [ + { + "type": "literal", + "value": "bar" + } + ], + "expErrors": [ + { + "type": "unsupported-statement" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is normalized, use isn't", - "src": ".input {$\u1E0c\u0307} {{{$\u0044\u0323\u0307}}}", - "params": [{"name": "\u1E0c\u0307", "value": "foo"}], - "exp": "foo" + "src": ".foo{42}{{bar}}", + "exp": "bar", + "expParts": [ + { + "type": "literal", + "value": "bar" + } + ], + "expErrors": [ + { + "type": "unsupported-statement" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is", - "src": ".input {$\u0044\u0323\u0307} {{{$\u1E0c\u0307}}}", - "params": [{"name": "\u0044\u0323\u0307", "value": "foo"}], - "exp": "foo" + "src": ".foo |}lit{| {42}{{bar}}", + "exp": "bar", + "expParts": [ + { + "type": "literal", + "value": "bar" + } + ], + "expErrors": [ + { + "type": "unsupported-statement" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is; reordering", - "src": ".local $\u0044\u0307\u0323 = {foo} {{{$\u1E0c\u0307}}}", - "exp": "foo" + "src": ".l $y = {|bar|} {{}}", + "exp": "", + "expParts": [ + { + "type": "literal", + "value": "bar" + } + ], + "expErrors": [ + { + "type": "unsupported-statement" + } + ] }, { - "description": "NFC: variables are compared to each other as-if normalized; decl is non-normalized, use is; special case mapping", - "src": ".local $\u0041\u030A\u0301 = {foo} {{{$\u01FA}}}", - "exp": "foo" + "src": "{{trailing whitespace}} \n", + "exp": "trailing whitespace" } ] } diff --git a/test/tests/u-options.json b/test/tests/u-options.json deleted file mode 100644 index 3e13b30a2..000000000 --- a/test/tests/u-options.json +++ /dev/null @@ -1,126 +0,0 @@ -{ - "$schema": "https://raw.githubusercontent.com/unicode-org/message-format-wg/main/test/schemas/v0/tests.schema.json", - "scenario": "u: Options", - "description": "Common options affecting the function context", - "defaultTestProperties": { - "locale": "en-US" - }, - "tests": [ - { - "src": "{#tag u:id=x}content{/ns:tag u:id=x}", - "exp": "content", - "expParts": [ - { - "type": "markup", - "kind": "open", - "id": "x", - "name": "tag" - }, - { - "type": "literal", - "value": "content" - }, - { - "type": "markup", - "kind": "close", - "id": "x", - "name": "tag" - } - ] - }, - { - "src": "{#tag u:dir=rtl u:locale=ar}content{/ns:tag}", - "exp": "content", - "expErrors": [{ "type": "bad-option" }, { "type": "bad-option" }], - "expParts": [ - { - "type": "markup", - "kind": "open", - "name": "tag" - }, - { - "type": "literal", - "value": "content" - }, - { - "type": "markup", - "kind": "close", - "name": "tag" - } - ] - }, - { - "src": "hello {4.2 :number u:locale=fr}", - "exp": "hello 4,2" - }, - { - "src": "hello {world :string u:dir=ltr u:id=foo}", - "exp": "hello world", - "expParts": [ - { - "type": "literal", - "value": "hello " - }, - { - "type": "string", - "source": "|world|", - "dir": "ltr", - "id": "foo", - "value": "world" - } - ] - }, - { - "src": "hello {world :string u:dir=rtl}", - "exp": "hello \u2067world\u2069", - "expParts": [ - { - "type": "literal", - "value": "hello " - }, - { - "type": "string", - "source": "|world|", - "dir": "rtl", - "value": "world" - } - ] - }, - { - "src": "hello {world :string u:dir=auto}", - "exp": "hello \u2068world\u2069", - "expParts": [ - { - "type": "literal", - "value": "hello " - }, - { - "type": "string", - "source": "|world|", - "dir": "auto", - "value": "world" - } - ] - }, - { - "locale": "ar", - "src": "أهلاً {بالعالم :string u:dir=rtl}", - "exp": "أهلاً \u2067بالعالم\u2069" - }, - { - "locale": "ar", - "src": "أهلاً {بالعالم :string u:dir=auto}", - "exp": "أهلاً \u2068بالعالم\u2069" - }, - { - "locale": "ar", - "src": "أهلاً {world :string u:dir=ltr}", - "exp": "أهلاً \u2066world\u2069" - }, - { - "locale": "ar", - "src": "أهلاً {بالعالم :string}", - "exp": "أهلاً \u2067بالعالم\u2069" - } - ] -} diff --git a/test/tests/unsupported-expressions.json b/test/tests/unsupported-expressions.json new file mode 100644 index 000000000..f7d611509 --- /dev/null +++ b/test/tests/unsupported-expressions.json @@ -0,0 +1,53 @@ +{ + "scenario": "Reserved and private annotations", + "description": "Tests for unsupported expressions (reserved/private)", + "defaultTestProperties": { + "locale": "en-US", + "expErrors": [ + { + "type": "unsupported-expression" + } + ] + }, + "tests": [ + { "src": "hello {|4.2| %number}" }, + { "src": "hello {|4.2| %n|um|ber}" }, + { "src": "{+42}" }, + { "src": "hello {|4.2| &num|be|r}" }, + { "src": "hello {|4.2| ^num|be|r}" }, + { "src": "hello {|4.2| +num|be|r}" }, + { "src": "hello {|4.2| ?num|be||r|s}" }, + { "src": "hello {|foo| !number}" }, + { "src": "hello {|foo| *number}" }, + { "src": "hello {?number}" }, + { "src": "{xyzz }" }, + { "src": "hello {$foo ~xyzz }" }, + { "src": "hello {$x xyzz }" }, + { "src": "{ !xyzz }" }, + { "src": "{~xyzz }" }, + { "src": "{ num x \\\\ abcde |aaa||3.14||42| r }" }, + { "src": "hello {$foo >num x \\\\ abcde |aaa||3.14| |42| r }" }, + { "src" : ".input{ $n ~ }{{{$n}}}" } + ] +} + diff --git a/test/tests/unsupported-statements.json b/test/tests/unsupported-statements.json new file mode 100644 index 000000000..d944aa0f7 --- /dev/null +++ b/test/tests/unsupported-statements.json @@ -0,0 +1,18 @@ +{ + "scenario": "Reserved statements", + "description": "Tests for unsupported statements", + "defaultTestProperties": { + "locale": "en-US", + "expErrors": [ + { + "type": "unsupported-statement" + } + ] + }, + "tests": [ + { "src" : ".i {1} {{}}" }, + { "src" : ".l $y = {|bar|} {{}}" }, + { "src" : ".l $x.y = {|bar|} {{}}" } + ] +} + From 68641af7a434ae6611d6ab9c42579e27604a7e40 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 26 Oct 2024 09:35:09 -0700 Subject: [PATCH 9/9] Add serialization proposal --- exploration/number-selection.md | 81 +++++++++++++++++++++++++++++++-- 1 file changed, 78 insertions(+), 3 deletions(-) diff --git a/exploration/number-selection.md b/exploration/number-selection.md index d60909632..a950789d0 100644 --- a/exploration/number-selection.md +++ b/exploration/number-selection.md @@ -548,9 +548,84 @@ and they _might_ converge on some overlap that users could safely use across pla ### Standardize the Serialization Forms -Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45 -and specify numeric matching by specifying the form of matching `key` values. -Comparison is as-if by string comparison of the serialized forms, just as in LDML45. +Modify the above exact match as follows. +Note that this implementation is less restrictive than before, but still leaves some +values that cannot be matched. +> [!IMPORTANT] +> The exact behavior of exact literal match is only defined for +> a specific range of numeric values and does not support scientific notation. +> Very large or very small numeric values will be difficult to perform +> exact matching on. +> Avoid depending on these types of keys in message selection. +> [!IMPORTANT] +> For implementations that do not have arbitrary precision numeric types +> or operands that do not use these types, +> it is possible to specify a key value that exceeds the precision +> of the underlying type. +> Such a key value will not work reliably or may not work at all +> in such implementations. +> Avoid depending on such keys values in message selection. +Number literals in the MessageFormat 2 syntax use a subset of the +[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). +The resolved value of an `operand` exactly matches a numeric literal `key` +if, when the `operand` is serialized using this format +the two strings are equal. +```abnf +number = [ "-" ] int [ fraction ] +integer = "0" / [ "-" ] (digit19 *DIGIT) +int = "0" / (digit19 *DIGIT) +digit19 = %31-39 ; 1-9 +fraction = "." 1*DIGIT +``` +If the function `:integer` is used or the `maximumFractionDigits` is 0, +the production `integer` is used and any fractional amount is omitted, +otherwise the `minimumFractionDigits` number of digits is produced, +zero-filled as needed. +The implementation applies the `maximumSignificantDigits` to the value +being serialized. +This might involve locally-specific rounding. +The `minimumSignificantDigits` has no effect on the value produced for comparison. +The option `signDisplay` has no effect on the value produced for comparison. +> [!NOTE] +> Implementations are not expected to implement this exactly as written, +> as there are clearly optimizations that can be applied. +> Here are some examples: +> ``` +> .input {$num :integer} +> .match $num +> 0 {{The number 0}} +> 1 {{The number 1}} +> -1 {{The number -1}} +> 1.0 {{This cannot match}} +> 1.1 {{This cannot match}} +> ``` +> ``` +> .input {$num :number maximumFractionDigits=2 minimumFractionDigits=2} +> .match $num +> 0 {{This does not match}} +> 0.00 {{This matches the value 0}} +> 0.0 {{This does not match}} +> 0.000 {{This does not match}} +> ``` +> ``` +> .input {$num :number minimumFractionDigits=2 maximumFractionDigits=5} +> .match $num +> 0.12 {{Matches the value 0.12} +> 0.123 {{Matches the value 0.123}} +> 0.12345 {{Matches the values 0.12345}} +> 0.123456 {{Does not match}} +> 0.12346 {{May match the value 0.123456 depending on local rounding mode?}} +> ``` +> ``` +> .input {$num :number} +> -0 {{Error: Bad Variant Key}} +> -99 {{The value -99}} +> 1111111111111111111111111111 {{Might exceed the size of local integer type, but is valid}} +> 11111111111111.1111111111111 {{Might exceed local floating point precision, but is valid}} +> 1.23e-37 {{Error: Bad Variant Key}} +> ``` + + ### Compare numeric values