Requirements - MF wishlist #3

romulocintra · 2019-11-27T18:56:05Z

List of requirements to consider for MF

romulocintra · 2020-01-06T19:15:37Z

I'am listing requirements from the 1st meeting slides :

List of possible requirements

Easier to use ICU “Select”
Fluent could be considered as a starting point for the future of message format
Have pluggable “formatters”(Date/Time/Number ...)
HTML Markup
Cross-platform / Universal Format
Messages should have more context “description” or ”metadata”
MessageFormat - More Readable
Escaping(“ or ‘ ) and Interpolations (html tags)
Rule Modifiers - Send Message or Send SMS -> similar to select ICU feature
Improve Translators / Developers UX/DX
I need to somehow be able to cache my translations
Use Yaml or JSON as file format
Message reference - from another Message

zbraniecki · 2020-01-06T19:29:58Z

Proposal for an additional requirement:

Provides a translation of an XML/HTML element.

jamuhl · 2020-01-06T19:30:17Z

Sorry, I wasn't there in the first meetings so I'm not sure what is meant with "HTML Markup"?

But:

fully agree on custom pluggable "formatters"

And add:

extended plurals, like:

{ count , plural ,
   =0 {No candy left}
  one {Got # candy left}
  <10 {Got a few candies left}
  10-20 {Got a handful candies left}
other {Got # candies left} }

edit:
in i18next we use a postProcessing plugin to achieve that: https://github.com/i18next/i18next-intervalPlural-postProcessor#usage-sample

zbraniecki · 2020-01-06T19:33:20Z

HTML Markup

Ability to interpolate localization with HTML. Example:

<span>You have <b>6</b> unread messages from <img/> Mary.</span>

Fluent provides DOM Overlays which are heavily used in Firefox l10n - https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays

jamuhl · 2020-01-06T19:38:44Z

@zbraniecki thank you for explaining...so basically take the innerhtml element(s) and extend it with the attributes and content contained in the translation...looks similar to the Trans component we have in react-i18next -> https://react.i18next.com/latest/trans-component (just we have no html elements but react components)

edit:
guess we could mimic DOM-Overlays by extending our Trans component...just not sure if this is part of the syntax or an extension that is provided by the i18n library?

romulocintra · 2020-01-06T19:41:19Z

@mihnita should i reference here the your entire document or we can break it in features to add here ?

zbraniecki · 2020-01-06T19:42:56Z

In our experience innerHTML in particular is a no-go for security reasons (l10n resources are treated as a third-party). I expect the requirements from the W3C to be similar here.

Instead, we whitelist allowed textual elements (<sup/>, <sub/>, <span/> etc.) and for everything else we require the developer to provide the elements in the source with a name, and then the localizer can position them using the same name:

<p data-l10n-id="key1">
  <a href="https://www.mozilla.org" data-l10n-name="link"/>
  <img src="./pics/img1.png" data-l10n-name="logo"/>
</p>

key1 =
    Welcome to <a data-l10n-name="link">Mozilla</a>!
    Please, click on <img data-l10n-name="logo"/> to proceed.

That's significantly more involved than innerHTML, but the end result is quite similar with a lot of linting, security, and sanity checks.
We're also discussing further extensions - https://github.com/zbraniecki/fluent-domoverlays-js/wiki/New-Features-(rev-3)

jamuhl · 2020-01-06T19:46:14Z

innerHTML was more referring to the content than to the implementation detail...same reason we do not just append translations into a react element by using dangerouslySetInnerHTML ;)

mihnita · 2020-01-07T15:00:36Z

I will break into features.But maybe also link, so that others can read the complete doc.I think that the current list of features will also need to "grow" with some more details. As it is some of them are so short that only the one who proposed it really understands what it means :-)MihaiOn Jan 6, 2020 11:41, Romulo Cintra <notifications@github.com> wrote:@mihnita should i reference here the your entire document or we can break it in features to add here ? —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

romulocintra · 2020-01-07T17:01:00Z

@mihnita

If you can break the into features great and link is important to Both are important
I completely agree that some of features wont fit in one line and will need more detail, that ones IMHO deserve a unique issue or thread.

My Proposal :

If you can break it into features will be perfect(agree that the link is important too)
Some of the features won't fit in one line description needing more detail, that ones IMHO deserve a unique issue or thread, I suggest that we can create a new Issue tagged as "requirements", where we have all detail and discussion about that issue, but we can keep a reference with description here to keep the list in only one place.

I feel that also the short description ones will grow to have their own issue/task, but I think we can figure out later after we groom and filter the tasks/lists of requirements

longlho · 2020-01-07T17:13:41Z

My proposal for the process @romulocintra is to set a deadline, then de-dupe the list, then prioritize into mvp, v1, v2... so we can move this along.

romulocintra · 2020-01-07T18:27:20Z

My proposal for the process @romulocintra is to set a deadline, then de-dupe the list, then prioritize into mvp, v1, v2... so we can move this along.

@longlho i believe this(process , mvp , roadmap , goals) must be addressed in #4 where we can define all related organizational and process as a team.

Related with this task and regarding how we organize the list, I think the previous proposal can fit our current needs, I did not propose any deadline for this task but I see next meeting as a good candidate to prioritize/filter/de-dupe the items originated in this thread. finally, we can review #4 to close all the organizational issues, deadlines and goals.

Meanwhile, I'm referencing your comments in #4

PS: just added this topics to the next meeting agenda

MickMonaghan · 2020-01-12T21:26:52Z

Right now, in ICU4J, if you do:
"You owe {someNumber, number, currency}." - then the actual currency is inferred from the current locale - which is just nasty.

You can do this:
"You owe {someNumber, number, :: currency/JPY}." - but this means that you know in advance that you're dealing with a specific currency - JPY - in this case.
One should be able to declare the actual currency at run time.
Perhaps Fluent already supports this?

nbouvrette · 2020-01-12T21:42:08Z

Sorry for joining the conversation late and having to leave the last session early but here is my take:

Make the syntax cross-language/cross-platform. Maybe having an RFC and/or improved (non-technical) documentation of the syntax would help?
See if we can make the syntax easier to read (not just for developers, but presuming "raw" syntax could also be translatable by linguists)
Provide free tools with the syntax for authoring and translation (our own online CAT tool?)
Extend selectors (I like @jamuhl's example and will have other to present in the next session)
File format-agnostic - not all TMS does a good job supporting file formats. If the syntax is independent it makes it more flexible to adopt
Leave the syntax markup (e.g. HTML) agnostic - the syntax should be able to accept HTML or any other markup but the TMS and or library can implement manipulation how it find best for its use case
Find better ways to escape the syntax (' is way too common and the current escape patterns could be possibly standardized/simplified)
Add more features:
- Predefined Linguistic selectors (will be presenting this idea in the next meeting)
- Improved list support
- Better currency support
- More flexible formats (extendable inline?)
- Numbers to "written numbers" convertor?
- Inflections (genders, articles, declensions, etc.)

MickMonaghan · 2020-01-13T16:51:14Z

Can we keep the language used to retrieve the UI strings separate from the language/locale used to format variables/placeholders within a string?
This would be consistent with how some OSs and some string formatting libs already separate UI language from locale formats.

zbraniecki · 2020-01-13T18:01:58Z

Perhaps Fluent already supports this?

Fluent does support it, it's called "partially formatted variables" and currency was the particular example that drove that feature.

The way it works in Fluent is this:

ctx.format('product-cost', {
  amount: FluentNumber(342, {
    currency: "JPY",
  })
});

// Translation can just use "default" formatting options
product-cost = This product costs { $amount }

// Or a translation can specify its own list of options (based on ECMA402 NumberFormat

product-cost = This product costs { NUMBER($amount, minimumFractionDigits: 3) }

An important bit is that the selector (NUMBER) limits which options can be provided by the translator - in case of number, currency is not available for the localizer to specify.

zbraniecki · 2020-01-13T18:09:12Z

Provide free tools with the syntax for authoring and translation (our own online CAT tool?)

Fluent comes with a CAT tool - https://github.com/mozilla/pontoon / https://pontoon.mozilla.org/
A lot of effort in Pontoon at the moment goes into better WYSIWYG for Fluent selectors.

Leave the syntax markup (e.g. HTML) agnostic - the syntax should be able to accept HTML or any other markup but the TMS and or library can implement manipulation how it find best for its use case

I'm not sure if I agree. Features like compound messages are important only when you look at the problem in context of UI widgets. The drive to be agnostic may lead to a syntax that is not really optimized for anything.
While I agree that we should ensure the syntax and data model are useful for wide range of software use cases (and not, say, just for Web/React), having some "P1" targets would help us bring something actually useful imho.
In particular, from my angle, understanding that Software UI is not created by a bunch of imperative calls from JS/C/Java, but is usually defined in some declarative markup is fundamental to how you design features.
If we reject this hypothesis, it will have deep implications on what we end up with.

grhoten · 2020-01-13T19:54:05Z

I previously gave a presentation called Let's Come To An Agreement About Our Words. The presentation covers an older format that we used in Siri, and we're migrating to a newer simplified format. Here are some highlights on what it can do or found was desirable.

It's generally an XML format. The original would use something like ECMAScript/Java beans/UEL for referencing variables and its properties. The UEL syntax was too complicated and was changed to favor more XML with a nicer editor, much like your favorite word processor stores its data in XML without the end user really knowing that low level detail. It's also easier to interchange it with XLIFF when it's XML.
Support for SSML is very desirable for screen readers or virtual assistants.
The messages are by default both printable and speakable, but you can exclusively print or speak a phrase. If you ever need to explicitly speak a number within a given context, this is critical.
Word inflection and grammeme detection (values of grammatical categories) are fundamental parts of the syntax. It's critical functionality with user provided vocabulary. Generally, you need to know the grammatical number, grammatical case, the grammatical gender of the words and the pronunciation of the word (generally just if the word starts or ends with a vowel).
Word inflection can include adding prepositions, articles, pronouns or grammatical states of a given word. For complicated examples, check out Russian, Korean or Arabic.
Number pronunciation is provided by CLDR's RBNF.
Getting a number and noun into grammatical agreement is critical. The grammatical gender of the number comes from the noun. The grammatical number of the noun is generally affected by the value of the number (e.g. 1 or 2). The grammatical case is defined by the translator given the context of the sentence. The translator does not provide the exact inflections by default.
List handling involves inflecting each word. This might mean making each item the definite form.
The "and" (AKA conjunction) list, and the "or" (AKA disjunction) list are able to handle the context correctly for Italian, Spanish and Korean.
There is also the adjective list, which is probably the hardest to get correct for English. For Chinese and Korean, it's a lot easier.
There is a calendar concept based mostly on CLDR's translations. Some functionality is provided to add preposition or postpositions as needed. The grammatical case can be modified as needed. CLDR doesn't handle grammatical case modification that well by default.
There is a measurement concept that is separate from CLDR's implementation to provide precise translations of units of measure, like kilometers and miles. CLDR is more focused on the printable form instead of the speakable form, which is why CLDR is generally ignored when the speakable form is also needed.
It has a highly customized currency concept. CLDR only partially covers support for this functionality. Pronunciation of a currency for its units and subunits in native and foreign contexts is important.

This functionality works or is shipped on Linux, macOS, iOS, tvOS and watchOS. The watchOS support is probably the important thing to highlight because it is the most resource restrictive environment to support. I'm just stating that this functionality can live in resource constrained environments where grammatical correctness of a message is important.

zbraniecki · 2020-01-13T20:08:38Z

Can we keep the language used to retrieve the UI strings separate from the language/locale used to format variables/placeholders within a string?
This would be consistent with how some OSs and some string formatting libs already separate UI language from locale formats.

While we definitely experienced a very vocal community of users of Firefox who want to use different translation from locale formats, this has also been a trap for regular users because date/time formats often contain translations.

For example, Japanese 2020年1月13日星期一下午12:03:10 or 星期一下午12時 (for { weekday: "long", hour: "numeric" }) would be very confusing if placed in a sentence with different locale.

There are even extreme cases. If the user had german translation, with a date that is formatted in en-US, there's a chance of flipping MM/DD and DD/MM order. If the sentence is in german, user has the right to interpret the "05/08" using german "DD/MM" pattern, and be very surprised if they later learn that it was actually en-US "MM/DD` taken from their OS locale formatting preferences.

My initial position is that we generally should, by default, format placeables (numbers, dates etc.) using the same locale as the translation is in, and allow for the develop to provide an alternative language negotiation for formatters in order to handle exceptions like you mentioned.

This is also important once we start talking about the error handling UX. Fluent has been designed to fallback using a locale chain, so if there's an error or missing string in the primary language, we'll fallback on the second best choice, rather than display an error and break the app.
It's an important resilience measure for us.
What's interesting is that that means that the locale chain used for formatters is per-bundle so that in the locale context ["fr-CA", "fr", "en"] we first try to localize a message in fr-CA using fr-CA formatters, but if there are errors and we end up localizing the message using en resources, we'll format the date/times using en locale.

zbraniecki · 2020-01-13T20:16:52Z

@grhoten - this is awesome! Thank you for sharing!

We have some experience with TTS in form of Common Voice project which uses Fluent.

While I don't see it in the translation resources they use now, I remember that in some variant of the project they used fluent's compound messages to represent the spoken/written difference:

time-is =
    .written = { $time }
    .spoken = The time is { $time }

It was an unexpected use of the compound messages, but brought up the idea that having message variants that are recognized as a single unit (with comments, invalidation rules, fallbacking together etc.) is important.

mihnita · 2020-01-13T22:21:10Z

Most OSes allow for a separation between the formatting locale and the resource locale, but it is not always explicit. It is a really useful thing for regional variants. Most applications are localized into Spanish, French, Arabic, etc. Rarely there is a "flavor" like Spanish-Latin America But there are tens of countries using each of these languages, and they use different date / time / number formats. So for the user it is best if one can use the French-Swiss locale (for example), and that will format things for fr-CH, but load the fr resources, with fallback. If the fallback is granular enough (for instance on Android and Java it is string level) then one can have (for example) everything translated into French, and a document (or string) for fr-CH to cover country specific stuff (think legal, or special functionality) Not all systems have a way to tell that the strings really come from "fr". The "application locale" is fr-CH, and the is used for everything. So you never get weird mixtures like French strings + German dates. But I think that we should do better than to format using the same locale as the translation. Not the same locale, but not 100% independent either. I can explain how that works in Android, for example. Cheers, Mihai

mihnita · 2020-01-13T22:31:31Z

About extended plurals, like:

{ count , plural ,
   =0 {No candy left}
  one {Got # candy left}
  <10 {Got a few candies left}
  10-20 {Got a handful candies left}
other {Got # candies left} }

And it was a huge problem for proper localization.
It was banned in most places I've been.

grhoten · 2020-01-13T22:58:20Z

"You owe {someNumber, number, currency}." - then the actual currency is inferred from the current locale - which is just nasty.

@MickMonaghan I agree. Actually, currency formatting that I've been involved with disallows this scenario. Currency formatting is a measured unit and not a number. The unit has to be explicitly defined outside of the current message.

zbraniecki · 2020-01-13T23:07:02Z

I am quite reluctant about it.

I agree with @mihnita. Such translations are rejected by the Mozilla L10n Drivers and the logic we use is that this is not a plural-based variant of the same string, but a set of separate strings, and which one to use should depend on some other selector than a localizer trying to build a selection like in the example.
We documented that recommendation in https://github.com/projectfluent/fluent/wiki/Good-Practices-for-Developers#prefer-separate-messages-over-variants-for-ui-logic

nbouvrette · 2020-01-25T20:26:26Z

@mihnita

if the XLIFF 1.2 (final spec in Feb 2008) is not properly supported, how long will it take for CAT tools to support it?

It's been 12 years already... I think it's safe to say it will never be fully supported? :) And it's not just CAT tools, it's also TMSes. There are dozens of both these products on the market and some of the top players are not known to move very quickly.

And this is also an argument for developing the format while considering at all times how that will interact with existing CAT tools. That includes not only how things are presented to the translators, but how leveraging works (or not).

Developing a new format can be quite challenging to have broad support (XLIFF is a good example). I still believe it would be a lot simple if we can find a way to remain format agnostic.

Another advantage if we can stay format agonistic is that most TMSes can support multi-level filter when parsing strings which means, you could have an HTML document with MessageFormat strings inside and they could be both parsed and presented correctly to linguists. This could also work the other way around.

Most CAT / TMs assume a 1:1 model "you give me a source message, I give you back a translated message". When the input is 2 messages (singular / plural) and the output is 4 messages (for example because Russian has 4 plural forms), then we run into problems.

Exactly, this is the biggest challenge - most linguistic tools expect symmetric keys in both the input and output and one input can have multiple outputs in multiple languages that have different rules. This is also why MessageFormat works well, regardless of the file format.

Do you think this would be easier in a document? (like Google docs?) It would keep the different "threads" together.

I tried Google docs to have conversations in the past and so far Git seems better - I would still love to propose having our own Slack at some point if we start having more active conversations but Git is also good at keeping everything documented. I just tagged you in this new thread when you have a chance!

The current MessageFormat syntax is a bit of a weird one. It ignores whitespaces in the syntax outside the messages, but it preserves the whitespaces in the messages.

Is there a reason for this? I wrote a parser that preserves both whitespaces. I used this both for syntax highlighting and also auto-completion/validation & error detection. It's a lot easier to be able to refer to a character position without changing the input for example.

Inline comments / extendable inline

I have had mixed results with this. Some translators translate the comments too, especially for first-timers, and they don't realize that the final message recipient won't see them, which wastes translation time. There are other times when there is information that is best conveyed inline. Sometimes the comments get in the way of readability. I can see the pros and cons of such functionality.

@grhoten

+1 on your comment - there are other ways to provide comments (typically called context) to linguists which handled correctly today by most TMSes. If we need inline context, there might be something too complex with the syntax.

aphillips · 2023-07-29T15:54:52Z

Closing resolve-candidates per discussion in 2023-07-24 call

romulocintra changed the title ~~Requirements List~~ Requirements Nov 27, 2019

romulocintra mentioned this issue Jan 2, 2020

Meeting Agenda : 2020-01-06 #9

Closed

romulocintra changed the title ~~Requirements~~ Requirements - MF wishlist Jan 6, 2020

romulocintra added the requirements Issues related with MF requirements list label Jan 7, 2020

romulocintra mentioned this issue Jan 7, 2020

MVP, Roadmap, Organization #4

Closed

romulocintra mentioned this issue Jan 7, 2020

Meeting Agenda : 2020-01-27 #12

Closed

echeran mentioned this issue Jan 24, 2020

Create and Collect Use Cases #2

Closed

nbouvrette mentioned this issue Jan 25, 2020

Support for inflections (cases) #16

Closed

romulocintra mentioned this issue Jan 27, 2020

Resource Format vs Message Format #1

Closed

This was referenced Jan 28, 2020

Support extended plural forms ("range based") #21

Closed

Support custom / pluggable "formatters" (beside Date/Time/Number ...) #22

Closed

This was referenced Jan 30, 2020

Well-defined timezone handling #25

Closed

Extendable inline markup #26

Closed

echeran mentioned this issue Feb 4, 2020

Define technical terms #30

Closed

nbouvrette mentioned this issue Feb 8, 2020

Separation of language and formatting locale #29

Closed

romulocintra removed the requirements Issues related with MF requirements list label Feb 18, 2020

dchiba mentioned this issue Mar 18, 2020

Partially resolved arguments #43

Closed

romulocintra added the Stale label Jul 20, 2020

mihnita added requirements Issues related with MF requirements list and removed Stale labels Sep 24, 2020

zbraniecki mentioned this issue Sep 27, 2020

Localization Units Formatting #118

Closed

mihnita mentioned this issue Oct 19, 2020

Collect all possible MessageFormat use cases #119

Closed

echeran mentioned this issue Apr 30, 2021

Candidate Features to be implemented/tested in the both Data Model versions #165

Closed

eemeli mentioned this issue May 2, 2021

"Local" text transformations (contextual changes) #160

Closed

mihnita added a commit that referenced this issue May 13, 2021

List formatting with grammatical inflection on each list item #3

dc01a8f

stasm mentioned this issue Aug 18, 2021

(third) Implement formatToParts #189

Merged

macchiati mentioned this issue Sep 23, 2022

When do we evaluate the local variables? #299

Closed

macchiati mentioned this issue Feb 27, 2023

proposal: replace first-match with best-match #351

Closed

aphillips added Stale resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. labels Jun 28, 2023

aphillips closed this as completed Jul 29, 2023

This was referenced Feb 22, 2024

Implement the default registry in the spec #659

Merged

Add note to "Function Resolution" section about function argument and result types #686

Merged

macchiati mentioned this issue Nov 3, 2024

Implement :unit as Proposed RECOMMENDED in the registry #922

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirements - MF wishlist #3

Requirements - MF wishlist #3

romulocintra commented Nov 27, 2019 •

edited

Loading

romulocintra commented Jan 6, 2020 •

edited

Loading

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020 •

edited

Loading

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020 •

edited

Loading

romulocintra commented Jan 6, 2020

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020

mihnita commented Jan 7, 2020 via email

romulocintra commented Jan 7, 2020 •

edited

Loading

longlho commented Jan 7, 2020 •

edited

Loading

romulocintra commented Jan 7, 2020 •

edited

Loading

MickMonaghan commented Jan 12, 2020

nbouvrette commented Jan 12, 2020 •

edited

Loading

MickMonaghan commented Jan 13, 2020

zbraniecki commented Jan 13, 2020

zbraniecki commented Jan 13, 2020 •

edited

Loading

grhoten commented Jan 13, 2020

zbraniecki commented Jan 13, 2020 •

edited

Loading

zbraniecki commented Jan 13, 2020

mihnita commented Jan 13, 2020 via email •

edited

Loading

mihnita commented Jan 13, 2020

grhoten commented Jan 13, 2020

zbraniecki commented Jan 13, 2020

nbouvrette commented Jan 25, 2020

aphillips commented Jul 29, 2023

Requirements - MF wishlist #3

Requirements - MF wishlist #3

Comments

romulocintra commented Nov 27, 2019 • edited Loading

romulocintra commented Jan 6, 2020 • edited Loading

List of possible requirements

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020 • edited Loading

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020 • edited Loading

romulocintra commented Jan 6, 2020

zbraniecki commented Jan 6, 2020

jamuhl commented Jan 6, 2020

mihnita commented Jan 7, 2020 via email

romulocintra commented Jan 7, 2020 • edited Loading

longlho commented Jan 7, 2020 • edited Loading

romulocintra commented Jan 7, 2020 • edited Loading

MickMonaghan commented Jan 12, 2020

nbouvrette commented Jan 12, 2020 • edited Loading

MickMonaghan commented Jan 13, 2020

zbraniecki commented Jan 13, 2020

zbraniecki commented Jan 13, 2020 • edited Loading

grhoten commented Jan 13, 2020

zbraniecki commented Jan 13, 2020 • edited Loading

zbraniecki commented Jan 13, 2020

mihnita commented Jan 13, 2020 via email • edited Loading

mihnita commented Jan 13, 2020

grhoten commented Jan 13, 2020

zbraniecki commented Jan 13, 2020

nbouvrette commented Jan 25, 2020

aphillips commented Jul 29, 2023

romulocintra commented Nov 27, 2019 •

edited

Loading

romulocintra commented Jan 6, 2020 •

edited

Loading

jamuhl commented Jan 6, 2020 •

edited

Loading

jamuhl commented Jan 6, 2020 •

edited

Loading

romulocintra commented Jan 7, 2020 •

edited

Loading

longlho commented Jan 7, 2020 •

edited

Loading

romulocintra commented Jan 7, 2020 •

edited

Loading

nbouvrette commented Jan 12, 2020 •

edited

Loading

zbraniecki commented Jan 13, 2020 •

edited

Loading

zbraniecki commented Jan 13, 2020 •

edited

Loading

mihnita commented Jan 13, 2020 via email •

edited

Loading