Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially resolved arguments #43

Closed
zbraniecki opened this issue Feb 13, 2020 · 24 comments
Closed

Partially resolved arguments #43

zbraniecki opened this issue Feb 13, 2020 · 24 comments
Labels
requirements Issues related with MF requirements list

Comments

@zbraniecki
Copy link
Member

Fluent supports ability to pass a partially formatted argument. For example:

bundle.formatValue("key", {
  date: FluentDateTime(new Date(), {month: "long", year: "numeric", day: "2-digit"})
});

is an equivalent of:

key = Today is { DATETIME($date, month: "long", year: "numeric", day: "2-digit") }

except that several arguments are not available to the localizer (currency in NumberFormat).

The features stack - so developer can provide defaults, and localizer can just place it as { $date } or customize it via { DATETIME($date, year: "2-digit") } if needed.

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2020

And 100% equivalent to this ICU syntax: Today is {date, DATE, ::ddMMMMy}

@zbraniecki
Copy link
Member Author

How would a developer specify the arguments to date or some other variable in ICU?
Basically:

bundle.formatValue("key", {
  date: FluentDateTime(new Date(), {month: "long", year: "numeric", day: "2-digit"})
});

?

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2020

In ICU you have 3 different ways to control formatting.
The result is the same ("Today is February 15, 2020") for all of them.

  1. The formatting info is in the message itself:
MessageFormat mf = new MessageFormat("Today is {expDate, DATE, ::dMMMMy}", locale);
String result = mf.format(args);

Which is very easy to use, little code, and does the right thing.

  1. You can associate a formatter with the parameter:
DateFormat df = DateFormat.getInstanceForSkeleton("dMMMMy", locale);
MessageFormat mf = new MessageFormat("Today is {expDate}", locale);
mf.setFormatByArgumentName("expDate", df);
String result = mf.format(args);

You can create the formatter and configure it in any way you want, not necessarily with skeletons.
Can use patters, the width enums (full/long/show/narrow enums), setters, change calendar, whatever, full control.

  1. Pass a string, formatting done outside the MessageFormat
DateFormat df = DateFormat.getInstanceForSkeleton("dMMMMy", locale);
args.put("expDate", df.format(now));
MessageFormat mf = new MessageFormat("Today is {expDate}", locale);
result = mf.format(args);

Which is the non-interesting case for our project.
Something like printf would be good enough :-)


The plumbing for all the examples above:

ULocale locale = ULocale.forLanguageTag("en-US");
Date now = new Date();

//  you pass the arguments in a map
Map<String, Object> args = new HashMap<>();
args.put("expDate", now);

One can always have syntactic sugar to make the map of arguments friendlier.

In Java 9 you can do:

Map<String, Object> args = Map.of( "expDate", now, "user", userName, "count", 42 }

In older Java you can do:

Map<String, Object> args = new HashMap<>() {{ put("expDate", now); }};

@zbraniecki
Copy link
Member Author

zbraniecki commented Feb 15, 2020

Ah, thank you! (2) is what I was looking for. That's exactly what I'd like to preserve for the new API.

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2020

++1 to preserve (and enhance) #2 :-)

But I am not sure it is 100% what you want.
It does not allow to partially resolve things.

You can't for example create the formatter, pass it to MessageFormat, then have the third "argument" in the placeholder "override" something.

So if you do:

DateFormat df = DateFormat.getInstanceForSkeleton("dMMMMy", locale);
MessageFormat mf = new MessageFormat("Today is {expDate, date,  ::MMMjm}", locale);
mf.setFormatByArgumentName("expDate", df);
String result = mf.format(args);

This will not add hour and minutes to the format, and will not change the month to the abbreviated form.

Not saying it would not be useful, or it can't be changed, I'm only explaining how it works today.

@dchiba
Copy link

dchiba commented Feb 21, 2020

I think the developer or message author should be able to specify one of these:

  • a predefined formatting style such as short, medium, long and full
  • a skeleton pattern (e.g. "yMMMd")
  • a complete pattern string per LDML (e.g. "d MMM y")

I think it should not be possible for them to specify partially resolved arguments like {month: style_a, day: style_b, year: style_c} because it is highly likely that an arbitrary combination of the individual field styles specified by the developer or message author is unconventional in some target locales. This practice goes against the general i18n principle to keep the code locale neutral.

@zbraniecki
Copy link
Member Author

zbraniecki commented Feb 21, 2020

But I am not sure it is 100% what you want.
It does not allow to partially resolve things.

Ah, thank you for the explanation. In that case, I'd like to see the Fluent approach introduced as a proposal to extend what MF is doing.

I think it should not be possible for them to specify partially resolved arguments like

There are four (not three) ways to specify how you want your date to look like, plus one set of "fiddles" to adjust:

  1. An option bag - {month: "long", year: "numeric", day: "2-digit"}
  2. Style - {dateStyle: "long", timeStyle: "short")
  3. Skeleton - yMMMd
  4. Pattern - d MMM y

And fiddles, like hourCycle which are user preferences that can be applied onto any of the 4 to adjust it.

Now, my most important claim is that we should not allow for (4). Pattern is not an internationalizable format. If you want to use a pattern, you are actually actively preventing internationalization from happening by specifying a format that will be replicated in each locale rather than allowing each locale to use its format.

Therefore, if you want a format, you should just format the date to your pattern of choice and then pass the result to localization as a String. (for localizable fields like weekday or month name we'll have Intl.DisplayNames.)

What I think you're saying is that we should avoid allowing partial arguments for (1) where the developer sends a bag, and the localizer overrides some fields. I'm torn on this. I can see how this could be confusing, but I can also see how it could be useful. For example:

bundle.formatPattern(pattern, {
    date: DateArgument(new Date(), {year: "numeric", month: "short", day: "2-digit"})
});

// en-US using default
let pattern = `Today is { $date }`;

// de overriding month length
let pattern = `Today is { DATETIME($date, month: "long") }`;

does seem rather useful to me - a german translator was able to adjust the month field to improve the output.

I'm not convinced (3) should be allowed at all. I haven't seen a good use case where skeleton is consciously used and gives better value than (1) or (2), but I'm open to being corrected.

For (1) and (2) - I think we may want to be careful about mixing it (just like we prevent mixing it in ECMA402), but I could imagine a scenario where the developer defines dateStyle and the localizer says yes, use dateStyle "medium" for my locale, but override month field to be long please.

@mihnita
Copy link
Collaborator

mihnita commented Feb 21, 2020

Yes, I think there are benefits from being able to mix "directives" from the formatter itself with overrides from the translator. Needs some thinking on how to get it right (probably you did that for Fluent, I would need to catch up with it :-)

100% agree that 4 is a bad practice and we should not allow it.
Especially for languages that are used in more than one country.
People translate in 1, max 2 language variants (pt-BR / pt-PT, or es + es-419, en + en-GB)
So you end up with an "Arabic" or "French" pattern forced on tens of countries.

I consider 1 (option bag) and 3 (skeleton) to be equivalent.
They represent the same information in 2 different ways. But I don't see why we would support both, one is enough.
I like 3 because there is a lot less typing, and I find it more readable :-)
But 1 is more readable and less error-prone.

So I'm fine with 1 :-) Or 3. We vote / flip coins / whatever. But not both.


There might be some debate on what is OK for translators to change and what is not...

Changing MM to MMM? Maybe. To MMMM? I have doubts... there were probably space restrictions that forced the developer to specify short formats to begin with.
Same for "short" => "long"

And adding extra fields to the option bug would be really problematic.
Imagine we send this out:
{ month: "long", year: "numeric", day: "numeric" }
and get this back:
{ month: "long", year: "numeric", day: "numeric", hour:"numeric", minute:"numeric" }
Is that OK?

Or worse, we get back { year: "numeric", day: "numeric" }
Does "21, 2020" make any sense?

Or is it OK for a lone Spanish translator to "inflict" a 2 digit year on the whole Spanish speaking world?


I'm not 100% against the idea, I'm just thinking out loud here, pros and cons.
But trying to show that it might not be as clear cut as we think.

Maybe we can put restrictions (change the items in the bag, but not add / remove)
And maybe change "one step only" (long vs medium month, but not long vs numeric)

@zbraniecki
Copy link
Member Author

Or worse, we get back { year: "numeric", day: "numeric" }
Does "21, 2020" make any sense?

Completely accidentally, but Fluent prevents it because a localizer cannot "remove" an option. :)

But yes to everything else you said - additions may be interesting, same as increasing length - I'd argue that in most cases switching from short to long won't break UI and if someone does override it means that they do see a value in it, and they (I hope) have a way to test the result).

But those kind of restrictions could be made easy to validate, warn by tooling, without forbidding it straight away by the syntax.
Or we could provide some merging strategy between dev and l10n provided info and allow customers to override their own merging algo if they want to so that Mozilla can use different one from Apple etc. With good "safe" defaults for majority of users that prevent footguns.

@zbraniecki
Copy link
Member Author

Or is it OK for a lone Spanish translator to "inflict" a 2 digit year on the whole Spanish speaking world?

I'd argue this is similar to the same lone translator inflicting a translation of the word "Tab" onto the whole Spanish userbase. That's the power they get :)

And maybe change "one step only" (long vs medium month, but not long vs numeric)

That could be a good default!

@asmusf
Copy link

asmusf commented Feb 22, 2020 via email

@stasm
Copy link
Collaborator

stasm commented Feb 24, 2020

A few thoughts on the topic of (1) vs. (3) when they appear in messages:

  1. An option bag - {month: "long", year: "numeric", day: "2-digit"}

Pros:

  • Closer to English prose; easier to understand what it does just by looking at it.
  • Allows unusual combinations of options (also a con).
  • Works for data types which don't yet have skeleton parsers.
  • Works for custom data types and formatters. See Support custom / pluggable "formatters" (beside Date/Time/Number ...) #22.
  • Most translators will likely be content with the default formatting set by the developer.*

Cons:

  • Much more verbose.
  • Closer to English prose, which makes it look like maybe it's localizable. Should month be translated? Or should "long" be translated since it's enclosed in quotes? Related to Syntax design should aid reader in what is translatable #51.
  • Likely requires a lookup in the docs when editing to see what options are available.
  • Allows unusual combinations of options (also a pro). Harder to test and find bugs.
  1. Skeleton - yMMMd

Pros:

  • Concise syntax.
  • "Weird" enough to not look like it's supposed to be translated.
  • Skeletons are well-defined. Easier to test.

Cons:

  • Likely requires a lookup in the docs to see what it means (unless the localizer is familiar with the skeleton syntax).
  • Doesn't work with custom data types nor with custom formatters.

* This assumption has important consequences. If true, it might be that we won't see much of this syntax in messages because the default formatting as set by the developer will be enough. If that's the case, verbosity vs. conciseness should not be a deciding factor in choosing the syntax for translators.

@zbraniecki
Copy link
Member Author

I'd add to pros of the option bag:

  • easier to create a WYSIWYG CAT UX that allows user to fiddle with than for skeleton.
  • easier to reason about overrides in the l10n message (override "month" length only)

@mihnita
Copy link
Collaborator

mihnita commented Feb 24, 2020

easier to create a WYSIWYG CAT UX that allows user to fiddle with than for skeleton.

I think that is really straightforward, and not an issue.
There is an almost 1:1 mapping between the skeletons and the options bag.

@mihnita
Copy link
Collaborator

mihnita commented Feb 24, 2020

The assumption is that the choice of using long vs short date formats is
a matter of application design and never one of cultural
preference/convention. Is that true? Was that true for the example where
the German translator "fixed" something? Or was that a case where short
format was not forced by application design, but simply acceptable to
English users in that context, but looked odd in another language?

I think that this is usually a fuzzy thing :-)
To me (as engineer) a lot of the UX decisions seem (at times) somewhat arbitrary :-)
And at times I try to fight back (for instance on the "all caps for buttons / labels" trend)
So yes, "Mar 3" vs "March 3" is probably (?) arbitrary, if there is enough space.

Maybe this is another topic for the "Design Principles" (#50): do we want translators to be able to fix bad i18n in the sources? Is it better to fix something in 80 languages instead of fixing the source?

@mihnita
Copy link
Collaborator

mihnita commented Feb 24, 2020

I'd argue this is similar to the same lone translator inflicting a translation of the word "Tab" onto the whole Spanish userbase. That's the power they get :)

It is not really similar.
For the translation you really have no choice. You can't translate into 20+ flavors of Spanish.
And you can't have 20 translators from several companies arguing and voting and bringing proof for their choices of terminology, over 20 years.

But we have that for CLDR.
So in general I would trust the CLDR data over a translator telling me "this date format is good for all of Latin America"

@asmusf
Copy link

asmusf commented Feb 24, 2020 via email

@mihnita
Copy link
Collaborator

mihnita commented Feb 24, 2020

The question is whether there are reasonable scenarios where you need a way
for a downstream override.

Yes. Rarely, but I've seen it.


Anyway, this is not something that one can forbid in the syntax of MessageFormat itself.

If a company decides to allow translators to edit those parameters, and there are no technical barriers (for example limited TMS tools), there is nothing to prevent it.

@asmusf
Copy link

asmusf commented Feb 24, 2020 via email

@zbraniecki
Copy link
Member Author

The CLDR data answer the questions like: what is the correct long date
format for language/locale X?
AFAIK, they don't answer questions like: is short date common or unusual
for this language/locale?

It feels to me that the latter is the consequence of the former.

If "correct" (a'ka most canonical) long date format for locale X uses month style long, then we know that short, narrow or full is not the canonical one.

What I think @mihnita is saying that if the canonical for style=long is long then short and narrow are unlikely to be natural (hence his suggestion of one step). I'd not forbid such an override, but maybe tooling show a warning (Notice: your date style month override is substantially different from what is considered canonical. Please, make sure this is intentional and the output looks as expected.)

@dchiba
Copy link

dchiba commented Mar 18, 2020

(4) pattern is useful for cases in which the same format is expected regardless of the locale. For example, this may always print the ISO 8601 standard calendar date (e.g. 2020-03-18):

{date, DATE, y-M-d}
(Syntax is illustration purpose only)

It can also be used for the uncommon cases in which it is desirable to override the output format. To enable easy override, there can be a notation to use a custom formatter, or reference another message that defines the customized pattern. (Someone nominated "Message reference - from another Message" in the giant thread #3 )

To override for a specific locale with a custom formatter while a common pattern derived from the locale is used for the other locales, the special case could be described in the message itself. For example, this message in an English(en) resource bundle may print 18-03-2020 for the British(en-GB) locale, while the standard short style is used for the other locales, like 3/18/2020 for the US locale:

{date, DATE, short{en-GB:dd-MM-y}}
(Syntax is illustration purpose only)

A caveat in using a custom pattern is that date formats are sensitive to locales, while translation bundles are organized by languages. So it would be advisable to override in a language bundle that encompasses the target locale(s), like en-* in the _en bundle, es-* in the _es bundle, and so on, or supply the locale specific overrides in separate resource bundles from the regular bundles that contain message strings.

CLDR defines all common conventions through (2) styles or (3) skeletons, and (4) pattern string complements them, to cover all possible scenarios.

I am against supporting (1) option bag because it results in unconventional output. It is not backed by CLDR locale data and it is against i18n principles. An application should not hardcode a locale sensitive convention like date format. (1) option bag introduces unnecessary complexity that conflicts with #48 Syntax Simplicity. Rare cases can be handled with (4).

@zbraniecki
Copy link
Member Author

(4) pattern is useful for cases in which the same format is expected regardless of the locale.

This is precisely not a localization issue then. I'm of the opinion that, in particular in this case, because people do tend to confuse that and not see the difference, it should be especially pronounced from the API that pattern formatting is not related to internationalization.

{date, DATE, short{en-GB:dd-MM-y}}

I also disagree with this concept. It seems very unscalable (I cannot imagine trying to override 5 or 10 locales) and it locks a pattern in place forever as part of the message.

We are working on ways to override data and I agree that we need them, but rather than encoding a pattern in a message, we'll want to plug an additional data provider to the instance of the formatter that can be used for overrides.

I am against supporting (1) option bag because it results in unconventional output.

I don't understand this position. Can you explain how is it different from (3) in your opinion in terms of produced output?

@dchiba
Copy link

dchiba commented Mar 19, 2020

I also disagree with this concept. It seems very unscalable (I cannot imagine trying to override 5 or 10 locales) and it locks a pattern in place forever as part of the message.

I don't like it myself, but no matter how we engineer the solution, user expectations may be different from the application behavior, so I tried to come up with a way to override the locale default behavior and meet unusual user expectations. Local conventions are defined in CLDR, but not every election is unanimous. It's great to hear that you are working on ways to override locale data. With that in place, it would probably be unnecessary to have another (odd) override mechanism like this. I would gladly withdraw it.

(1) and (3) are significantly different because (1) gives too much freedom to the developer/message author. It allows them to specify any possible combinations of the options and that combination can be unconventional depending on the locale. For example, the combination used in a prior example:

{month: "long", year: "numeric", day: "2-digit"}

would be mapped to the following field values in Japanese locale:

{month: "3月", year: "2020", day: "18"}

Even if the API re-ordered the fields as appropriate for the locale to year, month, day, the resulting output:

2020 3月 18

would still look quite odd. This is an excellent example of an arbitrary combination of individual field options results in an unexpected output. Some of the possible combinations could be mapped to an existing skeleton pattern, in which case it would be possible to guarantee correct formatted output for all locales. However, like in this case, certain combinations just don't make sense for some locales.

In contrast, (3) produces correct output for all locales because all skeleton patterns are endorsed by CLDR. They are defined through the voting process so there is no risk of producing an unexpected output (except for the case locale data needs to be overridden).

@mihnita mihnita added the requirements Issues related with MF requirements list label Sep 24, 2020
@aphillips
Copy link
Member

This discussion (about expression options) has been addressed in the design for quite a while. Please open new, specific issues if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requirements Issues related with MF requirements list
Projects
None yet
Development

No branches or pull requests

6 participants