Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extended plural forms ("range based") #21

Closed
jamuhl opened this issue Jan 28, 2020 · 27 comments
Closed

Support extended plural forms ("range based") #21

jamuhl opened this issue Jan 28, 2020 · 27 comments
Labels
requirements Issues related with MF requirements list

Comments

@jamuhl
Copy link

jamuhl commented Jan 28, 2020

It would be great to define beside the exact number match cases =0 some range cases - so a phrase like Only a few items are left! can be defined beside the other options.

Example (not definitive syntax!):

{ count , plural ,
   =0 {No candy left}
  one {Got # candy left}
  <10 {Got a few candies left}
  10-20 {Got a handful candies left}
other {Got # candies left} }

Note:

Never, had the need myself but was requested often enough from developers using i18next (To the point we added finally support for them).

See previous comments:

@zbraniecki
Copy link
Member

Asking localizers to pick the ranges may not work great for quality.
Generally speaking, notions of count categories may be worth exploring on the CLDR level similar to PLURAL rules, and then if Russian ends up with 4 categories "[one, two, threefive, fiveten, many]" and French with three "[one,several,many]" they'd use it just like plural:

{ count, COUNT_CATEGORY,
    =one {One candy left}
    =two {Two candies left}
    =threefive (Got a few candies left}
    =fiveten {Got several candies left}
    =many {Got # candies left}

@romulocintra romulocintra added the requirements Issues related with MF requirements list label Jan 28, 2020
@nbouvrette
Copy link
Collaborator

Asking localizers to pick the ranges may not work great for quality.

I think that for extended categories, it could be up to the author to define them. They would also probably have to take precedence over the CLDR ones when used, regardless of the target language. This is also why having some sort of "plural conversion" API or tool could make this type of functionality easier to use while keeping good quality.

@zbraniecki
Copy link
Member

I think that for extended categories, it could be up to the author to define them

Then you ask a developer to pick numerical categories that work for every locale in the World.

@nbouvrette
Copy link
Collaborator

nbouvrette commented Feb 2, 2020

I think that for extended categories, it could be up to the author to define them

Then you ask a developer to pick numerical categories that work for every locale in the World.

I feel we might be talking about different things - let me try to explain with examples to see if I am missing anything:

An author writes the following string in en-US:

{ count , plural ,
	=0 {No candy left}
	one {Got # candy left}
	<10 {Got a few candies left}
	10-20 {Got a handful candies left}
	other {Got # candies left} 
}

Whatever toolset we would have could know that this string is valid because it contains the two english plural categories: one and other

You want to translate in Simplified Chinese, which requires only the other category (no plurals). You will end up with:

{ count , plural ,
	=0 {没有糖果了}
	<10 {还剩一些糖果}
	10-20 {剩下一些糖果}
	other {还剩#个糖果} 
}

Since it actually does not use plural you could even use a select syntax if you prefer:

{ count , select ,
	0 {没有糖果了}
	<10 {还剩一些糖果}
	10-20 {剩下一些糖果}
	other {还剩#个糖果} 
}

To have this flexibility, this means that you would have the right tools, and possibly have linguists be familiar with the syntax, and translate raw syntax.

This would work the same way for languages that have more plural forms like Arabic.

Let me know if you see some examples where this would not work.

@longlho
Copy link

longlho commented Feb 3, 2020

How would your example be translated to languages with few & many? Some languages have those rules dependent on fractional digits as well instead of range. Or take en for example, 1 is one, but 1.00 is other. How would that work in your example?

@nbouvrette
Copy link
Collaborator

How would your example be translated to languages with few & many? Some languages have those rules dependent on fractional digits as well instead of a range. Or take en for example, 1 is one, but 1.00 is other. How would that work in your example?

The way I see this (unless I misunderstand your question), the plural categories per language would not change - they would still have the same values for each language. The range extension would only be applied on top of the plural categories. This means, if you set ranges, you try to apply them first (which is why the message should be different than plurals messages). If a range does not apply to a given value, it would then fall back on the plural rules for that language.

Translations would work the same way. If you have one and other in English, we would expect the target language to use the right categories. It would be the same for ranges.

Do we have examples where ranges would not work in certain languages?

Also, I would expect ranges to be optional, just like explicit arguments.

I see ranges the equivalent or "advanced explicit arguments" if this make sense.

@longlho
Copy link

longlho commented Feb 9, 2020

I think here u're assuming the word candy has 2 plural forms as well, not just the handful vs few part and that's not correct.
So take your example and use locale sr (Serbian), which, one is applied to all numbers ending w/ 1, so 2.1 or 11. Let's use kilometer which has 3 plural forms for the word kilometer as well:

1 kilometer = 1 километар
1.1 kilometer = "a few" километар or "a few" километра or "a few" километара

you'd have explode that range out so translators can provide the correct form for, literally every rule in that range right? And that also has other assumptions structurally as well

@nbouvrette
Copy link
Collaborator

you'd have explode that range out so translators can provide the correct form for, literally every rule in that range right? And that also has other assumptions structurally as well

I'm quite familiar with the different plural forms from different languages and I'm still not 100% sure I understand what you are trying to explain.

Maybe you are picturing scenarios where someone would concatenate a string with plurals? If I may try to illustrate:

Got { count , plural ,
	=0 {no candy}
	one {# candy}
	<10 {a few candies}
	10-20 {a handful of candies}
	other {# candies} 
} left

In Serbian, this would become:

{ count , plural ,
	=0 {Нема више слаткиша}
	one {Остао је # бомбон}
	two {Остало је # бомбона}
	<10 {Остало је неколико бомбона}
	10-20 {Остало је прегршт бомбона}
	other {Остало је # бомбона} 
}

In most cases, you will have to remove the concatenation from the source. Even better, start using a full sentence in the source as well. This can be defined as one of our best practices, or we even @mihnita suggested enforcing this.

But I'm not sure I understand why extending plurals (or select) like this would require anyone to explode ranges? I think the proposal here is not to literally use the range as values (as defined in the plural rules) but just as an additional explicit selector to allow for extra fluency.

@longlho
Copy link

longlho commented Feb 10, 2020

A range can span multiple plural rules though, so 1 message to capture that wouldn't be correct. It'd have to be.

<10 one
<10 few
<10 many
<10 other

At that point idk if this is super useful

@mihnita's example is exploding the combination out.

@jamuhl
Copy link
Author

jamuhl commented Feb 10, 2020

@longlho ranges are optional - if a range makes no sense in another language just don't define it it there:

// english
{ count , plural ,
   =0 {No candy left}
  one {Got # candy left}
  <10 {Got a few candies left}
  10-20 {Got a handful candies left}
other {Got # candies left} }

// german
{ count , plural ,
   =0 {Keine Süssigkeiten übrig}
  one {Eine Süssigkeit übrig}
  <10 {Wenige Süssigkeiten übrig}
other {# Süssigkeiten übrig} }

Might not be simple for translators - but this was asked a lot from developers using i18next

@nbouvrette
Copy link
Collaborator

I think I might have finally understood @longlho's concern after giving this more thought.

Let's presume you use this syntax as the source:

{ count , plural ,
	=0 {No candy left}
	one {Got # candy left}
	<10 {Got a few (#) candies left}
	10-20 {Got a handful (#) candies left}
	other {Got # candies left} 
}

Now, this could be a problem in some language since you are using back the variable # in the range message. I guess this could be solved 2 ways:

  1. Make variable inside range syntax illegal (not sure I like this)
  2. Provide good tools or linters to make sure people are aware that if they use this syntax, they might need to remove either the variable in some language or the range message completely

I think this discussion is starting to sound a bit like this other one with @zbranieck.

The more flexibility you provide, the trickier it is to ensure that the output will keep good linguistic quality both at the authoring and translation stages.

@longlho
Copy link

longlho commented Feb 11, 2020

hmm I don't think that's it. The word candies in a few candies can have multiple plural forms so which form would you pick? That depends on the number you pass into the PluralRules regardless of whether you're displaying that value in the output or not.

@jamuhl how do you make sure a range makes no sense in another language is enforced? Per LDML range makes no sense at all because the rule depends on a specific number & its digits (https://unicode.org/reports/tr35/tr35-numbers.html#Plural_Operand_Meanings).

IMO this feature brings too many footguns for devs to make assumptions on how to declare their source string.

@jamuhl
Copy link
Author

jamuhl commented Feb 11, 2020

@longlho my opinion - you can't enforce everything...should be part of the QA to review content in context (not always possible) but just removing a feature because you can't enforce something is like stopping writing javascript as you can't enforce your code is 100% correct.

@longlho
Copy link

longlho commented Feb 11, 2020

I'd at least not encourage that behavior by adding it to a standard. Its value is still rather questionable IMO since I haven't seen a need for it at Yahoo, Dropbox & react-intl

@jamuhl
Copy link
Author

jamuhl commented Feb 11, 2020

I'm ok with that (it's for sure not the most needed feature) -> just adding it afterward outside of the standard will be a lot harder.

Just a fast search on i18next:

@mihnita
Copy link
Collaborator

mihnita commented Feb 14, 2020

I think the rule of thumb should be that 'knobs' in the localizable message should be there ONLY if they are required for linguistic reasons.

We should not use them in order to move stuff out of the code.

Unless one can show that "in language X the range 1-10 should use 'handful', but language Y would do that for 1-8" then this is not a linguistic requirement.

If the developer whats something like the example, they should create 3 different messages and select them in code.

I've seen (for real!) stiff like this:

{errorCode, select,
   notfound {Page not found}
   redirect {The page was moved...}
   timeout {Time out retrieving the page}
    ... 200 other messages
   default {Unknown error}
}

This is terrible, and has nothing to do with localization.
It is an abuse.

And the only thing that might be OK is this case from localization side would be a way to group messages.

@zbraniecki
Copy link
Member

I agree with @mihnita . See #21 (comment)

@nbouvrette
Copy link
Collaborator

You also convinced me :) in this same spirit, should we even allow select syntax? I know it is as a sort of catch-all syntax for now but if we could have a better way to "group" messages it might no longer be useful? This way we could focus on new syntaxes that solve linguistic issues.

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2020

If we allow for groups and custom (developer provided) formatters then yes, select does not have a good use anymore.

I think it was there in ICU just because there was no clean way to do custom formatters.

The only reasons to keep it (that I can think of):

  • would help migrating old style messages to the new style
  • saves many developers from writing their own custom formatters for simple data types

@nbouvrette
Copy link
Collaborator

and custom (developer provided) formatters

I don't know exactly what you have in mind for this, but it sounds a bit like how Fluent handles formatting.

If we go that route then it will be important to bridge the link between the formatters and the translation. For example, if you change code that impacts a formatter, how do you make sure the linguist is aware and can update the translation if needed?

Also the risk there, just like the select statement is that formatters can be used to solve non-linguistic problems. This is why I think that if we make this catch-all solution more powerful, then it might take longer until we build plural-like solutions that would avoid using these syntaxes.

@stasm stasm mentioned this issue Feb 17, 2020
@mihnita
Copy link
Collaborator

mihnita commented Feb 17, 2020

That is definitely a risk.

But if we don't have a standard way to do it (with the proper warnings and restrictions) then developers will find hacky ways.

I've seen MessageFormat with date placeholders and setting the formatter to something doing non-date formatting. And I've seen select used to do plural because some of our tooling did not allow nested plurals (artificial restriction, not syntax).

@romulocintra romulocintra removed the requirements Issues related with MF requirements list label Feb 18, 2020
@mihnita mihnita added the requirements Issues related with MF requirements list label Sep 24, 2020
@aphillips
Copy link
Member

I have two comments on this:

  1. Adding ranges to plural cannot work for the grammatical reasons enumerated above. We have multiple selection for a reason: it allows a separation of concerns.
  2. There is a selector that can do range-like matching on numbers---it's called ChoiceFormat. Such a selector could be used to accomplish the original goal of this issue and I'll illustrate it below.

Before I do, though, I have to call out that choice format was used (abused) to do plurals before plural formats existed and using choice format correctly is ultra rare in the wild. Most folks frown on choice format as anything other than backward-compatibility. The documentation we wrote at Amazon said basically "call the I18N team if you think you need one of these" 😀

However, there are a very few valid use cases. To use CF correctly, you need a case where the form of the message varies based on an absolute value. @nbouvrette's example was "<10" triggering a message like "you are down to your last # chances". One of the examples I found in the wild was for driving directions, in which the UX designer wanted to change the formatting for distances under a specific amount (e.g. "In 11 miles..." vs. <10 miles says "In 9.6 miles..." (with a decimal part))

What this might look like in MF2:

match {$distance :choice} {$distance :plural}
when <10   =0   {You have arrived}
when <10  one   {You have {$distance :unit unit=mile skeleton=0.0} to go}
when <10  *     {You have {$distance :unit unit=mile skeleton=0.0} to go}
when *    one   {You have {$distance :unit unit=mile skeleton=0} to go}
when *    *     {You have {$distance :unit unit=mile skeleton=0} to go}

chair hat on:

I think we should retitle this to something more appropriate. There is an open question about what goes into the default registry (and even whether we have a default registry). Otherwise this looks like it might be addressed

@macchiati
Copy link
Member

I worry about having a 'choice' or range format because that was so subject to abuse. So I'd hesitate to make it one of the stock formatters.

I haven't read all of this thread, but a couple of notes:

  • CLDR has the ability to get the best plural form for a range (like 3-7) for different languages. That is not quite the same as what is being discussed here, but just a side note.
  • In the particular case you cite, probably a better choice would be to have 2 significant digits with 0.#, and then just {$distance :plural}

@aphillips
Copy link
Member

@macchiati I tend to agree. In fact, a better choice than pseudo-generic choice would be (familiar) comparison type operators, e.g.

match {$distance :lt value=10}
when true {_less than 10_}
when *    {_equal or more than 10_}

There is a separate issue (which I just now retitled) for plural selection for a formatting range such as 3-7.

The example uses {$distance ;plural} to allow grammatical selection of the pattern string. I got too cute by using measure unit formatting. The example is clearer if one writes:

match {$distance :choice} {$distance :plural}
when <10   =0   {You have arrived}
when <10  one   {You have {$distance :number skeleton=0.0} mile to go}
when <10  *     {You have {$distance :number skeleton=0.0} miles to go}
when *    one   {You have  {$distance :number skeleton=0} mile to go}
when *    *     {You have  {$distance :number skeleton=0} miles to go}

... and imagine e.g. Polish needs values like when <10 few, when * many usw.

@macchiati
Copy link
Member

I think you need a different example to make the case for :choice. Units are more complicated. In particular, unit support involves mapping the requested locale + unit + number + usage to formatted unit(s) and number(s). So you can have:

(en-US, 1.88, meter, person-height) => 6 ft 2 in
(xx, 1.88, meter, person-height) => 1.88 m
(yy, 1.88, meter, person-height) => 188 cm

(off-hand I don't recall which locales behave like xx and which like yy)

And the final unit also depends on the number, eg here are the current rules for roads:

https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml#L437

And because the final unit depends on the number, the gender of the result also then depends on the number.

So the message should look more like:

match {$distance :plural}
when 0    {You have arrived}
when one   {You have  {$distance :number skeleton=0} to go}
when *     {You have  {$distance :number skeleton=0} to go}

You actually only need 2 message variants for English for this case. Of course, it would expand to up to 7 message variants for some languages (assuming the zero option was kept.)

ICU also has grammatical case information, and gender for unit units, so a fuller example might be:

match {$distance :plural} {$distance :gender}
when 0     * {You have arrived}
when one * {You have  {$distance :number skeleton=0} to go}
when *     * {You have  {$distance :number skeleton=0} to go}

This would expand in languages with gender for units to be plural categories x unit genders, plus one for the 0 message variant.

@aphillips
Copy link
Member

@macchiati I think we're talking past each other? The point of the example was that a UX designer (and thus developer) might want to control/change presentation based on a specific value. That is, going from 10 miles (not 10.1 miles) to 9.6 miles (with one decimal precision) at an arbitrary cutoff of 10.0. There might be a different cutoff in the same message to switch to smaller units (miles=>feet, km=>m) at a smaller distance. These (erm) choices are not about grammatical or plural matching or related to a locale's preferences. It's helpful if we don't require developers to code this into business logic:

if (distance.units < 0.5) {
   pattern = rb.getString("shortDistanceLeftPattern");
} else if (distance.units < 10.0) {
   pattern = rb.getString("lessThan10UnitPattern");
} else {
   pattern = rb.getString("moreThan10UnitPattern");
}

So my point is: there may be a need for selectors based on value comparison (not just explicit match/equality) to choose between presentational variations in a message. And this isn't "choice format".

@aphillips
Copy link
Member

As mentioned in today's telecon (2023-09-18), closing old requirements issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requirements Issues related with MF requirements list
Projects
None yet
Development

No branches or pull requests

8 participants