-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support extended plural forms ("range based") #21
Comments
Asking localizers to pick the ranges may not work great for quality.
|
I think that for extended categories, it could be up to the author to define them. They would also probably have to take precedence over the CLDR ones when used, regardless of the target language. This is also why having some sort of "plural conversion" API or tool could make this type of functionality easier to use while keeping good quality. |
Then you ask a developer to pick numerical categories that work for every locale in the World. |
I feel we might be talking about different things - let me try to explain with examples to see if I am missing anything: An author writes the following string in
Whatever toolset we would have could know that this string is valid because it contains the two english plural categories: You want to translate in Simplified Chinese, which requires only the
Since it actually does not use plural you could even use a
To have this flexibility, this means that you would have the right tools, and possibly have linguists be familiar with the syntax, and translate raw syntax. This would work the same way for languages that have more plural forms like Arabic. Let me know if you see some examples where this would not work. |
How would your example be translated to languages with |
The way I see this (unless I misunderstand your question), the plural categories per language would not change - they would still have the same values for each language. The range extension would only be applied on top of the plural categories. This means, if you set ranges, you try to apply them first (which is why the message should be different than plurals messages). If a range does not apply to a given value, it would then fall back on the plural rules for that language. Translations would work the same way. If you have Do we have examples where ranges would not work in certain languages? Also, I would expect ranges to be optional, just like explicit arguments. I see ranges the equivalent or "advanced explicit arguments" if this make sense. |
I think here u're assuming the word
you'd have explode that range out so translators can provide the correct form for, literally every rule in that range right? And that also has other assumptions structurally as well |
I'm quite familiar with the different plural forms from different languages and I'm still not 100% sure I understand what you are trying to explain. Maybe you are picturing scenarios where someone would concatenate a string with plurals? If I may try to illustrate:
In Serbian, this would become:
In most cases, you will have to remove the concatenation from the source. Even better, start using a full sentence in the source as well. This can be defined as one of our best practices, or we even @mihnita suggested enforcing this. But I'm not sure I understand why extending plurals (or select) like this would require anyone to explode ranges? I think the proposal here is not to literally use the range as values (as defined in the plural rules) but just as an additional explicit selector to allow for extra fluency. |
A range can span multiple plural rules though, so 1 message to capture that wouldn't be correct. It'd have to be.
At that point idk if this is super useful @mihnita's example is exploding the combination out. |
@longlho ranges are optional - if a range makes no sense in another language just don't define it it there: // english
{ count , plural ,
=0 {No candy left}
one {Got # candy left}
<10 {Got a few candies left}
10-20 {Got a handful candies left}
other {Got # candies left} }
// german
{ count , plural ,
=0 {Keine Süssigkeiten übrig}
one {Eine Süssigkeit übrig}
<10 {Wenige Süssigkeiten übrig}
other {# Süssigkeiten übrig} } Might not be simple for translators - but this was asked a lot from developers using i18next |
I think I might have finally understood @longlho's concern after giving this more thought. Let's presume you use this syntax as the source:
Now, this could be a problem in some language since you are using back the variable
I think this discussion is starting to sound a bit like this other one with @zbranieck. The more flexibility you provide, the trickier it is to ensure that the output will keep good linguistic quality both at the authoring and translation stages. |
hmm I don't think that's it. The word @jamuhl how do you make sure IMO this feature brings too many footguns for devs to make assumptions on how to declare their source string. |
@longlho my opinion - you can't enforce everything...should be part of the QA to review content in context (not always possible) but just removing a feature because you can't enforce something is like stopping writing javascript as you can't enforce your code is 100% correct. |
I'd at least not encourage that behavior by adding it to a standard. Its value is still rather questionable IMO since I haven't seen a need for it at Yahoo, Dropbox & react-intl |
I'm ok with that (it's for sure not the most needed feature) -> just adding it afterward outside of the standard will be a lot harder. Just a fast search on i18next: |
I think the rule of thumb should be that 'knobs' in the localizable message should be there ONLY if they are required for linguistic reasons. We should not use them in order to move stuff out of the code. Unless one can show that "in language X the range 1-10 should use 'handful', but language Y would do that for 1-8" then this is not a linguistic requirement. If the developer whats something like the example, they should create 3 different messages and select them in code. I've seen (for real!) stiff like this:
This is terrible, and has nothing to do with localization. And the only thing that might be OK is this case from localization side would be a way to group messages. |
I agree with @mihnita . See #21 (comment) |
You also convinced me :) in this same spirit, should we even allow |
If we allow for groups and custom (developer provided) formatters then yes, I think it was there in ICU just because there was no clean way to do custom formatters. The only reasons to keep it (that I can think of):
|
I don't know exactly what you have in mind for this, but it sounds a bit like how Fluent handles formatting. If we go that route then it will be important to bridge the link between the formatters and the translation. For example, if you change code that impacts a formatter, how do you make sure the linguist is aware and can update the translation if needed? Also the risk there, just like the |
That is definitely a risk. But if we don't have a standard way to do it (with the proper warnings and restrictions) then developers will find hacky ways. I've seen |
I have two comments on this:
Before I do, though, I have to call out that choice format was used (abused) to do plurals before plural formats existed and using choice format correctly is ultra rare in the wild. Most folks frown on choice format as anything other than backward-compatibility. The documentation we wrote at Amazon said basically "call the I18N team if you think you need one of these" 😀 However, there are a very few valid use cases. To use CF correctly, you need a case where the form of the message varies based on an absolute value. @nbouvrette's example was "<10" triggering a message like "you are down to your last # chances". One of the examples I found in the wild was for driving directions, in which the UX designer wanted to change the formatting for distances under a specific amount (e.g. "In 11 miles..." vs. What this might look like in MF2:
chair hat on: I think we should retitle this to something more appropriate. There is an open question about what goes into the default registry (and even whether we have a default registry). Otherwise this looks like it might be addressed |
I worry about having a 'choice' or range format because that was so subject to abuse. So I'd hesitate to make it one of the stock formatters. I haven't read all of this thread, but a couple of notes:
|
@macchiati I tend to agree. In fact, a better choice than pseudo-generic
There is a separate issue (which I just now retitled) for plural selection for a formatting range such as 3-7. The example uses
... and imagine e.g. Polish needs values like |
I think you need a different example to make the case for :choice. Units are more complicated. In particular, unit support involves mapping the requested locale + unit + number + usage to formatted unit(s) and number(s). So you can have: (en-US, 1.88, meter, person-height) => 6 ft 2 in (off-hand I don't recall which locales behave like xx and which like yy) And the final unit also depends on the number, eg here are the current rules for roads: https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml#L437 And because the final unit depends on the number, the gender of the result also then depends on the number. So the message should look more like:
You actually only need 2 message variants for English for this case. Of course, it would expand to up to 7 message variants for some languages (assuming the zero option was kept.) ICU also has grammatical case information, and gender for unit units, so a fuller example might be:
This would expand in languages with gender for units to be plural categories x unit genders, plus one for the 0 message variant. |
@macchiati I think we're talking past each other? The point of the example was that a UX designer (and thus developer) might want to control/change presentation based on a specific value. That is, going from if (distance.units < 0.5) {
pattern = rb.getString("shortDistanceLeftPattern");
} else if (distance.units < 10.0) {
pattern = rb.getString("lessThan10UnitPattern");
} else {
pattern = rb.getString("moreThan10UnitPattern");
} So my point is: there may be a need for selectors based on value comparison (not just explicit match/equality) to choose between presentational variations in a message. And this isn't "choice format". |
As mentioned in today's telecon (2023-09-18), closing old requirements issues. |
It would be great to define beside the exact number match cases
=0
some range cases - so a phrase likeOnly a few items are left!
can be defined beside the other options.Example (not definitive syntax!):
Note:
Never, had the need myself but was requested often enough from developers using i18next (To the point we added finally support for them).
See previous comments:
The text was updated successfully, but these errors were encountered: