-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom / pluggable "formatters" (beside Date/Time/Number ...) #22
Comments
Is this supposed to refer to a comment that the translator would implement, or some sort of extensible markup that a custom renderer would handle (and a default renderer would ignore)? |
@Fleker not sure, in the case of i18next the developer specifies that custom format code - so I guess the second of your assumption: markup that a custom renderer would handle (and a default renderer would ignore) (just I would replace the wording renderer with formatter as nothing gets rendered) |
Wouldn't HTML markup support encompass this already? Having function that might modify specific words arbitrarily isn't necessarily safe IMO since it might change the context of the sentence completely. |
We have a number of "custom" builtins in Fluent and a number of requests for more. Custom functions are fairly often environment specific. (we're talking about specifying their behavior better)
Firefox/Gecko has a lot of per-platform strings. They're either messages that have different value depending on the platform, or accesskeys that differ per-platform etc. let PLATFORM = () => {
switch (AppConstants.platform) {
case "linux":
case "android":
return AppConstants.platform;
case "win":
return "windows";
case "macosx":
return "macos";
default:
return "other";
}
};
new FluentBundle(locale, { functions: { PLATFORM }); and in Rust it looks like this: #[derive(Debug)]
#[repr(C)]
pub enum FluentPlatform {
Linux,
Windows,
Macos,
Android,
Other,
}
bundle.add_function("PLATFORM", |_args, _named_args| {
match crate::ffi::FluentBuiltInGetPlatform() {
FluentPlatform::Linux => "linux".into(),
FluentPlatform::Windows => "windows".into(),
FluentPlatform::Macos => "macos".into(),
FluentPlatform::Android => "android".into(),
FluentPlatform::Other => "other".into(),
}
}).expect("Failed to add a function to the bundle."); and then localizers can do: enable-password-sync-notification-message =
{ PLATFORM() ->
[windows] Want your logins everywhere you use { -brand-product-name }? Go to your { -sync-brand-short-name } Options and select the Logins checkbox.
*[other] Want your logins everywhere you use { -brand-product-name }? Go to your { -sync-brand-short-name } Preferences and select the Logins checkbox.
} navbar-tooltip-instruction =
.value = { PLATFORM() ->
[macos] Pull down to show history
*[other] Right-click or pull down to show history
} profiles-opendir =
{ PLATFORM() ->
[macos] Show in Finder
[windows] Open Folder
*[other] Open Directory
} findbar-highlight-all2 =
.label = Highlight All
.accesskey = { PLATFORM() ->
[macos] l
*[other] a
}
.tooltiptext = Highlight all occurrences of the phrase And in some locales, the localizers may not have a distinguish term for Preferences in macOS HiG, so they'd just do: enable-password-sync-notification-message = Want your logins everywhere you use { -brand-product-name }? Go to your { -sync-brand-short-name } Preferences and select the Logins checkbox. Example: source in en-US with custom selector, and Italian translation and Czech without it.
Fluent has the concept of terms that in some locales have genders and then can be used as selectors. Here's an example of a string in English: search-results-help-link = Need help? Visit <a data-l10n-name="url">{ -brand-short-name } Support</a> and equivalent in Czech: search-results-help-link =
Potřebujete pomoc? Navštivte <a data-l10n-name="url">Podporu { -brand-short-name.gender ->
[masculine] { -brand-short-name(case: "gen") }
[feminine] { -brand-short-name(case: "gen") }
[neuter] { -brand-short-name(case: "gen") }
*[other] aplikace { -brand-short-name }
}</a> (I'd prefer each variant to contain the whole sentence, but ignore that for the sake of this conversation).
key = { TONE() ->
[formal] ...
*[informal] ...
}
This one should be handled by
greetings = { TIME_OF_THE_DAY() ->
[morning] Good morning
*[other] Hello
}
floor-msg = { $level ->
[-2] on basement floor B{ $level }
[-1] on basement floor B
[0] on ground floor
[one] on floor 2
*[other] on floor { ADD($level, 1) }
}
photo-msg = { LIST($names) } liked your photo.
reload-desc = { SCREEN_WIDTH() ->
[narrow] Warn me before redirect or reload.
*[wide] Warn me when websites try to redirect or reload the page.
} Here we were playing with the idea of enabling responsive localization, much like responsive CSS today, which would allow localizers in locales where it matters (say, German, while Chinese don't need it) to specify different variants depending on the available space and let Fluent adapt: Demo video These are just examples from our production and issues filed in |
I think it depends on which problems we would like to solve. If our focus is on linguistic issues, then I do see the value for "formatters" (or cases), but I think it would be important to predefine them and even simplify them wherever possible. But this seems more of an inflection discussion. For example, you could use this syntax to automatically format "A or An" based on the value of the variables:
Now if we are talking about non-linguistic examples:
I think that if we don't use this type of feature to focus on linguistic problems, we might end up in the loss of translation memory leverage and higher translation cost. |
How would you resolve that via CSS? If the localizer needs capitalization of a word, how would they communicate it to the CSS?
Again, how would you resolve it in code logic if 90 locales don't need a per-platform selector and one locale does?
Similar to the previous one. How would you let locales that need variants provide them, without requiring all locales to provide them? |
You are right that it would be impossible for the linguist to do this on their end but, I see only 4 practical scenarios where this could apply:
<div class="what">You are <em>{what}</em>.</div> html[lang=en-gb] .what em {
text-transform: uppercase;
}
html[lang=en-us] h1 {
text-transform: capitalize;
}
But, I do think there is something to explore with capitalization. I would just recommend documenting good practical use cases upfront (and maybe even guidelines that we could re-use later in a document) before implementing such a feature, otherwise, it could also end up being used for the wrong reasons. |
But if I understand correctly, taking the platform example, you are saying that a platform (operating system) would have completely different behavior in 1 locale only? Do you have an example for this? Unless I missed something I could not find it in your original example and I'm having a hard time picture it. The way I see this, and I don't have a lot of personal experience with this scenario, but you would possibly have 1 string per platform when it comes to a platform related topic. For example:
showString(`opendir-{PLATFORM}`);
You are right, TMSes expect symmetrical input/output keys when translating strings which is why Fluent's Multi-variant Message can be quite powerful. The main challenges I see around it, as it works today (having little experience using the syntax):
<p class="narrow">{reload-desc-narrow}<p>
<p class="wide">{reload-desc-wide}<p> .wide {
display: none;
}
@media (min-width: 30rem) {
.wide {
display: initial;
}
.narrow {
display: none;
}
}
You could have potential repetition or unused strings in some languages, but the solution is simple and requires no markup while fitting nicely within existing TMSes with good translation memory leverage.
|
Sure.
one-off in locale X:
The result is that a localizer can select a different shortcut for a given platform if needed, without requiring developers to alter the code and/or instilling the burden of managing linux-specific string on all locales.
Your example requires that all locales provide all four strings so that some of them can use some of the variants.
The example solution you're providing requires all locales to provide narrow/wide variants, while only several may need it. I think our conversation boils down to an observation that drove Fluent design - the factors that impact ability to produce high quality translation of a translation unit differ per locale. Fluent is heavy on the latter side because we wanted to offer the flexibility while minimizing the "leaking" of complexity from one locale to another or from a locale to a developer. Historically at Mozilla we used an approach similar to the one you're giving and gettext also used that (if any locale needs a plural, all locales provide a plural). For example, it's very easy to see why pluralization is the only well-addressed variant selection mechanism if you observe that English has pluralization, but not declension, and gender cases are limited etc. Fluent uses a concept of separation of concerns - a developer should never have to make a decisions about the localization, and if there's any scenario which a localizer wants to solve for their locale, it should not impose any complexity for another. I recognize that this is a particular position and one can take another. I'd only argue that examples you provide are not really scalable and should not be considered a "solution". But I don't believe only linguistic issues should be solvable via localization system. Custom selectors and formatters provide functionality that is developer-independent and doesn't leak across locales. |
Thanks @zbraniecki for the extra context. I think it helps (at least for me) to understand better the strategy behind Fluent. There are still some areas for me that are not clear and I think we could be able to break down each approach into pros and cons to have a better picture. We are getting away from the original topic of this thread, I don't know if we should start a new one? Some observations so far, let's imagine we try to break this into 2 schools of thought: Focused on linguistic problems (just made up a name for the approach I was proposing)
Full flexibility (this is the best name I could come up with for Fluent)
But I'm still having a hard time understanding how can Fluent solve some of the issues in the method I was proposing? For example, the linguist needs to know what the Isn't it the same, or even maybe more complex than coming up ahead of time, knowing which platform you support and having 1 string for each when you author the string? If a developer adds a new platform, he should also remember to update the related strings. In a continuous localization setup, this would automatically trigger new localization requests.
I'm also curious about how this can fit in big commercial TMSes? do you have details on this or maybe you had something else in mind? To me, this is also a very important point to consider when talking about scalability. |
So, let's say I can't determine the platform, or I add a new one (maybe, opendir-web). What is the best translation out of the four to use as a default if a specific translation isn't available? For many reasons, the different platforms might have the optimal default translations. For example, font in Spanish is translated as either the historically-correct tipo (de letra) used by Macs, or Microsoft's Spanglish fuente. In a general purpose program, I'd probably go with fuente as the default even though I personally want to barf when I see it. But in a publishing program... the traditional is a more acceptable default. The programmer is not qualified to know which one is the best default, and besides the fact that there's no great way to put that kind of logic into code, the programmer really shouldn't have to worry about which strings have platform differences. That's the whole idea of Fluent: let the programmer focus on programmer stuff, and let the localizer focus on localizer stuff. Sure, it can make localization more complex, but as tools start developing around it, things should get much easier on the localizer for the complex stuff. I can already imagine there being a fairly extensive set of terms that handle some of the most common intra-language issues, and a tool automatically signalling the the translator that, for instance, they probably shouldn't hardcode ordenador, but suggest instead the term |
You are right Fluent is quite powerful (probably the most powerful syntax around). But as you mentioned the integration with existing tools is still in progress. One of the big challenges is that traditionally, source assets cannot be modified by translation management systems. This means that regardless of the syntax, the linguist will need to be involved during authoring, or there are major changes that will need to happen in existing tools. But even if you could modify the source to add new "variants" of a string - then how will the code use it? Unless it's fully self-contained and uses within other strings, there is some collaboration that needs to happen, regardless of the solution, to get optimal localization. The big challenge I see ahead is when you start mixing non-linguistic problems (like white labeling, OSes or even A/B testing) with a linguistic solution - where do we draw the line? |
The whole idea of a registry is basically addressing this issue. Closing this as "addressed". Open a new specific issue if you thing something is missing. |
Is there an issue for which the functions should be in the standard registry? That should go beyond what ICU has as part of MF1.0, eg CurrencyAmount (currency+number), Measure (unit+number+usage), etc. |
@macchiati The short answer is "no, I don't think so", although I think there is an issue that tracks MFv1 compatibility (#361 ) including this comment from me. We have an agenda item for 2023-07-03 to discuss the registry, starting with "will we have a standard one?" I believe that we should have a standard registry and that it should go beyond what is in MF1 to embrace other formatters. An important question is "what criteria should be applied to inclusion the standard registry?", since implementations would be required to provide the items in the standard registry with, presumably, the options specified there. There is some debate about options. @eemeli (and others) have expressed a desire to use JS's array of options. Others such as @mihnita and myself prefer skeletons for certain operations that have them. There exists mappings (and implementation support) to move between these and compromises (including allowing for implementation-specific extension, e.g. ICU4J might support skeletons as an extension to standardized option bags) Separately there are two add-on opportunities: (1) implementation-specific registry additions (e.g. ECMA-402 might add JS specific options to e.g. In any case, I'd like us to break out specific items rather than having a giant "registry" issue with everything in it 🙈 |
While having the build in formatters (numbers, dates, lists, relative dates) is awesome sometimes there is the need to define some custom format (simple as lowercasing, uppercasing, ...)
Example:
"You are {what, uppercase}."
i18next uses a function in form:
(value, format, lng, options) => value
to allow those (https://github.com/i18next/i18next/blob/master/src/defaults.js#L55)See previous comments:
The text was updated successfully, but these errors were encountered: