-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-grained data filtering #953
Comments
Adding this to the backlog; when we have a client with a clear need for this, we should schedule work on this issue with that use case in mind. |
I'm pulling this up into the 1.3 milestone, since we are getting close to wanting to release components that need this type of data slicing. The easiest and most robust solution is to do what we've now done with Collator, Japanese Eras, Segmenter, and Locale Expander, which is to create multiple keys: one for core data and one for extended data. This has the advantage that it works automatically with data slicing without any additional infrastructure needed. A downside of this approach is that we need to define rigid boundaries between the core and extended data. Another downside is that if we need many levels of granularity, we risk hurting the performance of the resulting formatter, because each key needs to be checked separately for the required data. But, if we can establish a very good separation between core and extended, then this approach seems feasible. The two components that are coming up soon that need this are Currency Display Names and Locale Display Names. One way to make coarse slices for currency names would be, all currencies that are used in a particular locale get display names (all others fall back to ISO code). It's a bit less clear how to make the coarse slices for locale display names (language, script, region, variants, and extensions). Adding this to the discussion agenda. |
Discuss with: Optional: |
Auxiliary keys are implemented, and there is a follow-up in #3907 to add filtering for them. |
We'll track filtering here instead of in #3907 |
data_phases.md (#498) discusses the three phases of information: compile time, construction time, and format time. Currently, static data slicing (#948) is only capable of filtering based on the ResourceKey (compile time information). However, @iainireland has noted that it may be useful to filter ResourceOptions or data structs as well.
Some examples of potentially legitimate use cases:
Such fine-grained filtering is very tricky, because you risk removing data that has legitimate i18n value. For example, one might attempt to remove right-to-left support from an app launching in Spain, only to discover that there are peoples in Spain who communicate in the Hebrew alphabet. Or, you might attempt to remove the Buddhist calendar from an app launching in Oklahoma, only to discover that Oklahoma City is home to 9 Buddhist temples.
I believe the best path forward for fine-grained filtering in ICU4X is to sandbox decisions into specific flags. We should start by identifying the use cases, and then add flags corresponding to those use cases that retain high-quality i18n behavior.
This issue is to track the design and implementation of fine-grained data filtering in ICU4X.
The text was updated successfully, but these errors were encountered: