-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require opt-in for search collation datagen #2708
Comments
That is, without an opt-in, collations whose |
The solution could be to make the |
Actually I think On the Rust API we should replace the |
Is our CLI part of the stability guarantees? If no, I'm comfortable with |
Adding an extra flag wouldn't be a breaking change anyway. Where we really backed ourselves into a corner is on the Rust API side, where we cannot easily provide functions to generate lists of locales (as we do for keys), because they would depend on data and keys. |
When it comes to whether or not to include extension keywords, I think it makes the most sense to just add the configurations as new options in datagen. The fixed-size locale list allows us to perform smarter locale filtering (#834) and it should remain the primary entry point, although it may have been more appropriate to make it a list of LanguageIdentifier instead of Locale.
|
My question is how easy it will be to remove a flag when we generalize the solution and won't need |
The intent behind We already have |
I think this one isn't only about having some flag to omit the search collations but also having ergonomics such that the option that seems the most natural to pick doesn't generate data that we don't have a (proper) API for (search/searchjl). Managing exclusion of other collations that are often unwanted but don't have a clear line like "we don't have an API for these" is a harder datagen UI problem. |
(Regarding "legacy": Arguably all "traditional" collations are legacy, but in terms of data size and in terms of someone perhaps wishing to use them, they aren't on the same level of data size issue as |
I'm okay if we exclude |
To me, this seems like a good way forward. |
Currently, if you follow the data management tutorial, you end up with data for the search collations.
(Steps to repro: run
icu4x-datagen --cldr-tag latest --icuexport-tag latest --out my-data --format dir --all-keys --all-locales
and thenfind my-data/ | grep search
)This is bad, because we don't have a search API, so the search collations are in practice useless extra data.
For Web browsers that want to keep exposing the search collations via non-search API for compatibility with the present state of the Web, there should be some opt-in option to force the generation of search collation data. However, for just about everyone else, it's a data size bug to have the search collation data generated.
The text was updated successfully, but these errors were encountered: