Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data generation part of intro needs more clarity about icuexport #2677

Open
hsivonen opened this issue Sep 28, 2022 · 3 comments
Open

Data generation part of intro needs more clarity about icuexport #2677

hsivonen opened this issue Sep 28, 2022 · 3 comments
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-small Size: One afternoon (small bug fix or enhancement) T-docs-tests Type: Code change outside core library

Comments

@hsivonen
Copy link
Member

Currently the text at https://github.com/unicode-org/icu4x/blob/main/docs/tutorials/intro.md#generating-data talks about CLDR and doesn't explain what ICU-exported data is. It should explain that ICU-exported data covers Unicode Database data and CLDR data for collation. That is, the reader shouldn't assume that all CLDR-originating data is bundled for ICU4X use via --cldr-tag.

@robertbastian
Copy link
Member

Do you think the new tutorial does this better?

How much do you think the user needs to know? I'd say anything beyond --icuexport-tag=latest is pretty advanced.

@hsivonen
Copy link
Member Author

hsivonen commented Oct 3, 2022

The new tutorial is much better. Thank you! Three points:

  1. It's not linked from https://github.com/unicode-org/icu4x/blob/main/docs/README.md
  2. The intro sentence is suggestive that icu_testdata might be appropriate for an app. Instead, it should probably more clearly make the case that icu_testdata is for demonstration only, since the set of locales is picked for the purpose of exercising interesting things.
  3. This is more of a datagen issue than a doc issue, but what the doc says results in generating data for the search collations. This is making the export uselessly larger than is useful, because we don't have a search API. In the absence of a search API, it's generally a bad idea to generate the search collation data. (Only a Web browser might want to have that data and only for theoretical compatibility with the oddity that search collations are at present exposed via a non-search API.)

@hsivonen
Copy link
Member Author

hsivonen commented Oct 3, 2022

Filed #2708 about the search collation data.

@sffc sffc self-assigned this Oct 17, 2022
@sffc sffc added T-docs-tests Type: Code change outside core library C-data-infra Component: provider, datagen, fallback, adapters S-small Size: One afternoon (small bug fix or enhancement) labels Oct 17, 2022
@sffc sffc added this to the ICU4X 1.x Untriaged milestone Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-small Size: One afternoon (small bug fix or enhancement) T-docs-tests Type: Code change outside core library
Projects
None yet
Development

No branches or pull requests

3 participants