Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with icu_segmenter and ICU4X_DATA_DIR #4489

Closed
sffc opened this issue Dec 22, 2023 · 2 comments · Fixed by #4510
Closed

Problem with icu_segmenter and ICU4X_DATA_DIR #4489

sffc opened this issue Dec 22, 2023 · 2 comments · Fixed by #4510
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal T-bug Type: Bad behavior, security, privacy

Comments

@sffc
Copy link
Member

sffc commented Dec 22, 2023

bakeddata-scripts/main.rs has the following line:

        if component == "segmenter" {
            // segmenter uses hardcoded locales internally, so fallback is not necessary.
            driver.clone().with_fallback_mode(FallbackMode::Hybrid)
        } else {
            driver.clone()
        }

However, when customers generate custom compiled data via a manual invocation of icu4x-datagen and then use it via ICU4X_DATA_DIR, they don't get this condition, and end up generating baked segmenter data that uses fallback. This is exposed via the following compile error:

failed to resolve: could not find `locid_transform` in `icu`
could not find `locid_transform` in `icu`
segmenter_dictionary_w_auto_v1.rs.data(2, 5246158): Actual error occurred here
mod.rs(19, 1): consider importing this module: `use icu_locid_transform::fallback;

@robertbastian

@sffc sffc added C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal T-bug Type: Bad behavior, security, privacy labels Dec 22, 2023
@robertbastian
Copy link
Member

Yeah I noticed this in Google3 import as well. We should special-case the segmenter keys in datagen to disable fallback. Could do this in a patch release, otherwise in 2.0 we'll use aux keys anyway.

@sffc
Copy link
Member Author

sffc commented Jan 4, 2024

I'm in favor of making a special case for this in datagen in 1.5 as long as it isn't too much work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal T-bug Type: Bad behavior, security, privacy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants