Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add textInfo / script metadata API to determine directionality of locale #3172

Closed
hydroper opened this issue Mar 5, 2023 · 11 comments · Fixed by #3474
Closed

Add textInfo / script metadata API to determine directionality of locale #3172

hydroper opened this issue Mar 5, 2023 · 11 comments · Fixed by #3474
Assignees
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality U-ecma402 User: ECMA-402 compatibility

Comments

@hydroper
Copy link

hydroper commented Mar 5, 2023

I searched for "dir", "direction", "ltr" and "rtl" in the icu crate docs and found nothing. I'm looking for something like EcmaScript's Intl.Locale.prototype.textInfo.direction. Is it available anywhere?

@hydroper
Copy link
Author

hydroper commented Mar 7, 2023

I did this for the meanwhile:

/// Represents a language's reading direction.
#[derive(Copy, Clone, PartialEq)]
pub enum TextDirection {
    Ltr,
    Rtl,
}

pub trait HasTextDirection {
    fn text_direction(&self) -> TextDirection;
}

impl HasTextDirection for locale::Locale {
    fn text_direction(&self) -> TextDirection {
        self.id.text_direction()
    }
}

impl HasTextDirection for locale::LanguageIdentifier {
    fn text_direction(&self) -> TextDirection {
        self.language.text_direction()
    }
}

impl HasTextDirection for locale::subtags::Language {
    fn text_direction(&self) -> TextDirection {
        match self.as_str() {
            | "ar" | "ara"
            | "arc"
            | "az" | "aze"
            | "dv" | "div"
            | "he" | "heb"
            | "ku" | "kur"
            | "fa" | "per" | "fas"
            | "ur" | "urd"
                => TextDirection::Rtl,
            _ => TextDirection::Ltr,
        }
    }
}

@robertbastian
Copy link
Member

robertbastian commented Mar 9, 2023

This is not currently available in ICU4X.

Note that the text direction is a property of the script, not the language. Hence your implementation should look more like:

impl HasTextDirection for locale::Locale {
    fn text_direction(&self) -> TextDirection {
        let maximized = self.clone();
        LocaleExpander::try_new_unstable(...).unwrap().maximize(&mut maximized);
        self.id.script.unwrap().text_direction()
    }
}

impl HasTextDirection for locale::subtags::Script {
    fn text_direction(&self) -> TextDirection {
        match self.as_str() {
            "Adlm" | "Arab" | "Armi" | "Avst" | "Chrs" | "Cprt" | "Elym" | "Hatr" | "Hebr" | "Hung" | "Khar" | "Lydi" | "Mand" | "Mani" | "Mend" | "Merc" | "Mero" | "Narb" | "Nbat" | "Nkoo" | "Orkh" | "Ougr" | "Palm" | "Phli" | "Phlp" | "Phnx" | "Prti" | "Rohg" | "Samr" | "Sarb" | "Sogd" | "Sogo" | "Syrc" | "Thaa" | "Yezi"
                => TextDirection::Rtl,
            _ => TextDirection::Ltr,
        }
    }
}

The list of scripts is looked up in https://github.com/unicode-org/cldr-json/blob/main/cldr-json/cldr-core/scriptMetadata.json

@zbraniecki
Copy link
Member

In unic-locale I generated the following table: https://github.com/zbraniecki/unic-locale/blob/master/unic-langid-impl/src/layout_table.rs using the following script: https://github.com/zbraniecki/unic-locale/blob/master/unic-langid-impl/src/bin/generate_layout.rs

It looks through all locales and finds languages that consistently have the same script and marks them for RTL, and then Scripts that are always RTL.

@markusicu
Copy link
Member

The way to do this properly:

  1. Get the script subtag for the locale ID
    1. If the locale ID has a script subtag, return that
    2. Else, use the likely subtags to determine the script subtag and return it
    3. If there is no likely-subtags data for the locale ID, then return Zzzz (unknown script)
  2. Return whether the script is LTR or RTL

@sffc sffc changed the title Locale direction? Add script metadata API to determine directionality of locale Mar 10, 2023
@sffc
Copy link
Member

sffc commented Mar 10, 2023

Thanks @hydroper for raising this issue and everyone in this thread for the pointers. I changed the title of this issue to focus on the feature we don't currently have (map from script to text direction).

@sffc sffc added good first issue Good for newcomers T-core Type: Required functionality C-unicode Component: Props, sets, tries S-medium Size: Less than a week (larger bug fix or enhancement) U-ecma402 User: ECMA-402 compatibility labels Mar 10, 2023
@sffc sffc changed the title Add script metadata API to determine directionality of locale Add textInfo / script metadata API to determine directionality of locale Mar 10, 2023
@sffc sffc added the help wanted Issue needs an assignee label Mar 30, 2023
@robertbastian robertbastian added the discuss Discuss at a future ICU4X-SC meeting label May 16, 2023
@sffc sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels May 25, 2023
@sffc
Copy link
Member

sffc commented May 25, 2023

Discuss with:

@robertbastian
Copy link
Member

Need to discuss location:

  • It's somewhat an operation on a locale, but the name of the locale-with-data crate is icu_locid_transform, where it doesn't really fit in
    • We will need to call maximize, which is in this crate
  • Exemplar character are in icu_properties, so it could be there

@robertbastian
Copy link
Member

^ I think because it's tightly coupled with likely subtags, locid_transform is the right place.

@sffc
Copy link
Member

sffc commented Jun 2, 2023

Previous discussion on the crate naming: #2214 (comment)

"transform" doesn't describe this functionality, but it fits in with the "locale with data" nature of the crate. Names like icu_locid_manipulation and icu_locid_adapters were floated, but they didn't get enough support, so we went with icu_locid_transform.

In addition to directionality, here are some other locale-centric data operations that will need a home:

  • Locale display names (currently in its own experimental crate)
  • Locale preferences for things like measurement system, currency, time zone

I lean toward putting this in its own new experimental crate. Even if we put it in icu_locid_transform, we should let it incubate behind an experimental Cargo feature for a while.

@skius
Copy link
Member

skius commented Jun 6, 2023

Given that we currently need private methods of LocaleExpander (to get the likely script), I think it should stay in the same crate.

@sffc
Copy link
Member

sffc commented Jun 8, 2023

  • @robertbastian - I think we should use crates to manage dependencies. None of these data-driven locale algorithms add depdencies; just data. icu_datetime is an example of a crate with a lot of loosely related algorithms and data in one big crate.
  • @sffc - I can see an argument to add at least the directionality stuff to locid_transform: it is required for the Intl Locale Info proposal. But I'm hesitant to add display names because it is very heavy in data, and I think icu_locid_transform should be a fairly lightweight crate because I expect it to be widely used.
  • @younies - It's very confusing to have the crate name not match the APIs that are in the crate.
  • @robertbastian - We could rename the crate in 2.0
  • @echeran - I think the crate name can live until 2.0
  • @skius - It seems like the related functionality should stay together
  • @younies - Naming is very important; we don't want to repeat mistakes
  • @sffc - I agree on the importance of naming. I also don't think we should use icu_datetime as a prior art; everything in that crate shares code paths. These are very separate pieces of functionality. Could we make a new crate called icu_locid_metadata?
  • @robertbastian - I can be convinced that display names should be a separate crate. But I don't want to have a bunch of tiny crates for tiny pieces of functionality
  • @echeran - We should think about formalizing the guidelines about how to draw lines between what goes into different crates.
  • @skius - icu_locid_metadata makes sense as the crate for expander and directionality.
  • @sffc - This crate discussion will come up again with number formatting.

Conclusion:

  • Land this right now as experimental in icu_locid_transform
  • Plan to make directionality non-experimental before 2.0 in the icu_locid_transform crate
  • In 2.0, make a new crate called icu_locid_metadata and stabilize at least the directionality code and likely the expander/canonicalizer into that new crate

LGTM: @sffc @robertbastian @echeran @skius @younies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality U-ecma402 User: ECMA-402 compatibility
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants