Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

known_locale_names/0 doesn't seem to consider locales configured via Gettext #250

Open
andreyuhai opened this issue Jan 16, 2025 · 11 comments

Comments

@andreyuhai
Copy link

andreyuhai commented Jan 16, 2025

Context

Recently noticed that we were getting "No configured locale could be matched to [...]" errors when using best_match/2.

Even though we have Gettext locales configured for the locales we want best_match/2 to match against

iex(7)> ProfessionalProfiles.Cldr.known_gettext_locale_names()
["ar-SA", "bg-BG", "cs-CZ", "da-DK", "de-DE", "el-GR", "en", "en-CA", "en-GB",
 "es-ES", "fi-FI", "fr-CA", "fr-FR", "hr-HR", "hu-HU", "it-IT", "ja-JP",
 "lt-LT", "nb-NO", "nl-NL", "pl-PL", "pt-BR", "pt-PT", "ro-RO", "ru-RU",
 "sl-SI", "sv-SE", "uk-UA", "vi-VN"]

we are not seeing those picked up by known_locale_names/0

iex(8)> ProfessionalProfiles.Cldr.known_locale_names()
[:ar, :"ar-SA", :en, :"en-CA", :"en-GB", :fr, :"fr-CA", :pt, :"pt-PT"]

hence this fails

iex(11)> ProfessionalProfiles.Cldr.AcceptLanguage.best_match("pl-PL")
{:error,
 {Cldr.NoMatchingLocale, "No configured locale could be matched to \"pl-PL\""}}

Here is our backend configuration

defmodule ProfessionalProfiles.Cldr do
  use Cldr,
    default_locale: "en",
    gettext: ProfessionalProfiles.Gettext,
    locales: [],
    force_locale_download: false,
    providers: [],
    otp_app: :professional_profiles
end

We are using ex_cldr version 2.40.1.

@kipcole9
Copy link
Collaborator

kipcole9 commented Jan 17, 2025

@andreyuhai, thanks for the clear issue description. I know the source of the bug and will work on a fix over the weekend.

Basically, the configuration code is only matching the full locale (albeit in a case-insensitive way, and ignoring if "_" or "-" as the separator.

For example, you have a gettext locale ro-RO. There is no match for that in CLDR. It would be ro only - which is by definition the same as ro-RO (Romania having the largest population of native Romanian speakers).

Therefore I need to add matching code as follows:

If the Gettext language code (ie "ro") matches a locale in Cldr and the default territory of the Cldr locale is the same as Gettext territory "RO" then configure in Cldr the locale "ro".

I'm a bit surprised you aren't seeing compile time warnings though. Are you not seeing (for example):

note: The locale "ro-RO" is configured in the MyApp.Gettext gettext backend but is unknown to CLDR. It will not be used to configure CLDR but it will still be used to match CLDR locales to Gettext locales at runtime.

When you compile? Perhaps try a `mix compile --force" and check?

@andreyuhai
Copy link
Author

Thank you for the help @kipcole9 ! 🙇

I'm a bit surprised you aren't seeing compile time warnings though. Are you not seeing (for example):

I actually had it, I just missed it with all the debugging text printed.

note: The locales ["bg_BG", "cs_CZ", "da_DK", "de_DE", "el_GR", "es_ES", "fi_FI", "fr_FR", "hr_HR", "hu_HU", "it_IT", "ja_JP", "lt_LT", "nb_NO", "nl_NL", "pl_PL", "pt_BR", "ro_RO", "ru_RU", "sl_SI", "sv_SE", "uk_UA", "vi_VN"] are configured in the ProfessionalProfiles.Gettext gettext backend but are unknown to CLDR. They will not be used to configure CLDR but they will still be used to match CLDR locales to Gettext locales at runtime.

@kipcole9
Copy link
Collaborator

Excellent, that confirms my suspicion of the issue. For all of those locales, the region code is the default for the given language code. Which means in CLDR the region code is omitted.

I will fix this over the weekend and aim to have a new release by Monday morning your time.

@kipcole9
Copy link
Collaborator

Well I failed the Monday morning your time I'm afraid, travel got in the way - but I'll get it done ASAP.

@andreyuhai
Copy link
Author

That's fine no worries @kipcole9, I appreciate it! 🙇

@andreyuhai
Copy link
Author

andreyuhai commented Jan 20, 2025

One thing I am curious about though, would the solution to this also handle the case shown below?

iex(43)> Cldr.known_gettext_locale_names()
["pl-PL"]

iex(45)> Cldr.AcceptLanguage.best_match("pl-US") |> elem(1) |> IO.inspect(structs: false)
%{
  script: :Latn,
  extensions: %{},
  __struct__: Cldr.LanguageTag,
  locale: %{},
  backend: ProfessionalProfiles.Cldr,
  language: "pl",
  transform: %{},
  cldr_locale_name: :pl,
  requested_locale_name: "pl-US",
  canonical_locale_name: "pl-US",
  territory: :US,
  language_variants: [],
  gettext_locale_name: nil,  # Notice that it didn't map to a gettext locale because of gettext locale being "pl-PL"
  rbnf_locale_name: :pl,
  language_subtags: [],
  private_use: []
}

Basically mapping to the default locale of the language (in this case "pl-PL", because "pl" = "pl-PL") from "pl-US". I know that ex_cldr tries the inheritance chain so unless we had a gettext locale of "pl", it wouldn't map "pl-US" to anything except "pl" or the default locale, should it try mapping "pl" to "pl-PL"? (it does map actually, but not in all cases, please see the example below)

Also I am not sure whether the "pl" locale (without mentioning the country code where the language is spoken the most) just a CLDR thing or a broader rule that applies to gettext as well. I checked the manual here, but didn't see anything like that mentioned.


This actually matches to "pl-PL"

iex(10)> Cldr.AcceptLanguage.best_match("pl") |> elem(1) |> IO.inspect(structs: false)
%{
  script: :Latn,
  extensions: %{},
  __struct__: Cldr.LanguageTag,
  locale: %{},
  backend: ProfessionalProfiles.Cldr,
  language: "pl",
  transform: %{},
  cldr_locale_name: :pl,
  territory: :PL,
  canonical_locale_name: "pl",
  gettext_locale_name: "pl_PL",
  language_subtags: [],
  language_variants: [],
  private_use: [],
  rbnf_locale_name: :pl,
  requested_locale_name: "pl"
}

but this doesn't

iex(11)> Cldr.AcceptLanguage.best_match("pl-US") |> elem(1) |> IO.inspect(structs: false)
%{
  script: :Latn,
  extensions: %{},
  __struct__: Cldr.LanguageTag,
  locale: %{},
  backend: ProfessionalProfiles.Cldr,
  language: "pl",
  transform: %{},
  cldr_locale_name: :pl,
  territory: :US,
  canonical_locale_name: "pl-US",
  gettext_locale_name: nil,
  language_subtags: [],
  language_variants: [],
  private_use: [],
  rbnf_locale_name: :pl,
  requested_locale_name: "pl-US"
}

@kipcole9
Copy link
Collaborator

kipcole9 commented Jan 20, 2025

Not as currently planned. My thought is to check that the gettext locale is defined using the default region for the language and only then match on the CLDR locale name.

It's a reasonable discussion to decide if matching should also consider just the language component and therefore match pl alone. That would work because the Gettext pl-PL will become the CLDR pl.

The key idea is to limit the number of surprises. Given that Gettext allows arbitrary locale names there is a limit to what matching can do.

@kipcole9
Copy link
Collaborator

just a CLDR thing or a broader rule that applies to gettext as well

Gettext doesn't really have rules - its mostly based upon POSIX locales but only loosely - there is no formalisation.

CLDR on the other hand is rigorous in its definition and has a formal specification.

@andreyuhai
Copy link
Author

It's a reasonable discussion to decide if matching should also consider just the language component and therefore match pl alone. That would work because the Gettext pl-PL will become the CLDR pl.

This is actually what I'll do in our codebase to handle the matching when the region is different than the default region (e.g. "pl-US") and there's no locale defined for that. I don't know enough to say whether there'd be a problem if ex_cldr did that by default though.

@andreyuhai
Copy link
Author

Sorry for spamming here, regarding what I mentioned in #250 (comment), this actually works if you have the proper gettext configuration and locales have been downloaded for those.

iex(28)> ProfessionalProfiles.Cldr.AcceptLanguage.best_match("pl-US-u-ca-gregory-hc-h23-nu-latn") |> elem(1) |> IO.inspect(structs: false)
%{
  script: :Latn,
  extensions: %{},
  __struct__: Cldr.LanguageTag,
  locale: %{
    calendar: :gregorian,
    cf: nil,
    currency: nil,
    __struct__: Cldr.LanguageTag.U,
  },
  backend: ProfessionalProfiles.Cldr,
  language: "pl",
  transform: %{},
  cldr_locale_name: :pl,
  gettext_locale_name: "pl",
  territory: :US,
  requested_locale_name: "pl-US",
  canonical_locale_name: "pl-US-u-ca-gregory-hc-h23-nu-latn",
  language_variants: [],
  rbnf_locale_name: :pl,
  language_subtags: [],
  private_use: []
}

iex(29)> ProfessionalProfiles.Cldr.known_gettext_locale_names()
["ar-SA", "bg", "cs", "da", "de", "el", "en", "en-GB", "es", "fi", "fr", "hr",
 "hu", "id", "it", "ja", "lt", "nb", "nl", "pl", "pt", "pt-PT", "ro", "ru",
 "sl", "sv", "th", "uk", "vi", "zh-Hans"]

iex(30)> ProfessionalProfiles.Cldr.known_locale_names()
[:ar, :"ar-SA", :bg, :cs, :da, :de, :el, :en, :"en-GB", :es, :fi, :fr, :hr, :hu,
 :id, :it, :ja, :lt, :nb, :nl, :pl, :pt, :"pt-PT", :ro, :ru, :sl, :sv, :th, :uk,
 :vi, :zh, :"zh-Hans"]

Maybe I missed something or I just didn't know about it. Just wanted to mention here too.
So I won't need to manually map language tags like "pl-US" to "pl".

I just changed the structure of gettext dir to be something like

./apps/professional_profiles/priv/gettext
├── ar-SA
├── bg
├── cs
├── da
├── de
├── default.pot
├── el
├── en_GB
├── errors.pot
├── es
├── fi
├── fr
├── hr
├── hu
├── id
├── it
├── ja
├── lt
├── nb
├── nl
├── pl
├── pt
├── pt-PT
├── ro
├── ru
├── sl
├── sv
├── th
├── uk
├── vi
└── zh-Hans

previously it was all in the form of "ll_CC" where ll is language code and CC is country code.

@kipcole9
Copy link
Collaborator

Not spamming at all. And yes, that's the expected behaviour - if the gettext locale name matches a CLDR locale name then all should be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants