Repo might be unnecessarily large #242

adamwight · 2024-10-26T21:51:12Z

There are excellent reasons to keep a full git history, but as a newcomer to the project I'm surprised to encounter a 1.1GB repository. It seems that much of this consists of binary, compiled changes to the CLDR data itself, under priv/.

Old versions can still be supported by preserving a release tarball of elixir-cldr for each historical CLDR release, for example. But is there a reason to keep this extreme history in the repo itself, still?

The text was updated successfully, but these errors were encountered:

kipcole9 · 2024-10-26T22:07:56Z

Yes, thats a fair observation. @Schultzer has done a ton of great work to build an improved test harness which I will integrate for the next release cycle in November (CLDR 46 will be out this coming week and while I've done basic integration testing I need to fix some regressions and redesign issues in ex_cldr_units first).

I'm very open to ideas on how to restructure the repo, as long as any running version of the code can download any individual locale appropriate for that release.

I don't think a single release tarball accomplishes that goal. But I will be the first to admit that my GitHub actions-fu is very poor. Its an area contributions will always be welcome.

kipcole9 · 2024-10-26T22:15:07Z

I'm also a bit curious about your use case for the full repo. It not something that comes up often simply because it's only an issue for maintainers, and there's only three of us working on any development, and more than 90% of that is just me. You are of course welcome to have at it, I'm just curious.

adamwight · 2024-10-26T22:27:11Z

I see this is explained in the readme,

If installed from github then all 571 locales are installed when the repo is cloned into your application deps.

Please feel free to close the issue! I think I understand better now: the github repo includes locale output targets which can be downloaded by binary-only installations of elixir-cldr. The project source and outputs could be kept in separate repos, but there are trade-offs either way so the status quo is a perfectly good arrangement.

DEVELOPMENT.md mentions that cldr itself uses git-lfs. Since the LFS tooling is needed anyway, I wonder if it would be helpful for this repo as well, eg. using it to clean up the upstream priv/ data so that it becomes optional to pull its history?

adamwight · 2024-10-26T22:47:44Z

I'm also a bit curious about your use case for the full repo. It not something that comes up often simply because it's only an issue for maintainers, and there's only three of us working on any development, and more than 90% of that is just me. You are of course welcome to have at it, I'm just curious.

+1 I'm impressed by the huge effort that goes into maintenance of this very complete library, and I should be clear that I have no pressing use case at the moment other than curiosity. My comments here are more in the spirit of sharing newbie observations ("explain like I'm five" :-) ), not that I have some external project blocked on ex-cldr at all.

But I would be happy to share how I arrived here, making a bit of a nuisance! I only need the full repo because I'd like to get access to all languages. For day job I did a small investigation into how CLDR data might be used to improve an application which supports many hundreds of languages. The immediate use case would be to have an Elixir library which contained the correctly parsed core alphabets from known locales, now that I've discovered that the LDML format is nontrivial (sorry if this is an understatement ;-) ).

Other use cases which I don't have to support today, but which come to mind for my domain, are "let the user freely pick their interface locale from full database", and producing new transformations of the data which cut across locales eg. "dump the currency code names for all languages".

kipcole9 · 2024-10-26T22:58:04Z

All good use cases! If you want to leverage the elixir-formatted complete CLDR data set then you can do something like this:

iex> config = %Cldr.Config{locales: :all}
%Cldr.Config{
  default_locale: "en-001",
  locales: :all,
  add_fallback_locales: false,
  backend: nil,
  gettext: nil,
  data_dir: "cldr",
  providers: nil,
  precompile_number_formats: [],
  precompile_transliterations: [],
  precompile_date_time_formats: [],
  precompile_interval_formats: [],
  default_currency_format: nil,
  otp_app: nil,
  generate_docs: true,
  suppress_warnings: false,
  message_formats: %{},
  force_locale_download: false,
  https_proxy: nil
}
iex> locales = Cldr.Locale.Loader.known_locale_names(config)
[:aa, :"aa-DJ", :"aa-ER", :ab, :af, :"af-NA", :agq, :ak, :am, :an, :ann, :apc,
 :ar, :"ar-AE", :"ar-BH", :"ar-DJ", :"ar-DZ", :"ar-EG", :"ar-EH", :"ar-ER",
 :"ar-IL", :"ar-IQ", :"ar-JO", :"ar-KM", :"ar-KW", :"ar-LB", :"ar-LY", :"ar-MA",
 :"ar-MR", :"ar-OM", :"ar-PS", :"ar-QA", :"ar-SA", :"ar-SD", :"ar-SO", :"ar-SS",
 :"ar-SY", :"ar-TD", :"ar-TN", :"ar-YE", :arn, :as, :asa, :ast, :az, :"az-Arab",
 :"az-Arab-IQ", :"az-Arab-TR", :"az-Cyrl", :"az-Latn", ...]
iex> for locale <- locales do 
...>   local_data_map = Cldr.Locale.Loader.get_locale(locale, config)
...>   # do something with the data ....
...> end

What I would probably do in your case is depend upon only ex_cldr and the use a combination of Cldr.validate_locale/2 combined with Cldr.Locale.Loader.get_locale/2 to get the data. That way you have the full CLDR locale name resolution to the most appropriate CLDR locale data.

kipcole9 · 2024-10-26T22:58:53Z

Always great to have fresh eyes on the project - it's a huge topic before one even gets to implementation of code. And there is always room for much improvement. So please do keep the comments/suggestions coming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo might be unnecessarily large #242

Repo might be unnecessarily large #242

adamwight commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

adamwight commented Oct 26, 2024

adamwight commented Oct 26, 2024 •

edited

Loading

kipcole9 commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

Repo might be unnecessarily large #242

Repo might be unnecessarily large #242

Comments

adamwight commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

adamwight commented Oct 26, 2024

adamwight commented Oct 26, 2024 • edited Loading

kipcole9 commented Oct 26, 2024

kipcole9 commented Oct 26, 2024

adamwight commented Oct 26, 2024 •

edited

Loading