Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo might be unnecessarily large #242

Open
adamwight opened this issue Oct 26, 2024 · 6 comments
Open

Repo might be unnecessarily large #242

adamwight opened this issue Oct 26, 2024 · 6 comments

Comments

@adamwight
Copy link

There are excellent reasons to keep a full git history, but as a newcomer to the project I'm surprised to encounter a 1.1GB repository. It seems that much of this consists of binary, compiled changes to the CLDR data itself, under priv/.

Old versions can still be supported by preserving a release tarball of elixir-cldr for each historical CLDR release, for example. But is there a reason to keep this extreme history in the repo itself, still?

@kipcole9
Copy link
Collaborator

Yes, thats a fair observation. @Schultzer has done a ton of great work to build an improved test harness which I will integrate for the next release cycle in November (CLDR 46 will be out this coming week and while I've done basic integration testing I need to fix some regressions and redesign issues in ex_cldr_units first).

I'm very open to ideas on how to restructure the repo, as long as any running version of the code can download any individual locale appropriate for that release.

I don't think a single release tarball accomplishes that goal. But I will be the first to admit that my GitHub actions-fu is very poor. Its an area contributions will always be welcome.

@kipcole9
Copy link
Collaborator

I'm also a bit curious about your use case for the full repo. It not something that comes up often simply because it's only an issue for maintainers, and there's only three of us working on any development, and more than 90% of that is just me. You are of course welcome to have at it, I'm just curious.

@adamwight
Copy link
Author

I see this is explained in the readme,

If installed from github then all 571 locales are installed when the repo is cloned into your application deps.

Please feel free to close the issue! I think I understand better now: the github repo includes locale output targets which can be downloaded by binary-only installations of elixir-cldr. The project source and outputs could be kept in separate repos, but there are trade-offs either way so the status quo is a perfectly good arrangement.

DEVELOPMENT.md mentions that cldr itself uses git-lfs. Since the LFS tooling is needed anyway, I wonder if it would be helpful for this repo as well, eg. using it to clean up the upstream priv/ data so that it becomes optional to pull its history?

@adamwight
Copy link
Author

adamwight commented Oct 26, 2024

I'm also a bit curious about your use case for the full repo. It not something that comes up often simply because it's only an issue for maintainers, and there's only three of us working on any development, and more than 90% of that is just me. You are of course welcome to have at it, I'm just curious.

+1 I'm impressed by the huge effort that goes into maintenance of this very complete library, and I should be clear that I have no pressing use case at the moment other than curiosity. My comments here are more in the spirit of sharing newbie observations ("explain like I'm five" :-) ), not that I have some external project blocked on ex-cldr at all.

But I would be happy to share how I arrived here, making a bit of a nuisance! I only need the full repo because I'd like to get access to all languages. For day job I did a small investigation into how CLDR data might be used to improve an application which supports many hundreds of languages. The immediate use case would be to have an Elixir library which contained the correctly parsed core alphabets from known locales, now that I've discovered that the LDML format is nontrivial (sorry if this is an understatement ;-) ).

Other use cases which I don't have to support today, but which come to mind for my domain, are "let the user freely pick their interface locale from full database", and producing new transformations of the data which cut across locales eg. "dump the currency code names for all languages".

@kipcole9
Copy link
Collaborator

All good use cases! If you want to leverage the elixir-formatted complete CLDR data set then you can do something like this:

iex> config = %Cldr.Config{locales: :all}
%Cldr.Config{
  default_locale: "en-001",
  locales: :all,
  add_fallback_locales: false,
  backend: nil,
  gettext: nil,
  data_dir: "cldr",
  providers: nil,
  precompile_number_formats: [],
  precompile_transliterations: [],
  precompile_date_time_formats: [],
  precompile_interval_formats: [],
  default_currency_format: nil,
  otp_app: nil,
  generate_docs: true,
  suppress_warnings: false,
  message_formats: %{},
  force_locale_download: false,
  https_proxy: nil
}
iex> locales = Cldr.Locale.Loader.known_locale_names(config)
[:aa, :"aa-DJ", :"aa-ER", :ab, :af, :"af-NA", :agq, :ak, :am, :an, :ann, :apc,
 :ar, :"ar-AE", :"ar-BH", :"ar-DJ", :"ar-DZ", :"ar-EG", :"ar-EH", :"ar-ER",
 :"ar-IL", :"ar-IQ", :"ar-JO", :"ar-KM", :"ar-KW", :"ar-LB", :"ar-LY", :"ar-MA",
 :"ar-MR", :"ar-OM", :"ar-PS", :"ar-QA", :"ar-SA", :"ar-SD", :"ar-SO", :"ar-SS",
 :"ar-SY", :"ar-TD", :"ar-TN", :"ar-YE", :arn, :as, :asa, :ast, :az, :"az-Arab",
 :"az-Arab-IQ", :"az-Arab-TR", :"az-Cyrl", :"az-Latn", ...]
iex> for locale <- locales do 
...>   local_data_map = Cldr.Locale.Loader.get_locale(locale, config)
...>   # do something with the data ....
...> end

What I would probably do in your case is depend upon only ex_cldr and the use a combination of Cldr.validate_locale/2 combined with Cldr.Locale.Loader.get_locale/2 to get the data. That way you have the full CLDR locale name resolution to the most appropriate CLDR locale data.

@kipcole9
Copy link
Collaborator

Always great to have fresh eyes on the project - it's a huge topic before one even gets to implementation of code. And there is always room for much improvement. So please do keep the comments/suggestions coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants