Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supply plurals forms header to Gettext.Plural callbacks #343

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

maennchen
Copy link
Member

@maennchen maennchen commented Jan 15, 2023

Closes #318

This makes it possible to react precisely to the header instead of using predefined formats.

@coveralls
Copy link

coveralls commented Jan 15, 2023

Pull Request Test Coverage Report for Build 7377b2df5447c0d4d47b3ea46c8da3ff2a7f9220-PR-343

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 44 of 47 (93.62%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 89.538%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lib/gettext/plural.ex 23 26 88.46%
Totals Coverage Status
Change from base Build 87299828fe3702d33ac06fb504124b74efb80259: 0.06%
Covered Lines: 582
Relevant Lines: 650

💛 - Coveralls

lib/gettext/merger.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved
@whatyouhide
Copy link
Contributor

shift work into compile time to possibly improve the performance

If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?

@whatyouhide
Copy link
Contributor

@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following 😄

@maennchen
Copy link
Member Author

@whatyouhide

If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?

That highly depends on the implementation of that behaviour. I know that I can make it faster in ExpoPluralForms.

That description should probably be worded differently.

@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following smile

Sure :)

The purpose of this PR is that the plural form header is passed into Gettext.Plural implementations.

Therefore implementations can make decisions based on that header.

This will allow it to implement an ExpoPluralForms implementation of the behaviour.

Current behaviour

Gettext has a list of known locales for which plural is implemented. Since that list identifies rule by the locale name, the index is looked up by locale & count.

ExpoPluralForms behaviour

For various languages, there's not a single way to represent plural rules.

For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0; & nplurals=2; plural=(n > 1);)

Also in my own language (swiss german) there's some debate of how to handle 0 counts and depending on the dialect, either new singular case (0) or many (1) applies.

Those rules & differences can be represented using the plural forms header.

By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.

@maennchen
Copy link
Member Author

@whatyouhide Performance comparison (extended from https://github.com/elixir-gettext/expo_plural_forms/pull/33#issuecomment-1383212655) to illustrate what would happen if the header was parsed on every index call:

Mix.install([{:benchee, "~> 1.1"}, {:expo_plural_forms, path: "."}])

{:ok, arabic} = ExpoPluralForms.Known.plural_form("ar")

defmodule Arabic do
  def index(n), do: unquote(ExpoPluralForms.compile_index(arabic))
end

Benchee.run(
  %{
    "precompiled" => fn ->
      Arabic.index(17)
    end,
    "without" => fn ->
      ExpoPluralForms.index(arabic, 17)
    end,
    "including parse" => fn ->
      {:ok, arabic} = ExpoPluralForms.parse(" nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);")
      ExpoPluralForms.index(arabic, 17)
    end
  },
  time: 10,
  memory_time: 2
)

Result:

Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.14.2
Erlang 25.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking including parse ...
Benchmarking precompiled ...
Benchmarking without ...

Name                      ips        average  deviation         median         99th %
precompiled           12.75 M       78.44 ns ±11373.62%          72 ns         100 ns
without                4.89 M      204.65 ns ±17541.84%         173 ns         348 ns
including parse        0.35 M     2833.62 ns   ±846.85%        2546 ns        4091 ns

Comparison: 
precompiled           12.75 M
without                4.89 M - 2.61x slower +126.21 ns
including parse        0.35 M - 36.12x slower +2755.17 ns

Memory usage statistics:

Name               Memory usage
precompiled                 0 B
without                     0 B - 1.00x memory usage +0 B
including parse          9928 B - ∞ x memory usage +9928 B

**All measurements for memory usage were the same**

@josevalim
Copy link
Contributor

By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.

I think this is approaching the problem based on the current solution but, given the limitations you added, I would say the problem needs to be reworked altogether. Likely we want to compile the rules encoded in the header as part of the backend compilation?

When you say the PluralForms header, is that a header for the .po file or specific to a given translation/msgid?

@maennchen
Copy link
Member Author

@josevalim The plural forms header is in the .po header and not per message.

The idea with compile_plural was to compile the rule once per backend and call it from there.

Based on the discussion over in ExpoPluralForms (performance), I think it would make more sense to go a plug like approach. There could be some kind of init callback to do preparation works like header parsing and then a normal function call with the init result to calculate the index.

I'll put the PR back into draft state and implement a cleaner solution.

@maennchen maennchen marked this pull request as draft January 15, 2023 20:40
@josevalim
Copy link
Contributor

For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0; & nplurals=2; plural=(n > 1);)

and how do you determine which one to use? :)

@maennchen
Copy link
Member Author

@josevalim The user of the library can decide which rule to use when he sets the header in the .po file.

Many tools like Poedit help a user setting up the headers as well.

We have an example of that here in the source:

"Plural-Forms: nplurals=2; plural=(n != 1);\n"

@whatyouhide
Copy link
Contributor

Yeah, I think the Plural-Form header in each PO file is pretty neat. I think compiling the Plural-Forms header once is a no-brainer, because we can parse it into an Elixir data structure really easily. I think "precompiled" vs "without" in your benchmarks show that "without" is probably a good compromise between simplicity and speed.

@maennchen maennchen force-pushed the jm/extend_plural_interface branch 4 times, most recently from c22ce54 to 91bb61d Compare January 15, 2023 21:39
@maennchen maennchen marked this pull request as ready for review January 15, 2023 21:40
@maennchen
Copy link
Member Author

@josevalim / @whatyouhide I pushed a new version that is a lot simpler and does not cause any changes in the behaviour. Only a new callback is added.

@maennchen maennchen force-pushed the jm/extend_plural_interface branch 2 times, most recently from 14a44fb to 26427fe Compare January 15, 2023 22:19
@maennchen
Copy link
Member Author

maennchen commented Jan 15, 2023

@maennchen maennchen force-pushed the jm/extend_plural_interface branch from 26427fe to 41e16f2 Compare January 16, 2023 10:38
@@ -434,10 +467,22 @@ defmodule Gettext.Plural do
"sk"
]

@doc false
def init(%{locale: locale, plural_forms_header: plural_forms_header}) do
case read_plural_forms_from_headers(plural_forms_header) do
Copy link
Member Author

@maennchen maennchen Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here to keep gettext.merge --plural-forms 7 working.

It is however not consistent internally. I could for example merge with --plural-forms 1 in a language that has 2+ plural forms.

When using ngettext, the correct plural forms would not be defined and therefore always result in a missing translation.


As an alternate solution, we could remove the option entirely and make the Gettext.Plural implementation only work on known formats. If a user wants to add / change a language, he would need to add his own implementation as decribed here:

defmodule MyApp.Plural do

Even though I personally like option 2 a lot more, I decided to go with the solution that does not cause breaking changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can deprecate the option and guide users towards the new solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim That is what I intended to do here: #343 (comment)

Based on the discussion with @whatyouhide I will probably ignore the whole topic for the PR.

Copy link
Contributor

@whatyouhide whatyouhide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments!

lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved

Fallback if not implemented: `"nplurals={nplurals};"`
"""
@callback plural_forms_header(locale()) :: String.t() | nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this return nil, and what does that mean semantically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that it does not know a rule for that locale. The caller of the function will then just set a partial header with only nplurals defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my previous comment, that is not factually correct. That's what's supposed to happen though. I'll have a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is as I described.

lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/mix/tasks/gettext.merge.ex Show resolved Hide resolved
lib/mix/tasks/gettext.merge.ex Outdated Show resolved Hide resolved
@maennchen maennchen force-pushed the jm/extend_plural_interface branch from 12df317 to 8d701d8 Compare January 19, 2023 14:57
@maennchen maennchen force-pushed the jm/extend_plural_interface branch from 8d701d8 to e7bbf36 Compare January 19, 2023 15:06
lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/gettext/plural.ex Outdated Show resolved Hide resolved
lib/mix/tasks/gettext.merge.ex Show resolved Hide resolved
@maennchen maennchen force-pushed the jm/extend_plural_interface branch from e7bbf36 to 7377b2d Compare January 20, 2023 14:44
Closes #318

This makes it possible to react precisely to the header instead of
using predefined formats.
@maennchen maennchen force-pushed the jm/extend_plural_interface branch from 7377b2d to 352117b Compare January 20, 2023 14:45
@maennchen maennchen requested a review from whatyouhide January 20, 2023 14:45
@maennchen
Copy link
Member Author

@whatyouhide I've adressed all the comments except #343 (comment)

Are we sure that we handle the deprecation question correctly? I have removed it but find it a bit strange to keep code (and not even discourage using it) when the behaviour it causes is potentially flawed.

Copy link
Contributor

@whatyouhide whatyouhide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great for now, we can deprecate the --plural-forms option later. 💯

@whatyouhide whatyouhide merged commit d55aeb0 into main Jan 20, 2023
@whatyouhide whatyouhide deleted the jm/extend_plural_interface branch January 20, 2023 15:17
@maennchen
Copy link
Member Author

@whatyouhide / @josevalim
Quick note: I will have some surgery on my wrist next week. I will probably not have time to finish the Gettext.Plural implementation for expo until then.
I however plan to pick it back up 2/3 weeks later when I can hopefully get rid of the cast and properly type again.

@whatyouhide
Copy link
Contributor

@maennchen no worries! I have some free time, so I can give it a shot if you want. I'm working on polishing docs as well. Thanks for all the work!! 💟

@maennchen
Copy link
Member Author

@whatyouhide If you’d like to, I don’t mind if you pick it up from here. But otherwise I’m also fine doing myself in a few weeks. However you prefer 😊 I might be able to do some code reviews, so just let me know if I can help.

ravensiris pushed a commit to ravensiris/gettext that referenced this pull request Aug 3, 2023
…xt#343)

Closes elixir-gettext#318

This makes it possible to react precisely to the header instead of
using predefined formats.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plural formas parser for pluralization
4 participants