Skip to content

Conversation

@johnzhou721
Copy link

@johnzhou721 johnzhou721 commented Jun 28, 2025

Exactly what it says on the tin.

Fixes all bugs I can find with this plugin; future updates after this version should be rare.

Changes

Revision July 22

  • Ensure that POT content is now sorted by path when merging POTs from multiple sources (i.e., templates and content).
  • xgettext is used to merge POT files instead of msgcat, providing a better header and merging of same strings from different sources. This is used to ensure that all context will be kept; removing —use-first from msgcat simply causes header trouble.
  • The initially generated PO files will now have a header compatible with GNOME's Translation Editor, since they will have a non-placeholder Project-Id-Version. This is mostly for my local work — I’ve filled the project id versions manually into the existing POs — but have the good effect of adding a license header (Same license as BeeWare) on top of newly generated POs; I have not went in and added these would’ve-been-generated headers on the existing PO files, though.
  • Translations in templates now provide pgettext and npgettext methods. The pgettext is for the sprint helping string (pluralized with speaker count) and the gold members on the front page (pluralization, desperate bug I fixed); pgettext is used to give more context and to work around a bug where trailing spaces get trimmed if regular gettext is used.
  • The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.
  • When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.
  • The first bug in Two Seemingly Untranslated Strings beeware.github.io#689 has been fixed (button on frontpage).

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

…t files is used

Exactly what it says on the tin.
@johnzhou721 johnzhou721 changed the title Ensure everything is sorted whenever a command that generates po / pot files is used Sort by file when using msgcat to merge multiple pot files, and cumulate translation information. Jun 29, 2025
Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One code style tweak, and one request for clarification - it’s entirely possible you’re correct in what you’ve done, but I’m not sufficiently familiar with msgcat in practice to be confident in that.

@johnzhou721
Copy link
Author

Um... what's the tweak? Forgot to finish review?

@johnzhou721 johnzhou721 requested a review from freakboy3742 June 29, 2025 21:10
Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm… not sure what happened there - I must have neglected to confirm the comment I wrote.

The two comments were:

  1. Using long form flags - using —sort-by-file instead of -F
  2. Can you confirm why this is the right approach? I’m not an expert on msgcat (and I can’t confirm anything manually right now), but —use-first suggests merging keys, which seems preferable to duplication

@johnzhou721
Copy link
Author

@freakboy3742 Quite the opposite. Merge everything except keys -- but that also means that the "header" is being merged into something like

# #-#-#-#-#  templates-72gdmk0l.pot (PROJECT VERSION)  #-#-#-#-#
# Translations template for PROJECT.
# Copyright (C) 2025 ORGANIZATION
# This file is distributed under the same license as the PROJECT project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"#-#-#-#-#  contents.pot (PACKAGE VERSION)  #-#-#-#-#\n"
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-06-02 08:23+AWST\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: en <LL@li.org>\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"#-#-#-#-#  templates-72gdmk0l.pot (PROJECT VERSION)  #-#-#-#-#\n"
"Project-Id-Version: PROJECT VERSION\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2025-06-29 12:06-0500\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

Which is problematic. It's keeping both versions.

CONCLUSION hold this PR. just found this issue

@johnzhou721
Copy link
Author

That said, xgettext is the canoncial way to merge pots, so I'm trying that.

@johnzhou721
Copy link
Author

These concerns should now be resolved.

@johnzhou721 johnzhou721 requested a review from freakboy3742 June 29, 2025 21:27
@johnzhou721
Copy link
Author

DO NOT MERGE. It changes the creation date of the POT unnessacraily.

@johnzhou721
Copy link
Author

There. This works now.

This is what's called Yak Shaving I guess... just to have all the source places merged together, I removed an option from msgcat and realized xgettext is the canoncial way (https://www.gnu.org/software/gettext/manual/html_node/msgcat-Invocation.html, "To concatenate POT files, better use xgettext, not msgcat, because msgcat would choke on the undefined charsets in the specified POT files.") and can handle this better...

@johnzhou721 johnzhou721 changed the title Sort by file when using msgcat to merge multiple pot files, and cumulate translation information. v0.5.5 Jun 30, 2025
@johnzhou721
Copy link
Author

Again, as mentioned on the other thread, I really apologize for shaving all those yaks.

@johnzhou721
Copy link
Author

Hmm... in the bugfix I used the msgstr number to determine whether the singular form or the plural form needs to get filled in... it's only a good heuristic for some languages, so I documented that and if it's not English I mark the auto-filled plurals as fuzzy. See the changelog/readme for more details.

(FYI i patched to fill msgstr with msgids automatcially on the source language po file and clear all the other POs after initial msginit so the bug listed on the README is finally resolved, but here comes plural handling etc etc)

Copy link
Author

@johnzhou721 johnzhou721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure... the diff is still very large and rewraps a bunch of chinese simplified strings on the beeware.github.io PR even when using ubuntu 24.04... I'm trying to see if this is the trouble.

@johnzhou721
Copy link
Author

latest commit has a logic error, I will refactor by extracting clearing entry into seperate functions and clearing entry when fill translation entry is fuzzy, this is a note to self.

@freakboy3742 freakboy3742 requested a review from kattni October 10, 2025 06:24
@kattni
Copy link

kattni commented Oct 11, 2025

Hey, John. I'm taking over the final review on this PR. I may be asking for clarification on your changes in this process.

My first question is, do you consider this ready for a final review? You repeatedly requested throughout the process that we hold off on merging it, so I want to ensure it's actually ready at this point before moving forward.

@johnzhou721
Copy link
Author

@kattni Yes, please! It's ready for final review -- I've made a lot of random changes here and there just to completely fixup this plugin so we hopefully don't need another update again. If you need me to reexplain anything, let me know, as I was quite vague in communication when I started this PR. I apoglogize for the noise.

@kattni
Copy link

kattni commented Oct 11, 2025

@johnzhou721 I would appreciate it if you can explain to me how you're testing these changes. I'd like to test it before we get into explanations.

@johnzhou721
Copy link
Author

A lot of the testing happens at the beeware.github.io PR where lots of those changes are relevant and applied.

  • Ensure that POT content is now sorted by path when merging POTs from multiple sources (i.e., templates and content).

You can see that the new POT at https://github.com/beeware/beeware.github.io/pull/684/files#diff-c8f80bf8f257ddef4811618539fadecda3407d93671f14c95b81e3a161dc2c1c is sorted properly. There's lots of diffs in that file though because of the next change quoted below, however that does make the POT context formatting consistent with the PO files.

  • xgettext is used to merge POT files instead of msgcat, providing a better header and merging of same strings from different sources. This is used to ensure that all context will be kept; removing —use-first from msgcat simply causes header trouble.

Merging of different context is demonstrated at https://github.com/beeware/beeware.github.io/pull/684/files#diff-c8f80bf8f257ddef4811618539fadecda3407d93671f14c95b81e3a161dc2c1cR5102-R5103 -- --use-first of msgcat seems to just use the first context of those strings; however if we remove that flag, the headers are different and the header msgstr will be a complete mess.

  • The initially generated PO files will now have a header compatible with GNOME's Translation Editor, since they will have a non-placeholder Project-Id-Version. This is mostly for my local work — I’ve filled the project id versions manually into the existing POs — but have the good effect of adding a license header (Same license as BeeWare) on top of newly generated POs; I have not went in and added these would’ve-been-generated headers on the existing PO files, though.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini, along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True

Now lektor build and the POT will be

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:15+CST\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr ""

showing that Project-Id-Version is properly filled in.

  • Translations in templates now provide pgettext and npgettext methods. The pgettext is for the sprint helping string (pluralized with speaker count) and the gold member_s_ on the front page (pluralization, desperate bug I fixed); pgettext is used to give more context and to work around a bug where trailing spaces get trimmed if regular gettext is used.

OK I've made a mistake in the listing here. Ignore the sentence I've striked through here.

pgettext is used for member badges. If the preview of the jinja i18n translation PR renders Katie's "badge" at /zh_CN/about/team/ 超能力:Batavia、网站、高级养蜂师 with no extra spaces, it should be working since pgettext is used here to translate the item separators into these Chinese variants.

  • The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.
  • When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.

I'll come up with tests for these 2 a bit later

See preview for this. The missing strings are described at beeware.github.io#689 and they should be there.

@johnzhou721
Copy link
Author

  • The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini [AS DESCRIBED IN THE README FILE], along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True -- but this time make sure French is the primary language in the lektorproject and config/i18n.ini. Build the project and find that the en PO file is empty, which means the limitation where strings must be manually deleted from the English PO file with non-english content is resolved.

  • When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini [AS DESCRIBED IN THE README FILE], along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True -- and then lektor build. Now change the code for the navigation bar in layout.html in templates to say

        {% for href, title in [
          ['/blog', 'Blog'],
          ['/projects', _("Projects")],
          ['/about', 'About']
        ] %}

So now rerun lektor build and we find that the English PO file automatically populates the new string for Projects -- this doesn't happen in the old version of the plugin, the new Projects string will just stay blank:

# English translations for transtest package.
# Copyright (C) 2025 THE transtest'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# Automatically generated, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:52+CST\n"
"PO-Revision-Date: 2025-10-11 09:52+CST\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr "Blog"

#: templates/layout.html:14
msgid "Projects"
msgstr "Projects"

Now we translate the string in the French version (ce n'est pas une traduction, though) and save the file:

#: templates/layout.html:14
msgid "Projects"
msgstr "THIS IS A FRENCH PLURAL TRANSLATION"

Now, we introduce a pluralized version of the string -- we do this in the navigation bar in layout.html:

        {% for href, title in [
          ['/blog', 'Blog'],
          ['/projects', ngettext("Project", "Projects", 1)],
          ['/about', 'About']
        ] %}

Notice now the English translation is automatically filled correctly, showing that the code for autofilling new source-language translations handles plurals properly:

# English translations for transtest package.
# Copyright (C) 2025 THE transtest'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# Automatically generated, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:52+CST\n"
"PO-Revision-Date: 2025-10-11 09:52+CST\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr "Blog"

#: templates/layout.html:14
msgid "Project"
msgid_plural "Projects"
msgstr[0] "Project"
msgstr[1] "Projects"

#~ msgid "Projects"
#~ msgstr "Projects"

If the same test is performed on French with the source language, the new Projects string will not be automatically filled due to the lack in our ability to parse Plural-Forms to figure out what msgstr index to fill with msgid source and what to fill with plural msgid source.

@kattni
Copy link

kattni commented Nov 3, 2025

As noted here, we're putting a pin in this for now, and will pick it up again if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants