v0.5.5 #6

johnzhou721 · 2025-06-28T20:35:57Z

Exactly what it says on the tin.

Fixes all bugs I can find with this plugin; future updates after this version should be rare.

Changes

Revision July 22

Ensure that POT content is now sorted by path when merging POTs from multiple sources (i.e., templates and content).
xgettext is used to merge POT files instead of msgcat, providing a better header and merging of same strings from different sources. This is used to ensure that all context will be kept; removing —use-first from msgcat simply causes header trouble.
The initially generated PO files will now have a header compatible with GNOME's Translation Editor, since they will have a non-placeholder Project-Id-Version. This is mostly for my local work — I’ve filled the project id versions manually into the existing POs — but have the good effect of adding a license header (Same license as BeeWare) on top of newly generated POs; I have not went in and added these would’ve-been-generated headers on the existing PO files, though.
Translations in templates now provide pgettext and npgettext methods. The pgettext is for the sprint helping string (pluralized with speaker count) and the gold members on the front page (pluralization, desperate bug I fixed); pgettext is used to give more context and to work around a bug where trailing spaces get trimmed if regular gettext is used.
The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.
When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.
The first bug in Two Seemingly Untranslated Strings beeware.github.io#689 has been fixed (button on frontpage).

PR Checklist:

All new features have been tested
All new features have been documented
I have read the CONTRIBUTING.md file
I will abide by the code of conduct

…t files is used Exactly what it says on the tin.

lektor_i18n.py

freakboy3742

One code style tweak, and one request for clarification - it’s entirely possible you’re correct in what you’ve done, but I’m not sufficiently familiar with msgcat in practice to be confident in that.

johnzhou721 · 2025-06-29T21:10:26Z

Um... what's the tweak? Forgot to finish review?

freakboy3742

Hrm… not sure what happened there - I must have neglected to confirm the comment I wrote.

The two comments were:

Using long form flags - using —sort-by-file instead of -F
Can you confirm why this is the right approach? I’m not an expert on msgcat (and I can’t confirm anything manually right now), but —use-first suggests merging keys, which seems preferable to duplication

johnzhou721 · 2025-06-29T21:19:54Z

@freakboy3742 Quite the opposite. Merge everything except keys -- but that also means that the "header" is being merged into something like

# #-#-#-#-#  templates-72gdmk0l.pot (PROJECT VERSION)  #-#-#-#-#
# Translations template for PROJECT.
# Copyright (C) 2025 ORGANIZATION
# This file is distributed under the same license as the PROJECT project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"#-#-#-#-#  contents.pot (PACKAGE VERSION)  #-#-#-#-#\n"
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-06-02 08:23+AWST\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: en <LL@li.org>\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"#-#-#-#-#  templates-72gdmk0l.pot (PROJECT VERSION)  #-#-#-#-#\n"
"Project-Id-Version: PROJECT VERSION\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2025-06-29 12:06-0500\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.17.0\n"

Which is problematic. It's keeping both versions.

CONCLUSION hold this PR. just found this issue

johnzhou721 · 2025-06-29T21:24:08Z

That said, xgettext is the canoncial way to merge pots, so I'm trying that.

johnzhou721 · 2025-06-29T21:27:24Z

These concerns should now be resolved.

CHANGELOG.md

johnzhou721 · 2025-06-30T01:08:24Z

DO NOT MERGE. It changes the creation date of the POT unnessacraily.

lektor_i18n.py

johnzhou721 · 2025-06-30T02:04:44Z

There. This works now.

This is what's called Yak Shaving I guess... just to have all the source places merged together, I removed an option from msgcat and realized xgettext is the canoncial way (https://www.gnu.org/software/gettext/manual/html_node/msgcat-Invocation.html, "To concatenate POT files, better use xgettext, not msgcat, because msgcat would choke on the undefined charsets in the specified POT files.") and can handle this better...

johnzhou721 · 2025-06-30T20:04:57Z

Again, as mentioned on the other thread, I really apologize for shaving all those yaks.

johnzhou721 · 2025-07-03T03:21:52Z

Hmm... in the bugfix I used the msgstr number to determine whether the singular form or the plural form needs to get filled in... it's only a good heuristic for some languages, so I documented that and if it's not English I mark the auto-filled plurals as fuzzy. See the changelog/readme for more details.

(FYI i patched to fill msgstr with msgids automatcially on the source language po file and clear all the other POs after initial msginit so the bug listed on the README is finally resolved, but here comes plural handling etc etc)

lektor_i18n.py

README.md

johnzhou721

Not sure... the diff is still very large and rewraps a bunch of chinese simplified strings on the beeware.github.io PR even when using ubuntu 24.04... I'm trying to see if this is the trouble.

lektor_i18n.py

johnzhou721 · 2025-07-23T04:56:22Z

latest commit has a logic error, I will refactor by extracting clearing entry into seperate functions and clearing entry when fill translation entry is fuzzy, this is a note to self.

kattni · 2025-10-11T03:10:52Z

Hey, John. I'm taking over the final review on this PR. I may be asking for clarification on your changes in this process.

My first question is, do you consider this ready for a final review? You repeatedly requested throughout the process that we hold off on merging it, so I want to ensure it's actually ready at this point before moving forward.

johnzhou721 · 2025-10-11T04:28:33Z

@kattni Yes, please! It's ready for final review -- I've made a lot of random changes here and there just to completely fixup this plugin so we hopefully don't need another update again. If you need me to reexplain anything, let me know, as I was quite vague in communication when I started this PR. I apoglogize for the noise.

kattni · 2025-10-11T05:55:32Z

@johnzhou721 I would appreciate it if you can explain to me how you're testing these changes. I'd like to test it before we get into explanations.

johnzhou721 · 2025-10-11T14:24:06Z

A lot of the testing happens at the beeware.github.io PR where lots of those changes are relevant and applied.

Ensure that POT content is now sorted by path when merging POTs from multiple sources (i.e., templates and content).

You can see that the new POT at https://github.com/beeware/beeware.github.io/pull/684/files#diff-c8f80bf8f257ddef4811618539fadecda3407d93671f14c95b81e3a161dc2c1c is sorted properly. There's lots of diffs in that file though because of the next change quoted below, however that does make the POT context formatting consistent with the PO files.

xgettext is used to merge POT files instead of msgcat, providing a better header and merging of same strings from different sources. This is used to ensure that all context will be kept; removing —use-first from msgcat simply causes header trouble.

Merging of different context is demonstrated at https://github.com/beeware/beeware.github.io/pull/684/files#diff-c8f80bf8f257ddef4811618539fadecda3407d93671f14c95b81e3a161dc2c1cR5102-R5103 -- --use-first of msgcat seems to just use the first context of those strings; however if we remove that flag, the headers are different and the header msgstr will be a complete mess.

The initially generated PO files will now have a header compatible with GNOME's Translation Editor, since they will have a non-placeholder Project-Id-Version. This is mostly for my local work — I’ve filled the project id versions manually into the existing POs — but have the good effect of adding a license header (Same license as BeeWare) on top of newly generated POs; I have not went in and added these would’ve-been-generated headers on the existing PO files, though.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini, along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True

Now lektor build and the POT will be

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:15+CST\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr ""

showing that Project-Id-Version is properly filled in.

Translations in templates now provide pgettext and npgettext methods. ~~The pgettext is for the sprint helping string (pluralized with speaker count) and the gold member_s_ on the front page (pluralization, desperate bug I fixed);~~ pgettext is used to give more context and to work around a bug where trailing spaces get trimmed if regular gettext is used.

OK I've made a mistake in the listing here. Ignore the sentence I've striked through here.

pgettext is used for member badges. If the preview of the jinja i18n translation PR renders Katie's "badge" at /zh_CN/about/team/ 超能力：Batavia、网站、高级养蜂师 with no extra spaces, it should be working since pgettext is used here to translate the item separators into these Chinese variants.

The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.

When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.

I'll come up with tests for these 2 a bit later

The first bug in Two Seemingly Untranslated Strings beeware.github.io#689 has been fixed (button on frontpage).

See preview for this. The missing strings are described at beeware.github.io#689 and they should be there.

johnzhou721 · 2025-10-11T15:05:29Z

The limitation where deletion of strings from the English PO file with non-English content is required is resolved. See the deletion in the README.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini [AS DESCRIBED IN THE README FILE], along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True -- but this time make sure French is the primary language in the lektorproject and config/i18n.ini. Build the project and find that the en PO file is empty, which means the limitation where strings must be manually deleted from the English PO file with non-english content is resolved.

When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Plural forms aren’t filled unless it’s English. [EDIT] This is actually a functional issue; when adding a plural form to a previously singular-only string, msgmerging will actually fill the plural form with the singular form, but fuzzied!! The additional handling ensures that plurals are filled correctly for source-language POs in this case.

Use lektor quickstart to generate a project, and config lektor-i18n-plugin -- add babel.cfg and config/i18n.ini [AS DESCRIBED IN THE README FILE], along with adding the dependency and alternatives into the lektorproject file. Then go to models, pick something random and make it translatable = True -- and then lektor build. Now change the code for the navigation bar in layout.html in templates to say

        {% for href, title in [
          ['/blog', 'Blog'],
          ['/projects', _("Projects")],
          ['/about', 'About']
        ] %}

So now rerun lektor build and we find that the English PO file automatically populates the new string for Projects -- this doesn't happen in the old version of the plugin, the new Projects string will just stay blank:

# English translations for transtest package.
# Copyright (C) 2025 THE transtest'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# Automatically generated, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:52+CST\n"
"PO-Revision-Date: 2025-10-11 09:52+CST\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr "Blog"

#: templates/layout.html:14
msgid "Projects"
msgstr "Projects"

Now we translate the string in the French version (ce n'est pas une traduction, though) and save the file:

#: templates/layout.html:14
msgid "Projects"
msgstr "THIS IS A FRENCH PLURAL TRANSLATION"

Now, we introduce a pluralized version of the string -- we do this in the navigation bar in layout.html:

        {% for href, title in [
          ['/blog', 'Blog'],
          ['/projects', ngettext("Project", "Projects", 1)],
          ['/about', 'About']
        ] %}

Notice now the English translation is automatically filled correctly, showing that the code for autofilling new source-language translations handles plurals properly:

# English translations for transtest package.
# Copyright (C) 2025 THE transtest'S COPYRIGHT HOLDER
# This file is distributed under the same license as the transtest package.
# Automatically generated, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: transtest 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-11 09:52+CST\n"
"PO-Revision-Date: 2025-10-11 09:52+CST\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: en\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: (content/blog/contents+en.lr:blog.title) https://website_url/blog/
msgid "Blog"
msgstr "Blog"

#: templates/layout.html:14
msgid "Project"
msgid_plural "Projects"
msgstr[0] "Project"
msgstr[1] "Projects"

#~ msgid "Projects"
#~ msgstr "Projects"

If the same test is performed on French with the source language, the new Projects string will not be automatically filled due to the lack in our ability to parse Plural-Forms to figure out what msgstr index to fill with msgid source and what to fill with plural msgid source.

kattni · 2025-11-03T22:59:24Z

As noted here, we're putting a pin in this for now, and will pick it up again if needed.

Ensure everything is sorted whenever a command that generates po / po…

fbb64f7

…t files is used Exactly what it says on the tin.

This was referenced Jun 28, 2025

Use Jinja i18n to gettext instead of databags beeware/beeware.github.io#683

Open

Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! beeware/beeware.github.io#684

Open

johnzhou721 changed the title ~~Ensure everything is sorted whenever a command that generates po / pot files is used~~ Sort by file when using msgcat to merge multiple pot files, and cumulate translation information. Jun 29, 2025

johnzhou721 commented Jun 29, 2025

View reviewed changes

lektor_i18n.py Outdated Show resolved Hide resolved

lektor_i18n.py Outdated Show resolved Hide resolved

johnzhou721 added 2 commits June 29, 2025 12:01

Apply suggestions from code review

51807ff

Update CHANGELOG.md

8769755

freakboy3742 requested changes Jun 29, 2025

View reviewed changes

johnzhou721 requested a review from freakboy3742 June 29, 2025 21:10

freakboy3742 requested changes Jun 29, 2025

View reviewed changes

use xgettext

2f74b35

johnzhou721 requested a review from freakboy3742 June 29, 2025 21:27

johnzhou721 commented Jun 29, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

2089826

don't change the date

8d88d70

johnzhou721 commented Jun 30, 2025

View reviewed changes

lektor_i18n.py Outdated Show resolved Hide resolved

Update lektor_i18n.py

92431fd

johnzhou721 and others added 5 commits June 30, 2025 14:23

add version + package name

96a49c0

fix

34c9032

changelog

3e846be

Better docs

3c6f297

Update CHANGELOG.md

d71e9dc

johnzhou721 changed the title ~~Sort by file when using msgcat to merge multiple pot files, and cumulate translation information.~~ v0.5.5 Jun 30, 2025

johnzhou721 added 6 commits July 2, 2025 21:58

Update lektor_i18n.py

a6be921

documenation

a9e841b

Update CHANGELOG.md

bd9bc39

Update README.md

aba93aa

Update lektor_i18n.py

c0806b9

Update README.md

65717bb

Improve docs

b63def8

johnzhou721 commented Jul 4, 2025

View reviewed changes

lektor_i18n.py Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

johnzhou721 and others added 2 commits July 3, 2025 19:59

Apply suggestions from code review

9fb78d0

fixup

845cf14

johnzhou721 commented Jul 22, 2025

View reviewed changes

lektor_i18n.py Outdated Show resolved Hide resolved

lektor_i18n.py Show resolved Hide resolved

johnzhou721 added 6 commits July 22, 2025 11:54

Apply suggestions from code review

53e7737

Fix issue with extraction

34a5f26

changelog

d44f885

Update lektor_i18n.py

5339dc2

Update lektor_i18n.py

fdfa3d7

fuzzy handle

0638bed

johnzhou721 commented Jul 22, 2025

View reviewed changes

lektor_i18n.py Show resolved Hide resolved

Update lektor_i18n.py

f27756f

johnzhou721 added 2 commits July 23, 2025 11:17

simp logic

7f56076

Update README.md

066b6e5

freakboy3742 requested a review from kattni October 10, 2025 06:24

Uh oh!

v0.5.5 #6

Are you sure you want to change the base?

v0.5.5 #6

Uh oh!

Conversation

johnzhou721 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

PR Checklist:

Uh oh!

Uh oh!

Uh oh!

freakboy3742 left a comment

Choose a reason for hiding this comment

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

freakboy3742 left a comment

Choose a reason for hiding this comment

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

johnzhou721 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnzhou721 commented Jul 23, 2025

Uh oh!

kattni commented Oct 11, 2025

Uh oh!

johnzhou721 commented Oct 11, 2025

Uh oh!

kattni commented Oct 11, 2025

Uh oh!

johnzhou721 commented Oct 11, 2025

Uh oh!

johnzhou721 commented Oct 11, 2025

Uh oh!

kattni commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnzhou721 commented Jun 28, 2025 •

edited

Loading