-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating an empty .po file causes non-ASCII characters to be silently discarded from msgids #442
Comments
Hi Steve, |
I was going to refer to msgmerge's docs about how they want to preserve whichever charset the translator chooses, and that your suggestion would force UTF-8 instead. However, I've just found an interesting behavior of msgmerge 0.21 (as shipped in Debian stable), which looks like it'll need some additional bugs filed; I'll get to that, but not tonight. When running
|
I have yet another situation with msgmerge. I'm trying to update a UTF-8 PO file against a iso-8859 POT file, and it mangles the non-ascii chars:
The iconv commands do not output any error, proving that the file encoding matches the declared charset in the header (and the
All non-ascii chars of the msgids get mangled for some reason (the 'invalid multibyte sequence' lines are part of the msgmerge stderr, not of the actual file content). I'm puzzled. I'm using msgmerge 0.21 from Debian testing. Any help would be really welcome here. |
The files I used in this test: |
Submitted as https://savannah.gnu.org/bugs/index.php?65104 |
I guess that we should force UTF-8 on PO and POT files to stay safe. Do you have a better idea? |
The quickstart guide in the po4a(1) manpage says "Simply create an empty file with the .pot extension in the specified po_directory (e.g. man/po4a/foo.pot), and po4a will fill it with the expected content."
I assumed that .po files could be created in the same way, by creating an empty one and running
po4a po4a.cfg
. Doing that fills the .po file, but silently strips non-ASCII characters out of the msgids as it does so. This seems to be a deliberate feature of gettext's msgmerge - if it's given an empty .po file and a UTF-8 .pot file, it assumes that the .po file should be ASCII, and strips letters with umlauts out of the msgids. Running it directly gives warnings about that, but they aren't shown when running it via po4a.I have a German source file, and enabled UTF-8 in the .cfg file:
I'm not submitting a patch, as I'm not sure which way you'd prefer to handle it, but suggest either checking for empty files or adding "Don't create empty .po files, as these may cause the wrong charset to be used. Instead use the translators' tools to create a .po from the .pot." to the quickstart.
Debian bug #1022216 seems related, but is using
po4a-updatepo
.I'm using Debian Bookworm with gettext version 0.21-12, and have checked that the bug is still reproducible with po4a c9f5cf9.
The text was updated successfully, but these errors were encountered: