Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused po4a-gettextize by byte order markers and asciidoc #333

Closed
petterreinholdtsen opened this issue Nov 28, 2021 · 3 comments
Closed

Comments

@petterreinholdtsen
Copy link

The parsing of asciidoc files seem to be very confused when it find a byte order mark in the text file. The following demonstrate the problem. The two text files contain one text block each, but po4a-gettextize claim there is no text block in one of them. The two files are attached as a tarball, asciidoc-with-bom.tar.gz.

% file a_e*
a_en.adoc: ASCII text
a_es.adoc: UTF-8 Unicode (with BOM) text
% cat a_e*
:lang: en
:lang: es
% LANG=C po4a-gettextize -f AsciiDoc -M UTF-8 -m a_en.adoc -l a_es.adoc
Use of uninitialized value $newchar in substitution iterator at /usr/share/perl5/Locale/Po4a/Po.pm line 1619.
po4a gettextize: Original has less strings than the translation (0<1). Please fix it 
               by removing the extra entry from the translated file. You may need an 
               addendum (cf po4a(7)) to reput the chunk in place after 
               gettextization. A possible cause is that a text duplicated in the 
               original is not translated the same way each time. Remove one of the 
               translations, and you're fine.

The gettextization failed (once again). Don't give up, gettextizing is a subtle art, but this is only needed once to convert a project to the gorgeous luxus offered by po4a to translators.
Please refer to the po4a(7) documentation, the section "HOWTO convert a pre-existing translation to po4a?" contains several hints to help you in your task
%

I expected po4a-gettextize to handle byte order marks in text files, as there are several text editors on Windows that insert them when saving files.

@mquinson
Copy link
Owner

mquinson commented Dec 1, 2021

Hello, thanks for this report.

What would you advise? To simply ignore these markers, or to try to restore them afterward? I suspect that ignoring is the right approach here, but I'm not sure.

Thanks,

@petterreinholdtsen
Copy link
Author

petterreinholdtsen commented Dec 1, 2021 via email

@mquinson
Copy link
Owner

I had another look at the source code, and I get the feeling that we are handling file encoding wrongly in po4a. I am considering using File::BOM all over the place to fix it, but that's quite an intrusive change that requires time.

It would probably also be possible to hack something by adding something along these lines in Transtractor::read, but I'm not 100% confident that this will be enough. It somehow feels like hiding the issue instead of fixing it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants