-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confused po4a-gettextize by byte order markers and asciidoc #333
Comments
Hello, thanks for this report. What would you advise? To simply ignore these markers, or to try to restore them afterward? I suspect that ignoring is the right approach here, but I'm not sure. Thanks, |
[Martin Quinson]
Hello, thanks for this report.
What would you advise? To simply ignore these markers, or to try to
restore them afterward? I suspect that ignoring is the right approach
here, but I'm not sure.
I understand byte order markers to be there to identify the input byte
order, and would ignore them on output as long as the output is UTF-8 or
another byte based charset. It could be useful to include them for
UTF-16 output, but in the UTF-8 case they seem to be simply fluff. I
saw someone argue that BOM can be used to identify UTF-8, but I am not
sure I buy that argument.
…--
Happy hacking
Petter Reinholdtsen
|
I had another look at the source code, and I get the feeling that we are handling file encoding wrongly in po4a. I am considering using File::BOM all over the place to fix it, but that's quite an intrusive change that requires time. It would probably also be possible to hack something by adding something along these lines in Transtractor::read, but I'm not 100% confident that this will be enough. It somehow feels like hiding the issue instead of fixing it... |
The parsing of asciidoc files seem to be very confused when it find a byte order mark in the text file. The following demonstrate the problem. The two text files contain one text block each, but po4a-gettextize claim there is no text block in one of them. The two files are attached as a tarball, asciidoc-with-bom.tar.gz.
I expected po4a-gettextize to handle byte order marks in text files, as there are several text editors on Windows that insert them when saving files.
The text was updated successfully, but these errors were encountered: