Skip to content

Commit

Permalink
Use spaces (not '_') to dedup msgids, and update the doc
Browse files Browse the repository at this point in the history
This improves the fix for #334
  • Loading branch information
mquinson committed Jul 3, 2022
1 parent 34ca254 commit e472c45
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 27 deletions.
4 changes: 2 additions & 2 deletions lib/Locale/Po4a/Po.pm
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Set the package version for the POT header. The default is "VERSION".
=item B<dedup>
Boolean indicating whether we should deduplicate msgids.
If true, when the same string is added again, a '_' is appended to deduplicate it.
If true, when the same string is added again, a space is appended to deduplicate it.
This is probably only useful in the gettextization context, where dupplicate msgids break the string pairing algorithm.
See https://github.com/mquinson/po4a/issues/334 for more info.
Expand Down Expand Up @@ -1394,7 +1394,7 @@ sub push_raw {

# If asked to dedup the msgid, append a '_' as long as the string is still a dupplicate
while ( defined( $self->{po}{$msgid} ) && $self->{options}{'dedup'} ) {
$msgid .= '_';
$msgid .= ' ';
}

if ( defined( $self->{po}{$msgid} ) ) {
Expand Down
43 changes: 20 additions & 23 deletions po4a-gettextize
Original file line number Diff line number Diff line change
Expand Up @@ -256,29 +256,26 @@ inspecting F<gettextization.failed.po>, and fix the problem where it really is.

=item

In some unfortunate settings, you will get the feeling that po4a ate some parts
of the text, either the original or the translation. F<gettextization.failed.po>
indicates that both files matched as expected up to the paragraph N. But then,
an (unsuccessful) attempt is made to match the N+1 paragraph in the original
file not with the N+1 paragraph in the translation as it should, but with the
N+2 paragraph. Just as if the N+1 paragraph that you see in the document simply
disappeared from the file during the process.

This unfortunate situation happens when the same paragraph is repeated over
the document. In that case, no new entry is created in the PO file, but a
new reference is added to the existing one instead.

So, the previous situation occurs when two similar but different paragraphs are
translated in the exact same way. This will apparently remove a paragraph of the
translation. To fix the problem, it is sufficient to slightly alter one of the
translations in the document. You can also prefer to kill the second paragraph
in the original document.

To the opposite, if the same paragraph appearing twice in the original document
is not translated in the exact same way at both locations, you will get the
feeling that one paragraph of the original document just vanished. Just copy the
best translation over the other one in the translated document to fix the
problem.
In some case, po4a adds a space at the end of either the original or the
translated strings. This is because every string must be deduplicated during the
gettextize process. Imagine that a string appearing several times unmodified in
the original, but is translated in differing way, or that different paragraphs
are translated in the exact same way.

Without deduplication, such case would break the gettexization algorithm, as it
is a simple one to one pairing between the msgids of both the master and the
localized files. Since one of the PO files would miss an entry (that would be
reported as duplicate, with two references), the pairing would fail.

Since po4a uses the entry type ("title" or "plain paragraph", etc) to detect
whether the parsing streams got desynchronized, similar issues could occur if
two identical entries (same content but differing type) of the master file are
translated in the exact same way in the localized file. po4a would detect a fake
desyncronization in such case.

In most cases, the extra space added by po4a to deduplicate the strings has no
impact on the formatting. Strings are fuzzied anyway, and msgmerge will probably
match the strings accordingly afterward.

=item

Expand Down
4 changes: 2 additions & 2 deletions t/gettextize/test_dups.po
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ msgstr "HELLO"
#. type: Title ##
#: ../gettextize/test_dups.md:3
#, fuzzy, markdown-text, no-wrap
msgid "hello_"
msgid "hello "
msgstr "SUBTITLE"

#. type: Plain text
#: ../gettextize/test_dups.md:5
#, fuzzy, markdown-text
msgid "hello__"
msgid "hello "
msgstr "SAMPLE PARAGRAPH."

0 comments on commit e472c45

Please sign in to comment.