Skip to content

Commit

Permalink
Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1,…
Browse files Browse the repository at this point in the history
… 0) cmap-tables (issue 13316)

In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).

To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).

---
[1] As can be seem below, some of the entries are completely normal while others are non-standard:
```
Differences (array)
    0 = 65
    1 = /g5167
    2 = /space
    3 = /g11927
    4 = /g17737
    5 = /g11540
    6 = /g2180
    7 = /K
    8 = /P
    9 = /two
    10 = /zero
    11 = /one
    12 = /five
    13 = /four
    14 = /g6932
    15 = /g7246
    16 = /g1691
    17 = /g2343
    18 = /g14792
    19 = /g3325
    20 = /g4280
    21 = /g20383
    22 = /g18166
    23 = /g16988
    24 = /g17943
    25 = /g19223
    26 = /g10830
    27 = 97
    28 = /g982
    29 = /g1226
    30 = /g5059
    31 = /g2677
    32 = /g1042
    33 = /g11568
    34 = /L
    35 = /three
    36 = /seven
    37 = /g2364
    38 = /g12063
    39 = /g5356
    40 = /g2173
    41 = /g17877
    42 = /g7273
    43 = /g7647
    44 = /g7224
    45 = /g19327
    46 = /g5054
    47 = /g2342
    48 = /g10136
    49 = /g6856
    50 = /g13381
    51 = /g7257
    52 = /g12093
    53 = /g2359
```
  • Loading branch information
Snuffleupagus committed Sep 3, 2021
1 parent 804abb3 commit de889ec
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 0 deletions.
19 changes: 19 additions & 0 deletions src/core/fonts.js
Original file line number Diff line number Diff line change
Expand Up @@ -2651,6 +2651,25 @@ class Font {
unicodeOrCharCode = MacRomanEncoding.indexOf(standardGlyphName);
}

if (unicodeOrCharCode === undefined) {
// Not a valid glyph name, fallback to using the /ToUnicode map
// when no post-table exists (fixes issue13316_reduced.pdf).
if (
!properties.glyphNames &&
properties.hasIncludedToUnicodeMap &&
!(this.toUnicode instanceof IdentityToUnicodeMap)
) {
const unicode = this.toUnicode.get(charCode);
if (unicode) {
unicodeOrCharCode = unicode.charCodeAt(0);
}
}

if (unicodeOrCharCode === undefined) {
continue; // No valid glyph mapping found.
}
}

for (let i = 0; i < cmapMappingsLength; ++i) {
if (cmapMappings[i].charCode !== unicodeOrCharCode) {
continue;
Expand Down
1 change: 1 addition & 0 deletions test/pdfs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@
!issue4304.pdf
!issue4379.pdf
!issue4550.pdf
!issue13316_reduced.pdf
!issue4575.pdf
!bug1011159.pdf
!issue5734.pdf
Expand Down
Binary file added test/pdfs/issue13316_reduced.pdf
Binary file not shown.
6 changes: 6 additions & 0 deletions test/test_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -4312,6 +4312,12 @@
"lastPage": 4,
"type": "load"
},
{ "id": "issue13316",
"file": "pdfs/issue13316_reduced.pdf",
"md5": "f5821891cee29d8de8b65e1efd6f4ceb",
"rounds": 1,
"type": "eq"
},
{ "id": "issue10519",
"file": "pdfs/issue10519_reduced.pdf",
"md5": "8a2dae43c0ef47b0734bedaaa24f8c09",
Expand Down

0 comments on commit de889ec

Please sign in to comment.