Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) #13970

Merged
merged 1 commit into from
Sep 4, 2021

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Sep 3, 2021

In the PDF document some of the glyphs have bogus differences-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the PartialEvaluator._simpleFontToUnicode-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).

To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).

Fixes #13316


[1] As can be seem below, some of the entries are completely normal while others are non-standard:

Differences (array)
    0 = 65
    1 = /g5167
    2 = /space
    3 = /g11927
    4 = /g17737
    5 = /g11540
    6 = /g2180
    7 = /K
    8 = /P
    9 = /two
    10 = /zero
    11 = /one
    12 = /five
    13 = /four
    14 = /g6932
    15 = /g7246
    16 = /g1691
    17 = /g2343
    18 = /g14792
    19 = /g3325
    20 = /g4280
    21 = /g20383
    22 = /g18166
    23 = /g16988
    24 = /g17943
    25 = /g19223
    26 = /g10830
    27 = 97
    28 = /g982
    29 = /g1226
    30 = /g5059
    31 = /g2677
    32 = /g1042
    33 = /g11568
    34 = /L
    35 = /three
    36 = /seven
    37 = /g2364
    38 = /g12063
    39 = /g5356
    40 = /g2173
    41 = /g17877
    42 = /g7273
    43 = /g7647
    44 = /g7224
    45 = /g19327
    46 = /g5054
    47 = /g2342
    48 = /g10136
    49 = /g6856
    50 = /g13381
    51 = /g7257
    52 = /g12093
    53 = /g2359

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Sep 3, 2021

@brendandahl This patch does feel really hacky and somewhat arbitrary, so please just tell me if it looks stupid :-)

Copy link
Contributor

@brendandahl brendandahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, pdfium has a similar path.

src/core/fonts.js Outdated Show resolved Hide resolved
… 0) cmap-tables (issue 13316)

In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).

To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).

---
[1] As can be seem below, some of the entries are completely normal while others are non-standard:
```
Differences (array)
    0 = 65
    1 = /g5167
    2 = /space
    3 = /g11927
    4 = /g17737
    5 = /g11540
    6 = /g2180
    7 = /K
    8 = /P
    9 = /two
    10 = /zero
    11 = /one
    12 = /five
    13 = /four
    14 = /g6932
    15 = /g7246
    16 = /g1691
    17 = /g2343
    18 = /g14792
    19 = /g3325
    20 = /g4280
    21 = /g20383
    22 = /g18166
    23 = /g16988
    24 = /g17943
    25 = /g19223
    26 = /g10830
    27 = 97
    28 = /g982
    29 = /g1226
    30 = /g5059
    31 = /g2677
    32 = /g1042
    33 = /g11568
    34 = /L
    35 = /three
    36 = /seven
    37 = /g2364
    38 = /g12063
    39 = /g5356
    40 = /g2173
    41 = /g17877
    42 = /g7273
    43 = /g7647
    44 = /g7224
    45 = /g19327
    46 = /g5054
    47 = /g2342
    48 = /g10136
    49 = /g6856
    50 = /g13381
    51 = /g7257
    52 = /g12093
    53 = /g2359
```
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/f952512dfc7be0e/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/b3bd048b0f4cc5a/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/f952512dfc7be0e/output.txt

Total script time: 21.90 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 7

Image differences available at: http://54.241.84.105:8877/f952512dfc7be0e/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/b3bd048b0f4cc5a/output.txt

Total script time: 39.59 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 6
  different first/second rendering: 1

Image differences available at: http://54.193.163.58:8877/b3bd048b0f4cc5a/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus merged commit 258cf1d into mozilla:master Sep 4, 2021
@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/698ab9d43eeae20/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.193.163.58:8877/004ca615ee623f0/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/698ab9d43eeae20/output.txt

Total script time: 17.79 mins

  • Lint: Passed
  • Make references: FAILED

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux makeref

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/93fd4e71d9408b2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/93fd4e71d9408b2/output.txt

Total script time: 17.69 mins

  • Lint: Passed
  • Make references: FAILED

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/004ca615ee623f0/output.txt

Total script time: 36.83 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux makeref

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/668e0211582a7c2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/668e0211582a7c2/output.txt

Total script time: 19.59 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the issue-13316 branch September 4, 2021 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All of the Chinese character cannot be rendered
3 participants