-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) #13970
Conversation
@brendandahl This patch does feel really hacky and somewhat arbitrary, so please just tell me if it looks stupid :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable, pdfium has a similar path.
… 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```
de889ec
to
3ccf277
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/f952512dfc7be0e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/b3bd048b0f4cc5a/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/f952512dfc7be0e/output.txt Total script time: 21.90 mins
Image differences available at: http://54.241.84.105:8877/f952512dfc7be0e/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/b3bd048b0f4cc5a/output.txt Total script time: 39.59 mins
Image differences available at: http://54.193.163.58:8877/b3bd048b0f4cc5a/reftest-analyzer.html#web=eq.log |
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/698ab9d43eeae20/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.193.163.58:8877/004ca615ee623f0/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/698ab9d43eeae20/output.txt Total script time: 17.79 mins
|
/botio-linux makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/93fd4e71d9408b2/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/93fd4e71d9408b2/output.txt Total script time: 17.69 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/004ca615ee623f0/output.txt Total script time: 36.83 mins
|
/botio-linux makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/668e0211582a7c2/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/668e0211582a7c2/output.txt Total script time: 19.59 mins
|
In the PDF document some of the glyphs have bogus
differences
-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.My initial idea was to use a similar approach as in the
PartialEvaluator._simpleFontToUnicode
-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).
Fixes #13316
[1] As can be seem below, some of the entries are completely normal while others are non-standard: