-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unicode mapping in the font cmap to have correct chars when printing in pdf (bug 1778484) #15157
Conversation
src/core/fonts.js
Outdated
unicode = unicode.codePointAt(0); | ||
} | ||
if (unicode) { | ||
newMap[unicode] = glyphId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will affect glyph-mapping in a way that is likely not intended, and could very well cause issues; does this actually pass all tests?
These changes thus go a lot further than only the cmap
, according to the commit message, and note how we're using newMapping.charCodeToGlyphId
as part of the following code that builds a seac
-map:
Lines 3009 to 3078 in 0205597
function getCharCodes(charCodeToGlyphId, glyphId) { | |
let charCodes = null; | |
for (const charCode in charCodeToGlyphId) { | |
if (glyphId === charCodeToGlyphId[charCode]) { | |
if (!charCodes) { | |
charCodes = []; | |
} | |
charCodes.push(charCode | 0); | |
} | |
} | |
return charCodes; | |
} | |
function createCharCode(charCodeToGlyphId, glyphId) { | |
for (const charCode in charCodeToGlyphId) { | |
if (glyphId === charCodeToGlyphId[charCode]) { | |
return charCode | 0; | |
} | |
} | |
newMapping.charCodeToGlyphId[newMapping.nextAvailableFontCharCode] = | |
glyphId; | |
return newMapping.nextAvailableFontCharCode++; | |
} | |
const seacs = font.seacs; | |
if (newMapping && SEAC_ANALYSIS_ENABLED && seacs && seacs.length) { | |
const matrix = properties.fontMatrix || FONT_IDENTITY_MATRIX; | |
const charset = font.getCharset(); | |
const seacMap = Object.create(null); | |
for (let glyphId in seacs) { | |
glyphId |= 0; | |
const seac = seacs[glyphId]; | |
const baseGlyphName = StandardEncoding[seac[2]]; | |
const accentGlyphName = StandardEncoding[seac[3]]; | |
const baseGlyphId = charset.indexOf(baseGlyphName); | |
const accentGlyphId = charset.indexOf(accentGlyphName); | |
if (baseGlyphId < 0 || accentGlyphId < 0) { | |
continue; | |
} | |
const accentOffset = { | |
x: seac[0] * matrix[0] + seac[1] * matrix[2] + matrix[4], | |
y: seac[0] * matrix[1] + seac[1] * matrix[3] + matrix[5], | |
}; | |
const charCodes = getCharCodes(mapping, glyphId); | |
if (!charCodes) { | |
// There's no point in mapping it if the char code was never mapped | |
// to begin with. | |
continue; | |
} | |
for (let i = 0, ii = charCodes.length; i < ii; i++) { | |
const charCode = charCodes[i]; | |
// Find a fontCharCode that maps to the base and accent glyphs. | |
// If one doesn't exists, create it. | |
const charCodeToGlyphId = newMapping.charCodeToGlyphId; | |
const baseFontCharCode = createCharCode( | |
charCodeToGlyphId, | |
baseGlyphId | |
); | |
const accentFontCharCode = createCharCode( | |
charCodeToGlyphId, | |
accentGlyphId | |
); | |
seacMap[charCode] = { | |
baseFontCharCode, | |
accentFontCharCode, | |
accentOffset, | |
}; | |
} | |
} |
What I'm wondering is thus if you don't want to leave the existing newMap
alone here and rather introduce a new "map" (e.g. named completeMap
or something better) that contains the "full"-data and return that from this function?
Then you can pass that one in when building e.g. the cmap
-table, without taking the risk of affecting any other pre-existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't run the tests: I wanted to have your opinion before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question I was asking to myself in writing the patch: could we use this font in text layer ? or could you imagine some painful drawbacks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use this font in text layer ?
I'd advise against that, since it'll cause (lots of) problems in the general case.
or could you imagine some painful drawbacks ?
There's no guarantee that a font ever contains all of the glyphs that correspond to the data in /ToUnicode. One trivial example is ligatures, since a font doesn't necessarily contain the individual glyphs that make up one.
Another example is OCRed documents, where the glyphs may be completely empty and thus have no usable size.
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/f3c52b21c520c1b/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/7f6b759b6589674/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/7f6b759b6589674/output.txt Total script time: 26.49 mins
Image differences available at: http://54.241.84.105:8877/7f6b759b6589674/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/f3c52b21c520c1b/output.txt Total script time: 28.20 mins
Image differences available at: http://54.193.163.58:8877/f3c52b21c520c1b/reftest-analyzer.html#web=eq.log |
/botio browsertest |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/be8f3d8718e9947/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/f22b02142c0c61f/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/f22b02142c0c61f/output.txt Total script time: 21.43 mins
Image differences available at: http://54.193.163.58:8877/f22b02142c0c61f/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/be8f3d8718e9947/output.txt Total script time: 21.69 mins
Image differences available at: http://54.241.84.105:8877/be8f3d8718e9947/reftest-analyzer.html#web=eq.log |
/botio browsertest |
From: Bot.io (Windows)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/3cc5cf63d9cc42f/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/5a980ea0a16d405/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/3cc5cf63d9cc42f/output.txt Total script time: 21.83 mins
Image differences available at: http://54.193.163.58:8877/3cc5cf63d9cc42f/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/5a980ea0a16d405/output.txt Total script time: 21.86 mins
Image differences available at: http://54.241.84.105:8877/5a980ea0a16d405/reftest-analyzer.html#web=eq.log |
src/core/fonts.js
Outdated
// The charcodes are moved into the private area in to fix some rendering | ||
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing | ||
// into pdf the generated will contains wrong chars. We can avoid that in | ||
// adding the unicode to the cmap and the print backend will then map the | ||
// glyph ids to the correct unicode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some grammar-nits:
// The charcodes are moved into the private area in to fix some rendering | |
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing | |
// into pdf the generated will contains wrong chars. We can avoid that in | |
// adding the unicode to the cmap and the print backend will then map the | |
// glyph ids to the correct unicode. | |
// The charcodes are moved into a private use area in to fix some rendering | |
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing | |
// to PDF the generated font will contain wrong chars. We can avoid that by | |
// adding the unicode to the cmap and the print backend will then map the | |
// glyph ids to the correct unicode. |
Interesting, it's almost what I just pushed, modulo the grammar nits I'll fix. |
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/8a3e7a42cc45788/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/7409e99961e9f13/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/7409e99961e9f13/output.txt Total script time: 26.16 mins
Image differences available at: http://54.241.84.105:8877/7409e99961e9f13/reftest-analyzer.html#web=eq.log |
/botio-windows lint |
From: Bot.io (Windows)ReceivedCommand cmd_lint from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/7c468caf756449c/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/7c468caf756449c/output.txt Total script time: 1.33 mins
|
/botio-windows lint |
From: Bot.io (Windows)ReceivedCommand cmd_lint from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/e90d7319a77005a/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/e90d7319a77005a/output.txt Total script time: 1.45 mins
|
@Snuffleupagus do you have an idea on how to fix that: |
My only idea would be to try what's suggested in the log, i.e. using |
I think it's because of
With node |
In GitHub Actions we use the LTS version of Node.js so perhaps the bots should do the same thing to avoid problems? |
/botio-windows lint |
From: Bot.io (Windows)ReceivedCommand cmd_lint from @calixteman received. Current queue size: 0 Live output at: http://54.193.163.58:8877/de693e1f0d0f395/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/de693e1f0d0f395/output.txt Total script time: 8.18 mins
|
Strange, I tried the LTS wednesday and it was failing but it's now working. |
/botio-linux lint |
From: Bot.io (Linux m4)ReceivedCommand cmd_lint from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/9ff394c76fa6bb5/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/9ff394c76fa6bb5/output.txt Total script time: 1.51 mins
|
/botio-linux lint |
From: Bot.io (Linux m4)ReceivedCommand cmd_lint from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/0d58372b0219b3e/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/0d58372b0219b3e/output.txt Total script time: 1.98 mins
|
/botio-linux makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/d3f68202b607d2b/output.txt |
Both bots are now using node 16.6 (LTS) and to bypass the versions mismatch around
|
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/d3f68202b607d2b/output.txt Total script time: 20.28 mins
|
/botio-linux makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/51a5f041e8ec398/output.txt |
/botio-linux makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @calixteman received. Current queue size: 0 Live output at: http://54.241.84.105:8877/27c2fa27631cfaf/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/27c2fa27631cfaf/output.txt Total script time: 21.70 mins
|
It aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1778484.