Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unicode mapping in the font cmap to have correct chars when printing in pdf (bug 1778484) #15157

Merged
merged 1 commit into from
Jul 13, 2022

Conversation

calixteman
Copy link
Contributor

unicode = unicode.codePointAt(0);
}
if (unicode) {
newMap[unicode] = glyphId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect glyph-mapping in a way that is likely not intended, and could very well cause issues; does this actually pass all tests?
These changes thus go a lot further than only the cmap, according to the commit message, and note how we're using newMapping.charCodeToGlyphId as part of the following code that builds a seac-map:

pdf.js/src/core/fonts.js

Lines 3009 to 3078 in 0205597

function getCharCodes(charCodeToGlyphId, glyphId) {
let charCodes = null;
for (const charCode in charCodeToGlyphId) {
if (glyphId === charCodeToGlyphId[charCode]) {
if (!charCodes) {
charCodes = [];
}
charCodes.push(charCode | 0);
}
}
return charCodes;
}
function createCharCode(charCodeToGlyphId, glyphId) {
for (const charCode in charCodeToGlyphId) {
if (glyphId === charCodeToGlyphId[charCode]) {
return charCode | 0;
}
}
newMapping.charCodeToGlyphId[newMapping.nextAvailableFontCharCode] =
glyphId;
return newMapping.nextAvailableFontCharCode++;
}
const seacs = font.seacs;
if (newMapping && SEAC_ANALYSIS_ENABLED && seacs && seacs.length) {
const matrix = properties.fontMatrix || FONT_IDENTITY_MATRIX;
const charset = font.getCharset();
const seacMap = Object.create(null);
for (let glyphId in seacs) {
glyphId |= 0;
const seac = seacs[glyphId];
const baseGlyphName = StandardEncoding[seac[2]];
const accentGlyphName = StandardEncoding[seac[3]];
const baseGlyphId = charset.indexOf(baseGlyphName);
const accentGlyphId = charset.indexOf(accentGlyphName);
if (baseGlyphId < 0 || accentGlyphId < 0) {
continue;
}
const accentOffset = {
x: seac[0] * matrix[0] + seac[1] * matrix[2] + matrix[4],
y: seac[0] * matrix[1] + seac[1] * matrix[3] + matrix[5],
};
const charCodes = getCharCodes(mapping, glyphId);
if (!charCodes) {
// There's no point in mapping it if the char code was never mapped
// to begin with.
continue;
}
for (let i = 0, ii = charCodes.length; i < ii; i++) {
const charCode = charCodes[i];
// Find a fontCharCode that maps to the base and accent glyphs.
// If one doesn't exists, create it.
const charCodeToGlyphId = newMapping.charCodeToGlyphId;
const baseFontCharCode = createCharCode(
charCodeToGlyphId,
baseGlyphId
);
const accentFontCharCode = createCharCode(
charCodeToGlyphId,
accentGlyphId
);
seacMap[charCode] = {
baseFontCharCode,
accentFontCharCode,
accentOffset,
};
}
}

What I'm wondering is thus if you don't want to leave the existing newMap alone here and rather introduce a new "map" (e.g. named completeMap or something better) that contains the "full"-data and return that from this function?
Then you can pass that one in when building e.g. the cmap-table, without taking the risk of affecting any other pre-existing code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the tests: I wanted to have your opinion before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question I was asking to myself in writing the patch: could we use this font in text layer ? or could you imagine some painful drawbacks ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use this font in text layer ?

I'd advise against that, since it'll cause (lots of) problems in the general case.

or could you imagine some painful drawbacks ?

There's no guarantee that a font ever contains all of the glyphs that correspond to the data in /ToUnicode. One trivial example is ligatures, since a font doesn't necessarily contain the individual glyphs that make up one.
Another example is OCRed documents, where the glyphs may be completely empty and thus have no usable size.

@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/f3c52b21c520c1b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/7f6b759b6589674/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/7f6b759b6589674/output.txt

Total script time: 26.49 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 2337
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/7f6b759b6589674/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/f3c52b21c520c1b/output.txt

Total script time: 28.20 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 143

Image differences available at: http://54.193.163.58:8877/f3c52b21c520c1b/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio browsertest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_browsertest from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/be8f3d8718e9947/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_browsertest from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/f22b02142c0c61f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/f22b02142c0c61f/output.txt

Total script time: 21.43 mins

  • Regression tests: FAILED
  different ref/snapshot: 13

Image differences available at: http://54.193.163.58:8877/f22b02142c0c61f/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/be8f3d8718e9947/output.txt

Total script time: 21.69 mins

  • Regression tests: FAILED
  different ref/snapshot: 2286

Image differences available at: http://54.241.84.105:8877/be8f3d8718e9947/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio browsertest

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_browsertest from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/3cc5cf63d9cc42f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_browsertest from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/5a980ea0a16d405/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/3cc5cf63d9cc42f/output.txt

Total script time: 21.83 mins

  • Regression tests: FAILED
  different ref/snapshot: 2

Image differences available at: http://54.193.163.58:8877/3cc5cf63d9cc42f/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/5a980ea0a16d405/output.txt

Total script time: 21.86 mins

  • Regression tests: FAILED
  different ref/snapshot: 2284
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/5a980ea0a16d405/reftest-analyzer.html#web=eq.log

src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
src/core/fonts.js Outdated Show resolved Hide resolved
Comment on lines 467 to 472
// The charcodes are moved into the private area in to fix some rendering
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing
// into pdf the generated will contains wrong chars. We can avoid that in
// adding the unicode to the cmap and the print backend will then map the
// glyph ids to the correct unicode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some grammar-nits:

Suggested change
// The charcodes are moved into the private area in to fix some rendering
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing
// into pdf the generated will contains wrong chars. We can avoid that in
// adding the unicode to the cmap and the print backend will then map the
// glyph ids to the correct unicode.
// The charcodes are moved into a private use area in to fix some rendering
// issues (https://github.com/mozilla/pdf.js/pull/9340) but when printing
// to PDF the generated font will contain wrong chars. We can avoid that by
// adding the unicode to the cmap and the print backend will then map the
// glyph ids to the correct unicode.

@calixteman
Copy link
Contributor Author

Interesting, it's almost what I just pushed, modulo the grammar nits I'll fix.
I replaced the Object.create(null) in favor of a Map because I considered that it was useless to convert keys into numbers (thanks to the x | 0 trick).
I was wondering if we should move from Object to Map in this file in order to avoid useless conversion (I suppose that the Object are here because they've been written before Map exists.

@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/8a3e7a42cc45788/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/7409e99961e9f13/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/7409e99961e9f13/output.txt

Total script time: 26.16 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 2283
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/7409e99961e9f13/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio-windows lint

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_lint from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/7c468caf756449c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/7c468caf756449c/output.txt

Total script time: 1.33 mins

  • Lint: FAILED

@calixteman
Copy link
Contributor Author

/botio-windows lint

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_lint from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/e90d7319a77005a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/e90d7319a77005a/output.txt

Total script time: 1.45 mins

  • Lint: FAILED

@calixteman
Copy link
Contributor Author

@Snuffleupagus do you have an idea on how to fix that:
http://54.193.163.58:8877/e90d7319a77005a/output.txt ?

@Snuffleupagus
Copy link
Collaborator

My only idea would be to try what's suggested in the log, i.e. using npm install --force instead.

@calixteman
Copy link
Contributor Author

I think it's because of eslint-plugin-prettier:

With node 17.4.0 the dep mismatch is ok when it's a problem in 18.6.0

@Snuffleupagus
Copy link
Collaborator

With node 17.4.0 the dep mismatch is ok when it's a problem in 18.6.0

In GitHub Actions we use the LTS version of Node.js so perhaps the bots should do the same thing to avoid problems?

@calixteman
Copy link
Contributor Author

/botio-windows lint

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_lint from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/de693e1f0d0f395/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/de693e1f0d0f395/output.txt

Total script time: 8.18 mins

  • Lint: Passed

@calixteman
Copy link
Contributor Author

calixteman commented Jul 15, 2022

Strange, I tried the LTS wednesday and it was failing but it's now working.
I'll update the linux bot too.

@calixteman
Copy link
Contributor Author

/botio-linux lint

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_lint from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/9ff394c76fa6bb5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/9ff394c76fa6bb5/output.txt

Total script time: 1.51 mins

  • Lint: FAILED

@calixteman
Copy link
Contributor Author

/botio-linux lint

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_lint from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/0d58372b0219b3e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/0d58372b0219b3e/output.txt

Total script time: 1.98 mins

  • Lint: Passed

@calixteman
Copy link
Contributor Author

/botio-linux makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/d3f68202b607d2b/output.txt

@calixteman
Copy link
Contributor Author

Both bots are now using node 16.6 (LTS) and to bypass the versions mismatch around eslint-plugin-prettier I set this:

npm set legacy-peer-deps true -g

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/d3f68202b607d2b/output.txt

Total script time: 20.28 mins

  • Lint: Passed
  • Make references: FAILED

@calixteman
Copy link
Contributor Author

/botio-linux makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/51a5f041e8ec398/output.txt

@calixteman
Copy link
Contributor Author

/botio-linux makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/27c2fa27631cfaf/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/27c2fa27631cfaf/output.txt

Total script time: 21.70 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants