Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Snuffleupagus · 2018-06-19T11:50:06Z

While waiting for a reviewer to have time to look at PR #9729, I happened to glance at the 4-corrupted-pdf category; hence this PR which fixes a few simple issues.

Please refer to the individual commit messages for additional details.

Fixes #9252.
Fixes #9418.

pdfjsbot · 2018-06-19T11:54:44Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/84df6faf277b5dc/output.txt

pdfjsbot · 2018-06-19T11:54:44Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt

pdfjsbot · 2018-06-19T12:18:11Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/84df6faf277b5dc/output.txt

Total script time: 23.44 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-06-19T12:32:56Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt

Total script time: 38.19 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Rob--W

How about adding a unit test for the new parser logic to test/unit/parser_spec.js, preferably with a reference to the motivation behind the parser logic?

Rob--W · 2018-06-20T09:12:53Z

src/core/parser.js

      if (ch < 0x30 || ch > 0x39) { // '0' - '9'
+        if (divideBy === 10 && Number.isNaN(sign) && isSpace(ch)) {


The sign is always -1, +1 or NaN. So use isNaN instead of Number.isNaN.

(or maybe even sign = 0 instead of sign = NaN and/or !sign ?)

So use isNaN instead of Number.isNaN.

Note that there's actually (a brief) discussion in PR #8643 regarding replacing isNaN with Number.isNaN throughout the code-base.

(or maybe even sign = 0 instead of sign = NaN and/or !sign ?)

However, I've updated the patch to initialize sign with zero instead so the isNaN comment should now be moot :-)

How about adding a unit test for the new parser logic to test/unit/parser_spec.js, preferably with a reference to the motivation behind the parser logic?

OK, a new unit-test has been added (and an existing one extended as well). Furthermore a comment was added in the code/commit message.

…getNumber` (PR 8359 follow-up) With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches. Hence this is code is adjusted to actually do what the original commit message says, and nothing more.

This is consistent with the behaviour in Adobe Reader.

…rators in `XRef.indexObjects` (PR 9288 follow-up)

…ef.parse`

… to recover when possible Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.

…indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.

Rob--W

r=me with passing tests

pdfjsbot · 2018-06-20T15:10:12Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/974fb3b6b78f67d/output.txt

pdfjsbot · 2018-06-20T15:10:13Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/881b91a41b783d1/output.txt

pdfjsbot · 2018-06-20T15:33:43Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/974fb3b6b78f67d/output.txt

Total script time: 23.49 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-06-20T15:48:04Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/881b91a41b783d1/output.txt

Total script time: 37.83 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/881b91a41b783d1/reftest-analyzer.html#web=eq.log

Snuffleupagus · 2018-06-20T15:58:59Z

r=me with passing tests

Thanks for the review!

The Linux-only test "failures" look like the usual fallout from a browser upgrade on the bots (Firefox 61 -> 62), so I don't think it's a problem here.

Snuffleupagus · 2018-06-21T10:49:48Z

/botio makeref

pdfjsbot · 2018-06-21T10:49:49Z

From: Bot.io (Linux m4)

Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/a176087c3efc545/output.txt

pdfjsbot · 2018-06-21T10:49:49Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/7781883ad52dc6e/output.txt

pdfjsbot · 2018-06-21T11:11:14Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/7781883ad52dc6e/output.txt

Total script time: 21.41 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2018-06-21T11:25:20Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/a176087c3efc545/output.txt

Total script time: 35.51 mins

Lint: Passed
Make references: Passed
Check references: Passed

timvandermeij · 2018-06-21T20:35:32Z

Thank you for fixing this, and @Rob--W for the review!

kdleijer · 2018-06-21T21:15:13Z

many thanks for this fix/review/merge!

…ixes Fix various corrupt PDF files (issue 9252, issue 9418)

timvandermeij added the corrupted-pdf label Jun 19, 2018

Rob--W reviewed Jun 20, 2018

View reviewed changes

Snuffleupagus added 6 commits June 20, 2018 13:41

Let Lexer.getNumber treat a single decimal point as zero (issue 9252)

6bbcafc

This is consistent with the behaviour in Adobe Reader.

Use the correct startPos when repeating the search for 'endobj' ope…

30ad62a

…rators in `XRef.indexObjects` (PR 9288 follow-up)

Prevent hard errors if fetching the Encrypt dictionary fails in `XR…

e84813e

…ef.parse`

Snuffleupagus force-pushed the misc-corrupt-pdf-fixes branch from 49db104 to 56e3648 Compare June 20, 2018 12:16

Rob--W approved these changes Jun 20, 2018

View reviewed changes

timvandermeij merged commit 98ea39f into mozilla:master Jun 21, 2018

Snuffleupagus deleted the misc-corrupt-pdf-fixes branch June 21, 2018 20:38

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018

Merge pull request mozilla#9827 from Snuffleupagus/misc-corrupt-pdf-f…

4dbaa1a

…ixes Fix various corrupt PDF files (issue 9252, issue 9418)

Snuffleupagus mentioned this pull request Dec 3, 2021

[api-minor] Clear all caches in XRef.indexObjects, and improve /Root dictionary validation in XRef.parse (issue 14303) #14338

Merged

Snuffleupagus mentioned this pull request Feb 7, 2022

Let Lexer.getNumber treat a single minus sign as zero (bug 1753983) #14543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Snuffleupagus commented Jun 19, 2018

pdfjsbot commented Jun 19, 2018

pdfjsbot commented Jun 19, 2018

pdfjsbot commented Jun 19, 2018

pdfjsbot commented Jun 19, 2018

Rob--W left a comment

Rob--W Jun 20, 2018

Snuffleupagus Jun 20, 2018

Rob--W left a comment

pdfjsbot commented Jun 20, 2018

pdfjsbot commented Jun 20, 2018

pdfjsbot commented Jun 20, 2018

pdfjsbot commented Jun 20, 2018

Snuffleupagus commented Jun 20, 2018 •

edited

Loading

Snuffleupagus commented Jun 21, 2018

pdfjsbot commented Jun 21, 2018

pdfjsbot commented Jun 21, 2018

pdfjsbot commented Jun 21, 2018

pdfjsbot commented Jun 21, 2018

timvandermeij commented Jun 21, 2018

kdleijer commented Jun 21, 2018

		if (ch < 0x30 \|\| ch > 0x39) { // '0' - '9'
		if (divideBy === 10 && Number.isNaN(sign) && isSpace(ch)) {

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Conversation

Snuffleupagus commented Jun 19, 2018

pdfjsbot commented Jun 19, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jun 19, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jun 19, 2018

From: Bot.io (Windows)

Success

pdfjsbot commented Jun 19, 2018

From: Bot.io (Linux m4)

Success

Rob--W left a comment

Choose a reason for hiding this comment

Rob--W Jun 20, 2018

Choose a reason for hiding this comment

Snuffleupagus Jun 20, 2018

Choose a reason for hiding this comment

Rob--W left a comment

Choose a reason for hiding this comment

pdfjsbot commented Jun 20, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jun 20, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jun 20, 2018

From: Bot.io (Windows)

Success

pdfjsbot commented Jun 20, 2018

From: Bot.io (Linux m4)

Failed

Snuffleupagus commented Jun 20, 2018 • edited Loading

Snuffleupagus commented Jun 21, 2018

pdfjsbot commented Jun 21, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jun 21, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jun 21, 2018

From: Bot.io (Windows)

Success

pdfjsbot commented Jun 21, 2018

From: Bot.io (Linux m4)

Success

timvandermeij commented Jun 21, 2018

kdleijer commented Jun 21, 2018

Snuffleupagus commented Jun 20, 2018 •

edited

Loading