Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various corrupt PDF files (issue 9252, issue 9418) #9827

Merged
merged 6 commits into from
Jun 21, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

While waiting for a reviewer to have time to look at PR #9729, I happened to glance at the 4-corrupted-pdf category; hence this PR which fixes a few simple issues.

Please refer to the individual commit messages for additional details.

Fixes #9252.
Fixes #9418.

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/84df6faf277b5dc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/84df6faf277b5dc/output.txt

Total script time: 23.44 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt

Total script time: 38.19 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a unit test for the new parser logic to test/unit/parser_spec.js, preferably with a reference to the motivation behind the parser logic?

if (ch < 0x30 || ch > 0x39) { // '0' - '9'
if (divideBy === 10 && Number.isNaN(sign) && isSpace(ch)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sign is always -1, +1 or NaN. So use isNaN instead of Number.isNaN.

(or maybe even sign = 0 instead of sign = NaN and/or !sign ?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So use isNaN instead of Number.isNaN.

Note that there's actually (a brief) discussion in PR #8643 regarding replacing isNaN with Number.isNaN throughout the code-base.

(or maybe even sign = 0 instead of sign = NaN and/or !sign ?)

However, I've updated the patch to initialize sign with zero instead so the isNaN comment should now be moot :-)


How about adding a unit test for the new parser logic to test/unit/parser_spec.js, preferably with a reference to the motivation behind the parser logic?

OK, a new unit-test has been added (and an existing one extended as well). Furthermore a comment was added in the code/commit message.

…getNumber` (PR 8359 follow-up)

With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.

Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
This is consistent with the behaviour in Adobe Reader.
…rators in `XRef.indexObjects` (PR 9288 follow-up)
… to recover when possible

Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
…indexObjects` (issue 9418)

This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with passing tests

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/974fb3b6b78f67d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/881b91a41b783d1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/974fb3b6b78f67d/output.txt

Total script time: 23.49 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/881b91a41b783d1/output.txt

Total script time: 37.83 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/881b91a41b783d1/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Jun 20, 2018

r=me with passing tests

Thanks for the review!

The Linux-only test "failures" look like the usual fallout from a browser upgrade on the bots (Firefox 61 -> 62), so I don't think it's a problem here.

@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/a176087c3efc545/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/7781883ad52dc6e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/7781883ad52dc6e/output.txt

Total script time: 21.41 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/a176087c3efc545/output.txt

Total script time: 35.51 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit 98ea39f into mozilla:master Jun 21, 2018
@timvandermeij
Copy link
Contributor

Thank you for fixing this, and @Rob--W for the review!

@Snuffleupagus Snuffleupagus deleted the misc-corrupt-pdf-fixes branch June 21, 2018 20:38
@kdleijer
Copy link

many thanks for this fix/review/merge!

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
…ixes

Fix various corrupt PDF files (issue 9252, issue 9418)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bad XRef entry PDF not rendering at all: getOperatorList - ignoring errors during task: RenderPageRequest
5 participants