-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix various corrupt PDF files (issue 9252, issue 9418) #9827
Fix various corrupt PDF files (issue 9252, issue 9418) #9827
Conversation
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/84df6faf277b5dc/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/84df6faf277b5dc/output.txt Total script time: 23.44 mins
|
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/3bb7e8daedb2ced/output.txt Total script time: 38.19 mins
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a unit test for the new parser logic to test/unit/parser_spec.js
, preferably with a reference to the motivation behind the parser logic?
src/core/parser.js
Outdated
if (ch < 0x30 || ch > 0x39) { // '0' - '9' | ||
if (divideBy === 10 && Number.isNaN(sign) && isSpace(ch)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sign
is always -1
, +1
or NaN
. So use isNaN
instead of Number.isNaN
.
(or maybe even sign = 0
instead of sign = NaN
and/or !sign
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So use isNaN instead of Number.isNaN.
Note that there's actually (a brief) discussion in PR #8643 regarding replacing isNaN
with Number.isNaN
throughout the code-base.
(or maybe even sign = 0 instead of sign = NaN and/or !sign ?)
However, I've updated the patch to initialize sign
with zero instead so the isNaN
comment should now be moot :-)
How about adding a unit test for the new parser logic to test/unit/parser_spec.js, preferably with a reference to the motivation behind the parser logic?
OK, a new unit-test has been added (and an existing one extended as well). Furthermore a comment was added in the code/commit message.
…getNumber` (PR 8359 follow-up) With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches. Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
This is consistent with the behaviour in Adobe Reader.
…rators in `XRef.indexObjects` (PR 9288 follow-up)
… to recover when possible Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
…indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
49db104
to
56e3648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with passing tests
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/974fb3b6b78f67d/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/881b91a41b783d1/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/974fb3b6b78f67d/output.txt Total script time: 23.49 mins
|
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/881b91a41b783d1/output.txt Total script time: 37.83 mins
Image differences available at: http://54.67.70.0:8877/881b91a41b783d1/reftest-analyzer.html#web=eq.log |
Thanks for the review! The Linux-only test "failures" look like the usual fallout from a browser upgrade on the bots (Firefox 61 -> 62), so I don't think it's a problem here. |
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/a176087c3efc545/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/7781883ad52dc6e/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/7781883ad52dc6e/output.txt Total script time: 21.41 mins
|
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/a176087c3efc545/output.txt Total script time: 35.51 mins
|
Thank you for fixing this, and @Rob--W for the review! |
many thanks for this fix/review/merge! |
…ixes Fix various corrupt PDF files (issue 9252, issue 9418)
While waiting for a reviewer to have time to look at PR #9729, I happened to glance at the
4-corrupted-pdf
category; hence this PR which fixes a few simple issues.Please refer to the individual commit messages for additional details.
Fixes #9252.
Fixes #9418.