Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Snuffleupagus · 2019-10-11T18:55:37Z

TODO:

Manually check open/closed issues tagged with 4-corrupted-pdf such that there's no obvious regressions.
I've done a quick pass already, but additional help would be very much appreciated here.
Convert (relevant) pre-existing load tests to eq tests instead, to help avoid unnecessary regressions. Note that some files will render, but not as intended when compared with e.g. Adobe Reader, depending on which objects are chosen (see e.g. issues with incorrect total number of pages).
Add as many new eq tests as possible, based on the list above.

This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1]

Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. Edit: Obviously any breakage would be limited to corrupt PDF documents, since valid ones does not enter this method at all. Edit2: Also, this patch should mostly affect the "multiple documents in one PDF file" case, but not necessarily PDF documents with just a broken XRef table.
The only reason that I'm even submitting this patch, is because of the number of open issues that it would address.

Generally speaking though, the best course of action would probably be if XRef.indexObjects was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them.

Fixes #11230
Fixes #11139
Fixes #9552
Fixes #9129
Fixes #7303

Also partially improves #6243, however parts of it regresses as well. (A couple of pages now renders "better", but one page no longer renders and an image looks worse.)
Generally though it seems that none of the PDF viewers I've tried agree completely on how the document should look, and this might be a case where the document is just too damaged to "fix".

[1] Especially given that it's reverting part of PR #5910, however in the case of issue #5909 it seems that other (more recent) changes have actually made that PR redundant.

pdfjsbot · 2019-10-12T12:34:19Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e76013aabbe15e1/output.txt

pdfjsbot · 2019-10-12T12:36:03Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/e76013aabbe15e1/output.txt

Total script time: 1.71 mins

Published

Viewer: http://54.67.70.0:8877/e76013aabbe15e1/web/viewer.html

Snuffleupagus · 2019-10-12T16:12:22Z

Temporarily dumping some PDF files here, to have a (slightly) better place to point linked test-cases to.
issue5909_original.pdf
issue9552.pdf
issue9129.pdf

pdfjsbot · 2019-10-13T11:07:16Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e61ebbad97b67e1/output.txt

pdfjsbot · 2019-10-13T11:08:58Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/e61ebbad97b67e1/output.txt

Total script time: 1.69 mins

Published

Viewer: http://54.67.70.0:8877/e61ebbad97b67e1/web/viewer.html

Snuffleupagus · 2019-10-13T11:22:00Z

/botio test

pdfjsbot · 2019-10-13T11:22:01Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/66a1741ad676488/output.txt

pdfjsbot · 2019-10-13T11:22:02Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/5012ec3eb0199db/output.txt

pdfjsbot · 2019-10-13T11:39:57Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/5012ec3eb0199db/output.txt

Total script time: 17.91 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/5012ec3eb0199db/reftest-analyzer.html#web=eq.log

pdfjsbot · 2019-10-13T11:48:21Z

From: Bot.io (Windows)

Failed

Full output at http://54.215.176.217:8877/66a1741ad676488/output.txt

Total script time: 26.31 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/66a1741ad676488/reftest-analyzer.html#web=eq.log

…eration number matches (issues 11230, 11139, 9552, 9129, 7303) This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1] Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. The only reason that I'm even submitting this patch, is because of the number of open issues that it would address. Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them. --- [1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.

timvandermeij · 2019-10-14T21:24:20Z

/botio-linux preview

pdfjsbot · 2019-10-14T21:24:22Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/cb83521b5c0723f/output.txt

pdfjsbot · 2019-10-14T21:26:05Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/cb83521b5c0723f/output.txt

Total script time: 1.70 mins

Published

Viewer: http://54.67.70.0:8877/cb83521b5c0723f/web/viewer.html

timvandermeij · 2019-10-17T21:57:43Z

I think this is a very nice improvement. Even though you're absolutely right about other breakage that may still occur, given that those PDF files are usually quite broken, we have a lot more test coverage now than we had before, so future changes will be less risky. Thanks!

timvandermeij · 2019-10-17T21:57:50Z

/botio makeref

pdfjsbot · 2019-10-17T21:57:51Z

From: Bot.io (Linux m4)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/875b84455911221/output.txt

pdfjsbot · 2019-10-17T21:57:51Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 2

Live output at: http://54.215.176.217:8877/9707c85a90f21da/output.txt

pdfjsbot · 2019-10-17T22:14:37Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/875b84455911221/output.txt

Total script time: 16.74 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2019-10-17T22:24:42Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/9707c85a90f21da/output.txt

Total script time: 23.93 mins

Lint: Passed
Make references: Passed
Check references: Passed

timvandermeij added the corrupted-pdf label Oct 11, 2019

Snuffleupagus mentioned this pull request Oct 12, 2019

Convert a number of reference tests, for documents with corrupt XRef tables, from load to eq #11234

Merged

Snuffleupagus force-pushed the indexObjects-entries-gen branch 2 times, most recently from 041da85 to 998b6b3 Compare October 13, 2019 10:50

Snuffleupagus force-pushed the indexObjects-entries-gen branch from 998b6b3 to d939df7 Compare October 13, 2019 11:18

Snuffleupagus force-pushed the indexObjects-entries-gen branch 2 times, most recently from 794c7e9 to ff4593e Compare October 13, 2019 14:33

Snuffleupagus marked this pull request as ready for review October 13, 2019 14:40

Snuffleupagus force-pushed the indexObjects-entries-gen branch from ff4593e to d7c6a7f Compare October 13, 2019 14:53

Snuffleupagus force-pushed the indexObjects-entries-gen branch from d7c6a7f to ffc847e Compare October 14, 2019 20:10

timvandermeij approved these changes Oct 17, 2019

View reviewed changes

timvandermeij merged commit c54bb22 into mozilla:master Oct 17, 2019

Snuffleupagus deleted the indexObjects-entries-gen branch October 17, 2019 22:32

Snuffleupagus mentioned this pull request Nov 1, 2019

PDF renders incorrectly with error #6243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Snuffleupagus commented Oct 11, 2019 •

edited

Loading

pdfjsbot commented Oct 12, 2019

pdfjsbot commented Oct 12, 2019

Snuffleupagus commented Oct 12, 2019 •

edited

Loading

pdfjsbot commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

Snuffleupagus commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

timvandermeij commented Oct 14, 2019

pdfjsbot commented Oct 14, 2019

pdfjsbot commented Oct 14, 2019

timvandermeij commented Oct 17, 2019

timvandermeij commented Oct 17, 2019

pdfjsbot commented Oct 17, 2019

pdfjsbot commented Oct 17, 2019

pdfjsbot commented Oct 17, 2019

pdfjsbot commented Oct 17, 2019

Allow over-writing entries, in XRef.indexObjects, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Allow over-writing entries, in XRef.indexObjects, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Conversation

Snuffleupagus commented Oct 11, 2019 • edited Loading

pdfjsbot commented Oct 12, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 12, 2019

From: Bot.io (Linux m4)

Success

Published

Snuffleupagus commented Oct 12, 2019 • edited Loading

pdfjsbot commented Oct 13, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 13, 2019

From: Bot.io (Linux m4)

Success

Published

Snuffleupagus commented Oct 13, 2019

pdfjsbot commented Oct 13, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Oct 13, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 13, 2019

From: Bot.io (Linux m4)

Failed

pdfjsbot commented Oct 13, 2019

From: Bot.io (Windows)

Failed

timvandermeij commented Oct 14, 2019

pdfjsbot commented Oct 14, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 14, 2019

From: Bot.io (Linux m4)

Success

Published

timvandermeij commented Oct 17, 2019

timvandermeij commented Oct 17, 2019

pdfjsbot commented Oct 17, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Oct 17, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Oct 17, 2019

From: Bot.io (Linux m4)

Success

pdfjsbot commented Oct 17, 2019

From: Bot.io (Windows)

Success

Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) #11231

Snuffleupagus commented Oct 11, 2019 •

edited

Loading

Snuffleupagus commented Oct 12, 2019 •

edited

Loading