Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to find truncated endstream commands, in the fallback code-path, in Parser.makeStream (issue 10004) #10010

Merged
merged 2 commits into from
Sep 1, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.

Please note: The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a mixture of correct and truncated endstream commands.
However, considering that this particular mode of corruption fortunately doesn't seem very common[2], a slightly less complex solution ought to suffice for now.

Fixes #10004.


[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the Length entry of the (stream) dictionary is missing/incorrect.

[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.

…am` into a helper method

With this code now living in a separate method, it can be simplified slightly (e.g. by using early returns).
…th, in `Parser.makeStream` (issue 10004)

Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.

*Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands.
However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now.

Fixes 10004.

---
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect.

[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
const ENDSTREAM_SIGNATURE = new Uint8Array([
0x65, 0x6E, 0x64, 0x73, 0x74, 0x72, 0x65, 0x61, 0x6D]);
let actualLength = this._findStreamLength(startPos,
ENDSTREAM_SIGNATURE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one space should be removed to make this align.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, it looks correctly aligned to my eye; both here in the diff and in the file

pdf.js/src/core/parser.js

Lines 532 to 533 in 95e5bad

let actualLength = this._findStreamLength(startPos,
ENDSTREAM_SIGNATURE);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really weird. Here above the ENDSTREAM_SIGNATURE is one space ahead of startPos, but that is not the case only if I open the raw file. Somehow this is just GitHub messing up the diff rendering; sorry for the noise!

const TRUNCATED_SIGNATURE = ENDSTREAM_SIGNATURE.slice(0, end);

let maybeLength = this._findStreamLength(startPos,
TRUNCATED_SIGNATURE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like one space should be removed to make this align.

@timvandermeij
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/5aeff69ead3084c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/5aeff69ead3084c/output.txt

Total script time: 24.74 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b7935e82fdb96d5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/b7935e82fdb96d5/output.txt

Total script time: 19.97 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/b7935e82fdb96d5/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7c5cbae4846a7ec/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7c5cbae4846a7ec/output.txt

Total script time: 2.83 mins

Published

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7ac1bd5c90e0591/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/3db68d0c15af31c/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7ac1bd5c90e0591/output.txt

Total script time: 19.40 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2018

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/3db68d0c15af31c/output.txt

Total script time: 21.95 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit 66bd088 into mozilla:master Sep 1, 2018
@timvandermeij
Copy link
Contributor

Thank you for fixing this!

@Snuffleupagus Snuffleupagus deleted the issue-10004 branch September 1, 2018 16:53
@ajaythakor ajaythakor mentioned this pull request Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants