-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to find truncated endstream commands, in the fallback code-path, in Parser.makeStream
(issue 10004)
#10010
Conversation
80355a7
to
ffdf89c
Compare
…am` into a helper method With this code now living in a separate method, it can be simplified slightly (e.g. by using early returns).
…th, in `Parser.makeStream` (issue 10004) Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated. *Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands. However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now. Fixes 10004. --- [1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect. [2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
ffdf89c
to
95e5bad
Compare
const ENDSTREAM_SIGNATURE = new Uint8Array([ | ||
0x65, 0x6E, 0x64, 0x73, 0x74, 0x72, 0x65, 0x61, 0x6D]); | ||
let actualLength = this._findStreamLength(startPos, | ||
ENDSTREAM_SIGNATURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like one space should be removed to make this align.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, it looks correctly aligned to my eye; both here in the diff and in the file
Lines 532 to 533 in 95e5bad
let actualLength = this._findStreamLength(startPos, | |
ENDSTREAM_SIGNATURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really weird. Here above the ENDSTREAM_SIGNATURE
is one space ahead of startPos
, but that is not the case only if I open the raw file. Somehow this is just GitHub messing up the diff rendering; sorry for the noise!
const TRUNCATED_SIGNATURE = ENDSTREAM_SIGNATURE.slice(0, end); | ||
|
||
let maybeLength = this._findStreamLength(startPos, | ||
TRUNCATED_SIGNATURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like one space should be removed to make this align.
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/5aeff69ead3084c/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/5aeff69ead3084c/output.txt Total script time: 24.74 mins
|
/botio-linux test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b7935e82fdb96d5/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/b7935e82fdb96d5/output.txt Total script time: 19.97 mins
Image differences available at: http://54.67.70.0:8877/b7935e82fdb96d5/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/7c5cbae4846a7ec/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/7c5cbae4846a7ec/output.txt Total script time: 2.83 mins Published |
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/7ac1bd5c90e0591/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/3db68d0c15af31c/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/7ac1bd5c90e0591/output.txt Total script time: 19.40 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/3db68d0c15af31c/output.txt Total script time: 21.95 mins
|
Thank you for fixing this! |
Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.
Please note: The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a mixture of correct and truncated endstream commands.
However, considering that this particular mode of corruption fortunately doesn't seem very common[2], a slightly less complex solution ought to suffice for now.
Fixes #10004.
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the
Length
entry of the (stream) dictionary is missing/incorrect.[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.