-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCITTFaxStream problem when EndOfBlock is false #8901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you created a nice reduced test-case, can you please attach it to the PR as well? :-)
src/core/stream.js
Outdated
@@ -1858,6 +1859,10 @@ var CCITTFaxStream = (function CCITTFaxStreamClosure() { | |||
var refPos, blackPixels, bits, i; | |||
|
|||
if (this.outputBits === 0) { | |||
if (this.rowsDone) { | |||
this.eof = true; | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Given the condition on the very next line, do we really need return null;
here as well?
1b4d4fd
to
cd02f09
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/9397b9ba22dd6cc/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/9397b9ba22dd6cc/output.txt Total script time: 2.27 mins Published |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/e033eed1c4c098e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/d60ae3640f93f0c/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/e033eed1c4c098e/output.txt Total script time: 16.69 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/d60ae3640f93f0c/output.txt Total script time: 29.34 mins
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very clear description of the problem/solution, which made it much easier to understand the changes!
Since you created the test file yourself, could you please add it directly to the PR instead (so that we can avoid an unnecessary linked test-case)?
With that change, and the commits squashed, r=me.
Thank you for the patch!
cd02f09
to
3d99b8d
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/39f44da16339032/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/39f44da16339032/output.txt Total script time: 2.40 mins Published |
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/06af67aeafa3d19/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/f5ce200541e8b4b/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/06af67aeafa3d19/output.txt Total script time: 16.70 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/f5ce200541e8b4b/output.txt Total script time: 30.47 mins
|
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/0e746d3111cf377/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/bd239da57bc6cc8/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/0e746d3111cf377/output.txt Total script time: 15.78 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/bd239da57bc6cc8/output.txt Total script time: 28.02 mins
|
CCITTFaxStream problem when EndOfBlock is false
I have been working on the JBIG2 decoder to add support for the Huffman coding. When a JBIG2 region uses the Huffman option, its bitmaps (1-bit images) are compressed using the MMR coding. MMR is exactly the same algorithm as the PDF filter
CCITTFaxDecode
with/K -1
. Thus I would like to use the existing decoderCCITTFaxStream
that PDF.js already has.MMR bitmaps in JBIG2 typically have no end-of-data codes (EOFB), so I must set the parameter
EndOfBlock
to false when usingCCITTFaxStream
. This seems to cause a problem: the last pixel row of decoded images ends prematurely.In the JBIG2 decoder, bitmaps have no alpha channel, so the incomplete row appears as a nasty black line. Here is a screenshot of a Huffman coded JBIG2 image whose text symbols were decoded using
CCITTFaxStream
:I created a PDF test file and I found out that the same problem affects also
CCITTFaxDecode
images in PDFs when they have/EndOfBlock false
.ccitt_EndOfBlock_false.pdf
In each pair of images the top one has
/EndOfBlock false
, so the last row is incomplete.The parameter
/EndOfBlock false
is very rare in real-world PDFs (default value is true). The files that I have found have no problems in PDF.js. It is impossible to see any missing pixels because the images have a high resolution and their last row is white against a white page background.In
CCITTFaxStream
the propertyeoblock
corresponds to the PDF parameterEndOfBlock
. When its value is false, the decoding must end when all rows have been decoded (and EOFB is not expected), as specified in the PDF standard. The functionlookChar()
inCCITTFaxStream
does exactly that and setsthis.eof
to true when the last row has been decoded. However, the function returns only a few bits from the start of the last pixel row.lookChar()
is called only once in the last row because settingthis.eof = true
stops the whole decoding process inreadBlock()
. The rest of the decoded pixels of the last row remain in the arraycodingLine
but they are never returned becauselookChar()
is not called anymore.In this PR, I add a new property
rowsDone
. WhenEndOfBlock
is false and the CCITT data of the last row has been decoded,rowsDone
is set to true andeof
remains false. The valueeof
is not set to true until all the buffered contents ofcodingLine
have been consumed too. That happens whenthis.outputBits
becomes zero.