Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCITTFaxStream problem when EndOfBlock is false #8901

Merged
merged 1 commit into from
Sep 19, 2017

Conversation

janpe2
Copy link
Contributor

@janpe2 janpe2 commented Sep 12, 2017

I have been working on the JBIG2 decoder to add support for the Huffman coding. When a JBIG2 region uses the Huffman option, its bitmaps (1-bit images) are compressed using the MMR coding. MMR is exactly the same algorithm as the PDF filter CCITTFaxDecode with /K -1. Thus I would like to use the existing decoder CCITTFaxStream that PDF.js already has.

MMR bitmaps in JBIG2 typically have no end-of-data codes (EOFB), so I must set the parameter EndOfBlock to false when using CCITTFaxStream. This seems to cause a problem: the last pixel row of decoded images ends prematurely.

In the JBIG2 decoder, bitmaps have no alpha channel, so the incomplete row appears as a nasty black line. Here is a screenshot of a Huffman coded JBIG2 image whose text symbols were decoded using CCITTFaxStream:
jbig2_mmr_black_stripes

I created a PDF test file and I found out that the same problem affects also CCITTFaxDecode images in PDFs when they have /EndOfBlock false.
ccitt_EndOfBlock_false.pdf
In each pair of images the top one has /EndOfBlock false, so the last row is incomplete.

The parameter /EndOfBlock false is very rare in real-world PDFs (default value is true). The files that I have found have no problems in PDF.js. It is impossible to see any missing pixels because the images have a high resolution and their last row is white against a white page background.

In CCITTFaxStream the property eoblock corresponds to the PDF parameter EndOfBlock. When its value is false, the decoding must end when all rows have been decoded (and EOFB is not expected), as specified in the PDF standard. The function lookChar() in CCITTFaxStream does exactly that and sets this.eof to true when the last row has been decoded. However, the function returns only a few bits from the start of the last pixel row. lookChar() is called only once in the last row because setting this.eof = true stops the whole decoding process in readBlock(). The rest of the decoded pixels of the last row remain in the array codingLine but they are never returned because lookChar() is not called anymore.

In this PR, I add a new property rowsDone. When EndOfBlock is false and the CCITT data of the last row has been decoded, rowsDone is set to true and eof remains false. The value eof is not set to true until all the buffered contents of codingLine have been consumed too. That happens when this.outputBits becomes zero.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you created a nice reduced test-case, can you please attach it to the PR as well? :-)

@@ -1858,6 +1859,10 @@ var CCITTFaxStream = (function CCITTFaxStreamClosure() {
var refPos, blackPixels, bits, i;

if (this.outputBits === 0) {
if (this.rowsDone) {
this.eof = true;
return null;
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Given the condition on the very next line, do we really need return null; here as well?

@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/9397b9ba22dd6cc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/9397b9ba22dd6cc/output.txt

Total script time: 2.27 mins

Published

@mozilla mozilla deleted a comment from pdfjsbot Sep 12, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 12, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 12, 2017
@mozilla mozilla deleted a comment from pdfjsbot Sep 12, 2017
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e033eed1c4c098e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/d60ae3640f93f0c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/e033eed1c4c098e/output.txt

Total script time: 16.69 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/d60ae3640f93f0c/output.txt

Total script time: 29.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very clear description of the problem/solution, which made it much easier to understand the changes!

Since you created the test file yourself, could you please add it directly to the PR instead (so that we can avoid an unnecessary linked test-case)?

With that change, and the commits squashed, r=me.
Thank you for the patch!

@Snuffleupagus Snuffleupagus removed the request for review from yurydelendik September 19, 2017 15:15
@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/39f44da16339032/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/39f44da16339032/output.txt

Total script time: 2.40 mins

Published

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/06af67aeafa3d19/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/f5ce200541e8b4b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/06af67aeafa3d19/output.txt

Total script time: 16.70 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/f5ce200541e8b4b/output.txt

Total script time: 30.47 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/0e746d3111cf377/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/bd239da57bc6cc8/output.txt

@Snuffleupagus Snuffleupagus merged commit fbd6e47 into mozilla:master Sep 19, 2017
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/0e746d3111cf377/output.txt

Total script time: 15.78 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/bd239da57bc6cc8/output.txt

Total script time: 28.02 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@janpe2 janpe2 deleted the ccitt-eofb-false branch December 19, 2017 19:36
movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
CCITTFaxStream problem when EndOfBlock is false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants