Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Snuffleupagus · 2018-07-15T21:28:30Z

Add more validation of the /Filter entry, in image dictionaries, to the PDFImage constructor

Given that the code is currently assuming that the /Filter entry is a Name, it probably cannot hurt to actually ensure that's the case.
Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)

According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
- Width and Height shall match the corresponding width and height values in the JPEG2000 data.
- . . .
Hence it seems reasonable to use the Width/Height of the image data itself, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the PDFImage constructor.

Fixes #9650.

pdfjsbot · 2018-07-15T21:42:43Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/e287c0a69f87f6c/output.txt

pdfjsbot · 2018-07-15T22:08:49Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/e287c0a69f87f6c/output.txt

Total script time: 26.09 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Snuffleupagus · 2018-07-16T06:50:43Z

/botio-linux test

pdfjsbot · 2018-07-16T06:50:44Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/965c8105206df47/output.txt

pdfjsbot · 2018-07-16T07:50:59Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/965c8105206df47/output.txt

Total script time: 60.00 mins

pdfjsbot · 2018-07-17T08:11:02Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/b842f5bd94768eb/output.txt

pdfjsbot · 2018-07-17T08:37:11Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/b842f5bd94768eb/output.txt

Total script time: 26.14 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Snuffleupagus · 2018-07-21T09:01:14Z

/botio-windows preview

pdfjsbot · 2018-07-21T09:01:15Z

From: Bot.io (Windows)

Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.215.176.217:8877/592f5ceee5dc810/output.txt

pdfjsbot · 2018-07-21T09:09:51Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/592f5ceee5dc810/output.txt

Total script time: 4.66 mins

Published

Viewer: http://54.215.176.217:8877/592f5ceee5dc810/web/viewer.html

Snuffleupagus · 2018-07-28T08:27:52Z

/botio test

pdfjsbot · 2018-07-28T08:27:52Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/4203dae186386a8/output.txt

pdfjsbot · 2018-07-28T08:27:52Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/588ccbee67a5ee1/output.txt

pdfjsbot · 2018-07-28T08:47:35Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/4203dae186386a8/output.txt

Total script time: 19.71 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-07-28T08:54:22Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/588ccbee67a5ee1/output.txt

Total script time: 26.48 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Snuffleupagus · 2018-07-30T09:15:45Z

@Rob--W Is this something that you'd be, time permitting of course, willing/able to review?

Rob--W

Functionality-wise, looks good. Can you remove the refactor-commit, for the reason below?

PS. Reviewing the separate commits was a bit odd since it you had rebased the commits and ended up with the same CommitDate for every commit, and Github shows the commits in the wrong order because of that: https://help.github.com/articles/why-are-my-commits-in-the-wrong-order/
In the future, if you are able to, try to not to have distinct times, e.g. by adding exec sleep 2 in between the commits (if you use an interactive rebase).

Rob--W · 2018-08-01T14:23:01Z

src/core/image.js

-        image.stream.reset();
-        image.bitsPerComponent = jpxImage.bitsPerComponent;
-        image.numComps = jpxImage.componentsCount;
+        const jpxProperties = JpxImage.parseImageProperties(image.stream);


It seems that JpxImage is always used as the temporary object to store output (your jpxProperties here).

pdf.js/src/core/jpx_stream.js

Lines 53 to 54 in ae07adf

let jpxImage = new JpxImage();

jpxImage.parse(this.bytes);

pdf.js/src/core/image.js

Lines 89 to 90 in 620da6f

var jpxImage = new JpxImage();

jpxImage.parseImageProperties(image.stream);

Since parseImageProperties does not depend on pre-existing JpxImage instance methods or members, it seems to make sense to turn it into a static method. (in the past this was different: 894d9fe).

However, I'd still like to see this commit removed, for consistency with our other image decoding implementations (jpg.js, jbig2.js) also use the pattern of a instantiating a temporary class for parsing images.

Fair point regarding the consistency, the re-factoring commit has been removed.

…he `PDFImage` constructor Given that the code is currently assuming that the /Filter entry is a `Name`, it cannot hurt to actually ensure that's the case. Also fixes an error message, for JPEG 2000 images with unsupported ColorSpaces, since `this.numComps` hasn't been initialized when it's accessed during the `throw new Error()` invocation.

…ionary, for JPEG 2000 images (issue 9650) According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45 > When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries): > > - Width and Height shall match the corresponding width and height values in the JPEG2000 data. > > - . . . Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.

Snuffleupagus · 2018-08-01T16:49:51Z

/botio test

pdfjsbot · 2018-08-01T16:49:52Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/361a9727c5a7c04/output.txt

pdfjsbot · 2018-08-01T16:49:52Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/01c10a9d7265f91/output.txt

pdfjsbot · 2018-08-01T17:09:28Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/361a9727c5a7c04/output.txt

Total script time: 19.58 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-08-01T17:17:28Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/01c10a9d7265f91/output.txt

Total script time: 27.58 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

Rob--W · 2018-08-01T22:03:19Z

Thanks for this PR!

timvandermeij · 2018-08-01T22:12:19Z

I'm making reference images in another PR already, so there is no need to do that here anymore.

Snuffleupagus force-pushed the issue-9650 branch 3 times, most recently from e06d63e to d4ca103 Compare July 16, 2018 10:36

timvandermeij added core image-jpeg labels Jul 16, 2018

Snuffleupagus force-pushed the issue-9650 branch 2 times, most recently from 379f1c8 to 0def938 Compare July 27, 2018 10:46

Rob--W reviewed Aug 1, 2018

View reviewed changes

Snuffleupagus added 2 commits August 1, 2018 16:41

Snuffleupagus force-pushed the issue-9650 branch from 0def938 to 3ce4201 Compare August 1, 2018 16:44

Rob--W approved these changes Aug 1, 2018

View reviewed changes

Rob--W merged commit 20fddef into mozilla:master Aug 1, 2018

Snuffleupagus deleted the issue-9650 branch August 2, 2018 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Snuffleupagus commented Jul 15, 2018

pdfjsbot commented Jul 15, 2018

pdfjsbot commented Jul 15, 2018

Snuffleupagus commented Jul 16, 2018

pdfjsbot commented Jul 16, 2018

pdfjsbot commented Jul 16, 2018

pdfjsbot commented Jul 17, 2018

pdfjsbot commented Jul 17, 2018

Snuffleupagus commented Jul 21, 2018

pdfjsbot commented Jul 21, 2018

pdfjsbot commented Jul 21, 2018

Snuffleupagus commented Jul 28, 2018

pdfjsbot commented Jul 28, 2018

pdfjsbot commented Jul 28, 2018

pdfjsbot commented Jul 28, 2018

pdfjsbot commented Jul 28, 2018

Snuffleupagus commented Jul 30, 2018

Rob--W left a comment

Rob--W Aug 1, 2018

Snuffleupagus Aug 1, 2018

Snuffleupagus commented Aug 1, 2018

pdfjsbot commented Aug 1, 2018

pdfjsbot commented Aug 1, 2018

pdfjsbot commented Aug 1, 2018

pdfjsbot commented Aug 1, 2018

Rob--W commented Aug 1, 2018

timvandermeij commented Aug 1, 2018

	var jpxImage = new JpxImage();
	jpxImage.parseImageProperties(image.stream);

Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Conversation

Snuffleupagus commented Jul 15, 2018

pdfjsbot commented Jul 15, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 15, 2018

From: Bot.io (Windows)

Success

Snuffleupagus commented Jul 16, 2018

pdfjsbot commented Jul 16, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jul 16, 2018

From: Bot.io (Linux m4)

Failed

pdfjsbot commented Jul 17, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 17, 2018

From: Bot.io (Windows)

Success

Snuffleupagus commented Jul 21, 2018

pdfjsbot commented Jul 21, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 21, 2018

From: Bot.io (Windows)

Success

Published

Snuffleupagus commented Jul 28, 2018

pdfjsbot commented Jul 28, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jul 28, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 28, 2018

From: Bot.io (Linux m4)

Success

pdfjsbot commented Jul 28, 2018

From: Bot.io (Windows)

Success

Snuffleupagus commented Jul 30, 2018

Rob--W left a comment

Choose a reason for hiding this comment

Rob--W Aug 1, 2018

Choose a reason for hiding this comment

Snuffleupagus Aug 1, 2018

Choose a reason for hiding this comment

Snuffleupagus commented Aug 1, 2018

pdfjsbot commented Aug 1, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Aug 1, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Aug 1, 2018

From: Bot.io (Linux m4)

Success

pdfjsbot commented Aug 1, 2018

From: Bot.io (Windows)

Success

Rob--W commented Aug 1, 2018

timvandermeij commented Aug 1, 2018