Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) #9897

Merged
merged 2 commits into from
Aug 1, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

  • Add more validation of the /Filter entry, in image dictionaries, to the PDFImage constructor

    Given that the code is currently assuming that the /Filter entry is a Name, it probably cannot hurt to actually ensure that's the case.

  • Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)

    According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45

    When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):

    • Width and Height shall match the corresponding width and height values in the JPEG2000 data.

    • . . .

    Hence it seems reasonable to use the Width/Height of the image data itself, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the PDFImage constructor.

Fixes #9650.

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/e287c0a69f87f6c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/e287c0a69f87f6c/output.txt

Total script time: 26.09 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/965c8105206df47/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/965c8105206df47/output.txt

Total script time: 60.00 mins

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/b842f5bd94768eb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/b842f5bd94768eb/output.txt

Total script time: 26.14 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio-windows preview

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.215.176.217:8877/592f5ceee5dc810/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/592f5ceee5dc810/output.txt

Total script time: 4.66 mins

Published

@Snuffleupagus Snuffleupagus force-pushed the issue-9650 branch 2 times, most recently from 379f1c8 to 0def938 Compare July 27, 2018 10:46
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/4203dae186386a8/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/588ccbee67a5ee1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/4203dae186386a8/output.txt

Total script time: 19.71 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/588ccbee67a5ee1/output.txt

Total script time: 26.48 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

@Rob--W Is this something that you'd be, time permitting of course, willing/able to review?

Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality-wise, looks good. Can you remove the refactor-commit, for the reason below?

PS. Reviewing the separate commits was a bit odd since it you had rebased the commits and ended up with the same CommitDate for every commit, and Github shows the commits in the wrong order because of that: https://help.github.com/articles/why-are-my-commits-in-the-wrong-order/
In the future, if you are able to, try to not to have distinct times, e.g. by adding exec sleep 2 in between the commits (if you use an interactive rebase).

image.stream.reset();
image.bitsPerComponent = jpxImage.bitsPerComponent;
image.numComps = jpxImage.componentsCount;
const jpxProperties = JpxImage.parseImageProperties(image.stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that JpxImage is always used as the temporary object to store output (your jpxProperties here).

let jpxImage = new JpxImage();
jpxImage.parse(this.bytes);

pdf.js/src/core/image.js

Lines 89 to 90 in 620da6f

var jpxImage = new JpxImage();
jpxImage.parseImageProperties(image.stream);

Since parseImageProperties does not depend on pre-existing JpxImage instance methods or members, it seems to make sense to turn it into a static method. (in the past this was different: 894d9fe).

However, I'd still like to see this commit removed, for consistency with our other image decoding implementations (jpg.js, jbig2.js) also use the pattern of a instantiating a temporary class for parsing images.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point regarding the consistency, the re-factoring commit has been removed.

…he `PDFImage` constructor

Given that the code is currently assuming that the /Filter entry is a `Name`, it cannot hurt to actually ensure that's the case.

Also fixes an error message, for JPEG 2000 images with unsupported ColorSpaces, since `this.numComps` hasn't been initialized when it's accessed during the `throw new Error()` invocation.
…ionary, for JPEG 2000 images (issue 9650)

According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
>  - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
>  - . . .

Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Aug 1, 2018

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/361a9727c5a7c04/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Aug 1, 2018

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/01c10a9d7265f91/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Aug 1, 2018

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/361a9727c5a7c04/output.txt

Total script time: 19.58 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Aug 1, 2018

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/01c10a9d7265f91/output.txt

Total script time: 27.58 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Rob--W
Copy link
Member

Rob--W commented Aug 1, 2018

Thanks for this PR!

@Rob--W Rob--W merged commit 20fddef into mozilla:master Aug 1, 2018
@timvandermeij
Copy link
Contributor

I'm making reference images in another PR already, so there is no need to do that here anymore.

@Snuffleupagus Snuffleupagus deleted the issue-9650 branch August 2, 2018 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants