[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` #11601

Snuffleupagus · 2020-02-15T16:39:09Z

Please refer to the individual commit messages.

Fixes #6984 (comment)
Fixes #7041

pdfjsbot · 2020-03-08T12:44:29Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/c34edfcd473c2f6/output.txt

pdfjsbot · 2020-03-08T12:44:29Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4fcfc8db635eafa/output.txt

pdfjsbot · 2020-03-08T13:04:10Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/c34edfcd473c2f6/output.txt

Total script time: 19.67 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/c34edfcd473c2f6/reftest-analyzer.html#web=eq.log

pdfjsbot · 2020-03-08T13:09:34Z

From: Bot.io (Windows)

Failed

Full output at http://54.215.176.217:8877/4fcfc8db635eafa/output.txt

Total script time: 25.07 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/4fcfc8db635eafa/reftest-analyzer.html#web=eq.log

timvandermeij · 2020-05-20T11:11:25Z

The bug is fixed upstream, but I'm not sure if it made it to today's Nightly yet. Let's see...

/botio test

pdfjsbot · 2020-05-20T11:11:26Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b98a3f10a69802a/output.txt

pdfjsbot · 2020-05-20T11:11:27Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c443edc5f08f38c/output.txt

pdfjsbot · 2020-05-20T11:37:32Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/b98a3f10a69802a/output.txt

Total script time: 26.08 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/b98a3f10a69802a/reftest-analyzer.html#web=eq.log

pdfjsbot · 2020-05-20T11:41:01Z

From: Bot.io (Windows)

Failed

Full output at http://54.215.176.217:8877/c443edc5f08f38c/output.txt

Total script time: 29.55 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/c443edc5f08f38c/reftest-analyzer.html#web=eq.log

…n `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case. - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](https://github.com/mozilla/pdf.js/blob/8fb82e939cf0c8618a4e775ff17fc96f726872b5/src/display/canvas.js#L2356-L2395), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.

With the changes in the previous patch, this is now dead code which should thus be removed.

…ocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.

…task With the changes made in the previous patch, the `web/app_options.js` file no longer depends on anything *except* files residing in the `web/` folder. Hence the `gulp default_preferences` task can now be further simplified and thus becomes even faster than before; see also PR 11724.

timvandermeij · 2020-05-23T10:26:02Z

/botio test

pdfjsbot · 2020-05-23T10:26:03Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/1e697729f942157/output.txt

pdfjsbot · 2020-05-23T10:26:03Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/28ac54b94eeb884/output.txt

pdfjsbot · 2020-05-23T10:52:00Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/28ac54b94eeb884/output.txt

Total script time: 25.93 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/28ac54b94eeb884/reftest-analyzer.html#web=eq.log

pdfjsbot · 2020-05-23T10:54:58Z

From: Bot.io (Windows)

Failed

Full output at http://54.215.176.217:8877/1e697729f942157/output.txt

Total script time: 28.89 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/1e697729f942157/reftest-analyzer.html#web=eq.log

timvandermeij · 2020-05-23T13:34:16Z

Thank you! This greatly simplifies the code and should improve performance.

/botio makeref

pdfjsbot · 2020-05-23T13:34:17Z

From: Bot.io (Linux m4)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/2d106f21a4793c1/output.txt

pdfjsbot · 2020-05-23T13:34:17Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/5fe5e8787af7161/output.txt

pdfjsbot · 2020-05-23T13:58:27Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/2d106f21a4793c1/output.txt

Total script time: 24.14 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2020-05-23T14:03:44Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/5fe5e8787af7161/output.txt

Total script time: 26.69 mins

Lint: Passed
Make references: Passed
Check references: Passed

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch from e5413ed to b3952a7 Compare February 15, 2020 16:43

timvandermeij added the core label Feb 16, 2020

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch 7 times, most recently from 7fbc0e5 to 636991c Compare February 22, 2020 21:35

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch 7 times, most recently from 79cff41 to efca660 Compare March 3, 2020 22:59

timvandermeij mentioned this pull request Mar 4, 2020

Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880) #11523

Merged

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch 4 times, most recently from 837ad7d to 2e2dca3 Compare March 8, 2020 12:04

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch 3 times, most recently from a7de12c to b11bf8a Compare March 12, 2020 22:52

Snuffleupagus closed this Mar 16, 2020

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch from b11bf8a to f9ce22c Compare March 16, 2020 21:04

Snuffleupagus mentioned this pull request May 20, 2020

Attempt to cache repeated images at the document, rather than the page, level (issue 11878) #11912

Merged

Snuffleupagus added 4 commits May 22, 2020 00:22

Remove the, now unused, releaseImageResources helper function

cc4cc8b

With the changes in the previous patch, this is now dead code which should thus be removed.

Snuffleupagus force-pushed the rm-nativeImageDecoderSupport branch from 5a0dd45 to ebef67b Compare May 22, 2020 08:48

timvandermeij approved these changes May 23, 2020

View reviewed changes

timvandermeij merged commit 3b615e4 into mozilla:master May 23, 2020

Snuffleupagus deleted the rm-nativeImageDecoderSupport branch May 23, 2020 14:19

Snuffleupagus mentioned this pull request Aug 10, 2020

Remove the disableCreateObjectURL option from web/app_options.js #12191

Merged

This was referenced Aug 24, 2020

Warning: Unable to decode image: UnknownErrorException: "document" is not defined. #9533

Closed

Reference Error: document is not defined when extracting images (ref: #9603) #9667

Closed

Snuffleupagus mentioned this pull request Apr 17, 2021

Blurry graph in pdf #9648

Closed

Snuffleupagus mentioned this pull request Sep 24, 2021

Mark the paintJpegXObject operator as deprecated (PR 11601 follow-up) #14069

Merged

Snuffleupagus mentioned this pull request Oct 12, 2021

Use even more optional chaining in the src/display/api.js file #14116

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` #11601

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` #11601

Snuffleupagus commented Feb 15, 2020 •

edited

Loading

pdfjsbot commented Mar 8, 2020

pdfjsbot commented Mar 8, 2020

pdfjsbot commented Mar 8, 2020

pdfjsbot commented Mar 8, 2020

timvandermeij commented May 20, 2020

pdfjsbot commented May 20, 2020

pdfjsbot commented May 20, 2020

pdfjsbot commented May 20, 2020

pdfjsbot commented May 20, 2020

timvandermeij commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

timvandermeij commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

pdfjsbot commented May 23, 2020

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js #11601

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js #11601

Conversation

Snuffleupagus commented Feb 15, 2020 • edited Loading

pdfjsbot commented Mar 8, 2020

From: Bot.io (Linux m4)

Received

pdfjsbot commented Mar 8, 2020

From: Bot.io (Windows)

Received

pdfjsbot commented Mar 8, 2020

From: Bot.io (Linux m4)

Failed

pdfjsbot commented Mar 8, 2020

From: Bot.io (Windows)

Failed

timvandermeij commented May 20, 2020

pdfjsbot commented May 20, 2020

From: Bot.io (Linux m4)

Received

pdfjsbot commented May 20, 2020

From: Bot.io (Windows)

Received

pdfjsbot commented May 20, 2020

From: Bot.io (Linux m4)

Failed

pdfjsbot commented May 20, 2020

From: Bot.io (Windows)

Failed

timvandermeij commented May 23, 2020

pdfjsbot commented May 23, 2020

From: Bot.io (Windows)

Received

pdfjsbot commented May 23, 2020

From: Bot.io (Linux m4)

Received

pdfjsbot commented May 23, 2020

From: Bot.io (Linux m4)

Failed

pdfjsbot commented May 23, 2020

From: Bot.io (Windows)

Failed

timvandermeij commented May 23, 2020

pdfjsbot commented May 23, 2020

From: Bot.io (Linux m4)

Received

pdfjsbot commented May 23, 2020

From: Bot.io (Windows)

Received

pdfjsbot commented May 23, 2020

From: Bot.io (Linux m4)

Success

pdfjsbot commented May 23, 2020

From: Bot.io (Windows)

Success

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` #11601

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` #11601

Snuffleupagus commented Feb 15, 2020 •

edited

Loading