Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted images in PDF #17061

Closed
TyDraniu opened this issue Oct 3, 2023 · 5 comments
Closed

Corrupted images in PDF #17061

TyDraniu opened this issue Oct 3, 2023 · 5 comments

Comments

@TyDraniu
Copy link

TyDraniu commented Oct 3, 2023

Attach (recommended) or Link to PDF file here:

File has 26 MB, too big for upload.
https://btohq.sgp1.cdn.digitaloceanspaces.com/bto/nov-2019-bto/plantation-village.pdf

Images are somehow corrupted, can't reproduce it in Adobe Reader.

Configuration:

  • Web browser and its version: Firefox Nightly 120
  • Operating system and its version: Win 10
@calixteman
Copy link
Contributor

@Snuffleupagus, every time I see a jpx issue I do wonder if we should switch to openjpeg and wasm, wdyt about that ?

@Snuffleupagus
Copy link
Collaborator

@Snuffleupagus, every time I see a jpx issue I do wonder if we should switch to openjpeg and wasm, wdyt about that ?

I've been wondering the exact same thing for a while, since our JPEG 2000 decoder have two separate problems:

  • We lack support for a fair number of image options, which lead to many rendering bugs; see

    pdf.js/src/core/jpx.js

    Lines 336 to 352 in 0cc8c66

    const unsupported = [];
    if (cod.selectiveArithmeticCodingBypass) {
    unsupported.push("selectiveArithmeticCodingBypass");
    }
    if (cod.terminationOnEachCodingPass) {
    unsupported.push("terminationOnEachCodingPass");
    }
    if (cod.verticallyStripe) {
    unsupported.push("verticallyStripe");
    }
    if (cod.predictableTermination) {
    unsupported.push("predictableTermination");
    }
    if (unsupported.length > 0) {
    doNotRecover = true;
    warn(`JPX: Unsupported COD options (${unsupported.join(", ")}).`);
    }

  • Our decoder is fairly slow, and most JPEG 2000 images tend to be quite large (which likely exacerbate the problem). However, I obviously don't know how much (if at all) using OpenJPEG would help performance wise.

Assuming that our existing test-cases still work if we switch to OpenJPEG, and that it'll fix the currently open bugs, my only question is how it'd affect file-size of the built pdf.worker.js file?

@calixteman
Copy link
Contributor

About the size:

  • it should increase especially if it helps to support more stuff;
  • we should take care to only have the code for the decoder and nothing for encoding.
    About the perf:
  • it isn't that easy, wasm has its own memory space so we have to pay for copying the data to/from the wasm memory so I'd be inclined to think that it should be faster... (that said it's an opinion and it's worth benchmarking).

TBH, I don't care that much about perfs: we should try to have at least the same as now.
For your information, we plan to add more perf tests in talos (https://bugzilla.mozilla.org/show_bug.cgi?id=1855674).
The size could be a problem but for me the major win would be to not have to maintain that stuff anymore and I'm pretty sure that in the balance, maintainability is way more weighty than size.
We can tune the compilation to wasm to try to have the smallest size.
That said, I wonder if we could have some security concerns: few years ago, I talked about the quickjs and wasm, and some people from the fuzzing team weren't super excited.

@Snuffleupagus, do you want to try to make this replacement ? (when you've time of course, there is no rush here).

@forensmatt
Copy link

Can anyone verify if there would be significant rendering speed improvements with openjpeg? If so, it might be greatly beneficial, as, in my experience with PDF.js, JPEG2000 rendering is many times slower compared to JPEG rendering. I often convert JPEG2000 pdf's to the far bigger JPEG format just to make documents user friendly.

@TyDraniu
Copy link
Author

Thank you. It's fixed in Firefox 127 Nightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants