Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocException problem with webpack in production mode #13290

Closed
trey-miller opened this issue Apr 24, 2021 · 7 comments · Fixed by #13297
Closed

DocException problem with webpack in production mode #13290

trey-miller opened this issue Apr 24, 2021 · 7 comments · Fixed by #13297
Labels

Comments

@trey-miller
Copy link

trey-miller commented Apr 24, 2021

Attach (recommended) or Link to PDF file here:

This is not related to a specific pdf file.

Configuration:

Web browser and its version: Any (latest firefox and chrome tested)
Operating system and its version: macOS 11.2
PDF.js version: latest
Is a browser extension: no (using the examples/webpack in this repo, and pdfjs-dist in a separate project using default optimize options for webpack)

Issue Description:

pdf.js uses this.constructor.name in error objects, and tries to use that name in a switch case. Webpack's default minimizer, Terser, removes these names in default production mode, so the switch case never matches, causing the error to be lost in production builds.

I think one solution could be to refactor the error classes to use explicitly declared name strings instead of relying on their constructor names. I'd take a stab at implementing this if you would like me to.

Steps to reproduce the problem:

  1. In the examples/webpack/main.js, change the pdfPath to a nonexistent pdf, e.g.

    const pdfPath = "../learning/404.pdf";
  2. Follow the instructions in examples/webpack/README.md to run the webpack example, except use production webpack on the 4th step:

    ./node_modules/webpack/bin/webpack.js --mode=production

What is the expected behavior? (add screenshot)

image

What went wrong? (add screenshot)

image

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

N/A

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 24, 2021

When opening an issue, please make sure that you actually provide all of the information requested in https://github.com/mozilla/pdf.js/blob/master/.github/ISSUE_TEMPLATE.md in order to ensure that the issue is easily actionable/valid.


Webpack minimizer removes these names in production mode, so the switch case never matches, causing the error to be lost in production builds.

We don't know what "Webpack minimizer" is, and it's consequently not really something that we can provide assistance with here.
The only minimizer that we support is Terser, as mentioned in the Wiki, note how it's being used in https://github.com/mozilla/pdf.js/blob/master/gulpfile.js

This looks, for all intents and purposes, essentially like a duplicate of issues such as e.g. #11941, #12209, and #12797.

Edit: According to https://webpack.js.org/configuration/optimization/ Webpack uses Terser for minification, hence updating your Webpack configuration accordingly (see below) thus ought to work.

I think one solution could be to refactor the error classes to use explicitly declared name strings instead of relying on their constructor names. I'd take a stab at implementing this if you would like me to.

We do not want to re-write part of the code-base just to satisfy a particular minimizer configuration, hence I'd suggest that you look at how the issues referenced above were solved in PR #12210 (by configuring Terser to not mangle names).

@trey-miller
Copy link
Author

I apologize for not adding more information, and sorry to offend. I only meant the built-in webpack minimizer, which yes is Terser. I thought it would be clear enough from the second step of the repro steps I wrote, which simply add the --mode=production to the webpack build in the /examples/webpack/README.md. There is no custom minimizer happening here, only default webpack behavior in production mode, using the webpack example in this repo.

I thought it would be good to support default webpack production builds, particularly because disabling Terser's mangling can have a real impact on production build output size, and configuring for only pdfjs-dist files to be not mangled is actually kind of a pain depending on your splitting strategy.

Also further apologies for potential duplicate issues - I searched several terms like "DocException" and "constructor.name" but didn't find these issues for some reason.

Anyway, the change I suggested doesn't really require rewriting, only changing the part where the BaseException has this.name = this.constructor.name to instead something like each subclass having e.g. this.name = "MissingPDFException";

@trey-miller
Copy link
Author

I have updated the original description to include all the prompts in the template. I'm not sure they add much, but better to let you the maintainer decide that.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 24, 2021

I apologize for not adding more information, and sorry to offend.

No worries; no one is the slightest "offended" here :-)
(Since it's usually a lot easier to understand what issues are about, not to mention that you get much better answers, with the full ISSUE_TEMPLATE filled out this is thus something that we mention.)

I thought it would be clear enough from the second step of the repro steps I wrote, which simply add the --mode=production to the webpack build in the /examples/webpack/README.md.

Often times, you'll try to understand an issue without running a whole bunch of code first (to save time) and going mostly on the information provided (since a lot of the people here are contributing in their spare time).

I thought it would be good to support default webpack production builds, particularly because disabling Terser's mangling can have a real impact on production build output size,

It does seem quite unfortunate to let a particular tool dictate how we can/cannot write code, since similar patterns are likely to occur elsewhere and we probably don't want to "forbid" a coding-pattern like that.

Also further apologies for potential duplicate issues - I searched several terms like "DocException" and "constructor.name" but didn't find these issues for some reason.

I simply listed those issues to provide additional context. (Searching on GitHub isn't great, I only found them since I remembered seeing similar issues.)

Anyway, the change I suggested doesn't really require rewriting, only changing the part where the BaseException has this.name = this.constructor.name to instead something like each subclass having e.g. this.name = "MissingPDFException";

You'd have to make those changes across a whole bunch of classes/files, leading to unnecessary code-churn, and it'd do nothing to prevent similar issues elsewhere in the future. Furthermore, manually listing names like that also makes it easier for errors to accidentally sneak in if there's any re-factoring/re-naming happening later.
A much better solution, in my opinion, would be to update the Webpack example and/or its README to mention that minification requires a suitable configuration.

@trey-miller
Copy link
Author

trey-miller commented Apr 27, 2021

Hey, thanks for considering this and ultimately for the README update. I totally understand not wanting to kowtow to newfangled minifiers, webpack builtin defaults or not. I will see if I can get a PR in for adding the keep_classnames and keep_fnames flags in the webpack example.

You'd have to make those changes across a whole bunch of classes/files, leading to unnecessary code-churn, and it'd do nothing to prevent similar issues elsewhere in the future. Furthermore, manually listing names like that also makes it easier for errors to accidentally sneak in if there's any re-factoring/re-naming happening later.

I only see two places that constructor.name is used in the repo: BaseException and BaseViewer. But maybe elsewhere there are uses of Function.name - that's a little harder to search for when there could be just regular properties called name. But I mean it wouldn't be terrible to keep the message that you have to keep class/function names, but also harden against basic breakage when someone doesn't.

The area that broke for me (and others based on the issues you linked) is getting a meaningful error type out of the rejected getDocument error. I'm not totally sure why the WorkerTransport checks the error name and creates a new error instance of the same type, but that's right where it gets lost from minification. That code is also is doing what you are trying to avoid, which is using raw manually listed names to match these error names.

If each error had an explicitly defined the name attached to it, then it could be used and not removed by minifiers, and could also be referenced by other things instead of manually typed strings. e.g.:

// util.js
class MissingPDFException extends BaseException {}

// babel would convert `static name = "MissingPDFException"` to the below,
// except the meta properties below match default values on a Function.name
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/name
Object.defineProperty(MissingPDFException, "name", {
  value: "MissingPDFException",
  configurable: true,
  enumerable: false,
  writable: false,
});
// api.js
switch(ex.name) {
  case MissingPDFException.name: 
    reason = new MissingPDFException(ex.message);
}

Again this wouldn't solve the problem everywhere, but could just harden against basic breakage and "minimize" (heh) the number of issues reported when this issue and the others get lost in the cracks in github search indexing. But don't worry I won't try requesting a PR doing this! I will however see about a PR later for the keep_ flags in the example, which might also reduce issue reporting.

@Snuffleupagus
Copy link
Collaborator

I'm not totally sure why the WorkerTransport checks the error name and creates a new error instance of the same type,

Because the data is being sent from the worker-thread to the main-thread, which means that it's passed through the structured clone algorithm and what arrives in the API is thus a "regular" Object (and not something which is instanceof Error).
Hence we really have no choice in the matter here, but need use the name, since we'd obviously prefer not having to check the name explicitly.

@alexrothenberg
Copy link

This was fixed for me in v2.11.338 because of 6167566.

Thank you so much for fixing this and this library overall 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants