Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Elide comments in bundles #2420

Merged
merged 2 commits into from
Aug 20, 2024
Merged

feat: Elide comments in bundles #2420

merged 2 commits into from
Aug 20, 2024

Conversation

kriskowal
Copy link
Member

Closes: #2413

Description

Adds an option to the bundler to blank the interior of comments, reducing bundle sizes.

This change does not attempt to apply the elideComments behavior if the user selects noTransforms, since it is piggybacking on the censorship evasion transform. These features could be decoupled, elideComments works with endoZipBase64 and a narrower interpretation of noTransforms (no precompiled module transforms).

Security Considerations

Some care has been taken to ensure that the resulting comments produce programs with the same behavior in the event the comment must be interpreted as a newline for automatic semicolon insertion (ASI).

Scaling Considerations

None.

Documentation Considerations

  • NEWS
  • README

Testing Considerations

  • evasive transform unit tests
  • bundle source unit tests
    • covering endoScript and endoZipBase64
    • composition errors with noTransforms

Uncovered:

  • cache behavior. The new flag participates in the cache and gracefully handles caches from prior versions.
  • command line

Compatibility Considerations

Maintains support for caches from prior versions.

Upgrade Considerations

None

@kriskowal kriskowal requested review from turadg and dckc August 20, 2024 00:23
@kriskowal kriskowal force-pushed the kriskowal-no-comment-2413 branch from fa48cac to fc1eb2f Compare August 20, 2024 04:46
@@ -42,6 +42,14 @@ entry instead of `main` in `package.json`, if not overridden by explicit
The `development` condition additionally implies that the bundle may import
`devDependencies` from the package containing the entry module.

## Comment Elision

The `--no-comment` (`-X`) flag with `--format` `endoScript` or `endoZipBase64`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a cute name but "no" sounds like opting out of behavior, not into one.

The name should convey that it's opting in. E.g. strip-comments as is the common name for this behavior or elide-comments as is the term used throughout the implementation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally worth the gag, even if lost in a fit of responsibility. I want to reserve --strip-comments for a feature that removes the comment brackets entirely. Please push back if that is undue pedantry and I’ll thread stripComments throughout.

* @param {import('./location-unmapper.js').LocationUnmapper} [unmapLoc]
*/
export const elideComment = (node, unmapLoc) => {
if (node.type === 'CommentBlock') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also remove copyright notices. I think that should be a requirement but IANAL.

Whichever way we land the tests and docs should make it patently clear.

I think we could get it by checking for some values in the node.value before trying to replace: https://terser.org/docs/miscellaneous/

@kriskowal kriskowal force-pushed the kriskowal-no-comment-2413 branch 2 times, most recently from 18da963 to 432cd82 Compare August 20, 2024 20:45
@kriskowal kriskowal requested a review from turadg August 20, 2024 20:52
};

/**
* Rewrites comments by replacing their interior with an ellipsis, deleting all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect most readers will think ellipsis=

Consider,

Rewrites comments by blanking their interior, deleting all non-newlines before the last line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the catch. I removed the ellipsis and did not update the comment.

@kriskowal kriskowal force-pushed the kriskowal-no-comment-2413 branch from 432cd82 to 7ef5f0f Compare August 20, 2024 21:07
@kriskowal kriskowal force-pushed the kriskowal-no-comment-2413 branch from 7ef5f0f to 0a59732 Compare August 20, 2024 21:07
@kriskowal kriskowal merged commit 1f2257f into master Aug 20, 2024
17 checks passed
@kriskowal kriskowal deleted the kriskowal-no-comment-2413 branch August 20, 2024 21:25
Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM module regular expression efficiency.

if (comment.startsWith('*')) {
// Detect jsdoc style @preserve, @copyright, @license, @cc_on (IE
// cconditional comments)
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test(
Copy link
Contributor

@gibson042 gibson042 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is unnecessarily vulnerable to ReDoS.

Suggested change
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test(
return /(?:^|\n)\s*(?:\*\s*)?@(?:preserve|copyright|license|cc_on)\b/.test(

or (relaxing as suggested by Claude below)

Suggested change
return /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/.test(
return /(?:^|\n)[\s*]*@(?:preserve|copyright|license|cc_on)\b/.test(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For increased availability, the claude suggestion is:

const optimizedRegex = /^[\s*]*@(?:preserve|copyright|license|cc_on)\b/;

I can't seem to access the chatgpt suggestion.

Copy link
Member Author

@kriskowal kriskowal Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Claude suggestion is more liberal in what comments it accepts. Claude would consider this a valid and preservation-worthy JSDoc:

/**
 ** @copyright
 **/

Maybe this is an acceptable relaxation. The existing regex is already more liberal than strictly necessary. That is, it accepts weird jsdoc comment styles.

Can the Claude LLM be prompted that the behavior should be equivalent and do the same work in fewer cycles? For what comments would the proposed regex process the same characters more than once?

  • (?:^|\n) matches the beginnings of lines, initial and non-initial
  • \s* consumes all leading space
  • \*? consumes exactly one asterisk if present. If absent the character under the cursor is neither a space nor an asterisk. This seems like the most likely matcher to cause an iterative walk-back, but I’m not sure how or why.
  • \s* consumes any additional space after an asterisk but should be harmless if there was no asterisk.
  • @ leads every pragma
  • At some loss of legibility, the alternation of matched pragma names could avoid some walkback with a common prefix tree, (?:c(?:c_on|opyright)|preserve|license) but the legibility is probably a bigger win for that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Haiku is free to use at a rate limit. Fwiw, I think it's reasonable that @copyright anywhere in the comment should be considered an intent to preserve.

Copy link
Contributor

@gibson042 gibson042 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the relaxed suggestion above, but FWIW @kriskowal's intuition is pretty much correct. The most problematic input for the merged regular expression is lots of whitespace followed by an asterisk and then not an at sign, and I think the backtracking is exponential:

$ esbench.mjs -b1 -h 'V8' --arg i~3 \
  -s '
    const S = " ".repeat(10**(i + 1));
    const sMatch = `${S}*@preserve`;
    const sNoMatch = `${S}*X`;
    const r = /(?:^|\n)\s*\*?\s*@(?:preserve|copyright|license|cc_on)\b/;
  ' \
  accept:'if(!r.test(sMatch)) throw Error("unexpected rejection");' \
  reject:'if(r.test(sNoMatch)) throw Error("unexpected acceptance")'
#### V8
accept (0) 7491.209439528024 ops/ms after 31 245760-count observations
reject (0) 3480 ops/ms after 29 122880-count observations
accept (1) 4924.580152671756 ops/ms after 21 245760-count observations
reject (1) 89.9080919080919 ops/ms after 34 2647-count observations
accept (2) 1298.868 ops/ms after 34 38202-count observations
reject (2) 1.2203389830508475 ops/ms after 34 36-count observations
accept (3) 154.3412228796844 ops/ms after 34 4603-count observations
reject (3) 0.0125 ops/ms after 13 1-count observations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bundler option to strip comments
4 participants