Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: detect ESM syntax by trying to recompile as SourceTextModule #52413

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

joyeecheung
Copy link
Member

Instead of using an async function wrapper, just try compiling code with unknown module format as SourceTextModule when it cannot be compiled as CJS and the error message indicates that it's worth a retry. If it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext out as static methods and move GetHostDefinedOptions to ModuleWrap.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/vm

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 7, 2024
@guybedford
Copy link
Contributor

This seems like it works. Can we think of any cases where a module would not use top-level await, import or export or import.meta syntax, and does not lexically define any of the CJS variables, where it would parse as ESM and not a script function body? I can't think of anything but it would be great if we could fully confirm the edge cases.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 7, 2024

I think no matter what edge cases there are, it's pretty unlikely that reparsing it with a weird async function wrapper (what the main branch currently does) can be any more correct than reparsing as SourceTextModule....I'd prefer to just go ahead with this to unblock #52047

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 7, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 7, 2024
@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth
Copy link
Member

There are two reasons for the current behavior:

  1. If we reparse as ESM and that fails too, presumably the initial parse failed for a reason other than ESM syntax and so we want to throw that error, not whatever error is thrown by attempting to parse as ESM.
  2. There's no need to reparse as ESM for any CommonJS parse error, so for efficiency we want to do the reparse only when it might make a difference.

I'm on my phone so I can't tell if these have been addressed, but if so, then sure let's change the algorithm (and update the docs accordingly). I think the first point has an existing test covering it, but if not then we should add one.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 8, 2024

If we reparse as ESM and that fails too, presumably the initial parse failed for a reason other than ESM syntax and so we want to throw that error, not whatever error is thrown by attempting to parse as ESM.

That's not changed. The error getting rethrown from catch block in executeUserEntryPoint() is still the error from the CJS loader. Reparse errors are still swallowed, the only difference is that previously it was re-parsed as wrapped async function, now it's reparsed as ESM. That's covered by test/fixtures/es-modules/package-without-type/commonjs-wrapper-variables.js which still passes as you can see in the CI.

There's no need to reparse as ESM for any CommonJS parse error, so for efficiency we want to do the reparse only when it might make a difference.

That's not changed either. It is still only reparsed when the error is in the throws_only_in_cjs_error_messages set.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 8, 2024

then sure let's change the algorithm (and update the docs accordingly).

I don't think the algorithm as documented changes in anyway as it does not mention specifics about how the code is reparsed (would be somewhat funny to mention that async wrapper too). The doc only lists errors used to determine whether the code should be retried as ESM, those aren't changed by this PR. We are still using the same set to determine syntax errors that definitely indicate ESM syntax and don't need a reparse, and the same set to determine syntax errors that warrant a reparse to double-check.

lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Comment on lines -181 to -182
const { enrichCJSError } = require('internal/modules/esm/translators');
enrichCJSError(error, source, resolvedMain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is removing these lines not causing any tests to break? There are tests that check the output of this “enriched” error that I’d think should be failing without these lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those only check part of the sentence because the rest keeps changing (typically it's not a good idea to check all of the message, either). The error is now "enriched" in C++.

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was talking about my followup. In this branch, this part is already just unreachable, even before this patch, because if !retryAsESM, then the code doesn't contains ESM syntax, then the error won't be "enriched" by enrichCJSError() anyway (i.e. skipping inside enrichCJSError() already).

lib/internal/modules/run_main.js Show resolved Hide resolved
try_catch.ReThrow();
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this consolidation mean that we no longer need the try/catch at all? In other words, we could revert #52093 to go back to the previous approach of calling containsModuleSyntax, and then running the CJS entry with the CJS loader, and the performance cost will be negligible because the parse gets cached in V8?

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what my follow up does https://github.com/joyeecheung/node/commits/esm-detect-cpp/ which also implements detection support for require(esm): if it's entry point, and the detection flag is enabled, and it looks like ESM, use the path that allow TLA, and ignore the exports because uh no one needs the module.exports of the entry point..at least for something that's already ESM because that would not have been possible anyway? If it's not entry point, and the require flag is enabled, or detection flag is enabled and it looks like ESM, do synchronous require esm and add module.exports. Though I don't think it's a good idea to merge that into this PR, this is a refactoring, that one implement things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it’s entry point, and the detection flag is enabled, and it looks like ESM, use the path that allow TLA, and ignore the exports because uh no one needs the module.exports of the entry point

What if the entry point gets required or imported a second time? Like node entry.js imports foo.js which imports entry.js? The second import would need the exports from entry.js.

Can we land your follow-up and unflag detect-module while require(esm) is still flagged? Or is require(esm) possibly close enough to being ready to unflag that we could just do them together?

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second import would need the exports from entry.js.

If entry.js is ESM, then that's the namespace, not module.exports, and we'll return the namespace of the cached module in the ESM loader, because import entry.js is not handled by CJS loader when the module job is already cached.

If it's require entry.js again, then we can error or warn that it's the entry point, or return the namespace if it's fully evaluated. But that's talking about require(esm) edge cases in a follow-up now, pretty off topic for this PR.

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we land your follow-up and unflag detect-module while require(esm) is still flagged?

I don't know for sure, locally my followup does pass test/parallel/test-debugger-exceptions.js even when detect-module is unflagged, though, if you think that's the last blocker for unflagging detection. require(esm) will need to be flagged for at least 22, blockers are __esModule interop and customization hooks support, and maybe there are more as we get more feedback from 22. In general I'm not too comfortable with just unflagging an experimental feature that really needs some battle testing from users at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not too comfortable with just unflagging an experimental feature that really needs some battle testing from users at least.

Especially for a feature like this one. As soon as it can be used without a flag, people will start to publish packages that depend on it. Then we will be blamed if we break them.

src/node_contextify.cc Outdated Show resolved Hide resolved
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 12, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 12, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

// In case the entry point is a large file, such as a bundle,
// ensure no further references can prevent it being garbage-collected.
cjsLoader.entryPointSource = undefined;
if (error != null && ObjectGetPrototypeOf(error) === SyntaxErrorPrototype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential follow-up: Do we control the error being thrown? If so, I'm wondering if we should create a sub-class like MismatchedSyntaxError 🤔

Copy link
Member Author

@joyeecheung joyeecheung Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow up is just don't do the check in JS, and check right after parsing in C++ :)

lib/internal/modules/run_main.js Show resolved Hide resolved
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@nodejs-github-bot
Copy link
Collaborator

Instead of using an async function wrapper, just try compiling code with
unknown module format as SourceTextModule when it cannot be compiled
as CJS and the error message indicates that it's worth a retry. If
it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that
we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext
out as static methods and move GetHostDefinedOptions to
ModuleWrap.
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 19, 2024
@joyeecheung joyeecheung added the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 19, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 19, 2024
@nodejs-github-bot nodejs-github-bot merged commit 651fa04 into nodejs:main Apr 19, 2024
41 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 651fa04

aduh95 pushed a commit that referenced this pull request Apr 29, 2024
Instead of using an async function wrapper, just try compiling code with
unknown module format as SourceTextModule when it cannot be compiled
as CJS and the error message indicates that it's worth a retry. If
it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that
we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext
out as static methods and move GetHostDefinedOptions to
ModuleWrap.

PR-URL: #52413
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
@marco-ippolito marco-ippolito added the backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants