-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: utility method for detecting ES module syntax #27808
Conversation
(link to nodejs/modules#330 and nodejs/ecmascript-modules#69) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more useful to have a method that returns an array of possible parse goals - ie, esm, cjs, or both - which allows for future options like “wasm”, “json”, etc - but “contains module syntax” certainly does what it says on the tin.
i am concerned about the longevity of this api, not that the api itself is a bad thing. is this the right place to put it, is there some other api in the future we might want instead, etc. perhaps flagging it would make sense for now? |
I agree with @devsnek that this API might change again. An alternative to a flag would be to mark it as experimental. |
Does this api really belong in core? Can’t it be an npm module? Especially since it’s future may be uncertain, out of core flexibility seems better to me... |
Of course, any utility method could be outside core. I was thinking that this belonged in core because it’s useful in relation to |
Hmm, I think we should wait until such a usage in core becomes apparent and then discuss adding this. It seems possible right now that a future feature would in fact require a slightly different algorithm for deciding if a file is CJS or a module. |
Ultimately I think there will be multiple algorithms for deciding how a file or a source input string should be treated. There’s no one ideal algorithm that will work for all scenarios. The difficulty in “detecting” CommonJS is that most (all?) CommonJS files that have |
that's a pretty good argument to leave it out of core for now. perhaps you could develop a module with lots of algorithms and such that can be merged into core in the future? |
Yeah, I also think adding it as an npm module makes more sense at this moment rather than having this in the core. |
Thinking of this long enough — I must admit it is neither a core thing nor just another npm thing — Some things are just to intertwined to specific features of core to a point where it creates too much noise when forced to always play catch up in the wild. So in my opinion — this needs to have:
Last thing we need is a popularity contest on how a package seems to magically detect ESM modules only to discover it just drops safe guards to do so — that is if we are serious about having an auto detect mode (assuming it is a loader extension at some point, yes?). |
Does the API here lock us into shipping with a parser module like acorn? If so, I’d also really prefer this to be an npm module. |
@SMotaal If I recall correctly, we own the |
@addaleax I raised this a few months back, and the rational at the moment was acorn ships because of REPL, which I must point out was already causing me issues, but reason still dictated that since it is there already, that was a different problem space to work out, and that was more appropriate than adding yet another parser to the mix.
|
I think that this is a very fine grain special case — when people implement loaders, they will aim for convenience, they will sell anything goes mentality and claim that "we the mojo will always detect CJS vs ESM formats for you" which is in very high demand these days. I'd like this problem to be solved by an "official" solution so that people do not solve this in ways that overload the loader or make it too hard for security aspects that they fail to properly predict what code will be loaded before it is actually loaded. (sorry for the edits — it is annoying but takes |
lib/internal/modules/cjs/loader.js
Outdated
// writing, Acorn doesn't support import() expressions as they are only Stage | ||
// 3; yet Node already supports them. | ||
const acorn = require('internal/deps/acorn/acorn/dist/acorn'); | ||
source = stripShebang(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct thing to use now is stripShebangOrBOM
if you know that you're parsing cjs. If you're parsing ESM, you should only use stripShebang
. This obviously presents a problem, and I don't have a solution for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stripShebangOrBOM
was removed in 702331b, because the V8 parser handles stripping shebangs/hashbangs now. Only stripBOM
remains.
This is a problem, as Acorn doesn’t strip hashbangs. If I remove stripShebang
from my method, the following throws:
containsModuleSyntax('#!/usr/bin/env node\nimport "./file.js"');
When it should return true
. I could resurrect stripShebangOrBOM
for this method, I suppose?
I guess this is the problem with relying on Acorn for parsing in certain places (like the REPL) and V8 everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, Acorn has an allowHashBang
option, and it’s supported by its tokenizer. All is well. I rebased from the latest master
and added a test with a shebang line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffreyBooth can we maybe have a preflight checklist re making sure that acorn configs/plugins... etc remain effective for this feature?
I'm thinking that if it is used sparsely, we might want to sanely keep track of dependencies so that updates don't break features like this.
695d780
to
a8a7776
Compare
Yes, but Node core already uses Acorn in the REPL and in |
As long as this doesn't expose anything specific to Acorn (such as error messages or objects returned by the library), I'm fine with it. |
It doesn’t. The method either returns |
@devsnek Okay, I removed |
a8a7776
to
750d617
Compare
@GeoffreyBooth but we do strip bom for cjs. |
750d617
to
0d7db35
Compare
@devsnek Okay, then we should keep Whether or not the source code is runnable is outside the scope of what this method is aiming to tell you. If a string has a BOM and also has module syntax, then this method should return |
@devsnek Whether or not source code contains a BOM or a hashbang is irrelevant to whether or not the source contains module syntax. For the purposes of this method we can strip out anything we want that's not an |
@GeoffreyBooth it's a syntactic difference for esm and cjs. Maybe you could explicitly sniff it instead of stripping it? |
I’m not sure what you’re asking for here. If a BOM is permissible in ESM, and therefore ESM may or may not contain a BOM, then sniffing it doesn’t tell us anything about whether code contains module syntax. The stripping, of BOM or of hashbangs, is only to prevent Acorn from throwing an exception on code that should be parseable. Now that hashbangs are permitted in JavaScript, Acorn shouldn’t require a special option to strip them—but it does. Now that BOMs are permitted in JavaScript, Acorn shouldn’t require us to strip them preemptively—but we need to. These edits are just to work around Acorn’s limitations. |
@devsnek can you elaborate what you mean, would:
I am almost certain we're not considering BOM a disambiguating aspect here so my apologies for needing more clarification please. |
(note that hashbangs are stage 3, so not yet permitted in JS) |
what i'm saying is we don't strip bom for source text modules, so we can use the presence of bom as another indicator for cjs. |
This method doesn’t look for indicators of CommonJS. It only looks for module syntax. |
Thanks @devsnek. Can someone please trigger a build? The last build failed on some crypto tests that have nothing to do with this PR. |
This comment has been minimized.
This comment has been minimized.
Should this stay open? |
Technically it's still on the modules roadmap, and there's no alternative yet. Until we discuss and agree to table it, I think this stays open. |
Closing per nodejs/modules#465. |
This PR adds a utility method to detect whether input JavaScript source code contains ES module syntax:
The use case for this is any utility that currently uses
vm.Script
’srunInThisContext
and might also want to support ESM input source code, and therefore would need to know when to usevm.Script
versus the ES modulevm.SourceTextModule
(and has no definitive signal from the user as to the intended source code type, and where the risk of evaluating in the wrong module system is acceptable for the use case). Two such utilities that I’m familiar with are Babel and CoffeeScript, which each take arbitrary source code as user input (like--eval
) and usevm.runInThisContext
for evaluation. I would also imagine that this method would be useful to some loaders or testing frameworks.Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes