Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: New “ESM by default” mode #49432

Open
GeoffreyBooth opened this issue Sep 1, 2023 · 98 comments
Open

Discussion: New “ESM by default” mode #49432

GeoffreyBooth opened this issue Sep 1, 2023 · 98 comments
Labels
cli Issues and PRs related to the Node.js command line interface. esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. wasi Issues and PRs related to the WebAssembly System Interface. wasm Issues and PRs related to WebAssembly.

Comments

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Sep 1, 2023

Building off of #49295 (comment), we’re considering a new mode where all of the current places where Node defaults to CommonJS would instead default to ESM. I think this should be enabled by flag, and once it ships we can potentially ship a separate binary where it’s enabled by default; and/or we can someday make the new mode Node’s default in a semver-major change.

The use cases solved by such a mode are (and I’ll edit this comment to update this list as people think of more):

  • I want to write ESM syntax by default without needing to opt into it, such as via package.json file or file extension or --input-type=module.

  • I want to write what are typically called shell scripts in ESM JavaScript, where I can have a single extensionless file anywhere on my disk that uses ESM syntax and it doesn’t need a nearby package.json file or a symlink or wrapper file in order to run as ESM.

  • I want consistency in how the CLI handles references to files; currently --import accepts URL strings but the main entry point must be a path string (and therefore can’t be a data: URL or https: URL, such as to work with --experimental-network-imports).

  • If this mode becomes enabled by default in a future version of Node, when using that version of Node I want to be able to start a new project and write ESM syntax without needing to take any steps to opt into enabling ESM support.

To handle these use cases, the changes that this mode should make are (and likewise I’ll edit this to reflect consensus):

  • When a .js file is outside of any defined package.json scope, because there are no package.json files in its folder or any folders above it, it will be treated as ESM JavaScript.

  • When a .js file is in a package scope where its nearest parent package.json lacks a "type" field, it will be interpreted per the conditions we establish in Discussion: In an ESM-first mode, how should a package.json file with no type field be handled? #49494: when the package scope is under a folder named node_modules, the file will be treated as CommonJS; otherwise it will be treated as ESM.

  • When an extensionless file is outside of any defined package.json scope, it can be run as an entry point or imported per the conditions we establish in Discussion: In an ESM-first mode, how should extensionless entry points be handled? #49431: if the file begins with the Wasm magic bytes, it will be evaluated as Wasm, and likewise for any other future file types with defined headers that don’t conflict with JavaScript; otherwise it will be evaluated as ESM JavaScript. (Still subject to --experimental-wasm-modules until that stabilizes.)

  • Input from STDIN or --eval would be treated as ESM unless --input-type=commonjs is passed. Or in other words, the default value of --input-type would flip from commonjs to module.

  • The CLI will parse the main entry point as an URL string, similar to how the value of --import is parsed. process.argv[1] will not be parsed by the CommonJS loader.

This new flag will be named per the discussion in #49541.

Prior art: #32394, #49295, #49407

@LiviaMedeiros @nodejs/loaders @nodejs/wasi @nodejs/tsc

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. cli Issues and PRs related to the Node.js command line interface. esm Issues and PRs related to the ECMAScript Modules implementation. wasm Issues and PRs related to WebAssembly. wasi Issues and PRs related to the WebAssembly System Interface. labels Sep 1, 2023
@mcollina
Copy link
Member

mcollina commented Sep 1, 2023

I don't think this is feasible to change the default if package.json is present due to the massive amount of modules available on npm.

My 2 cents on how to solve this in a much easier way: let's make npm init add the necessary field in package.json.

@GeoffreyBooth
Copy link
Member Author

I don’t think this is feasible to change the default if package.json is present due to the massive amount of modules available on npm.

I’ve considered this and I don’t think it’s a blocker. As discussed in #49295 (comment), there are many potential solutions, the simplest of which is to just crawl through subfolders adding "type": "commonjs" to package.json files that lack it. I don’t think this needs to be a worry or something we need to deal with at this point.

Also, the proposal isn’t to go straight to changing Node’s default. That will be months if not years away, once this flag has shipped for a long time and issues have been resolved. The time for considering the impact of changing Node’s default is once this flag has been available for a while and we can see how it works. We may never change Node’s default, and that’s fine; this flag and mode has merit regardless. The point of mentioning that this might lead to changing Node’s default is so that we design with that potential goal in mind, so that whatever we add has a clear way to opt out if/when it becomes the new default.

@bmeck
Copy link
Member

bmeck commented Sep 1, 2023

@GeoffreyBooth how does adding that field work for people using other languages/processes that spawn child processes for existing codebases? I'm mostly curious since it is a much larger breaking change than most we see and affects all sorts of runners of node applications in non-simplistic ways.

@GeoffreyBooth
Copy link
Member Author

how does adding that field work for people using other languages/processes that spawn child processes for existing codebases? I’m mostly curious since it is a much larger breaking change than most we see and affects all sorts of runners of node applications in non-simplistic ways.

First off, adding a flag isn’t a breaking change. The breaking change would only be if we make the flag the new default behavior, which I don’t think we need to worry about just yet. It’s premature to consider larger ecosystem effects of making this new mode the default when we haven’t designed, much less implemented, the flag to enable this new mode.

The flag is no different than achieving these effects via module customization hooks (the changes that the hooks can achieve, anyway). Just as other languages/processes wouldn’t know how you’re customizing the files in your project when using hooks, they wouldn’t know whether your project is meant to run under this new flag or not. This is a problem we already have, and it’s on those other tools to provide a way for the user to signal to them that they should expect Node to be in this new mode, like how tsconfig.json is used. Shipping the flag long before we change Node’s default allows the ecosystem time to adapt.

One thing we can do to help ease the transition is to start making the "type" field mandatory. As in, you don’t need to have a package.json file, but if you have one, it needs a "type" field. Then at least for applications, which are our dominant use case, there’s still a deterministic way for tools that work with apps to know how all the files within that application should be interpreted. We could make this one of the effects of the flag, so that in the new mode we need explicit package.json types.

@bmeck
Copy link
Member

bmeck commented Sep 1, 2023

@GeoffreyBooth I'm not opposing the flag in any way, I'm just concerned about the default swap which has been attempted in the past in #32394

@bmeck
Copy link
Member

bmeck commented Sep 1, 2023

Then at least for applications, which are our dominant use case, there’s still a deterministic way for tools that work with apps to know how all the files within that application should be interpreted. We could make this one of the effects of the flag, so that in the new mode we need explicit package.json types.

I'm more skeptical on this due to files being loaded for config/init like ~/.*rc files outside of the application directory.

@joyeecheung
Copy link
Member

joyeecheung commented Sep 1, 2023

I think the idea of having multiple releases in general is not great - if we want to do it, builds with pointer compression enabled would’ve been a thing already and then we would have another headache about what combinations of releases we want to maintain….

A runtime flag that makes ESM the default sounds good to me though, why can’t we just have that and allow it in NODE_OPTIONS? That should be enough for the use cases mentioned in the OP already

@GeoffreyBooth
Copy link
Member Author

I’m more skeptical on this due to files being loaded for config/init like ~/.*rc files outside of the application directory.

There’s an infinitely long tail of ecosystem tools and libraries that don’t support whatever latest and greatest features we ship. Several years ago there were presumably tools that relied on the fact that .js files were CommonJS, and those tools needed to update to check for the type field before making that assumption; this is no different. If we limit ourselves by what today’s tools expect out of Node, then we’d never be able to change anything, even additive/non-breaking changes. Users can make the choice of deciding to use this flag or deciding to use a tool that's incompatible with it.

I think the idea of having multiple releases in general is not great

What do you mean by “multiple releases”?

@joyeecheung
Copy link
Member

What do you mean by “multiple releases”?

multiple binaries, e.g. if you can have node-esm, then you can also also have node-pc (pointer compression), then you can also have a combination of node-esm-pc, then you can have…other combinations

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Sep 1, 2023

What do you mean by “multiple releases”?

multiple binaries, e.g. if you can have node-esm, then you can also also have node-pc (pointer compression), then you can also have a combination of node-esm-pc, then you can have…other combinations

Yes, that’s perhaps an argument against shipping an alternate binary (see #49407). I think we should worry about the flag first before contemplating binaries; I think the flag stands on its own, and maybe there’s a solution for a binary with that flag enabled that doesn’t involve us needing to ship or maintain it, but we can worry about it later.

A runtime flag that makes ESM the default sounds good to me though, why can’t we just have that and allow it in NODE_OPTIONS? That should be enough for the use cases mentioned in the OP already

Shebangs can’t include flags on many platforms. See #49295 (comment) and linked and following comments. I don’t know if shell scripts have much need for pointer compression, but assuming they don’t, then the shebang case is an argument for a separate binary, as shell scripts being unable to specify flags is a particular constraint that only a separate binary can solve, and hopefully this is the only Node option that such scripts need control over.

Anyway to your larger point, yes, let’s ship the flag first and go from there. Many platforms do support flags in shebangs, and there are other use cases solved by the flag, so we’re still making a lot of progress just with the flag by itself at first.

@mcollina
Copy link
Member

mcollina commented Sep 1, 2023

As discussed in #49295 (comment), there are many potential solutions, the simplest of which is to just crawl through subfolders adding "type": "commonjs" to package.json files that lack it. I don’t think this needs to be a worry or something we need to deal with at this point.

We are not in agreement. The ecosystem concern is something that needs to be at the center of this design. Changing the default of how a module from node_modules is evaluated is so massively breaking that should be off the table. A node.js that cannot run most of the registry is close to useless.
The way this is outlined will create python2 / python3 break, which we should be avoiding at all cost.

Note that I'm not opposed of changing the default when a package.json is not available. I think it would be a significant improvement for shell scripts and the like.. I wish I had it in quite a few occasions. Also: "type": "module" should be the default on npm init at some point in the close future.

@GeoffreyBooth
Copy link
Member Author

We are not in agreement. The ecosystem concern is something that needs to be at the center of this design. Changing the default of how a module from node_modules is evaluated is so massively breaking that should be off the table. A node.js that cannot run most of the registry is close to useless.
The way this is outlined will create python2 / python3 break, which we should be avoiding at all cost.

A new flag will not cause a break. Full stop. There is no breaking change proposed here.

The proposed behavior under the new flag will require an extra patch step until package managers catch up and insert the missing type field during package installation. This is not a big change; users can do it themselves via something like npm install && npx add-missing-type-fields. Calling this a Python 2/3 break is extreme hyperbole. I expect that package managers will begin doing it automatically once we ship the flag and they can test against this mode; or package managers will do it automatically when run by Node in this new mode, or at the very least offer an option to do so, saving users from needing to run the patch command. This is not a big deal. No package maintainers will need to update anything if they don’t want to. Even package managers don’t need to update, since we can publish an add-missing-type-fields script that users can run via npx to do the crawl and patch. We could even have Node throw an error upon encountering a typeless package.json: Error: ./node_modules/foo/package.json lacks a "type" field. Patch it by running npx add-missing-type-fields.

In #49295 (comment) and following comments we discussed the pros and cons of preserving the “missing type field still means CommonJS” behavior in the new mode. Doing so would mean avoiding the need for this patch step and/or avoiding package managers needing to update, but ultimately we wouldn’t be achieving the central goal of making ESM the default: it still wouldn’t be the default if a lack of type field still means CommonJS. And since the fix is so simple, that someone could probably implement in a Bash one-liner or a dozen lines of JavaScript, I think the patch approach is better as it allows us to get to a full ESM default without caveats. And once package managers catch up there won’t be any work for users to do.

There are other approaches. We could treat node_modules as a special case, where in this new mode, subfolders of node_modules preserve the “missing type = CommonJS” behavior. Another idea is that in this new mode, a missing type could be treated as a fallthrough, and Node continues to search up the tree for a package.json with a type, and the user could put a package.json with "type": "commonjs" in the node_modules root folder. I’m sure there are other variations people can think of. Personally though I think it’s better to have the same behavior on both sides of the mode, and just have the flag flip the default: it’s simpler, easier to reason about, and easier for other tools to follow. Yes it means running a patch script, at least at first. That’s a small price to pay, in my opinion, and short-term.

Also: "type": "module" should be the default on npm init at some point in the close future.

Or at the very least, it should start including "type": "commonjs" today and support npm init --module ASAP for the ESM version. But these things are a bit out of scope. Many things about making ESM the default get easier with a supportive package manager, but I want to try to design this so that such buy-in isn’t required, so that we don’t need to depend on every popular package manager signing on to a particular new design.

@mcollina
Copy link
Member

mcollina commented Sep 2, 2023

A new flag will not cause a break. Full stop. There is no breaking change proposed here.

What's the point of creating a mode of running Node.js that cannot run the majority of the registry?

I expect the .js to become recognized as an ESM by default some day, and this behavior as depicted here is unfeasible due to the massive ecosystem breakage.

The proposed behavior under the new flag will require an extra patch step until package managers catch up and insert the missing type field during package installation. This is not a big change; users can do it themselves via something like npm install && npx add-missing-type-fields. Calling this a Python 2/3 break is extreme hyperbole. I expect that package managers will begin doing it automatically once we ship the flag and they can test against this mode; or package managers will do it automatically when run by Node in this new mode, or at the very least offer an option to do so, saving users from needing to run the patch command. This is not a big deal. No package maintainers will need to update anything if they don’t want to.

I think this developer experience is unfeasible. What will happen when new packages that assumes are running in esm-first will be mixed with commonjs packages? Are users supposed to edit them by hand? This would worsen Node.js developer experience significantly and it is a very big deal, considering that the majority of the registry is (still) commonjs.

Moreover, this new mode will break quite a lot of npx and npm create commands, because individual packages would need to be fixed manually.

The ecosystem is already complex enough, and this new mode will actually worsen the situation. We can avoid completely avoid this friction and having "type": "module" there is by far a minor problem as long as it's the default.

Personally though I think it’s better to have the same behavior on both sides of the mode, and just have the flag flip the default: it’s simpler, easier to reason about, and easier for other tools to follow.

I agree. The default should not be changed actually: it has no friction whatsoever for developers. Let's not break everyone.

@LiviaMedeiros
Copy link
Contributor

LiviaMedeiros commented Sep 2, 2023

It seems like there's some confusions there.

Shebangs

IMHO most of linked prior discussions are a bit mislead.

  1. Shebang does support arguments on its own. It works just fine as #!/usr/bin/node --esm.
    EDIT: there is limitation of only one argument, so it can't be used together with any other option
  2. There is a practice of using env(1) in shebangs that basically works as #!$(which node)
  3. env historically has limitation of not being able to pass arguments, which is fixed in modern GNU coreutils and FreeBSD.
  4. Minimalistic distributions that use e.g. busybox may not have split-string feature in their env.

The solution for portable enough shebang without separate binary is to write #!/usr/bin/node --esm, simple as that.

I don't recall any issues with that; even though very same thing always applied to shebangs with --experimental-wasm-modules, --allow-fs-write, or any other option.

Separate binaries

There is a difference between separate binary that is compiler with different source code and/or different compile options, and a copy of the very same binary that has different name.

The first comes with downsides such as increased build time and increased release size.

The second (which is proposed in #49407) comes with restriction that the second binary must have specific filename.

For common real life example of it, one may run find "$(git --exec-path)" -samefile "$(which git)".
If this prints about 137 filepaths, you found it: all the git something commands are apparently separate files referring to same inode, AKA hardlinks.
If it doesn't, ls -l "$(git --exec-path)" will probably show a waste of space with copies.
Symlinks would also work, as long as we don't resolve them (e.g. process.execPath).

The implementation comes with almost no overhead, and node-esm doesn't even have to be included in release build; it's a job for the installer that places node in /usr/bin/.

Breaking change

No, adding new optional command line flag that is disabled by default does not break anything for anyone by itself.
It's just a feature that will instantly benefit anyone who already prefers ESM-first code.

And yes, blindly running everything in this mode with Node.js ecosystem that we have right now is likely to break.

This is not only intentional, this is the intent. We must be able to enable it and see what happens. We must ship it so any developer can enable it and see how it works with their projects. We must ship it so any user can enable it and see what breaks in the packages they use. We must ship it so any maintainer can see which dependencies are breaking.

Only after that, we can decide if it's feasible or not to give everyone time to migrate, then introduce --cjs flag and maybe deprecate the default-cjs behaviour, then give everyone more time to migrate, then maybe change the default to ESM-first.

@mcollina
Copy link
Member

mcollina commented Sep 2, 2023

This is not only intentional, this is the intent. We must be able to enable it and see what happens. We must ship it so any developer can enable it and see how it works with their projects. We must ship it so any user can enable it and see what breaks in the packages they use. We must ship it so any maintainer can see which dependencies are breaking.

Only after that, we can decide if it's feasible or not to give everyone time to migrate, then introduce --cjs flag and maybe deprecate the default-cjs behaviour, then give everyone more time to migrate, then maybe change the default to ESM-first.

This is where we are not in agreement, and any plans that include massive ecosystem breakage should be avoided. I'm strongly -1 on this proposal as currently framed. Moreover, I think any plan involving the discontinuation of commonjs should be avoided.

@tniessen
Copy link
Member

tniessen commented Sep 2, 2023

Shebang does support arguments on its own. It works just fine as #!/usr/bin/node --esm.

To be precise, shebang on Linux supports only a single argument. Anything beyond that needs a modern env.

@bmeck
Copy link
Member

bmeck commented Sep 2, 2023

One thing not mentioned here as well is tooling needing out of band info for flags/separate binaries like this; tsconfig/eslintrc/babelrc etc. would need to account for this flag since it wouldn't be statically analyzable and/or can be varied per entrypoint or in multiple modes for a given directory:

/bin/newer -> #!/usr/bin/env node-esm or node --default-operating-type=esm
/bin/older -> #!/usr/bin/env node

These need not be in shebangs either, these can be sources out of band just by invocation of the executable, package.json#scripts, etc. This tooling problem should probably be considered with them in this discussion as going completely out of sync of tools was a struggle for exports but that was at least statically analyzable.

@LiviaMedeiros
Copy link
Contributor

LiviaMedeiros commented Sep 2, 2023

This is where we are not in agreement, and any plans that include massive ecosystem breakage should be avoided. I'm strongly -1 on this proposal as currently framed. Moreover, I think any plan involving the discontinuation of commonjs should be avoided.

Any suggestions then?

@aduh95
Copy link
Contributor

aduh95 commented Sep 2, 2023

@LiviaMedeiros you are demonstrating that passing a single argument works, it would be more interesting to show wether passing a second argument doesn't work.

This is where we are not in agreement, and any plans that include massive ecosystem breakage should be avoided. I'm strongly -1 on this proposal as currently framed. Moreover, I think any plan involving the discontinuation of commonjs should be avoided.

Any suggestions then?

I guess we'd need to find a way to either restrict the effect of this flag to dev-only (a bit like the Buffer constructor deprecation warning is emitted only outside of node_modules folder), or hide it behind a build flag.

@LiviaMedeiros

This comment was marked as off-topic.

@aduh95

This comment was marked as off-topic.

@mcollina
Copy link
Member

mcollina commented Sep 2, 2023

Any suggestions then?

I think asking the package managers to add "type": "module" when initializing new projects achieve most of the same results.
As I mentioned earlier, not changing the default "type" in package.json will avoid most if not all of the ecosystem breakage.

@LiviaMedeiros
Copy link
Contributor

@tniessen @aduh95 my bad, shebang indeed interprets parameters single argument with spaces. Thanks for pointing this out!

I guess we'd need to find a way to either restrict the effect of this flag to dev-only (a bit like the Buffer constructor deprecation warning is emitted only outside of node_modules folder), or hide it behind a build flag.

Dev-only might work, I guess we can sacrifice performance for that on the first stage.

I think asking the package managers to add "type": "module" when initializing new projects achieve most of the same results.

I think this part is feasible on its own, e.g. marking existing packages without type as using fallback mechanism that should default to commonjs, and setting deadline after which the default fallback value changes to module and/or "type": "module" gets explicitly set by default.

What about files outside of scope? How can we launch non-symlink /usr/local/bin/something.js or /usr/local/bin/something as ESM?

@GeoffreyBooth
Copy link
Member Author

As I mentioned earlier, not changing the default “type” in package.json will avoid most if not all of the ecosystem breakage.

@mcollina If I’m understanding your full comments correctly, this was the only part that you opposed, right?

I have conflicting thoughts on this myself. On the one hand, the more that the flag breaks, the less adoption it’ll get and the longer the timeline to potentially making it the default. On the other hand, like @LiviaMedeiros mentioned, the more that it breaks behind a flag, the more potential gain: if people try out the changes and like them, that builds momentum for the ecosystem adapting.

So I guess the question is, what’s the benefit of changing how “typeless package.jsons” are treated? The only one I can think of is symmetry between the on and off versions of this flag, which is another way of saying user expectations. I can’t think of any use cases enabled by flipping this default. So sure, we can preserve “no type = CommonJS” if we want, and it makes the flag a bit harder to explain but it avoids a set of other problems, the things I was saying would need a patching script for. And maybe that’s what we should do, I don’t know.

There’s no way this flag won’t land without --experimental- in its name. So I also think that we’re free to land the most ambitious version first, just to see how people react and use it, and I’m sure there will be changes before the flag leaves experimental status (long before it becomes a new Node default). We could even split out the “treat typeless package.json as CommonJS” part into a separate flag, so that the primary flag enables everything else and another experimental flag enables that particular behavior, to see if the typeless change has benefits of its own. I don’t think we should be putting down hard blocks against these types of experimentation.

What will happen when new packages that assumes are running in esm-first will be mixed with commonjs packages?

I was thinking that in the new mode, type would be required in package.json files when they’re present. So anyone publishing a package assuming ESM first would still be forced to add "type": "module", keeping it compatible with the legacy mode.

Though I acknowledge that this breaks the symmetry, so maybe there’s no value in flipping how typeless scopes are treated. It would be an unfortunate “this is always this way for legacy reasons” thing.

One thing not mentioned here as well is tooling needing out of band info for flags/separate binaries like this; tsconfig/eslintrc/babelrc etc. would need to account for this flag since it wouldn’t be statically analyzable

I don’t think this is much of an issue. Most tools that I’m familiar with have moved away from the .babelrc pattern into ones like vite.config.js where the file extension behaves like any other file in that scope. And again, we can ship an experimental flag and see which tools are broken under it, and see what those tools say when they open issues with us; would there be a path for them to get their tool working in the new mode, or not? This is part of the point of the experiment.

@bmeck
Copy link
Member

bmeck commented Sep 2, 2023

@GeoffreyBooth they interpret the type of files. I'm not concerned about tsconfig.json being interpreted in a new way but about TS not being able to determine the type of a module without "yet another config option". Even with more config options the entrypoint of a program could alter the type of a file to superimpose the 2 modes of operation.

@GeoffreyBooth
Copy link
Member Author

it would be best to point at such a policy

https://nodejs.org/en/about:

As an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications.

@GeoffreyBooth
Copy link
Member Author

@aduh95 I've done some digging, and I think I have tentative answers for the questions you posed above regarding parsing the entry as a URL:

  • What value of argv[1] will be given to userland?

So today, running node app.js will yield an argv[1] along the lines of /var/www/app.js. Since it’s already a conversion of an input string into another string, I think it would be fine for something like node --type=module file:///var/www/app.js to yield a path string argv[1], such as in this case the same /var/www/app.js. Then we avoid breaking anything that’s expecting an absolute file path in argv[1].

The argv[1] value is unset for entries that don’t correspond to files on disk, like --eval. So I think for non-file: URLs, we would do the same. node --eval 'console.log(process.argv)' foo bar prints [ '/usr/local/bin/node', 'foo', 'bar' ], so I would expect node --experimental-type=module 'data:text/javascript,console.log(console.argv)' foo bar to also print [ '/usr/local/bin/node', 'foo', 'bar' ].

There’s also process.argv0 that stores the original value of argv[0] before it gets converted. We could make a similar process.argv1 to store the pre-conversion value of the entry point. So in the node --type=module file:///var/www/app.js example, process.argv[1] would be /var/www/app.js and process.argv1 would be file:///var/www/app.js. In the node --type=module 'data:text/javascript,console.log(3)' example, process.argv[1] would be undefined/missing and process.argv1 would be data:text/javascript,console.log(3).

  • Would bare specifiers be interpreted as relative URLs?

I think so. I’m not sure what the use case would be using the full resolution algorithm. Doing so would mean that you could never run node --type=module app.js, only node --type=module ./app.js, and that seems annoying. You could use a bare specifier entry via node --import pkg --eval '' if you wanted that, among other ways.

Disallowing bare specifiers would also be consistent with legacy --type=commonjs behavior, where argv[1] is parsed via path.resolve, not require.resolve.

  • How would that work with shebang? AFAIK the OS will always pass a path, not a URL.

I did some testing. At least on my Mac, the OS passes whatever string was used as the command. I made a file shebang and saved it in ~/Sites/node, and modified https://github.com/nodejs/node/blob/main/lib/internal/process/pre_execution.js to save a new property process.argv1 to preserve argv[1] before this file updates it. Within shebang I logged out process.argv1.

When run via ./shebang it prints './shebang'. When run via $PWD/shebang it prints '/Users/geoffrey/Sites/node/shebang'. So maybe the “OS passing the path” part you’re remembering is the path.resolve replacement we do in pre_execution.js.

At least on my system and maybe all POSIX systems, these pre-replacement strings are parseable as URLs: both new URL('./shebang', pathToFileURL(process.cwd()) and new URL('/Users/geoffrey/Sites/node/shebang', pathToFileURL(process.cwd()) return the same thing, an URL instance for file:///Users/Geoffrey/Sites/node/shebang which is what we want. So basically it seems like it's happenstance that for the shebang case, the strings sent from the OS work as relative URLs. Presumably this wouldn't work in Windows, but do people run shebang scripts in Windows?

  • We should explicitly set the goal of getting rid of some of the legacy behavior; e.g. the fact that we always pass argv[1] to CJS loader first should probably change under the new mode.

Agreed, I’ve updated the top post.

@aduh95
Copy link
Contributor

aduh95 commented Sep 25, 2023

At least on my system and maybe all POSIX systems, these pre-replacement strings are parseable as URLs

I think we've already established that parsing paths as URLs is not an acceptable solution: it breaks as soon as the path contains %, #, or ?. We wouldn't accept a solution that "sometimes works, sometimes doesn't, sometimes loads an entirely different file".

@GeoffreyBooth
Copy link
Member Author

I think we’ve already established that parsing paths as URLs is not an acceptable solution: it breaks as soon as the path contains %, #, or ?.

My examples above aren’t parsing paths, at least for the non-shebang case. They’re parsing relative URLs, based on the file URL of process.cwd(). We would define it as such. node ./file.js?foo=bar should work, as desired in #49295. It happens to be identical to what users in POSIX systems would write as paths, but it’s defined input as a relative URL. In Windows, node C:\folder\file.js would not work; Windows users would need to write with forward slashes, as either relative or absolute URLs. This would just be part of the specification, that we would document.

Just so we’re clear, my goal is to eventually allow node https://example.com/script.js and node 'data:text/javascript,console.log("hello")'. I think it’s okay to tell people something along the lines of “If you wish to run a file whose filename contains characters that requiring escaping in a URL, such as % or # or ?, you need to specify it using an absolute file URL such as node file:///path/to/wha%3F.js.” We can even have an error path where if the entry point isn’t found, Node checks to see if parsing as a path would have worked and if so, suggests the correct file: absolute URL instead.

So that leaves the shebang case. When I run ./shebang, the original argv[0] and argv[1] passed to Node from my OS, before Node does any replacements, are values like 'node' and './shebang'. The first value, of argv[0], is whatever executable I write in the shebang line; if I make the shebang file begin with #!/usr/bin/env /Users/geoffrey/Sites/node/node then argv[0] becomes '/Users/geoffrey/Sites/node/node'. It would seem that there’s no way to know, at least from argv alone, whether Node was launched directly or from a shebang script. (Is there a way to know via some other method?)

If I do mv shebang 'shebang?' and run ./shebang\? I get argv[1] before replacement of './shebang?', on my test branch. So this is perhaps one approach: if the original argv[1] contains characters that require escaping to be allowable in URLs, we could check to see if a file with those characters exists on disk; and if it does, error to tell people to use an absolute file URL with escaped characters. This would mean that we are disallowing shebang files whose filenames have special characters, since you can’t run a shebang file as an absolute file URL, but I think that’s fine. What do you think? Is there a better approach? I would much prefer a solution that allowed input to be URLs, even at the expense of special characters in shebang files, rather than requiring the entry to always be a file path. There’s always going to be a tradeoff there but I think there’s much more benefit to allowing URLs than allowing shebang filenames with special characters.

@GeoffreyBooth
Copy link
Member Author

Folks subscribed to this issue, please tell me what you think. Which is a better UX under the new mode:

  • 🚀 Only URLs are allowed as the main entry point:

    • node entry.js?foo=bar
    • node wha%3F.js
    • node 'data:text/javascript,console.log("Hello")'
    • node file:///app/start%20here.js
  • 👀 The main entry point is still parsed as a path (though with explicit extension required) and if you want to run Node with an URL as the entry, you use --import with no entry point and you need to start with ./ for relative URLs:

    • node --import ./entry.js?foo=bar
    • node 'wha?.js'
    • node --import 'data:text/javascript,console.log("Hello")'
    • node '/app/start here.js'

Assume for the purposes of this question that both can be made to work: we can figure out ways to properly error on ambiguous special characters, etc.

@mcollina
Copy link
Member

I would follow the latter case and minimize the breakage. Please use paths, folks are used to use paths to manipulate files in the filesystem.

@LiviaMedeiros
Copy link
Contributor

I think the terminology is mixed up. We don't "parse the main entry point" here, we parse first positional argument in something that points to main entry point.

My answer is: we absolutely must keep parsing first positional argument only as path, and we should be able to provide main entry point as URL in a different way than as first positional argument.

@targos
Copy link
Member

targos commented Sep 28, 2023

I agree with @LiviaMedeiros and @mcollina.

We must not change how the first positional argument is interpreted by default. We don't want to break the use case and user expectations.

What we can do is create a new flag that either:

  • Takes an URL (or relative URL as defined in 🚀) as a parameter
  • Takes no parameter and changes the behavior of the first positional argument to 🚀.

@GeoffreyBooth
Copy link
Member Author

Okay, so it sounds like people want the second option, where if you want to pass an entry point that’s an URL you do it via node --import ./entry.js?foo=bar. Currently that command launches the REPL, with entry.js loaded beforehand.

We could ship a semver-major change where node --import <entry URL> would behave the same as node --import <entry URL> --eval '' does today, i.e. running the entry URL. I feel like this could just ship on its own separate from the new flag, and land in 21.0.0? And it wouldn’t be backportable, but I don’t think it needs to be; people could just add --eval '' for Node <20. What do people think of this?

@GeoffreyBooth
Copy link
Member Author

And just so we’re clear, it seems like we don’t want the first option, but are people asking for something other than the second option? I was planning to use --import for this to avoid the need to create yet another CLI flag; are people saying that they would prefer a new flag, like --entry-url?

@Qard
Copy link
Member

Qard commented Sep 28, 2023

I don't like the behaviour of not being able to enter the REPL with an --import flag. That seems like it would be confusing.

Personally I think what would make the most sense is an option to change how to interpret that first arg. So your --entry-url would just be a mode changer not a --entry-url=url form.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Sep 28, 2023

I don’t like the behaviour of not being able to enter the REPL with an --import flag. That seems like it would be confusing.

You would just need to pass -i or --interactive. Like node --import ./dependency.js?foo=bar --interactive would launch the REPL. This would a breaking change, but I'm not sure how many people it would affect.

Personally I think what would make the most sense is an option to change how to interpret that first arg. So your --entry-url would just be a mode changer not a --entry-url=url form.

I intended --entry-url to be a boolean that enables the new behavior for the parsing of the entry point. So like node --entry-url --enable-source-maps ./entry.js?foo=bar, where --entry-url doesn’t take a value.

The issue with --entry-url being a standalone option is that then it needs to work for CommonJS too, unless we just error if the entry is CommonJS?

@aduh95
Copy link
Contributor

aduh95 commented Sep 28, 2023

The issue with --entry-url being a standalone option is that then it needs to work for CommonJS too, unless we just error if the entry is CommonJS?

It shouldn’t be an issue, the ESM loader is able to load CJS, why would we error?

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Sep 28, 2023

It shouldn’t be an issue, the ESM loader is able to load CJS, why would we error?

I was thinking how would we handle something like node --entry-url ./entry.cjs?foo=bar but I guess you’re right, it would get handled however an import() of that would be handled currently, which I assume is along the lines of “convert the URL to a file path and import/require it.” So --entry-url would become another of those “if this is used, the user has opted into having the ESM loader handle the entry point” flags like --import and --loader.

Anyway getting back to the UX question first though, which do we want:

  • 👀 --import is used for this case, like node --import ./entry.js, and if the user wants the REPL with --import they need to add -i or --interactive
  • ❤️ --entry-url is a new boolean flag that causes the entry point string to be parsed as a URL and loaded by the ESM loader: node --entry-url --enable-source-maps ./entry.js?foo=bar

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Sep 29, 2023

I know people are still debating the best approach for supporting URL entry points, but since one of the options under consideration is a semver-major change and the others aren’t, I opened a PR for the semver-major one in case that wins consensus: #49946. Let’s hopefully decide whether or not to go with that approach before the 21.0.0 cutoff on Tuesday.

@LiviaMedeiros
Copy link
Contributor

I think it would be better to open spinoff issue so we can have consensus on more details.
For example, do we want to support relative URLs? Do we want to support bare specifiers? Do we want to support shortcut specifiers? Do we want to allow users to override mime-type? Should we support loading CJS with URL? Should we seek for pjsons for file: URLs?

For the UX question, it might be worth mentioning that third option: providing main entry URL as named argument.

As for when it should land, I see it as a feature tied to import.meta.main rather than feature tied to --default-type. If we are planning to have .main soon, I'd suggest to land this feature directly afterwards (if not in the same PR, assuming that import.meta.main isn't semver-major).
The reasons for that are that it's much harder (if possible at all) to test if it actually works as main entry point, and it would be unclear for users that there's difference between main entry point and import until we provide a mechanism to differentiate.

@GeoffreyBooth
Copy link
Member Author

I think it would be better to open spinoff issue so we can have consensus on more details.

We can, but since the first PR implemented everything else I think the URL entry stuff is all that’s left to design, so maybe we can just finish it here?

For example, do we want to support relative URLs?

Yes. The input would be relative to the file URL of process.cwd().

Do we want to support bare specifiers?

I don’t think so. That’s what --import is for, or one could do node --entry-url data:text/javascript,import('bare-specifier').

Do we want to support shortcut specifiers?

What is a shortcut specifier?

Do we want to allow users to override mime-type?

From an HTTPS URL, you mean? I think not as part of the CLI, that can be achieved via a module customization hook.

Should we support loading CJS with URL? Should we seek for pjsons for file: URLs?

Yes and yes. It would be loaded by the ESM loader, the same as if you had referenced a relative or absolute file URL via import().

@LiviaMedeiros
Copy link
Contributor

What is a shortcut specifier?

Subpath imports, the ones starting from #.

Do we want to allow users to override mime-type?

From an HTTPS URL, you mean? I think not as part of the CLI, that can be achieved via a module customization hook.

From any non-file: URL, to be precise.
For example, we could expand --input-type to enable node --input-type=commonjs --entry-url https://example.com/legacy.cjs and node --input-type=commonjs --entry-url 'data:text/javascript,console.log("am CJS");' (I don't think we should, but it's worth considering).

@GeoffreyBooth
Copy link
Member Author

Subpath imports, the ones starting from #.

No, I think we shouldn’t implement the ESM resolution algorithm here. # would just mean a hash, like a URL.

From any non-file: URL, to be precise.

We already have --experimental-network-imports. The way data: and https: imports work currently is that Node uses the MIME types from the header of the resource. That can be overridden by the load module customization hook; I don’t think we want to provide another way to customize this.

Offroaders123 added a commit to Offroaders123/NBTify that referenced this issue Feb 15, 2024
The more I look into this, the more complex it seems, and the less sure I am with both the way I have things now, as well as whether I like the other options instead.

I think I may just stick with TSC for compiling my package code in general, and if anything I can still do/try out the dual-build setup with ESM/CJS, but maybe just use TSC, since I can ensure that things are built consistently.

Should I provide synchronous APIs from NBTify? I think learning about how to work with async has really helped me progress, and I don't want to prevent someone else from learning that if I go with providing a sync option to my API. Or rather, if anything, that would only work in Node, because the Compression Streams API is only an asynchronous implementation. I'd have to use Node Zlib as a fallback to allow for a synchronous implementation, and it would be for Node only. I'm not sure I like the splintering of that either.

I think I have realized afterall, that I don't specifically want to use ESBuild for building my packages, since it's meant to bundle code, and I think I would only want to use that in a situation of building an ESM bundle for the browser. I may look into that as well though. Maybe I can use pkgbuild for that instead though? It kind of works a bit nicer for plain TS type generation support out of the box, as well as the configuation for things just plain working. Maybe I can look into making my own tool kind of like that, since it feels like all of the tools for this that I find, they all don't quite work exactly like how I want them to, unfortunately.

https://www.reddit.com/r/node/comments/14os7zv/the_esmcjs_situation/
https://github.com/wooorm/npm-esm-vs-cjs
https://stackoverflow.com/questions/46515764/how-can-i-use-async-await-at-the-top-level
https://stackoverflow.com/questions/71517624/because-i-cant-run-await-on-the-top-level-i-have-to-put-it-into-an-async-funct
https://commerce.nearform.com/blog/2021/node-esm-and-exports/
https://dev.to/a0viedo/nodejs-typescript-and-esm-it-doesnt-have-to-be-painful-438e
https://www.reddit.com/r/node/comments/14rg9ym/esm_not_gaining_traction_in_backend_node/
nodejs/node#49432
https://github.com/azu/tsconfig-to-dual-package
https://www.breakp.dev/blog/simple-library-package-setup-with-esbuild/
https://esbuild.github.io/api/#build
evanw/esbuild#263
evanw/esbuild#618
microsoft/TypeScript#46005

https://guitar.com/news/music-news/devin-townsend-shredding-dreams-steve-vai/
https://www.reddit.com/r/DevinTownsend/comments/17s332s/devin_and_the_vai_years/
https://www.reddit.com/r/DevinTownsend/comments/k9jot4/devin_townsend_i_was_unable_to_articulate_my/
https://www.kerrang.com/devin-townsend-i-was-unable-to-articulate-my-discontent-so-i-tended-to-act-up-i-even-took-a-sh-t-in-steve-vais-guitar-case
https://www.reddit.com/r/DevinTownsend/comments/6tia0r/is_vai_comparing_devin_to_zappa/
Offroaders123 added a commit to Offroaders123/NBTify that referenced this issue Feb 21, 2024
The more I look into this, the more complex it seems, and the less sure I am with both the way I have things now, as well as whether I like the other options instead.

I think I may just stick with TSC for compiling my package code in general, and if anything I can still do/try out the dual-build setup with ESM/CJS, but maybe just use TSC, since I can ensure that things are built consistently.

Should I provide synchronous APIs from NBTify? I think learning about how to work with async has really helped me progress, and I don't want to prevent someone else from learning that if I go with providing a sync option to my API. Or rather, if anything, that would only work in Node, because the Compression Streams API is only an asynchronous implementation. I'd have to use Node Zlib as a fallback to allow for a synchronous implementation, and it would be for Node only. I'm not sure I like the splintering of that either.

I think I have realized afterall, that I don't specifically want to use ESBuild for building my packages, since it's meant to bundle code, and I think I would only want to use that in a situation of building an ESM bundle for the browser. I may look into that as well though. Maybe I can use pkgbuild for that instead though? It kind of works a bit nicer for plain TS type generation support out of the box, as well as the configuation for things just plain working. Maybe I can look into making my own tool kind of like that, since it feels like all of the tools for this that I find, they all don't quite work exactly like how I want them to, unfortunately.

https://www.reddit.com/r/node/comments/14os7zv/the_esmcjs_situation/
https://github.com/wooorm/npm-esm-vs-cjs
https://stackoverflow.com/questions/46515764/how-can-i-use-async-await-at-the-top-level
https://stackoverflow.com/questions/71517624/because-i-cant-run-await-on-the-top-level-i-have-to-put-it-into-an-async-funct
https://commerce.nearform.com/blog/2021/node-esm-and-exports/
https://dev.to/a0viedo/nodejs-typescript-and-esm-it-doesnt-have-to-be-painful-438e
https://www.reddit.com/r/node/comments/14rg9ym/esm_not_gaining_traction_in_backend_node/
nodejs/node#49432
https://github.com/azu/tsconfig-to-dual-package
https://www.breakp.dev/blog/simple-library-package-setup-with-esbuild/
https://esbuild.github.io/api/#build
evanw/esbuild#263
evanw/esbuild#618
microsoft/TypeScript#46005

https://guitar.com/news/music-news/devin-townsend-shredding-dreams-steve-vai/
https://www.reddit.com/r/DevinTownsend/comments/17s332s/devin_and_the_vai_years/
https://www.reddit.com/r/DevinTownsend/comments/k9jot4/devin_townsend_i_was_unable_to_articulate_my/
https://www.kerrang.com/devin-townsend-i-was-unable-to-articulate-my-discontent-so-i-tended-to-act-up-i-even-took-a-sh-t-in-steve-vais-guitar-case
https://www.reddit.com/r/DevinTownsend/comments/6tia0r/is_vai_comparing_devin_to_zappa/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Issues and PRs related to the Node.js command line interface. esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. wasi Issues and PRs related to the WebAssembly System Interface. wasm Issues and PRs related to WebAssembly.
Projects
None yet
Development

No branches or pull requests