Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: add --module option #49295

Closed

Conversation

LiviaMedeiros
Copy link
Contributor

@LiviaMedeiros LiviaMedeiros commented Aug 23, 2023

This adds an option to directly load ESM entry point by absolute or relative URL.

Allows:

$ node --module file.mjs?qwe=rty#uio
$ node --module relative/url/to/file.mjs?qwe=rty#uio
$ node --module /absolute/url/to/file.mjs?qwe=rty#uio
$ node --module win32_style\\url\\to\\file.mjs?qwe=rty#uio
$ node --module file:///wellformed/url/to/file.mjs?qwe=rty#uio
$ node --module file%20with%20spaces.mjs?qwe=rty#uio
$ node --module "file with spaces.mjs?qwe=rty#uio"
$ node --experimental-network-imports --module "https://example.com/module.js"
$ node --module esm_file_without_extension?qwe=rty#uio
$ node --module "data:text/javascript,console.log(new URL(import.meta.url));"?qwe=rty#uio

Doesn't allow:

use `node --import my-package` instead
$ node --module my-package

I.e. most importantly, it allows to

  • load ES module as entry point with user-defined search and hash
  • load ES module explicitly in ESM mode no matter what file extension it has
  • load it explicitly as URL, with consistent percent-decoding

Without breaking changes to how node path/to/file.js works.

Might be related to: #34049
Fixes: #46009
Fixes: #49204

I'm not sure if this is the most correct way to implement it (or if a better alternative is planned), hence draft.

cc @nodejs/modules

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/modules
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Aug 23, 2023
@GeoffreyBooth
Copy link
Member

  • load ES module explicitly in ESM mode no matter what file extension it has

This used to exist, as --entry-type. It was removed in favor of --input-type, because of a cascade of UX problems. Please review that history, and if you think you have a way to avoid all the problems cited against --entry-type, then we can consider this.

  • load ES module as entry point with user-defined search and hash
  • load it explicitly as URL, with consistent percent-decoding

I think this is possible already?

node --eval 'import("./2.mjs?key=value#hash")'

This is as short as it is because dynamic import() is allowed in CommonJS, allowing us to skip --input-type=module.

So I guess the ask is to make this even shorter and more obvious. But before getting to that, what is the use case for this? Why is this common enough to justify a new CLI flag? If this is an edge case, then the existing solution is arguably good enough.

@aduh95
Copy link
Contributor

aduh95 commented Aug 23, 2023

I think this is possible already?

node --eval 'import("./2.mjs?key=value#hash")'

If 2.mjs or any of its dependencies is using top-level await, it would lead to surprising behavior and hidden errors. I wouldn't recommend doing this. node --input-type=module -e 'import "./2.mjs?key=value#hash"' is a much better alternative – but does not address the linked issues.

@GeoffreyBooth
Copy link
Member

If 2.mjs or any of its dependencies is using top-level await, it would lead to surprising behavior and hidden errors. I wouldn’t recommend doing this.

What surprising behavior? What hidden errors? Let's please not spread FUD.

node --input-type=module -e 'import "./2.mjs?key=value#hash"' is a much better alternative – but does not address the linked issues.

What linked issues? I read #49204 but I don’t really see a use case; just “I want to be able to launch Node with a URL” but not why the user wants to do that. I don’t think we should be adding complexity for the sake of adding features, only for adding use cases. What’s the use case for launching Node with a URL?

@LiviaMedeiros
Copy link
Contributor Author

This used to exist, as --entry-type. It was removed in favor of --input-type, because of a cascade of UX problems. Please review that history, and if you think you have a way to avoid all the problems cited against --entry-type, then we can consider this.

AFAICT the linked issue objected --entry-type's behaviour specifically. This option:

  • Does not depend on type in package.json, on --input-type, or file extension. It unconditionally overrides the type. It can be named --force-module or something if the concern is about ambiguous behaviour.
  • Does not aim on solving the "support extensionless files" or "support non-.mjs files" or "support directly loading files over https" issues. It's just how it works, and it's aligned with module loading in browsers (i.e. https://example.com/mycode.es2023module or https://example.com/mycode must work as long as web server provides appropriate Content-Type headers).

If there are problems that are applicable to this version rather than to --entry-type implementation, please point on these specific problems so we can see in which direction it might be improved.

I think this is possible already?

node --eval 'import("./2.mjs?key=value#hash")'

Exactly, this is possible with import, import(), and --import. In fact, this is necessary: we can't use path string in import until we at least encodeURI it.

Supporting search and hash here allows to do parameterized imports such as const hmac = await import('./crypto/myHMAC.mjs?algo=SHA3-768'), import { db } from './flexbb/mydb.mjs#customdbprefix_', etc.

But it's import, not direct load. The question is, if we have this for import, why we don't have this for main entry point?
For the minimal meaningful reproduction, let's say, myHMAC.mjs is:

const params = new URL(import.meta.url).searchParams;
if (!params.get('algo'))
  throw new Error('algo must be supplied!');

if (import.meta.main) {
  console.info(`Oops, this should be imported!\nUsage info: ...`);
  process.exit(-14);
}

// [implementation]

or become

#!/usr/bin/env node
const params = new URL(import.meta.url).searchParams;
// TODO: allow algo to be provided as command line argument
if (!params.get('algo'))
  throw new Error('algo must be supplied!');

// [implementation]

if (import.meta.main) {
  const autogeneratedKey = makeNewKey();
  const hash = myHMAC(process.stdin, autogeneratedKey);
  console.log({ autogeneratedKey, hash });
}

Assuming import.meta.main is supported, its code is reachable and it could be reached directly as node-49295 --module https://example.com/myHMAC.mjs?algo=123 (or as deno run https://example.com/myHMAC.mjs?algo=123).

So, how do I at least test this part with Node.js, without writing a huge loader hook to somehow set import.meta.main to true?

Also, how do I load a any module over https: or data: protocol as main module?

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Aug 24, 2023

If there are problems that are applicable to this version rather than to --entry-type implementation, please point on these specific problems so we can see in which direction it might be improved.

See the --entry-type “cons” listed in nodejs/modules#300 (comment):

  • So far, all users who have read about this have assumed that setting the type of the entry also opts in to that type for all files imported by that entry point. As in, if you set the entry point to be ESM, import statements of .js files should treat those .js files also as ESM. This user expectation is likely to only become stronger as ESM in browsers becomes more widely used, as this is how <script type="module"> behaves in browsers.

  • It is inconsistent for entry.js and dep.js to be side by side, where entry.js imports dep.js, and node --entry-type=module entry.js loads entry.js as ESM but then dep.js is treated as CommonJS. This is the only case where files with the same extension in the same folder are treated differently by Node.

  • There’s no use case for overriding the initial entry point of a project while relying on file extensions or package.json to define the type of all other files in the project.

I would think that your proposal shares these issues? With regard to the last one, yes the query params stuff is a use case but that points toward a different goal, of having Node treat the entry as a URL rather than a path string, which is separate from forcing it to be treated as ESM. There are other ways to solve the “treat as URL” request, such as the options mentioned above, or we can propose new ways.

Something to consider is that Node already supports Wasm files, and someone might want the equivalent “interpret the entry as a URL” option for a Wasm entry. So --module isn’t really the right name if that’s what this flag is doing; it could be --entry-url, for example, to opt into parsing the entry the same way that --import parses its value. Another option is to just allow running node with no entry point, like node --import entry.js?foo=bar. Currently this runs the imported script and then opens the REPL, but we could perhaps change that.

@LiviaMedeiros
Copy link
Contributor Author

  • So far, all users who have read about this have assumed that setting the type of the entry also opts in to that type for all files imported by that entry point.
  • I guess "all users" is a bit of exaggeration, especially in modern days when we can import wasm and json.
  • I'm not sure if I'll be able to make perfect documentation for this option by myself but I believe that problem of users who misread or misunderstand that option can be solved by documenting it, and that concern of people misusing this shouldn't be blocking.
  • This is already listed in cons from that issue, but --input-type is also affected by this. It's the most promising option for user who wants to make standalone .js or extensionless file just work in strict mode, and it doesn't give any warning when used as node --input-type=module file.js until something ESM-only is used.
  • It is inconsistent for entry.js and dep.js to be side by side, where entry.js imports dep.js, and node --entry-type=module entry.js loads entry.js as ESM but then dep.js is treated as CommonJS. This is the only case where files with the same extension in the same folder are treated differently by Node.
  • If it was so simple and robust, we wouldn't need preserve-symlinks and --preserve-symlinks-main flags. By default, if I have "type":"module" in adjacent package.json and write import './dep.js';, I can't be sure how it's treated: it depends on if dep.js is symlink or not, and in what kind of scope its realpath lies.
  • Yes, it explicitly changes how one particular entry point is loaded, and doesn't change anything about its import resolution. I guess it's not only desired, but also the only possible thing that may happen; unless there are users who believe in magical --make-all-legacy-code-work-in-strict-mode kind of feature.

Also, if users want to misuse it and bypass type resolution, they will do that anyway. They will find a way like cat file.js | node --input-type=module, node --import "data:text/javascript,$(cat file.js)", etc. But in case of --module, we can at least print warning or hint; while tampering with file contents is impossible to detect.

  • There’s no use case for overriding the initial entry point of a project while relying on file extensions or package.json to define the type of all other files in the project.

This is argument against --entry-type misuse. --module is not intended for usage with CJS-first projects, and overriding type for them won't be useful. However,

  • It would help with ESM-first projects, as it would allow to load entry point with the same rights as imported modules, as URL (enabling data: and https: protocols) with user-defined parameters.
  • It would be extremely helpful for non-scoped files such as standalone CLI tools. User's scripts in /usr/local/bin can be a mix of CJS and ESM, they can be completely unrelated to each other, they are not handled by package managers, and they shouldn't have package scope.

With regard to the last one, yes the query params stuff is a use case but that points toward a different goal, of having Node treat the entry as a URL rather than a path string, which is separate from forcing it to be treated as ESM. There are other ways to solve the “treat as URL” request, such as the options mentioned above, or we can propose new ways.

Agreed. The reasons why it's a combined option are:

  • CJS modules are path-based and there is probably no reason to load them like this.
  • No reason to load JSON modules as entry point, either.
  • This is closer to how existing URL-based non-file: imports work: node --experimental-network-imports --import https://example.com/file.cjs loads in ESM mode despite file extension and node --import "data:text/javascript,console.log(import.meta.url)" loads in ESM mode despite lack of --input-type.

IMHO this is still more intuitive and more convenient that resolution algorithm that handles URL specifier differently depending on its protocol.

Of course, if I'm missing something and there are more reasons to apply path-oriented resolution here, it's possible to rework it into --entry-url.

Something to consider is that Node already supports Wasm files, and someone might want the equivalent “interpret the entry as a URL” option for a Wasm entry. So --module isn’t really the right name if that’s what this flag is doing; it could be --entry-url, for example, to opt into parsing the entry the same way that --import parses its value.

Right, this approach is applicable for Wasm entry points, too. The choices I see here are:

  1. Keep it as is, i.e. --module/--module-url/--entry-esm/--load-by-url-and-force-esm-type whichever sounds better.
    Separately, we can add --wasm/--wasm-url/--entry-wasm option that does similar thing. One option per url-based type that can work as main entry point won't clutter command line options.
  2. Modify it into --entry-url[:type], so it can be typed as --entry-url:wasm https://example.com/blob.wasm.
    Basically same thing, but this form might also support :commonjs and :json for the sake of completeness
  3. Modify it to --entry-url and keep resolution algorithm.
    I agree that we can separate launch-by-url and launch-as-module, but in this case I'd also like to know what proposed launch-as-module mechanism would be. So far, I see that options like --entry-type or separate node-esm binary are rejected. If there is no better alternative, this option can be a good opportunity to make a step forward to resolve extensionless files support and similar issues.

Another option is to just allow running node with no entry point, like node --import entry.js?foo=bar. Currently this runs the imported script and then opens the REPL, but we could perhaps change that.

This probably wouldn't work. There can be multiple --imports, it's misleading to say "import" for main entry point, and opening REPL with --import ./temporal.mjs --import poteto --import https://example.com/map-emplace-polyfill.js is valid usecase.


After checking for related issues, I'm linking #46009 as one that would be fixed by this PR, let me know if this is incorrect.

@GeoffreyBooth
Copy link
Member

  • It is inconsistent for entry.js and dep.js to be side by side, where entry.js imports dep.js, and node --entry-type=module entry.js loads entry.js as ESM but then dep.js is treated as CommonJS. This is the only case where files with the same extension in the same folder are treated differently by Node.
  • If it was so simple and robust, we wouldn’t need preserve-symlinks and --preserve-symlinks-main flags. By default, if I have "type":"module" in adjacent package.json and write import './dep.js';, I can’t be sure how it’s treated: it depends on if dep.js is symlink or not, and in what kind of scope its realpath lies.
  • Yes, it explicitly changes how one particular entry point is loaded, and doesn’t change anything about its import resolution. I guess it’s not only desired, but also the only possible thing that may happen; unless there are users who believe in magical --make-all-legacy-code-work-in-strict-mode kind of feature.

Also, if users want to misuse it and bypass type resolution, they will do that anyway. They will find a way like cat file.js | node --input-type=module, node --import "data:text/javascript,$(cat file.js)", etc. But in case of --module, we can at least print warning or hint; while tampering with file contents is impossible to detect.

I don’t see how any of these points address the issue. In a folder with no package.json, if you run node --module a.js, and a.js imports CommonJS file b.js, which then calls require('a.js'), does Node error on a.js being an ES module that can’t be required? Or does Node try to evaluate a.js as a CommonJS module?

Yes this is a contrived example but it’s illustrating the point that you have two files in the same folder with the same file extension being treated differently; the only way this application could work is with the knowledge that a.js needs to be run with --module, which feels a lot like a footgun. This potential source of confusion was why were leaning away from --entry-type toward --package-type, a flag that would set the current package scope to be either CommonJS or ESM (so node --package-type=module entry.js would more or less be equivalent to echo '{"type": "module"}' > package.json; node entry.js). Ultimately we didn’t end up shipping that either because there weren’t really any use cases for such a flag (if I remember correctly).

I think there’s an argument to be made that it would be nice to launch Node with an entry point that’s a URL. I just don’t know if that argument is compelling enough to justify creating a new flag, when you can achieve the same via node --input-type=module --eval 'import <url>'. The idea of using query params as a way to pass input into a module feels very hacky to me; like a workaround for just importing a function from the module and running it.

@LiviaMedeiros
Copy link
Contributor Author

I don’t see how any of these points address the issue.

These points mostly address the issue being an issue.

What is practical benefit of having same type for same extension in same directory? I can see two factors:

  • If I know the scope of directory (e.g. checked explicitly with jq .type ./package.json), I can be 100% sure what happens if I run ./1.js and ./2.js.
  • If I write import with relative path that is known to be in known scope, I can be 100% sure what type will be applied to it.

Both are not working by default because 2.js can be a symlink pointing at file in any scope, or become symlink at some point, and it will be resolved using that unknown scope. On top of that,

  • What about network imports, even if we serve from localhost? I guess, this rule will stop working if the very same package with the very same directory structure is served over http: rather than file:? And even if we crawl for package.jsons, how do we know if file is symlink or not on server, and where its realpath is located?
  • What about data: imports? These are not located anywhere on filesystem by design, and AFAICT their import is just as force-module as proposed here.

Moreover, this "rule" contradicts with the idea of having /usr/bin/ populated with a mix of .js, .mjs and extensionless files that all behave differently because they all are symlinks to scoped files installed by package manager.

Tl;dr I think that for URL-based resolution (that is native for module and not native for commonjs) the very concept of "being in same directory" doesn't work as good as it did with cjs-only ecosystem, and this rule shouldn't be enforced here.

In a folder with no package.json, if you run node --module a.js, and a.js imports CommonJS file b.js, which then calls require('a.js'), does Node error on a.js being an ES module that can’t be required? Or does Node try to evaluate a.js as a CommonJS module?

Simple: main entry must be evaluated as ESM and imports must be evaluated as usual.

If there is a chance that it may happen in real life, we can probably detect this and print a warning. This usage doesn't seem to be intentional to me, unless for some reason they want to do some weird trick like

(() => {
  try { 09; } catch { return; }
  /* let's use sloppy mode magic! */
})();

Yes this is a contrived example but it’s illustrating the point that you have two files in the same folder with the same file extension being treated differently; the only way this application could work is with the knowledge that a.js needs to be run with --module, which feels a lot like a footgun.

If someone designs an application that relies on the fact that a.js must be evaluated as ESM and then via circular dependency again as CJS on purpose, they don't even need a footgun. They probably already did this using ln -s a.js a.mjs with --preserve-symlinks or using data:text/javascript,$(cat a.js) or with a dozen of similar workarounds.

This potential source of confusion was why were leaning away from --entry-type toward --package-type, a flag that would set the current package scope to be either CommonJS or ESM (so node --package-type=module entry.js would more or less be equivalent to echo '{"type": "module"}' > package.json; node entry.js). Ultimately we didn’t end up shipping that either because there weren’t really any use cases for such a flag (if I remember correctly).

The alternatives to proposed --module behaviour are exactly this: to either do it in --entry-type style that will override type for "file" (or, to be more precise, with the module with URL specifier; IMHO mixing it with CJS-style concept of file on filesystem is the actual source of confusion here) whenever it's imported again and be ambiguous with type specified in package.json if any; or to be like --package-type (which wouldn't work well as we must specify scope which is applicable only to file: modules).

The proposed option doesn't share these issues because it does not mess with any imports at all, it's only applied to main entry point.

Also it wouldn't endorse developers to design "packages that work only with this option", as all imports in package will continue working the same, and importing the package as dependency will also work the same.

I think there’s an argument to be made that it would be nice to launch Node with an entry point that’s a URL. I just don’t know if that argument is compelling enough to justify creating a new flag, when you can achieve the same via node --input-type=module --eval 'import <url>'.

It's not the same as it's not the main entry point.

The idea of using query params as a way to pass input into a module feels very hacky to me; like a workaround for just importing a function from the module and running it.

This concern might go offtopic but the idea of using querystring to parametrize a module in JavaScript feels even more natural than process.argv and process.env to me, since it works the same in browsers and other runtimes (and in Node.js itself, but only for imports and not for main entry point for some reason).

"Just importing a function" doesn't always work. Let's say, when we import myHMAC.mjs?algo=sha9, it performs a process of initialization, e.g. collecting enough entropy for autogenerating salt, loads certificate chains from CA for signing functions, pre-allocates buffers, etc.
Whenever we import it again with same parameters, we prefer to have the same already-prepared instance, and module cache will do the trick for us. When we import it with different parameters, we get "different module" that is initialized separately and cached by separate specifier.
This approach is not as popular as DIY instance caching or singleton pattern and has some downsides, but I don't see why it shouldn't be viable nor how users can "just import a function" if third-party module uses this approach, without forking and rewriting it.

@GeoffreyBooth
Copy link
Member

These points mostly address the issue being an issue.

You’re essentially arguing that what many people in that earlier thread found confusing isn’t actually confusing. It’s not a “rule” that we decided upon back then; it was a question of “if we allow this override, will users be confused by how it behaves?” and the consensus was that users would be confused; and the benefit of the override didn’t outweigh the cost of the bad UX. I think that that still applies today.

It would be nice if Node’s entries were always treated as URLs, the way they are for --import, but we just can’t make that change because of backward compatibility. As for network imports and so on, Node cares about file extensions; our version of a network server that decides what Content-Type MIME to use is to read the file extension.

It’s not the same as it’s not the main entry point.

Imports have no way of knowing whether they’re the main entry point. We don’t support import.meta.main.

The bar to adding new flags should be high, as it’s both bad UX the more we add and it adds downstream complexity, like in this flag’s case how this special “treat as module override” is supposed to carry through into the resolve and load module customization hooks. This proposed flag seems messy in that it’s trying to solve two possibly unrelated use cases, that haven’t been common asks; and there are existing solutions to achieve equivalent or very similar results.

I would suggest that you take a step back and focus on one use case or the other, and open an issue asking for ideas to achieve it, with some that you’ve considered. Building new features is a search for consensus, both that the problem deserves to be solved and the use case supported, and also agreement that your proposed solution is the best one. This PR seems like a jump straight to the end of the process, which can sometimes work if there are lots of people already in agreement on all of the decisions that you made on the way (both that the use case deserves support and that this is the best solution) but based on the prior art around --entry-type and the lack of others commenting on this thread, I don’t think that that consensus is here at the moment.

@LiviaMedeiros
Copy link
Contributor Author

You’re essentially arguing that what many people in that earlier thread found confusing isn’t actually confusing. It’s not a “rule” that we decided upon back then; it was a question of “if we allow this override, will users be confused by how it behaves?” and the consensus was that users would be confused; and the benefit of the override didn’t outweigh the cost of the bad UX. I think that that still applies today.

Please point it out where many people found confusing about the behaviour proposed in this PR. What was discussed in that earlier thread is --entry-type, not this.

It would be nice if Node’s entries were always treated as URLs, the way they are for --import, but we just can’t make that change because of backward compatibility.

This is correct and this is why I propose a command line option instead of changing how node's first positional argument is interpreted.

As for network imports and so on, Node cares about file extensions; our version of a network server that decides what Content-Type MIME to use is to read the file extension.

This is wrong. Node.js's import only cares about Content-Type header, and there is no separate MIME types for ESM and CJS.
What web server uses to decide is irrelevant.

Imports have no way of knowing whether they’re the main entry point. We don’t support import.meta.main.

We don't support it yet. In fact, I think it would make sense to ship this option together with the introduction of import.meta.main.
If there will be decision to not support it in foreseeable future, it would be a good argument against this option.

The bar to adding new flags should be high, as it’s both bad UX the more we add and it adds downstream complexity, like in this flag’s case how this special “treat as module override” is supposed to carry through into the resolve and load module customization hooks. This proposed flag seems messy in that it’s trying to solve two possibly unrelated use cases, that haven’t been common asks; and there are existing solutions to achieve equivalent or very similar results.

Please point out which parts of proposed implementation are confusing or too complex; perhaps we can solve the specific issues.

For resolve hook, it works like this:

  • on resolve, we get (await nextResolve(url)).format === 'module'.
  • if we override it with result.format = 'commonjs', the module will be require()'d, i.e. by default .cjs will be loaded as CJS and .mjs will throw ERR_REQUIRE_ESM.

For load hook, it works like this:

  • when we load something via --module, we get (await nextLoad(url)).format === 'module'.
  • if we override it with result.format = 'commonjs', the module will be loaded as CJS.

So basically, hook gets .format === 'module' hint, gets full URL as specifier, can mutate the result and the outcome will be the same as with regular load. Unless I missed something, it doesn't add complexity and doesn't lower capabilities of hooks.

I would suggest that you take a step back and focus on one use case or the other, and open an issue asking for ideas to achieve it, with some that you’ve considered.

There are two issues linked: #46009 and #49204. Wouldn't opening a new one be a duplicate?

This PR is mainly to discuss this particular solution and its implementation, and to see exactly how complex it is, is it feasible, are there any unobvious nuances (right now I see that it needs adjustments in how process.argv is built, but otherwise it looks stable), or is there a better approach code-wise.

I'll look into possibility of making --entry-url alternative since it looks less controversial, although for now I'm

  • not really convinced that it would be better than this version
  • afraid that if we add it as "first step", we'll get similar problems again: need new option, works well only in synergy with --entry-url because specifier must be URL and not path, etc. etc.

Building new features is a search for consensus, both that the problem deserves to be solved and the use case supported, and also agreement that your proposed solution is the best one. This PR seems like a jump straight to the end of the process, which can sometimes work if there are lots of people already in agreement on all of the decisions that you made on the way (both that the use case deserves support and that this is the best solution) but based on the prior art around --entry-type and the lack of others commenting on this thread, I don’t think that that consensus is here at the moment.

Agreed. Of course there is no rush, I understand that consensus, enough of time and enough amount of explicit approvals is required; as well as possibility of new alternate solutions, or bugs that can't be fixed with this approach. If you feel like the part of discussion that is about issue itself and alternate solutions should be continued in any of the linked issues or a new one, I wouldn't mind to move. Or maybe you would like to suggest something to get more feedback?

Either way, thanks for discussion so far, I hope it will serve as good ground for everyone else to agree, counter, or propose a new suggestion. :)

Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll just make this explicit: we shouldn’t have a single CLI flag that does both force the Node entry point to be treated as an ES module and force it to be interpreted as an URL string. The first part was already proposed and rejected in --entry-type and I don’t find the arguments listed here persuasive in light of the previous discussion.

With regard to forcing the entry to be parsed as a URL string, I’m not opposed per se but I’m unconvinced that we need such a feature. I’m not sure that such an ability is worth adding a new flag for. The discussion in #46009 was more one of surprise that Node didn’t support URL strings, not a suggestion that it begin to do so (since apparently it can’t be done without adding a new flag). No one on that thread suggested adding a new flag to support such a feature.

@aduh95
Copy link
Contributor

aduh95 commented Aug 28, 2023

What is practical benefit of having same type for same extension in same directory? I can see two factors:

  • If I know the scope of directory (e.g. checked explicitly with jq .type ./package.json), I can be 100% sure what happens if I run ./1.js and ./2.js.
  • If I write import with relative path that is known to be in known scope, I can be 100% sure what type will be applied to it.

Both are not working by default because 2.js can be a symlink pointing at file in any scope, or become symlink at some point, and it will be resolved using that unknown scope.

The benefit is not to help humans disambiguate the type of file – for most JS files, any JS dev can very easily guess if it's CJS or ESM at a glance without looking at the file extension or the related package.json – it's for the computer: by having a set of well defined and stable rule to tell if a file is CJS or ESM, all the tools of the ecosystem that follow the same rules will agree on how such file should be parsed.

The goal is not to have a implication "same extension + same directory => same type", it's to have a deterministic way to tell if such or such file should be parsed as ESM or CJS.

If someone designs an application that relies on the fact that a.js must be evaluated as ESM and then via circular dependency again as CJS on purpose, they don't even need a footgun.

It's possible to write code that can be parsed as either ESM or CJS, and output different things depending how it's parsed.

// a.js
import('./b.js');
console.log(`Hello from ${this===undefined ? 'ESM' : 'CJS'}`);
// b.js
require('./a.js');
$ node ./a.js
Hello from CJS
$ node --module ./a.js
Hello from ESM
Hello from CJS

If we introduce a way for someone to parse twice the same file, given enough time someone is eventually going to fall in that trap by accident, and it's probably going to be very confusing and hard to even understand what's happening.

  • What about network imports, even if we serve from localhost? I guess, this rule will stop working if the very same package with the very same directory structure is served over http: rather than file:? And even if we crawl for package.jsons, how do we know if file is symlink or not on server, and where its realpath is located?
  • What about data: imports? These are not located anywhere on filesystem by design, and AFAICT their import is just as force-module as proposed here.

CJS is simply not supported with network imports and data: URLs – because there is no MIME type for CJS I believe, and CJS-over-the-network does not seem like a good idea anyway.
In general, I don't think we should try to make file: "not special", because FS doesn't provide a MIME it's always going to be a completely different algorithm to determine if we're dealing with CJS or ESM. Parsing file extension and package.json only makes sense on file: URLs, otherwise the MIME should be the only thing needed.

@pinko-fowle
Copy link

pinko-fowle commented Aug 29, 2023

As a user, I rather like this option & how it gets Node into such a WinterCG mode, makes it behave more like how modules behave everywhere else on the planet. It seems like a big win.

It also resolves #34049 which is one of the longest saddest issues around, which has had more issues merged into it than is to believed. At a start we have #32316, #33223, #37512, #37848, #41522. And address https://github.com/orgs/nodejs/discussions/37857 .

It seems like the opinion is turning against this idea. I'd beg to please let us do something. It keeps seemingly like every option has some kind of problem, and that lets us excuse getting no where. Or we stand on precedent. @GeoffreyBooth:

The first part was already proposed and rejected in --entry-type [nodejs/modules#300] and I don’t find the arguments listed here persuasive in light of the previous discussion," referring to the incident where all these problems were created.

But to quote a user in 34049,

I'm surprised at the fact that writing an esm script is still such a huge pita.
#34049 (comment)

The resolution in nodejs/modules#300 created a disaster. A huge trail of making it impossible to create esm escripts in node.js. Users have been begging for redress for years. The 5 issues I listed above & long story of 34049 itself are a testament to how clearly & obviously help is wanted.

If not this PR... one thing that's still super unclear to me is why --input-type is restricted to stdin/eval. Quoting @GeoffreyBooth in nodejs/modules#300 (in two different comments):

Users would probably expect [--input-type] to apply to files as well. It’s not obvious why it wouldn’t.

Even with --input-type, whatever its final name becomes, there’s still the issue that people will probably expect/want it to work on files, and if it should work on files, then it should behave like --package-type. I still feel that way; but I’m willing to ship --input-type for now and wait for that feedback. When users complain that --input-type doesn’t work on files, we can ask them why they want it to; what are their use cases, beyond the two I listed above? If they’re compelling enough, that might lead us to support a flag for files, either like --package-type or maybe somehow different. But I see the logic that maybe we should wait for such feedback rather than building a flag we don’t know the use case for just yet.

The result of 300 created an impossibility to create esm scripts. It sure seems straightforward & obvious to users to expect --input-type to just work, but it doesn't. This PR seems like an even better option that also enables a more cross-platform like behavior from Node. I don't have a strong opinion what solution there should be, but I'd really like to see some resolve to let scripts be ESM, and not need special extensions to do it: very few people write extensions on their binaries, and it should not have to be. Please, I beg that we make progress in this arena; it feels like the call keeps being heard then everyone decides, no, it was decided a long time ago & we're sticking to that. But 300 created a disaster.

@GeoffreyBooth
Copy link
Member

The result of 300 created an impossibility to create esm scripts. It sure seems straightforward & obvious to users to expect --input-type to just work, but it doesn’t.

There are multiple use cases being condensed here:

  • Being able to run standalone files as ESM, that might be extensionless or have a .js extension outside of a package.
  • Being able to run an ESM entry point with query parameters.
  • Being able to force Node to interpret an entry point as ESM when the regular resolution algorithm wouldn’t assign it such.

There’s probably not a single solution that works for all of these.

There are many problems with --entry-type that --module type shares; the most problematic from my point of view is that in our experience, users expect the “force as ESM” aspect to continue to imported files, which it wouldn’t. But that’s not the only issue, they’ve all been rehashed above.

The solution for extensionless script files, like that would get a shebang line, that seems most promising is to ship a second binary like node-esm or something so that you could do a shebang like #!/bin/node-esm. This other binary could have ESM-first behaviors such as interpreting the entry point as a URL and interpreting extensionless files as ESM. This would allow us to essentially flip the defaults without breaking changes. This solution still wouldn’t provide the “force ESM” feature but I’m not sure that that feature has value per se; it seems to be requested more as a means to an end to do things like run extensionless ESM scripts or launch URLs.

@pinko-fowle
Copy link

pinko-fowle commented Aug 29, 2023

I agree that having a separate binary would be a good option. I'm affiliated & partial but #37512 seemed promising.

There are many problems with --entry-type that --module type shares; the most problematic from my point of view is that in our experience, users expect the “force as ESM” aspect to continue to imported files, which it wouldn’t.

It just makes me very sad that imagined concerns about users misusing what seems to so many here like a straightforward simple ask that would help improve a half decade old solution is the primary objection. Slap an experimental prefix on the flag & let it sit for two years & find out. #NodeFwd, then now and always.

You're right that there are multiple intertangled of concerns, and your breakdown is great. I realize there's a balance in trying to not create more problems as we go, trying to create consistency & clarity, which requires extreme dilligence. And I'm so happy there are so many sharp knowing minds able to break these problems down. My impatience & sadness over trying to create standalone ESM scripts has been growing for a long time however, and I want the impulse to make happen to have a voice, which I don't see in so many PRs.

@LiviaMedeiros
Copy link
Contributor Author

LiviaMedeiros commented Aug 29, 2023 via email

@WebReflection
Copy link
Contributor

WebReflection commented Aug 29, 2023

it was a question of “if we allow this override, will users be confused by how it behaves?” and the consensus was that users would be confused

no poll, survey, or community involvement, was used to reach such consensus, IIRC, and this issue and PRs keep piling up since about ever.

the energy spent to re-hydrate and re-enforce discussions born around the "must use .mjs extension" era, still something mostly nobody cares about in the real world, or in other runtimes (Deno, Bun, gjs who decided to break everything until they move to ESM for GNOME 45), is amazing ... are you all folks really convinced this topic is over because .mjs was introduced years ago? It's quite entertaining to see same effort over and over again, but alternatives are becoming more popular and current (frankly stubborn and dystopian) position around this topic never has been beneficial for anyone involved in this project, or actual users demanding ESM opt-in default for years (which dare I say it's the standard ECMAScript suggestion, btw).

Other runtimes got this solved, the main project around JS in console can't move forward (ever, apparently) about this topic ... I know I am adding nothing to this issue, but the fact it keeps coming up with actual PRs that "just works" and get rejected is something to follow/study/entertain current state of JS around teams/projects.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Aug 29, 2023

Did i read that right? The bar of adding new binary is lower than adding new command line option?

No. A binary is the only solution for the extensionless shell script use case, because many platforms don’t allow flags in shebang lines.

We’ve never added a new binary before, so it might very well be a nonstarter more broadly. We would need to find some way to create it such that it doesn’t double the size of the Node.js download, for starters. I’m sure people would raise other concerns.

The tone on this thread is feeling more argumentative than productive. The solution proposed in this PR has been proposed before and rejected before, so I suggest you move on. Issues can be used for discussing use cases and ideas for solutions.

@WebReflection
Copy link
Contributor

WebReflection commented Aug 29, 2023

@GeoffreyBooth

We’ve never added a new binary before

you don't need a binary at all if there is a flag ... you need a regular bash / sh script, executable wherever node lands, that starts node with that flag on ... this argument about double binary needed, but no flag wanted, doesn't really look like a smart argument from the very smart people behind this project, sorry.

@aduh95
Copy link
Contributor

aduh95 commented Aug 29, 2023

@WebReflection can you please phrase your argument in a way that doesn't sound insulting?

you don't need a binary at all if there is a flag ... you need a regular bash / sh script, executable wherever node lands, that starts node with that flag on

That's already possible, a dedicated flag would simplify said bash script, but if you're fine making your own executable nothing's stopping you to make it work already. But I think most folks asking for this would actually want to not have a bash script.

@LiviaMedeiros
Copy link
Contributor Author

No.

Then what, exactly, is the proposed alternative?

I guess there is consensus on the fact that this PR addresses two, partially related, issues. This fact comes to be the main blocking reason.

Your suggestion is to extract the "entry-url" part of it, which is reasonable and feasible on its own.

But then what do we do with the "entry-module" part? Do we reintroduce --entry-type, do we ship node-esm, do we replace --entry-url with --entry-module, or do we do something else?

A binary is the only solution for the extensionless shell script use case, because many platforms don’t allow flags in shebang lines.

Which shebang-supporting platforms don't allow flags in 2023?

The tone on this thread is feeling more argumentative than productive.

This is PR thread and this is about specific solution, so questions I'd like to see addressed here are:

  • Is there a better alternative in terms of both -url and -module parts?
  • If not, can we proceed with improving and landing this one?
  • If not, what exactly is actionable problem with this one?

I think it would be more productive to see how we can fix the issue and what is the best solution and then implement the best solution; rather than blocking all possible choices just because each of them has its own cons.

@GeoffreyBooth
Copy link
Member

Your suggestion is to extract the “entry-url” part of it, which is reasonable and feasible on its own.

Yes, an --entry-url flag that opted into treating the CLI entry point argument as an URL string rather than a path string would at least not cause breaking changes and avoid UX issues, since the regular resolution algorithm would apply. I’m not sure it wouldn’t have its own issues, but it’s at least something that hasn’t been proposed before and doesn’t have any obvious problems that I’m aware of.

But then what do we do with the “entry-module” part? Do we reintroduce --entry-type, do we ship node-esm, do we replace --entry-url with --entry-module, or do we do something else?

By “the entry-module part” I assume you mean what --entry-type did? I don’t see any way to support that that makes sense, for the reasons listed above and in the linked issues. But it’s also a means to an end, not a use case in itself. If you want to force an arbitrary file to be treated as ESM, you can use module customization hooks to do so. I don’t think Node needs to provide a built-in low-level API for this, especially considering all the problems that would result if we did so.

It would be nice to have a built-in way to run extensionless ESM scripts, and there are several old issues debating various approaches for this. The binary solution came from one of them; to my recollection it’s the idea that seemed the most feasible, though it’s already possible today if you’re willing to add some BASH code. Search for “shebang ESM” in this repo and you’ll see the various old issues, such as #32316.

@GeoffreyBooth
Copy link
Member

So from that point, what would the actionable way to proceed with it, and lift the PR block to ensure that it won’t be a futile work?

The version of --module that is --entry-type, where all it does is change the interpretation of the entry point, won’t get my approval. It doesn’t make sense for the UX reasons cited above.

I think if we’re going to have a flag called --module, it should go big: it should flip all the defaults to ESM. So not just the entry point: it would be as if you had a package.json at the root of your volume and it contained "type": "module". The default value of --input-type would be module. Extensionless files would be treated as ESM JavaScript. (This means we still couldn’t have extensionless Wasm, but 🤷 .) The CLI entry point string would be interpreted as a URL. Basically anything that currently defaults to CommonJS, if possible it should default to ESM instead.

There would be the question of what to do about package.json files with no "type" field; flipping to an ESM default would break any CommonJS package in your node_modules. We could preserve the current “no "type" = CommonJS” behavior, either for packages under node_modules or everywhere; or be strict about it and leave it to package managers to insert "type": "commonjs" in installed packages that lack it.

This would also set the stage for Node potentially flipping to ESM by default in a semver-major change. Then we’d need a flag to opt out, to return to the legacy behavior: --commonjs, say, or we could have this flag take a value like --type=module / --type=commonjs.

We could add this flag and ship an alternate binary that essentially just runs node with this flag set; or if @WebReflection is correct and it’s not needed anymore, we could just go with the flag. But I think we would want at least the flag, as there are cases like Docker images where the binary is the entry point and so you’d want to enable this mode via a flag rather than a separate binary.

@WebReflection
Copy link
Contributor

WebReflection commented Aug 30, 2023

@GeoffreyBooth beside the fact I kinda like the "go big or go home" you suggested, you mentioned me so I'd like to clarify:

if @WebReflection is correct and it’s not needed anymore, we could just go with the flag

it's not me eventually being correct, it's explained in the official gnu documentation but I haven't tested it across all possible OS or WSL although I expect these having an env greater than 8.30 or equal to it.

if anyone is willing to give it a shot, my example should just run node and show its version, then exit instead of hanging out foever or throw errors.

#!/usr/bin/env -S node -v

as complete test:

TEST_NODE_ESM='/tmp/node-esm'
echo '#!/usr/bin/env -S node -v' > $TEST_NODE_ESM
chmod +x $TEST_NODE_ESM
$TEST_NODE_ESM
rm $TEST_NODE_ESM

@jirutka
Copy link
Contributor

jirutka commented Aug 30, 2023

however, today this would work pretty much anywhere with 8.30 or greater env

As I already wrote in #34049 (comment), busybox env (used by default e.g. on Alpine Linux) doesn’t support -S. The same for OpenBSD env and NetBSD env. This is a non-standard GNU option, i.e. not defined in POSIX or similar.

@GeoffreyBooth
Copy link
Member

As I already wrote in #34049 (comment), busybox env (used by default e.g. on Alpine Linux) doesn’t support -S. The same for OpenBSD env and NetBSD env. This is a non-standard GNU option, i.e. not defined in POSIX or similar.

Thanks. Yeah I suspected there was still some strong reason why a flag wouldn’t work all on its own. However I think we can have both: the binary is only required for the shebang use case, whereas the flag covers a whole bunch of others. I would suggest starting with the flag as that’s relatively straightforward and not novel like creating a new binary, and once the flag ships we can investigate the binary option.

@WebReflection
Copy link
Contributor

WebReflection commented Aug 30, 2023

@jirutka apologies for not reading that comment out of a different issue but thanks for sharing that ... like I've said I just tested locally and based my assumption out of official gnu documentation (9.3 there) but I wonder at which version busybox or other mentioned OS are, when it comes to env, and if there's hope they'll ever update to the current gnu (standard?) behavior.

@jirutka
Copy link
Contributor

jirutka commented Aug 30, 2023

apologies for not reading that comment out of a different issue but thanks for sharing that…

#34049 is directly related to this PR (and it’s linked in its description).

I wonder at which version busybox or other mentioned OS are

The latest version, they simply don’t use GNU implementation of env.

if there's hope they'll ever update to the current gnu (standard?) behavior.

What, why should they? GNU coreutils (the source of GNU env) is just one of the multiple implementations of basic unix utilities, it’s not a standard that everyone should follow. The standard here is POSIX.

@WebReflection
Copy link
Contributor

WebReflection commented Aug 30, 2023

@jirutka I haven't read all comments linked to this ... my bad.

What, why should they?

for the same reason every other OS solved this issue, in a way or another, or never had it to start with ... are you saying those OSs decided explicitly to never be able to have multiple arguments @ shebang line/definition?


edit all I could find is a standard from 5 years ago that suggests env only has a -i flag although examples look like it would support multiple arguments there, if I'm not mistaken ... so, if that's the case, it's a matter of creating a different first line of the file to use the flag accordingly with the target which is still easier at both build time and distribution (I suppose). If instead these OSs decided this issue should never be solved I rise my hands but still agree solving with a flag first would be already a win. If those OSs are used mostly in Docker and Serverless though, I suppose they can also use a {"type":"module"} at the root of their node execution if desired/needed, solving this issue in a way or another.

@jirutka
Copy link
Contributor

jirutka commented Aug 30, 2023

for the same reason every other OS solved this issue

By every other you mean GNU(/Linux) and OpenBSD…?

are you saying those OSs decided explicitly to never be able to have multiple arguments @ shebang line/definition?

They probably didn’t decide that, they just probably didn’t feel the need to add another option and diverge from the standard.

I’m not saying this option isn’t useful and others shouldn’t implement it, just stating the fact that it’s not as universally supported as some people in this PR think it is. And I definitely didn’t mean it as an argument against adding --module <path> option. Tbh, I got a bit lost in this PR, so not sure who is proposing which variant.

@GeoffreyBooth
Copy link
Member

@WebReflection and @jirutka can you please take this discussion of shebang stuff to #49407 or a new issue? This PR is about a --module flag.

@jirutka
Copy link
Contributor

jirutka commented Aug 30, 2023

If those OSs are used mostly in Docker and Serverless though

Sigh, Alpine Linux might be often used in containers, but OpenBSD and NetBSD definitely not.

@WebReflection and @jirutka can you please take this discussion of shebang stuff to #49407 or a new issue? This PR is about a --module flag.

Exactly, and the --module flag is gonna be used in shebang… It’s tightly related, not a separate issue.

@LiviaMedeiros
Copy link
Contributor Author

The version of --module that is --entry-type, where all it does is change the interpretation of the entry point, won’t get my approval. It doesn’t make sense for the UX reasons cited above.

By not getting approval do you mean absence of explicit approval, verbal disapproval, or explicit block on the PR?

I think if we’re going to have a flag called --module, it should go big:

This sounds great and I'd absolutely like to see the end goal of flipping defaults in semver-major release to be reached.
Please clarify a few details to see if this PR can be evolved into this, or they should coexist as non mutually exclusive features that cover different issues:

(This means we still couldn’t have extensionless Wasm, but 🤷 .)

In any previous issues I couldn't find the reason why we don't want extensionless Wasm anymore. Can you elaborate?
Support for extensionless Wasm was a very explicit reason for a very explicit drop of extensionless ESM. See #31388, #31415 and #34177.
The "but 🤷" is incomprehensible to me.

The CLI entry point string would be interpreted as a URL.

Please elaborate this as well.
IIUC the entry point of ES module is always defined by URL by design, and with #47999 we can have URL internally even when loading CJS. User's ability to provide URL as entry point will be ensured separately by --entry-url option that should already exist if we land this part first.

There would be the question of what to do about package.json files with no "type" field; flipping to an ESM default would break any CommonJS package in your node_modules. We could preserve the current “no "type" = CommonJS” behavior, either for packages under node_modules or everywhere; or be strict about it and leave it to package managers to insert "type": "commonjs" in installed packages that lack it.

Can we generalize this to any files with no explicit type, including those that do not belong to any existing package.json?

Sure, there is this question.
IIUC, this is applicable not only to node_modules, this will break any import './b.js'; from ESM, if there is no scope and b.js happens to be CJS.
There are many CJS projects, and many of them use .js instead of .cjs. It's not really flipping defaults if we keep "CJS if no type", so it has to be done, and I agree that at the end goal they all have to set "type": "commonjs" explicitly, or use .cjs extension, or find other workaround if any. But before that, it will cause breakages across in ecosystem.
What's the actionable answer you suggest here?

@GeoffreyBooth
Copy link
Member

The version of --module that is --entry-type, where all it does is change the interpretation of the entry point, won’t get my approval. It doesn’t make sense for the UX reasons cited above.

By not getting approval do you mean absence of explicit approval, verbal disapproval, or explicit block on the PR?

Block. There are many footguns created by overriding just the entry point and not the larger scope. This has been explained numerous times above and in linked issues. I would consider what was earlier proposed as --package-type, which essentially overrides the "type" field of the nearest parent package.json (or of the default, if there is no package.json) but it has some of the same footguns and others might block it.

(This means we still couldn’t have extensionless Wasm, but 🤷 .)

In any previous issues I couldn’t find the reason why we don’t want extensionless Wasm anymore. Can you elaborate? Support for extensionless Wasm was a very explicit reason for a very explicit drop of extensionless ESM. See #31388, #31415 and #34177. The “but 🤷” is incomprehensible to me.

It’s not that we don’t want it, it’s just that it’s unclear how to achieve it. Right now, extensionless files are parsed as CommonJS. Early on we had them controlled by the "type" field, so that in a "type": "module" package scope they would be parsed as ESM, but we had to remove that ability because of compatibility problems with bin scripts. So if we want to change how extensionless files are parsed in ESM under this new flag, we’d need to find a solution that somehow permits bin scripts to continue to work. Breaking all of them will hinder adoption of this new mode, and discourage the team from ever making it the default.

Once we find a solution for how Node should know to interpret extensionless files as ESM, we would need to find another solution for permitting extensionless Wasm. One idea that was proposed was that if an extensionless entry point file failed to parse as a string, Node would fall back to trying to detect if it had magic number (the first few bytes defining the type of the file, like how PNG starts with the letters PNG and Wasm starts with \0ASM). There are also out-of-band solutions, like "type": "wasm" was once proposed but I dislike that idea. We should figure this out from the start, because it's tricky to add later.

The CLI entry point string would be interpreted as a URL.

Please elaborate this as well. IIUC the entry point of ES module is always defined by URL by design, and with #47999 we can have URL internally even when loading CJS. User’s ability to provide URL as entry point will be ensured separately by --entry-url option that should already exist if we land this part first.

I’m not sure we need --entry-url at all, regardless of whatever else lands. It feels like extra complexity for an extremely niche use case that can be achieved via other means.

All I mean is that the entry point string would be interpreted in the same way the value of --import is interpreted. So you could have a Node entry point that’s a data: URL, etc.

There would be the question of what to do about package.json files with no “type” field; flipping to an ESM default would break any CommonJS package in your node_modules. We could preserve the current “no “type” = CommonJS” behavior, either for packages under node_modules or everywhere; or be strict about it and leave it to package managers to insert “type”: “commonjs” in installed packages that lack it.

Can we generalize this to any files with no explicit type, including those that do not belong to any existing package.json?

I don’t know what you mean. Generalize what to any files with no explicit type?

Currently every package.json file defines a “package scope,” the folder that the package.json is in as well as any subfolders that don’t themselves contain a package.json which would start a new scope. For any package scope where the controlling package.json lacks a "type" field, Node behaves as if there was "type": "commonjs" for backward compatibility. This is the default that we would flip: the lack of "type" would be interpreted instead as "type": "module", meaning that any packages relying on the other default would need an explicit "type": "commonjs" to restore the previous behavior.

Sure, there is this question. IIUC, this is applicable not only to node_modules, this will break any import ‘./b.js’; from ESM, if there is no scope and b.js happens to be CJS. There are many CJS projects, and many of them use .js instead of .cjs. It’s not really flipping defaults if we keep “CJS if no type”, so it has to be done, and I agree that at the end goal they all have to set “type”: “commonjs” explicitly, or use .cjs extension, or find other workaround if any. But before that, it will cause breakages across in ecosystem. What’s the actionable answer you suggest here?

Since this mode is opt-in via flag it wouldn’t break anything yet, which gives the ecosystem time to react and update to this potentially becoming the new default. The npm registry already inserts a few extra metadata fields to every installed package’s package.json; it might be feasible to ask them to insert "type": "commonjs" to packages that lack a "type" field, or updated package managers themselves could do so; in the short term it would be pretty trivial to write a CLI tool that crawls all your subfolders to find package.json files lacking "type" and add "type": "commonjs". Shipping the flag first and announcing the intention to make this the default eventually puts the ecosystem on notice to make these fairly minor updates to prepare for the change.

@guybedford
Copy link
Contributor

Another proposal along these lines would be one of the previous hopes that we could switch the default operating mode of Node.js at some point in future to an ESM first behaviour. Effectively Node.js today defaults to treating extensionless entry points as CommonJS and to treating packages without a "type" field as CommonJS. These defaults if switched to the corresponding ESM versions would imply not needing a "type" field for modules as .js files and also being able to execute extensionless ESM entry points. If we had a flag like this - node --default-type=module or similar`, that could also help test out such a path to some long term future where such a switch might be possible.

@LiviaMedeiros
Copy link
Contributor Author

It’s not that we don’t want it, it’s just that it’s unclear how to achieve it.

Is there consensus that in that "new defaults" mode by default extensionless file is ES module?
I'm a bit lost in the order of events:

  1. module: drop support for extensionless files in ESM #31415 landed
  2. [we are here]
  3. We should add flag that will allow loading extensionless files as ESM
  4. We should make this flag enabled by default
  5. We should add fallback to parsing entry point as Wasm if loading as ESM fails

Why don't we support extensionless files as ESM right now without flags? How about adding it before flipping the defaults so the adaptation can go smoother for the ecosystem?

Also, what about unknown non-empty extensions? Should there be any difference in running myfile or myfile.exe or myfile.jabbascript?

I’m not sure we need --entry-url at all, regardless of whatever else lands. It feels like extra complexity for an extremely niche use case that can be achieved via other means.

All I mean is that the entry point string would be interpreted in the same way the value of --import is interpreted. So you could have a Node entry point that’s a data: URL, etc.

Errm, this is self-contradictory. Why support for an extremely niche use case should become default behaviour. 😄

Since this mode is opt-in via flag it wouldn’t break anything yet, which gives the ecosystem time to react and update to this potentially becoming the new default. The npm registry already inserts a few extra metadata fields to every installed package’s package.json; it might be feasible to ask them to insert "type": "commonjs" to packages that lack a "type" field, or updated package managers themselves could do so; in the short term it would be pretty trivial to write a CLI tool that crawls all your subfolders to find package.json files lacking "type" and add "type": "commonjs". Shipping the flag first and announcing the intention to make this the default eventually puts the ecosystem on notice to make these fairly minor updates to prepare for the change.

All of this sound good to me. Two more points to clarify:

  1. We generalize typeless scope to scopeless files. Which means, we treat files that do not belong to package.json in the same way as if they were in package.json with no type. Which means, these files will be affected by flipping the defaults as well.
    I don't see anything wrong or unobvious about this, but I'd like to point out that we can not update package.json that does not exist. If this is intentional, all good.

  2. Providing all the listed measures would effectively defy the change itself. If both tools and registry will start inserting commonjs automatically, we will return to exactly what we have now, except the still-default commonjs will be injected in every package.json that didn't explicitly set module instead of being implied.
    The whole point of changing how typeless package.json is applicable only to case when type is omitted. If I write ESM package and want to omit type because it's ESM by default, but npm will override it with commonjs, it will be very weird.

My opinion on how to deal with it

We should give developers a choice.
First of all, after the change, we absolutely shouldn't provide any automatic measures to new projects. No type must always mean current default which will be ESM.
For existing projects, I believe that a significant part of CJS in ecosystem is a code that already works in strict mode and uses CJS-specific features at bare minimum. Every maintainer of such code should make a choice: to make step back and explicitly mark their packages as commonjs, or to make step forward and replace module.exports with export. If they can stay still and the choice will be made for them automatically, there will be no stimulus to make even the most trivial migration.

All users should have awareness, and end users should have easier way to avoid breakage.
A CLI way to force any locally installed packages to become CJS-first is indeed as simple as

jq '. += if (has("type") | not) then { type: "commonjs" } end'

I'd raise a question "but how to undo it?" but it's also feasible.
Another question is "but why should it assume that it's not ESM package utilizing new defaults?" and it's less trivial.
To do that with packages handled by npm, I guess they can introduce a temporary option to inject type in pjsons on install, so users can add it to npmrc; and npm itself should decide what was the intended default value. Something like "published before 2024-05-01T00:00:00 is CJS by default, after that it's ESM by default".
Packaging systems can do it in their own way. For example, net-libs/nodejs package in Gentoo can have USE flag that enables the injection mechanism in npm, and any questionable packages can have their own USE flags that will enable their compatibility with Node.js after flipping defaults. After EOL deadline, unmaintained packages that don't support any newer nodejs will get yeeted.

Also, we have to keep developers informed that some kind of action is required from them, and to keep users informed that they are using a package that wasn't updated and might eventually stop working. I'd suggest the following:

  • The automatic injection should never set commonjs type. Instead, it should be some new fallback-commonjs type, or "fallback-type": "commonjs" field.
  • npm can warn developers and users that package currently works in this compatibility mode.
  • Node.js should yield a deprecation warning whenever it handles this fallback.

All of this is applicable only to "unflagged" semver-major step and should have prior discussion and consensus with npm and packaging system maintainers and community in general, of course.

These last questions are about how do we deal with phase 2 after flipping defaults. It's important to have general outline, but now I'm going back to the phase 1 which is optional flag that doesn't break the ecosystem.


After considering all of the above, I agree that this default flipping plan is feasible and on the long run more desirable than what was proposed in this PR originally. I'm glad to see actionable way to proceed, and hope we can move forward with it. Hence, I plan to:

  • As discussed above, extract --entry-url part and make it into separate PR that is only about URLs. As suggested in module: add --module option #49295 (comment), file: protocol will be threated as special, i.e. the type will be resolved in the same way as with paths because we don't have MIME.
  • Change --module PR to propose the boolean flag as suggested in module: add --module option #49295 (comment). Code-wise it's basically the same as build,module: add node-esm support #49407, except instead of guessing binary name it will check boolean --module option in isNodeESM().
    The currently proposed implementation (the one that works as --entry-type=module) will be dropped.
  • Keep build,module: add node-esm support #49407 onhold for now, and if --module option lands, rewrite it to basically just enable this option.
  • (optional, see the question about extensionless files above) Open independent PR that restores loading extensionless files as ESM by default. When the solution for Wasm is found, it should be implemented over it. ES modules in their current form can not have magic number associated with them, so we don't have luxury of guessing the type using magic.

The discussion would be also simplified, as we won't have to have URLs, module types and shebangs tangled together. :)

Does this plan sound good?

@aduh95
Copy link
Contributor

aduh95 commented Sep 1, 2023

5. We should add fallback to parsing entry point as Wasm if loading as ESM fails

That doesn't sound ideal, and would probably have security implications. According to https://webassembly.github.io/spec/core/binary/modules.html#binary-module, WASM modules must start with a magic string (0x00 0x61 0x73 0x6D), in all likelyhood that's what we should use to decide if a file is ES or WASM.

Also, what about unknown non-empty extensions? Should there be any difference in running myfile or myfile.exe or myfile.jabbascript?

Historically, node would parse anything as CJS unless for .json, .node. When introducing "type":"module", it was decided we should take the opportunity to restrict what extension was allowed in case we want to add support for it later – e.g. do we really want to parse file.ts as ESM? Wouldn't that close the door to support TS out-of-the-box in a later version without being a breaking change?


Regarding the whole plan, I'm not sure where I stand. It seems to me that the only POSIX compliant way forward would be to have a separate binary name, and if we introduce a non-POSIX way that happens to work for some wide-spread OSs, it will very likely start to be used on popular npm packages (either because their authors are not aware or they don't care), putting our users in an awkward situation. The status quo (either symlink your extensionless executable to a .mjs file, or use CJS) is portable, and when it errors out it produces relatively easy to understand error message; for most folks, it's an OK tradeoff.

@bmeck
Copy link
Member

bmeck commented Sep 1, 2023 via email

@LiviaMedeiros
Copy link
Contributor Author

Regarding the whole plan, I'm not sure where I stand. It seems to me that the only POSIX compliant way forward would be to have a separate binary name, and if we introduce a non-POSIX way that happens to work for some wide-spread OSs, it will very likely start to be used on popular npm packages (either because their authors are not aware or they don't care), putting our users in an awkward situation.

Tbh as someone who is using fgrep every day (history | fgrep fgrep | wc -l shows 49 out of 502) in interactive shell but keeps grep -F in scripts, I'd say that sometimes POSIX compliance or most-Unix-way can be counterintuitive. Especially when we also have usecases like node as entry point in Docker.
Search across GitHub for #!/usr/bin/env node - shows twice as much results as #!/usr/bin/env -S node -, so I guess we are doomed either way.

But why can't we just have both options available, command line flag and named binary?

The status quo (either symlink your extensionless executable to a .mjs file, or use CJS) is portable, and when it errors out it produces relatively easy to understand error message; for most folks, it's an OK tradeoff.

Maybe it would be worth adding to tsc-agenda Issues and PRs to discuss during the meetings of the TSC. to decide, when we have both choices implemented and if there's no critical flaw in them?
Making a breaking change here is a long-term goal that might happen who-knows-when (if happen at all), but keeping status quo as in "Node.js does not have convenient ESM-first support out of the box even as option" is a bit saddening.

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Sep 1, 2023
@GeoffreyBooth
Copy link
Member

Is there consensus that in that “new defaults” mode by default extensionless file is ES module?

There isn’t consensus, but this very well might become one. I think this is one of the main sticking points to figure out, and deserves its own issue.

My recollection of #31415 was that it was done because we wanted to preserve an option for extensionless Wasm in the future, and we didn’t like the "type": "wasm" approach that had landed without much review (and was reverted). The most obvious options that I can think of for supporting extensionless ESM in a new mode are:

  1. Extensionless is just always ESM unless inside a "type": "commonjs" scope. There’s just no ability for extensionless Wasm entry points; Node.js is a JavaScript runtime, after all.
  2. Use the Wasm magic byte to disambiguate between Wasm and ESM entry. Either always check for magic byte before trying to parse, or check after failure to parse as ESM JavaScript.

Maybe there are other approaches. This should probably get its own discussion, because if we can’t reach a consensus on this aspect, there’s not much point in creating the new mode since one of its primary use cases is to enable extensionless ESM scripts.

I’m a bit lost in the order of events:

I think it’s much simpler: we just add a flag which flips as many defaults as possible to ESM. Once that ships, we can possibly add the binary as an alternate way of running Node in this mode. And at some point, in a semver-major change we make the new mode the default and provide a way to opt into the CommonJS-default mode. We should design things such that it’s not awkward to flip back.

I don’t think we need a collection of new flags for various smaller pieces of this. I wouldn’t block such efforts, but I feel like it would be harder to get consensus on --entry-url and --extensionless-type and so on than it would to just ship a single flag that does it all. In general we want to limit the number of CLI flags; I know there are already a huge number but that’s all the more reason for it to not get even further out of hand.

  1. We generalize typeless scope to scopeless files. Which means, we treat files that do not belong to package.json in the same way as if they were in package.json with no type.

Yes, this was my intent. The current algorithm searches up the current volume to the root, and if it never finds a package.json, it acts as if it had found a package.json at the root with "type": "commonjs". The new mode would just flip that, so the imaginary default package.json would act like "type": "module".

2. Providing all the listed measures would effectively defy the change itself.

I don’t think so. Changing how typeless package scopes are interpreted and discouraging typeless packages go hand-in-hand. There’s no point in having ./node_modules/some-commonjs-dependency being simply broken; a tool should just go in there and add the missing "type": "commonjs". There are countless dependencies that people use every day that aren’t actively maintained anymore; we don’t want to rely on maintainers to need to go back and publish new versions of packages they haven’t touched in years just to add "type": "commonjs". This kind of disruption doesn’t benefit our users; our goal here is just to make ESM easier to use in Node for end users, not to stamp out CommonJS code from the npm registry.

We don’t need to work out the question of how to patch CommonJS dependencies just yet. Whether it’s something at the npm registry level, something at the package manager level or handled by end users directly via a shell command or some script like npx add-missing-package-types, there are many solutions that can work and some of them don’t require acquiescence by other parties. I would build the new mode and ship it as a flag first, so that these other teams can see it with their own eyes and use it, and then they’ll probably be motivated without our prodding to build solutions that they can test against Node in this mode.

Hence, I plan to:

  • As discussed above, extract --entry-url part and make it into separate PR that is only about URLs. As suggested in module: add --module option #49295 (comment), file: protocol will be threated as special, i.e. the type will be resolved in the same way as with paths because we don’t have MIME.

I wouldn’t bother doing --entry-url as a standalone PR. There’s a good chance it gets blocked, and on its own it enables very little.

I would just do one PR that includes everything, unless that’s too big to handle in one piece (in which case we can use a build flag to hide things until they’re all in place, or land them on main but mark them as “don’t backport” to prevent releasing until they’re all ready).

Before we start, though, I think we should reach consensus on the open questions. I’ll open two issues:

  1. How extensionless files should be handled in this new ESM-first mode, including whether/how to support extensionless Wasm and/or other potential extensionless formats.
  2. What this new flag should be—--module, --type=module, --experimental-module or --experimental-type, etc.—especially considering what it would be when the overall Node default flips.

There are also questions about how to migrate the ecosystem (adding "type": "commonjs", etc.) and whether/when Node should make this new mode the default, but I would postpone those discussions until we ship the flag. It’ll be easier to reach consensus on those topics once people can see what the new mode looks like.

The other thing we need is a list of concrete use cases, and not solutions presented as use cases. As in, “I want to be able to launch Node with an URL entry point” isn’t a use case; why do you want to do that? Whereas something like “I want to write what are typically called shell scripts in ESM JavaScript, where I can have a single file anywhere on my disk that uses ESM syntax and it doesn’t need a nearby package.json file or a symlink or wrapper file in order to run as ESM” is a specific use case where it’s not possible today and the new mode enables it. Ideally this new mode would satisfy most if not all of the outstanding “can’t do it in ESM” use cases in the various issues you’ve referenced.

@GeoffreyBooth
Copy link
Member

however, today this would work pretty much anywhere with 8.30 or greater env

As I already wrote in #34049 (comment), busybox env (used by default e.g. on Alpine Linux) doesn’t support -S. The same for OpenBSD env and NetBSD env. This is a non-standard GNU option, i.e. not defined in POSIX or similar.

@jirutka Do you mind giving me some examples of where flags in shebang lines do and don’t work? With the most popular/prominent platforms included. I’m trying to write documentation along the lines of:

The flag can be used in shebang lines on platforms that support env -S, such as _, _, _; some popular platforms where this is not supported as of this writing are _, _, _.

In particular, we should include macOS and Windows (Windows Subsystem for Linux?) in this sentence, whichever list they fall into, as those two probably account for the majority of our users. For Linux, at least Debian and Alpine, as those are the two used in the official Docker images.

@ljharb
Copy link
Member

ljharb commented Sep 17, 2023

I'm pretty sure dash and sh don't support them, as I had to account for that in nvm.

@GeoffreyBooth
Copy link
Member

@LiviaMedeiros Is there a reason to keep this open? I think #49869 covers everything except the URL stuff, and #49975 is in progress for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Launching main ESM module with search and hash Entry points specified as absolute URLs fail to load
10 participants