Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: deprecate loading files with unknown or missing extensions #15365

Closed
wants to merge 4 commits into from

Conversation

bmeck
Copy link
Member

@bmeck bmeck commented Sep 12, 2017

This prevents treating files as CJS when they are not specified as
.js explicitly. Package managers create links without the extension
already so the use of "bin" still works. Also, node continues to do
path searching and can find a .js file with the same location via
command line.

This fixes some problems with ESM loading from CLI arguments since we cannot reliably know if a file is supposed to be loaded via CJS normally. This will also be desired once loader hooks come around so that they can capture all modules loaded via CLI arguments.

Right now we have a very simplistic check for if a main entry point should be loaded as ESM. However, this still has collisions and doesn't have a path forward for loader hooks wishing to change how main entry points/preloading via --require works.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

module

This prevents treating files as CJS when they are not specified as
.js explicitly. Package managers create links without the extension
already so the use of "bin" still works. Also, node continues to do
path searching and can find a .js file with the same location via
command line.
@nodejs-github-bot nodejs-github-bot added the module Issues and PRs related to the module subsystem. label Sep 12, 2017
Copy link
Contributor

@jkrems jkrems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -0.5 on this change. I know of quite a few cases where we actively use this. E.g. it's one of the few ways to "hide" files from aggressive auto-loading by something like mocha.

lib/module.js Outdated
if (!Module._extensions[extension]) extension = '.js';
var extension = path.extname(filename);
if (!extension || !Module._extensions[extension]) {
internalUtil.deprecate(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is re-wrapping a fresh function each time, will this log a deprecation warning on each require?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gosh, yes we should move that out.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@jkrems Is there an example I can look at / does mocha have some ignore file option?

@jkrems
Copy link
Contributor

jkrems commented Sep 12, 2017

Example use: https://github.com/groupon/gofer/blob/master/test/_remote-server

There's definitely cleaner solutions for this (e.g. just putting fixtures and other potentially side-effect files into different directories). But afaik mocha force-includes any .js files. You can add more extensions to its search but you can't exclude any. It's definitely a work-around. Just pointing out that there's ecosystem impact of this deprecation and I'm not fully convinced its worth the churn.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@jkrems this is mostly to create a path for userland loaders to affect CLI arguments for ESM. If you can think of a way without breakage I'd be fine taking a different approach.

@Fishrock123 Fishrock123 added semver-major PRs that contain breaking changes and should be released in the next major version. tsc-review labels Sep 12, 2017
@jkrems
Copy link
Contributor

jkrems commented Sep 12, 2017

Would it be acceptable to exclude the "no extension" case from the deprecation? It's the thing that I'm aware of for "chmod +x" files and these kinds of test helpers. Afaik made-up file extensions loaded as vanilla JS are less common in the current ecosystem and at the same time maybe more interesting for custom loaders..?

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@jkrems without the extension we could fall back to something like nodejs/node-eps#62

@Fishrock123
Copy link
Contributor

Fishrock123 commented Sep 12, 2017

I like this change but we'll need pretty good data that this won't be too severe.

@nodejs/tsc PTAL

Kicking off a citgm but don't expect to find anything related in those modules: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/980/

Edit: TBQH, personally I'm not sure if I've ever seen direct non-js-as-js loading myself in the wild.

@jkrems
Copy link
Contributor

jkrems commented Sep 12, 2017

@bmeck But that would only ever affect the entry module? Since this is a new feature, it feels cleaner to just say "entry ES6 modules need to have a valid file extension". Otherwise we end up in the awkward situation where a file can only ever be loaded iff it's the entry module.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@jkrems correct, I have -1 to allowing require('./no-extension') to continue to work. This causes all sorts of future compat problems (WASM, BinaryAST, WebPackage, etc.)

@jkrems
Copy link
Contributor

jkrems commented Sep 12, 2017

Is anything else using "no extension at all"? If not, what exactly is the issue with treating "no extension" the same as "explicit .js", always?

@mcollina
Copy link
Member

I am a bit lost in this proposal.

Would I have to change:

require('./lib/mymodule')

into:

require('./lib/mymodule.js')

Or it is just:

$ node myentrypoint

Into

$ node myentrypoint.js

I fear the breakage would be too high even if it's just the latter case.

@Fishrock123
Copy link
Contributor

To be clear, does an extensionless file with a shebang still work, or would that require --ext=.js in the shebang?

@Fishrock123
Copy link
Contributor

Fishrock123 commented Sep 12, 2017

Ok so, from twitter it appears this would break require()-ing in the following circumstance:

require('./my-module') where './my-module is is a .js file instead of a folder with an index.js. I know that myself and others have used this pattern quite extensively and the breakage would be widespread.

Edit: looks like I was mistaken.

@retrohacker
Copy link

retrohacker commented Sep 12, 2017

Edit by @Fishrock123 - duplicate comment I understand the motivation here. I'm neither 👍 or 👎 (not that my vote counts 😛) but in most of my work, I use `require('./thingy')` as shorthand for `thingy.js`, it keeps things consistent and allows me to break a single file out into a folder without having to run a `sed`. I suspect a lot of people do the same.

I also don't understand the effect this has on shebangs, none of my Node.js cli tools have a .js extension.

What is the advantage of this over, say, always assuming extensionless files are .js for backwards compatibility?

lib/module.js Outdated

const DEPRECATED_EXTENSION_FILLING = internalUtil.deprecate(() => {
return '.js';
}, 'Loading files without an extension or unknown extension is deprecated', 'DEP0077');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line has more than 80 characters.

@@ -684,6 +684,13 @@ difference is that `querystring.parse()` does url encoding:
{ '%E5%A5%BD': '1' }
```

<a id="DEP0077"></a>
### DEP0077: Loading files without a known file extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers should be assigned only during the commit landing process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We landed one with XX already, unsure how to ensure that gets done? Who assigns these/when?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmeck Whoever is landing the commit, they will assign the latest available number usually.


Type: Runtime

Files without file extensions, or unknown file extensions will not be loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think oxford comma will not be applicable here.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@mcollina
@Fishrock123
@retrohacker

This change is for files as they are stored on disk/the result after resolution; this change is not about file resolution algorithms/the string passed to module lookup.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@jkrems

Is anything else using "no extension at all"? If not, what exactly is the issue with treating "no extension" the same as "explicit .js", always?

You get collisions still, imagine bin/foo with the following:

#!/usr/bin/env node --entry-extension=mjs
// ...

Using it via node bin/foo works fine.
Using it via import('bin/foo')/require('bin/foo') doesn't since it would act as CJS.

@mcollina
Copy link
Member

Can you make an example of what will break?

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@mcollina

echo 'console.log("I am CJS")' > ./has-no-extension
node ./has-no-extension

@thlorenz
Copy link
Contributor

thlorenz commented Sep 12, 2017

@bmeck as indicated in our twitter discussion I'm not a great fan of breaking the above. (node ./foo)

Let's say I have a Makefile with the following:

build:
  ./build-script

At this point the build-script might be a bash script with #!/usr/bin/env bash indicating how to run it.
In the future I may want to rewrite it in JavaScript and all I'd have to do is change the first line to #!/usr/bin/env node. If I keep the filename the same the Makefile doesn't need to know about this change.
Basically I think of the tool used to execute the script as an implementation detail.

Now if I have to change the extension this implementation detail becomes part of its public interface (the filename).
As a result I'll have to find all the other scripts, Makefiles, etc. that execute it and change the filename there as well.

Therefore this change stands contrary to Node.js becoming a very powerful scripting tool which is unfortunate. Especially with *Sync methods and the async await feature more people start using Node.js this way since non-nested scripts became much easier to write.

I'd argue that we should agree on a default on how scripts directly executed with Node.js should be treated and allowing to change that via a command line flag.

I like the idea you stated above:

#!/usr/bin/env node --entry-extension=mjs

and don't see the fact that they can't be imported/loaded as a problem.

After all these are CLI scripts and as such will have side effects and therefore shouldn't be loaded in the first place. So CLI scripts being treated as CJS by default is fine.
People should put all the script code they want to load from elsewhere inside a separate file and load+exercise that from the CLI script instead.

This also solves the case of people wanting wo use mjs syntax in their script (without supplying a flag) as all they'd have to do is write a simple CJS wrapper script and all it does is load the actual script.mjs.

@thlorenz
Copy link
Contributor

thlorenz commented Sep 12, 2017

Yes, I'm advocating against having to write wrapper scripts for the ones that currently don't have to be wrapped.
With your current proposal if I don't want to name my file .mjs|.js then I'm forced to write a wrapper script + I'll have to do so for all already existing modules without an extension (or rename them).

On the other hand with my proposal if people must use MJS syntax directly in their script file then they have two options:

  • a) use the command line flag
  • b) write a wrapper script

I agree with you that none of these options is perfect, but am just trying to suggest one that affects fewer people/cases negatively. Obviously there is no hard data on that which is why deciding on the better solution will have to be subjective. I'm just providing my opinion here.

@jkrems
Copy link
Contributor

jkrems commented Sep 12, 2017

#!/usr/bin/env node --entry-extension=mjs

Worth nothing that passing additional options into the binary isn't portable. So I don't think that approach is viable in general (even though it would work for some developer scripts).

@thlorenz
Copy link
Contributor

@jkrems in those cases people could resort to option b (write a wrapper script) and only if they must use .mjs features directly in their script.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

We haven't yet brought up the fact that shebangs don't work on windows, and would mean all people support windows in these cases should use wrappers.

@thlorenz
Copy link
Contributor

Good point, again we'd have to think about which option will cause the least pain here.
So all people who will feel the pain could be either of the two below:

  • A) all people == all people who want to use .mjs features directly in their script (what I'm proposing)
  • B) all people == all people that currently have any scripts without an extension (the alternative)

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

This also deals with cases where you have other entry point types, like https://github.com/WICG/webpackage or https://github.com/syg/ecmascript-binary-ast/ , keeping these as CJS is just making the problem continue on into the future when several situations already can't rely on the shebang style of doing this / defaulting to CJS.

@thlorenz
Copy link
Contributor

The links you posted both are web related.
Obviously we want to support web features/conventions into the future as much as possible.
However we need to be careful not to sacrifice the versatility of Node.js as a tool (in this case as a great way to author CLI scripts) in the process.

Finding the compromise will not be easy.
Since I spend most my time in the terminal and see Node.js's great potential there I naturally will try ensure the solution doesn't sacrifice too much on that end.

Hopefully others will add their opinion so we get a better picture of what the priorities should be.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@thlorenz

BinaryAST is a TC39 JavaScript language proposal for all environments: https://github.com/tc39/proposals

As per WebPackage, it is a suitable format for offline distribution/environments and that is one of it's primary use cases even if the name is somewhat misleading in that aspect.

@thlorenz
Copy link
Contributor

BinaryAST is a TC39 JavaScript language proposal for all environments

I understand that, but if I want to use those don't I have to name my module accordingly anyways, i.e. with .mjs ext?

So what's the problem with just running modules without any extension as if they had a .js ext and treat them as CJS? Especially if the alternative seems to be not to run them at all?

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@thlorenz mainly that it prioritizes one evaluation mode over others; it treats one as the first class module system even if others are more suitable for both portability and performance. For hooks, this causes an issue as well since all unknowns are treated as CJS meaning that hooks seeking to be supersets of Node should not use any other evaluation mode.

I understand that you like how it works currently, but I argue that the way it works currently is going to continue causing problems as types of evaluation grow over time.

@thlorenz
Copy link
Contributor

thlorenz commented Sep 12, 2017

I like how it works currently since that is how any other CLI tool works.

For example neither bash, perl, python nor ruby require you to specify a specific extension in order to execute a file. Right now Node.js doesn't either.

I feel like regressing away from that is a huge mistake, so we'll have to agree to disagree.
I hope others may sway the decision enough so a good compromise can be made (whatever that may be).

@mcollina
Copy link
Member

I'm 👎 on breaking the current behavior.

Another option might be to try to parse it in one way (cjs) and then parse it as the other (mjs) if the first fail. This would not be needed if there is an extensions of course. As this is specific for the entrypoint, it will not effect loading so much, and I think it will be a step forward.

@bmeck
Copy link
Member Author

bmeck commented Sep 12, 2017

@mcollina there are syntax collisions so I would avoid doing that. I'd rather change nothing than make it more ambiguous.

@cxreg
Copy link
Contributor

cxreg commented Sep 12, 2017

As far as other formats are concerned, both WASM and WebPackage have known magic numbers and should be unambiguously identifiable. I couldn't find one for BinaryAST but it looks like it's in early draft form.

I think the only significant ambiguity here is between CJS and ESM

@cxreg
Copy link
Contributor

cxreg commented Sep 12, 2017

there are syntax collisions so I would avoid doing that

Maybe if you are prone to writing code that trips up the parser, you should just use the command line flag or an environment variable that specifies the intended format

@medikoo
Copy link

medikoo commented Sep 13, 2017

I'm 👎 It'll break requires of binary files (ones intentionally left without as extension as e.g. https://github.com/mochajs/mocha/blob/master/bin/mocha ), and there are loads of them in node.js ecosystem

e.g. it aims at breaking:

$ node bin/mocha
$ bin/mocha

And in module requires to binary files, which are very rare, but happen for some reason, as e.g. here: https://github.com/serverless/serverless/blob/master/bin/test#L22

I would also put emphasis on what @thlorenz said

neither bash, perl, python nor ruby require you to specify a specific extension in order to execute a file. Right now Node.js doesn't either.

This proposal seems a consequence of agreeing on .mjs, maybe that should be revisited?

@targos
Copy link
Member

targos commented Sep 13, 2017

Package managers create links without the extension already so the use of "bin" still works

It still works only if the file that the links points to has a .js extension, right?
There are potentially a lot of modules where the actual file doesn't have an extension:

$ ls -l node_modules/.bin
acorn -> ../acorn/bin/acorn
babel -> ../babel-cli/bin/babel.js
babel-doctor -> ../babel-cli/bin/babel-doctor.js
babel-external-helpers -> ../babel-cli/bin/babel-external-helpers.js
babel-node -> ../babel-cli/bin/babel-node.js
babylon -> ../babylon/bin/babylon.js
errno -> ../errno/cli.js
escodegen -> ../escodegen/bin/escodegen.js
esgenerate -> ../escodegen/bin/esgenerate.js
esparse -> ../esprima/bin/esparse.js
esvalidate -> ../esprima/bin/esvalidate.js
handlebars -> ../handlebars/bin/handlebars
he -> ../he/bin/he
jest -> ../jest/bin/jest.js
jest-runtime -> ../jest-runtime/bin/jest-runtime.js
jsesc -> ../jsesc/bin/jsesc
json5 -> ../json5/lib/cli.js
js-yaml -> ../js-yaml/bin/js-yaml.js
loose-envify -> ../loose-envify/cli.js
mkdirp -> ../mkdirp/bin/cmd.js
mocha -> ../mocha/bin/mocha
_mocha -> ../mocha/bin/_mocha
rimraf -> ../rimraf/bin.js
sane -> ../sane/src/cli.js
semver -> ../semver/bin/semver
sshpk-conv -> ../sshpk/bin/sshpk-conv
sshpk-sign -> ../sshpk/bin/sshpk-sign
sshpk-verify -> ../sshpk/bin/sshpk-verify
uglifyjs -> ../uglify-js/bin/uglifyjs
user-home -> ../user-home/cli.js
uuid -> ../uuid/bin/uuid
which -> ../which/bin/which

@bmeck
Copy link
Member Author

bmeck commented Sep 13, 2017

@targos correct, the target needs .js. This is a deprecation intended to cause a transition, not a removal until the ecosystem seems safe.

@cxreg

As far as other formats are concerned, both WASM and WebPackage have known magic numbers and should be unambiguously identifiable. I couldn't find one for BinaryAST but it looks like it's in early draft form.

Correct, they are not ambiguous source text. However, that would mean node would have multiple ways to determine what a file is + still requires the CLI flag for guarding type of source text / disambiguation. I would not recommend using parsing to determine the type of file in those cases since the flag is still needed for various reasons. Adding more ways to select evaluation mode adds complexity and things to learn. If these were rules added to https://mimesniff.spec.whatwg.org/ sniffing rules I would say that it is fine to do parsing to determine evaluation mode, but I don't think that is planned / web seems to be moving away from sniffing on content.

Maybe if you are prone to writing code that trips up the parser, you should just use the command line flag or an environment variable that specifies the intended format

Some people might not know they wrote something ambiguous. I do quick testing all the time, and with some cases you get different results:

// ./no-extension
x = 1;
y = {};
console.log(x + y);

@medikoo the goal here is a non-breaking deprecation.

This proposal seems a consequence of agreeing on .mjs, maybe that should be revisited?

This would still be subject to discussion with .wasm / .webpackage / .binaryast like mentioned above. It is not solely affected by .mjs.

@bmeck
Copy link
Member Author

bmeck commented Sep 13, 2017

To all: The breaking bit here is the logged message, not that code stops running.

@medikoo
Copy link

medikoo commented Sep 13, 2017

@medikoo the goal here is a non-breaking deprecation.

Deprecations are never breaking. It's the change they announce (scheduled to be introduced in a future) that are breaking, and this what we're discussing here (I believe).

If there's no intention to deliver announced change, then there's no point for deprecation message.

@addaleax
Copy link
Member

Just so everybody is on the same page, Node does treat runtime-deprecations (i.e. printing a message) as semver-major:

  1. When run with --throw-deprecation, they actually do turn into exceptions
  2. The changed stderr output is breaking on its own, and in the past has actually broken test suites, and
  3. Its semver-major-ness ensures that people have sufficient time to update their code

I don’t see any reason to make this an exception to these rules.

@evanlucas
Copy link
Contributor

I think I'm -1 on this change. Yes, there are ecosystem breakages, but I'm more concerned with the breakages that we can't quantify.

Disclaimer: I have a ton of scripts that rely on this specific behavior

@jasnell
Copy link
Member

jasnell commented Sep 13, 2017

Runtime Deprecations can certainly be breaking because they alter the stderr output of the application. We've run into this many times in the past which is why we treat runtime deprecations as semver-major events.

@bmeck ... in this particular case, I'm leaning strongly against the proposed change here. I would rather continue to treat cases like node foo as if foo is CJS by default, and instead of requiring specification of the extension, I'd rather have something like a command-line flag ... e.g. node -esm foo in order to treat extensionless names as defaulting to .mjs.

@bmeck
Copy link
Member Author

bmeck commented Sep 13, 2017

I think we have significant objections; I'll close for now but may re-open as number of evaluation modes increases over time.

@bmeck bmeck closed this Sep 13, 2017
@jasnell
Copy link
Member

jasnell commented Sep 13, 2017

@bmeck ... thank you for working on this stuff, btw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module Issues and PRs related to the module subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.