-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: treat modules with no extension as .js #34177
base: main
Are you sure you want to change the base?
Conversation
Note this PR should also include a change to the docs / spec in the esm docs as well. This seems like a good compromise on the problem of extensionless binary files to me, although won't support binaries like //cc @nodejs/modules-active-members |
This was prevented (at the moment) because @bmeck and others wanted a way to preserve extensionless main entry points as Wasm, and potentially other formats in the future. In the case of Wasm, at least, that file type has a magic string we could detect. So there could be an algorithm like:
The downside of this algorithm is that any future format that we want to support as an extensionless file must include a detectable magic string. But if people are okay with that limitation, then I think this algorithm could work (and therefore this PR). |
There’s a third option, which would be a “type” value that only controlled extensionless files - either a “wasm” value, or a separate key entirely. It seems a little concerning to me to have “type” ever determine anything besides “what a .js file is”. |
@ljharb that approach was taken in #31388 which is what caused the revert to drop support for extension-less files. There was objection to adding more "type" field values resulting in #31415 . I will state that I am personally not comfortable with magic byte detection, some file types have trailing forms of detection so you could have a dual match e.g. files that are both valid Zip archives and JPEG as a classic example. In fact things like HTTP Signed Exchanges also support trailing detection. I think adding a new "type" would be preferable and seems a fine path. To me, "type" was never about just determining a single file extension type (including the empty extension for extension-less files). |
Right - what I'm saying is, This PR is totally viable - if what we want is to make |
@ljharb often WASM includes a JS wrapper around it for glue and things like WASM feature detection to have different builds for more modern WASM engines. I can imagine people wanting to use
You get into all sorts of situations so I think trying to enumerate them all would be difficult. |
Regardless of whatever horrible things we do with |
@devsnek yes, i agree! that's why i'm suggesting that we should support extensionless files in any module format, and that it perhaps deserves a distinct mechanism than "type", which is only for |
@ljharb I mean it should support it in the absence of configuration, using wasm bytecode magic |
@ljharb why can't we use |
@bmeck i agree "type"'s naming doesn't box it in - but it seems concerning to me to conflate two kinds of files in the same key (extensionless, and |
@bmeck, to try and summarize your concern here, it seems you are saying that And your concern with checking the wasm header for the magic string seems to be that not all possible binary formats in future that Node.js might want to support such as web bundles will support header detection? I was not aware that web bundles were undetectable from their binary format. Do you have further information here? |
I was intending to state quite the opposite. I think "type" is fine to overload; right now it is a string which does not have any direct naming related to a specific file extension (or that it even relates to a file extension / a singular one at that) in either the key or the values, and we could easily transform it into an object of some kind similar to how "exports" works. |
From https://mimesniff.spec.whatwg.org/#sniffing-a-mislabeled-binary-resource |
@bmeck to be clear, I think having it be an object is totally fine; my concern is overloading the string form to mean two things. |
@ljharb to me the string form mapping multiple values seems fine personally. I'm don't understand why the value in a string form should be limited only to mapping 1 extension but the object form could map many. If the value of the field can map many in 1 form, why not the other? |
To be fair, if the object form exists, then it's not as important to me what the string form does. |
Thanks @bmeck for the clarifications. To dig in a little more here...
Surely even zip files and web bundles have a well-defined detectable file header? And if we know we have one of these file types, then interpreting the internals appropriately seems like a fine approach to me (eg using an index file mechanism, where that index file can in turn be sniffed). In all of the above sniffing continues to work out fine, even in these hypothetical future use cases IMO. I still don't see where anything might break down here around header sniffing.
Binary files have standard headers. Sniffing works.
I really do not understand this argument. Running |
Yes, but that "header" is not guaranteed to be at the start of the file (E.G. in the case of things like https://en.wikipedia.org/wiki/Self-extracting_archive) and sometimes is even towards the end of the archive format (E.G. Zip archives).
I do not know what you mean "work out fine".
It does something, qualifying that it does something isn't really a note about if it is a good choice or not. In the current implementation, files are always associated with meta-data somehow and not content. If we choose sniffing, and only sniffing some file extensions we are changing quite a few designs in ways that are not simply about "working". We now need to have all the relevant content body parts to perform sniffing rather than just URL + metadata. I think having sniffing only for some files also complicates the issue, instead of asking about "what is the current metadata, and what is url" it becomes "what is the current metadata, what urls do sniffing, what is the url, and if the url is sniffed what is the relevant body for the url". This all seems to be much more complicated than just stating that you can have the metadata define what the URL pattern is. We already do this for file extensions. We also would not have a good way to state for
The idea put forth that caused that callout is a threat of privilege escalation, a similar escalation concern is put forth as the driving factor of https://github.com/tc39/proposal-import-conditions , and my concern matches that. The reliance on sniffing leads towards a path that could become a collision where a file is valid for multiple sniffing patterns. Overall, I remain steadfast that we should not do sniffing. It seems to be much more complicated for questionable gain versus alternatives:
Both of those alternatives satisfy the use case of this PR without causing either a new dimension required to analyze what format a resource is in and without concerns of collision. We explicitly knew that |
Alternatively we could look at something like X-Content-Type-Options and defaulting it to a |
I think that using the example of Node.js executing a self-extracting archive in order to argue against a feature which has historically been supported is quite a low-leverage argument personally.
We are talking about file paths here specifically. Nothing about this format detection applies to URLs.
You can only have a privilege of escalation when you start from a lower level privilege.
I am strongly against both alternatives unfortunately. |
It is merely an example of how archives are not necessarily read from start of file. This kind of stuff about archives being trailing rather than leading in the body of a file shows up all over with security concerns and can be seen when you try to upload archives to various email providers that they reject them. I'm not convinced that mime sniffing is a good idea as well considering that similar methods are not used by browser and/or even operating systems (in totality, some detection does occur E.G. https://en.wikipedia.org/wiki/File_format#Magic_number but limited even then regarding things like execve )
I think we should have some solution that allows resources to be generated for non-file formats.
WASM touts that it is designed to execute with limited and sandboxed privileges by only using injected capabilities unlike JS, that alone appears to make there be different levels of privilege.
Are you also against the Alternatively we could allow files without extensions to be treated as ESM in |
@guybedford can you elaborate on why you're strongly against both alternatives? (i'm mildly against a "type" string also applying to extensionless, ftr, so i'm in particular asking about the map) |
I'm not saying sniffing should be used in all cases - we are specifically talking about extensionless files and the "Node.js main runpath" which is the Let's define somewhat the security constraints though. There are two cases This was why my original solution to this problem was always to explicitly special case
MIME types for http/https URLs seems the sensible way to do this. This can be implemented with loaders and the getFormat hook. I would like to focus the discussion here to file URLs specifically which do use file extensions as the source of truth.
It just feels a little bit unfair to expect a full discussion over a comprehensive WASM security model in order to consider progress on this use case.
I could get behind this. It still seems to leave behind the use case of
@ljharb |
People distribute wasm cli binaries without an extension. It can be the only type we sniff by default, and nothing else is allowed to lack an extension, but it really needs to be there. |
There is a 3rd case with worker_threads spawning
I am not convinced that this should only apply to files. The extensions is not even the source of truth given package types altering formats for a given extension.
I am just pointing out the claims of different levels of trust as a counterpoint to the claim that loading everything is of the same authority.
This would make me uncomfortable as I do not want sniffing per multiple reasons listed above. If the goal is to keep either solution as the future I do not think we should take that path and we must resolve not to pursue either path in the near or even mid term future unless something changes.
We currently don't even expose the package type, so this doesn't seem to match up with the state of determining formats.
what are you seeking to see / what work can be done? It seems if formats do not have well defined semantics with our current internal database adding sniffing also would make things more complex semantically.
This is an already existing problem with our loader designs. I do not see how exposing the database and/or allowing alternative databases changes the existing issue with loaders returning non-builtin formats.
Node, likely referencing WHATWG and/or IANA. I don't see how why this is different from the fact that
I do not understand this sentence. MIME is not actually associated with any 1-1 mapping and only lists common file extensions. If this is about the ambiguity of looking up a format for a given extension, we already do not expose this capability and which is already ambiguous if you do not read package.json files.
This already is a problem with |
My counter arguments are simply to try and shift this in the direction of debating solutions over debating problems. We already support extensionless mains today for CommonJS - Unfortunately I don't see any proposal which offers a valid counter here. |
To be very clear - extensionless bin files should not have to rely on out-of-band metadata since they are "file-system-portable" unlike packages which comprise a number of files and hence can contain their own metadata in the package.json. We either solve this use case for Node.js or we do not. |
What makes an argument valid? I don't understand what is being sought. I think all 5 mentioned solutions all allow
Extension-less files are not currently portable for a variety of reasons. If we do use |
Would you object to allowing the —input-type flag to apply to the main
entry?
…On Thu, Jul 16, 2020 at 11:37 Bradley Farias ***@***.***> wrote:
Unfortunately I don't see any proposal which offers a valid counter here.
What makes an argument valid? I don't understand what is being sought. I
think all 5 mentioned solutions all allow node x to evaluate under ESM,
some don't allow x to be WASM though.
To be very clear - extensionless bin files should not have to rely on
out-of-band metadata since they are "file-system-portable" unlike packages
which comprise a number of files and hence can contain their own metadata
in the package.json.
Extension-less files are not currently portable for a variety of reasons.
If we do use "type" to alter how they are seen they would also remain
non-universally portable due to needing to know the package scope in which
they reside. Regardless of which path we move forward with, package.json
metadata is seeking to be used for "type".
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34177 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAESFSQE27VDX4L6QWBA5KDR35CFHANCNFSM4OPJQTGA>
.
|
@guybedford I would be uncomfortable with that as it means CLI determines the format of something rather than it being static; with something like #34177 (comment) , even though there is a flag it could be dynamically checked and doesn't apply to a single exception point within the system but rather configures the system as a whole. That has fewer overall possible permutations for a graph's interpretation than a single node being treated specially. |
I still don’t follow the argument against sniffing extensionless wasm. If
we could just go with that we’d have a static solution.
But even then we don’t even need consensus on that now to ship this PR do
we?
…On Thu, Jul 16, 2020 at 12:55 Bradley Farias ***@***.***> wrote:
@guybedford <https://github.com/guybedford> I would be uncomfortable with
that as it means CLI determines the format of something rather than it
being static; with something like #34177 (comment)
<#34177 (comment)> , even
though there is a flag it could be dynamically checked and doesn't apply to
a single exception point within the system but rather configures the system
as a whole. That has fewer overall possible permutations for a graph's
interpretation than a single node being treated specially.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34177 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAESFSR6B763PRXBWVPBEL3R35LMFANCNFSM4OPJQTGA>
.
|
@guybedford we dropped support for extension-less imported files due to WASM it seems like this PR shouldn't land unless we come to some agreement about a directions for WASM. I've stated my concerns with sniffing, but remain unclear likewise about exposing a MIME database since we already use one internally and loaders already suffer from the arguments about standardizing/coordinating formats. If we wanted to try and state that standardization and coordination of formats is a necessity for our loaders, we should move towards altering our loader API; I don't see the same pushback about our existing APIs that I'm seeing in this thread regarding lack of those guarantees. |
@bmeck my primary concern with a MIME database is that it doesn't solve the issue of being able to execute a file directly. The concept of a universal executor is that you can apply it to a binary. This is what operating systems do. If we require that files must always be located alongside appropriate metadata that places a direct constraint on the possibilities of the future of Node.js in not supporting a simple |
I still continue to not understand the arguments against sniffing. The two cases which you claim are possible escalations are:
In both cases, JS has the ability to import fs functions, and has full permissions to everything in Node.js. WASI does not have any permissions over and above these permissions. Thus there is no escalation. Where am I wrong? |
@devsnek do you have an example at hand? |
This is not what other runtimes do and I remain concerned about type confusion in this space. Node is not the same as an operating system. I agree that using things like Java has Python has zipapp and to drop the Ruby has Traveling Ruby as the most common thing these days. Also the executing archive approach. Node has pkg and the like as well generating bundles. I even gave a talk about this idea years ago for node. To me, it appears this need to have a runtime able to execute a standalone is solved either by making a full binary or providing metadata. A big thing for me here is that bundling methods ensure that the runtime is compatible with the code since it was bundled at the time of build. Even if Node were to be considered a "universal executor" it wouldn't necessarily be safe to update node if it breaks these non-runtime bearing executables. I think the idea that we are trying to solve a problem of single file executables is false and we do need to take into account existing solutions that other ecosystems have and are not trying to change and that we don't even have a way to do this due to using metadata anyway (see how
I would state the fact that On that Note, I think that moving to need not just the metadata of the CJS/ESM split but also to actually read the file is a much larger concern for me than you. By using file bodies for format determination, static tools must now analyze all those files' bodies and we still don't have a clean way to deal with the fact that we also need the metadata still.
The problem is you are setting up exact claims and I have a general unease given that I do see things exploited by mime sniffing regarding executables. My claims are that this is not as simple as the first file ever loaded. You could I do believe that the fact that we have forced ourselves to rely on metadata isn't necessarily terrible which it appears to have some disagreement here. However, a lot of things simply are not installed as a single file and are generally in a directory with a I truly do think if the problem is about executing single files in a configuration less manner it likely needs to be a different issue and not part of this. |
@targos I know https://github.com/wasmerio/wasmer will execute files without extensions as WASM but do not see it doing any sniffing, it just runs and errors on wrong file formats. |
@targos you can find some stuff on wapm.io. the idea is that you throw the binaries in your PATH like any other binary, and register them to run with wasmtime/lucet/etc using binfmt_misc. |
@devsnek that works for linux but not for Windows or OSX though. |
@bmeck there is a binfmt compat thing people use on macos (i used to use it but i don't own a mac anymore), i don't know about windows. |
I'm trying to think of a relevant parallel to
How do you define type confusion? Sniffing not being able to determine the format? I think the algorithm proposed above is simple enough: if extensionless, sniff for known magic strings and run if detected as a known type; else run as JavaScript, either as CommonJS or as ESM per
Which formats? And I guess the question is, what's the alternative? I think I'd rather have sniffing for formats that are detectable over not sniffing at all, or requiring a |
Python has In general most VM based runtimes have some form of compiled vs interpreted format.
Well... we can declare sniffing works anyway we want, even if it doesn't support every file of a specific type. E.G. https://drive.google.com/file/d/15gY2pB_kmlI8SXFV0tPjtnVaIldmluWQ/view?usp=sharing is both a valid JPEG and a valid ZIP file. I uploaded this as a
If we sniff WASM we can state we have WASM regardless of the package scope. However, we don't use sniffing for example to determine ESM vs CJS. If we sniff WASM it allows for WASM and either CJS or ESM in a given situation, but you cannot have CJS and ESM in the same. If we did things like allow configuring default package type in the CLI ( rejected in #32394 ) we could still serve both, but we are once again using metadata.
I think we should re-evaluate what we are trying to solve first of all. Having node applications able to be distributed as single files can be satisfied without assuming that an extension-less file that is sniffed to determine format. See the examples above.
I simply don't think sniffing for WASM is a good idea regardless. I also don't think sniffing for WASM is going to be a use case that really falls into the single file as a full binary situation usually. For some cases a WASM file will be the entire application but there are a lot of cases where it won't match up since many application uses multiple files.
I'd agree! I just don't think a single file works for the majority of actual use cases to distribute a binary! Usually you want an application in a bundle of some kind, such as a zip archive approach used by all those places I linked above.
This is the most common form of distributing Node applications, most node applications have multiple files which means they are using a folder somehow. In fact, if the application use
This is what NPM does on windows actually instead of a symlink (look for .cmd files). |
Yes, but all of those examples are a runtime taking as input the text, compiled or compressed version of its own source code (like plaintext Java source, compiled Java, or zipped Java). I'm asking if there's an example of a runtime taking as input different types of sources, like if the I think the original ask of this PR is valid: since we allow Node to execute extensionless JavaScript as CommonJS, we should allow the same as ESM via |
is, i think, unarguably valid! the
part i don't think is quite as straightforward or obvious. |
I don't think we should land this feature until plans for WASM or the use case are more clarified. This smells like a mistake to me for a quick fix rather than understanding some things that don't match up in the arguments. For example, the claim is that:
However, for some, the intent of such support is to allow so called configuration-less executables / single file applications; to others, the intent is merely to support files in a package context (thus not single file by nature) to be extension-less and have the ability to load first class module formats. These are 2 very different use cases being conflated and I think this is starting to show as a main point of contention. Single file application by nature is about having a file without a
In fact, people are stating that meta-data is undesirable for such single file applications. This conflict needs to be teased out as like @ljharb I am not convinced of this argument style that |
with this change, extensionless modules loaded with esm loader will be
loaded with the same behavior as .js files:
"type": "module"
is specified, load as an esm moduleFixes: #34049
Fixes: #33226
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes