-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: The getPackageType
utility function is not exposed
#30514
Comments
@nodejs/modules-active-members |
I'm against exposing machinery like this. We are planning to expose functions to let you call up to the previous loader. I think this is also a really good example of why we shouldn't be adding new fields to package.json but oh well |
Package type does not imply the module format, but rather the module format database, so note that the usage example is wrong as it doesn't support eg The source of truth to follow for determining the module format of a file is the specification for the resolver (https://nodejs.org/dist/latest-v13.x/docs/api/esm.html#esm_resolver_algorithm), under the function Exposing or having userland implementations of the spec would be very useful indeed. If Node core were to expose something here it should likely be a JS |
Does Node already do the |
I think exposing something is likely useful here since it is possible to sniff / can be looked up on disk and there isn't a clear workaround to make it easy to deal with. The main contention here for myself is what to expose; as @guybedford points out, just exposing the |
To give a concrete use case, I’m considering creating a CoffeeScript loader. So for a given So for me, I’d want to know either what Node would determine the package scope type to be if my file were a |
I feel like this points to an inherent problem that tooling needs to know way too much about the specific details of node's esm resolution system. In core we've been really careful to avoid adding new stuff like that to the cjs loader because of the burden on the ecosystem to support it and the node maintainers to keep it working for the ecosystem. even if we expose all the needed APIs, that doesn't guarantee tooling uses them, or uses them at the correct times (matching the behaviour of bugs is important too). |
@GeoffreyBooth that sounds like you need to know the format of a file, not the package type. |
What do you mean? The format is CoffeeScript.
Well, why does the loader API need to be given the type? Ideally that would be optional, and if it’s unspecified the regular Node logic would apply; and unknown types (e.g. |
As you stated, you want to know what format a |
Here is an implementation for getPackageType: function getPackageType (checkPath) {
const rootSeparatorIndex = checkPath.indexOf(path.sep);
let separatorIndex;
while (
(separatorIndex = checkPath.lastIndexOf(path.sep)) > rootSeparatorIndex
) {
checkPath = checkPath.slice(0, separatorIndex);
if (checkPath.endsWith(path.sep + 'node_modules'))
return 'commonjs';
try {
const pjson = JSON.parse(fs.readFileSync(checkPath));
return pjson.type || 'commonjs';
}
catch (e) {
if (e.code === 'ENOENT') continue;
throw e;
}
}
return 'commonjs';
} This follows the specification (which is fully provided in the Node.js documentation exactly for this reason). Because it is specified and relatively stable, there is nothing wrong with copy-pasting such a snippet. DRY does not apply when there are lots of decisions like API / async or async / caching concerns. Copy paste is good for this stuff. Or a userland packages with this code would be useful. Please be useful. |
Thanks @guybedford, that certainly solves the need; though if this needs to happen for many/most loaders we might want to consider providing this as a utility, or reusing Node’s lookup rather than having this happen twice for each file. This also makes me think of nodejs/modules#436; I wasn’t aware we treated a |
folders within node_modules aren't guaranteed to have package.json, that's just a side effect of package managers. it's really important to highlight that prior to our esm loader, package.json didn't infer a "boundary" to node. |
/to @guybedford
The goal of the example provided was to support use-cases like @GeoffreyBooth's, which don't use file extensions to determine module format; sorry, I should've clarified.
Although the specification in the documentation may not need /to @GeoffreyBooth
I will provide an example of a project directory structure that currently uses this pattern. Since it's more relevant to the other issue I opened, I'll put it in there.
Ditto. I'm guessing that it would mean that there would need to be a /to @devsnek
There are actually two boundaries now: |
to the comment above, @GeoffreyBooth 's use is to map to what format the |
/to @bmeck
Going off of what he wrote in the following excerpt.
A file extension-based solution is not viable because CoffeeScript files use the |
@DerekNonGeneric I don't understand the reasoning you give.
This is simply not true; any exposure of the format database mapping would suffice and likely lead to clearer more robust code, e.g. getFileFormatsForExtensions(directoryOfFileInQuestion).get('.js'); This is clearer in wanting to match |
/to @bmeck
If CoffeeScript files must use the I already proposed
Is this module format database hypothetical or does it actually exist? I would like to know more. It would be nice if it could expose an API that would enable the querying of hypothetical file URL strings. Like: Hey database,
if this extensionless file URL string had a `.js` extension,
what module context would you say it's in? Similarly: Hey database,
if this `.coffee` file URL string had a `.js` extension,
what module format would you say it is? |
They don't need to use an extension or even put something on disk, they just need loaders to map the location to a format (the same format as
We should probably not encourage multi
the databases already exist within our runtime, but some design considerations about them would need to be discussed regarding "custom" mappings.
I'm not sure I understand the "hypothetical URL strings" bit. You would just query the database relative to a location. This is what node does already with package types. |
I think that’s just “Hey Node, if path And yes, if there was some API that Node offered that could answer a question like this, in a way that I knew what string to return for |
I'm not sure I understand the distinction between syntax and format this makes. CommonJS and ES modules are different in terms of syntax. They have a lot of overlap but in the end their grammar (syntax) is different. There's no such thing as a return statement in an ES module body just like there's no such thing as an export statement in CommonJS. Some tools have parsers that accept a cover grammar over both but the end result is that they accept input that's neither. E.g. P.S.: Babel seems to have fixed some of the issues it had in the past. It doesn't support CommonJS but it supports script which is mostly a clean subset of CommonJS and module. Which means it doesn't allow mixing HTML comments and |
"parsed happily" isn't the same as "doesn't have an error that simply isn't reported". A parser should be error tolerant when used in editor scenarios... |
@weswigham TypeScript does have great reasons for the current behavior - there's a lot of code that assumes |
At risk of being pedantic, here's further clarification if it's still needed. Correction:
We cannot use file extensions to store metadata used to determine |
I don't understand this at all. We have counter examples above. I don't understand what |
@bmeck for TS, at least, since |
I disagree; even with the multitude of file extensions being a possible UX that TS chooses to take, it would still ultimately be trying to understand the formats within the package scope. Nothing we have discussed prevents customization of these and there have been multiple discussions on allowing augmentation to be explicit and in package.json such that tools can coordinate properly. If TS were to implement its own augmentation any Node loaders for TS would need a way to be aware of these (either by hard coding, or coordination) and the same would be true for other tooling like linters or build tooling. Leaving it up to tooling doesn't simplify things, and the argument that tools might want to associate their own mappings is not prevented by exposing mappings. If we consider real world examples of things like how eslint is being forced to work around how to load files due to not knowing the formats at a location and/or the format of files being misaligned with package.json it seems that a need is being left unfulfilled, not that we should leave all tooling to find their own workaround. |
Using require.resolve(request[, options)] as a template, here's an idea that may fill the needs expressed. import.meta.resolve(request[, options])
Use the internal A note I'd like to make is that if this function were internally implemented in userland JS (as |
I just played around with the resolve hook a bit. The third parameter is the default ES resolver. I tried writing a loader like this: export async function resolve(specifier, parentModuleUrl, defaultResolver) {
console.log(defaultResolver(specifier)); And I ran it via {
url: 'file:///app/loader/test/index.coffee',
format: 'module'
} I put a Rewriting the specifier to something that doesn’t exist, like replacing |
@GeoffreyBooth it seems you've found a bug actually. I've posted #30632. Following the specification, there is supposed to be an unsupported module format error thrown for this case when "type": "module" is set. It is supposed to support format: 'commonjs' in the non module case though. |
Okay. Well as far as I understand this {
url: 'file:///app/loader/test/index.coffee',
format: 'module'
} Could something in this return object tell the loader to do the “determine the package type as if That way we wouldn’t need to expose this utility method, and potentially cause this logic to run twice for every specifier, but custom loaders could still take advantage of this internal method. |
Also, it's problematic that a resolve hook would return a format. It assumes that at URL resolution time the content-type behind a URL is already known. I would hope that a final loader hook API only returns that information when actually fetching the resource contents. |
/to @jkrems
I second this hope and I wonder if we share the same concern. With the current API, if the loader needs any of the source contents to determine the module format (e.g. @module JSDoc and containsModuleSyntax), it will end up reading file contents twice (once when doing the parsing of the format in the loader, and another when being added to the cache internally). /to @GeoffreyBooth
It seems like the dynamic instantiate hook can be leveraged for that purpose. No guarantees, but I'll try to put together a demo as a proof of concept. I've been looking for a reason to get familiar with this hook. Please let me know if you find success before I do. 😃 /to @guybedford
Here are the three functions associated with Userland JS implementation of Node's ESM resolver spec I will also be doing the remaining three and write unit tests for them. Hope this helps! I must know how we can better collaborate. Where would these go in the Node core repo? I need more guidance about contributing things like this. |
Absolutely amazing work to see @DerekNonGeneric! We discussed having an open test suite for the resolver at the last meeting, if you want to work with core to get the unit tests made as a suite that can be tested against userland resolvers as well that could be an interesting effort, but not pressure on that either. Just having those functions available to users spec compatible as a resource to install or fork in their own apps unlocks a huge amount of value. |
I'd still like to try and decouple determining format/metadata from determining location. There are a few ways we could do that, but if we do go with an API like |
@bmeck, did you mean to say would be solved? (Unclear if you're proposing a solution or stating that one already exists.) |
I've removed the modules agenda label here as this has been discussed a number of times now. If anyone wants to re-add though feel free. |
@guybedford, what was the conclusion? |
We just merged #30986 which I think makes the need for this function unnecessary. Within // Ask Node to determine the format as if the current URL had a .js extension
return defaultGetFormat(`${url}.js`); |
@GeoffreyBooth, sounds good to me. |
Build tooling in need of this feature, please refer to #49446. |
Is your feature request related to a problem? Please describe.
Assist in determining module format by using internal machinery.
Describe the solution you'd like
Expose the
getPackageType
utility function.Describe alternatives you've considered
The alternative is what I am currently doing, which is passing the
--expose-internals
flag (no bueno) and acquiring the utility function viainternalBinding('module_wrap')
. Exposing it would allow shaving off one dangerous flag and five lines from the following example.As you may have noticed, one use-case is for custom loaders, which necessitate this info. Amongst others, one advantage of using the internal machinery (over writing it in JS) is that it is implemented in C++, which is important because this particular algorithm can be very expensive to perform and ( ! ) resolve hooks are on the hot code path of every module.
Additionally, exposing this function may also improve the viability of using loaders to fill the current gap in format autodetection until the
--es-module-resolution
flag's proposedauto
mode is ready.cc @nodejs/modules-active-members
bcc @GeoffreyBooth (please cc above)
The text was updated successfully, but these errors were encountered: