Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: resolver & spec hardening /w refactoring #40510

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 52 additions & 44 deletions doc/api/esm.md
Original file line number Diff line number Diff line change
Expand Up @@ -1077,59 +1077,63 @@ The resolver can throw the following errors:
> 1. Note: _specifier_ is now a bare specifier.
> 2. Set _resolved_ the result of
> **PACKAGE\_RESOLVE**(_specifier_, _parentURL_).
> 6. If _resolved_ contains any percent encodings of _"/"_ or _"\\"_ (_"%2f"_
> and _"%5C"_ respectively), then
> 1. Throw an _Invalid Module Specifier_ error.
> 7. If the file at _resolved_ is a directory, then
> 1. Throw an _Unsupported Directory Import_ error.
> 8. If the file at _resolved_ does not exist, then
> 1. Throw a _Module Not Found_ error.
> 9. Set _resolved_ to the real path of _resolved_.
> 10. Let _format_ be the result of **ESM\_FORMAT**(_resolved_).
> 11. Load _resolved_ as module format, _format_.
> 12. Return _resolved_.
> 6. Let _format_ be **undefined**.
> 7. If _resolved_ is a _"file:"_ URL, then
> 1. If _resolved_ contains any percent encodings of _"/"_ or _"\\"_ (_"%2F"_
> and _"%5C"_ respectively), then
> 1. Throw an _Invalid Module Specifier_ error.
> 2. If the file at _resolved_ is a directory, then
> 1. Throw an _Unsupported Directory Import_ error.
> 3. If the file at _resolved_ does not exist, then
> 1. Throw a _Module Not Found_ error.
> 4. Set _resolved_ to the real path of _resolved_, maintaining the
> same URL querystring and fragment components.
> 5. Set _format_ to the result of **ESM\_FILE\_FORMAT**(_resolved_).
> 8. Otherwise,
> 1. Set _format_ the module format of the content type associated with the
> URL _resolved_.
> 9. Load _resolved_ as module format, _format_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is describing Node’s internal behavior, but does this section correspond only with the user loader resolve hook, or both the resolve and load hooks? Because for user loaders, resolve returning format is optional; it’s load that does the definitive determination of format, to handle cases like an HTTPS loader where we don’t know the format until we get a network response with a Content-Type header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpreted this the same as Geoffrey—that resolve's format is authoritative rather than suggestive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this isn't really specifyin a resolve hook at this point so much as just specifying how to load overall. I wouldn't interpret resolve as being equivalent to any concept of a resolve hook here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have a suggestion for how to do so, but if you can think of a way to make it clearer that this doesn’t correspond directly with the resolve hook, I think that would make this better. The fact that the section heading is **ESM\_RESOLVE**(_specifier_, _parentURL_), when those are two of the arguments passed into the resolve hook, seems to strongly imply a similarity.

Or maybe the spec could correlate with the resolve and load hooks, describing the hooks’ “default” behavior? Then when people write their own versions, they have a reference of how those hooks “should” be if they were to follow Node’s example. I guess that would mean splitting this section into two, one for resolve and one for load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to think of this is more like HOST_RESOLVE_IMPORTED_MODULE in being a single resolver that covers the requirements like it does in ECMA262, as opposed to any fine-grained hooks.


**PACKAGE\_RESOLVE**(_packageSpecifier_, _parentURL_)

> 1. Let _packageName_ be **undefined**.
> 2. If _packageSpecifier_ is an empty string, then
> 1. Throw an _Invalid Module Specifier_ error.
> 3. If _packageSpecifier_ does not start with _"@"_, then
> 3. If _packageSpecifier_ is a Node.js builtin module name, then
> 1. Return the string _"node:"_ concatenated with _packageSpecifier_.
> 4. If _packageSpecifier_ does not start with _"@"_, then
> 1. Set _packageName_ to the substring of _packageSpecifier_ until the first
> _"/"_ separator or the end of the string.
> 4. Otherwise,
> 5. Otherwise,
> 1. If _packageSpecifier_ does not contain a _"/"_ separator, then
> 1. Throw an _Invalid Module Specifier_ error.
> 2. Set _packageName_ to the substring of _packageSpecifier_
> until the second _"/"_ separator or the end of the string.
> 5. If _packageName_ starts with _"."_ or contains _"\\"_ or _"%"_, then
> 6. If _packageName_ starts with _"."_ or contains _"\\"_ or _"%"_, then
> 1. Throw an _Invalid Module Specifier_ error.
> 6. Let _packageSubpath_ be _"."_ concatenated with the substring of
> 7. Let _packageSubpath_ be _"."_ concatenated with the substring of
> _packageSpecifier_ from the position at the length of _packageName_.
> 7. If _packageSubpath_ ends in _"/"_, then
> 8. If _packageSubpath_ ends in _"/"_, then
> 1. Throw an _Invalid Module Specifier_ error.
> 8. Let _selfUrl_ be the result of
> 9. Let _selfUrl_ be the result of
> **PACKAGE\_SELF\_RESOLVE**(_packageName_, _packageSubpath_, _parentURL_).
> 9. If _selfUrl_ is not **undefined**, return _selfUrl_.
> 10. If _packageSubpath_ is _"."_ and _packageName_ is a Node.js builtin
> module, then
> 1. Return the string _"node:"_ concatenated with _packageSpecifier_.
> 10. If _selfUrl_ is not **undefined**, return _selfUrl_.
> 11. While _parentURL_ is not the file system root,
> 1. Let _packageURL_ be the URL resolution of _"node\_modules/"_
> concatenated with _packageSpecifier_, relative to _parentURL_.
> 2. Set _parentURL_ to the parent folder URL of _parentURL_.
> 3. If the folder at _packageURL_ does not exist, then
> 1. Continue the next loop iteration.
> 4. Let _pjson_ be the result of **READ\_PACKAGE\_JSON**(_packageURL_).
> 5. If _pjson_ is not **null** and _pjson_._exports_ is not **null** or
> **undefined**, then
> 1. Return the result of **PACKAGE\_EXPORTS\_RESOLVE**(_packageURL_,
> _packageSubpath_, _pjson.exports_, _defaultConditions_).
> 6. Otherwise, if _packageSubpath_ is equal to _"."_, then
> 1. If _pjson.main_ is a string, then
> 1. Return the URL resolution of _main_ in _packageURL_.
> 7. Otherwise,
> 1. Return the URL resolution of _packageSubpath_ in _packageURL_.
> 1. Let _packageURL_ be the URL resolution of _"node_modules/"_
> concatenated with _packageSpecifier_, relative to _parentURL_.
> 2. Set _parentURL_ to the parent folder URL of _parentURL_.
> 3. If the folder at _packageURL_ does not exist, then
> 1. Continue the next loop iteration.
> 4. Let _pjson_ be the result of **READ\_PACKAGE\_JSON**(_packageURL_).
> 5. If _pjson_ is not **null** and _pjson_._exports_ is not **null** or
> **undefined**, then
> 1. Return the result of **PACKAGE\_EXPORTS\_RESOLVE**(_packageURL_,
> _packageSubpath_, _pjson.exports_, _defaultConditions_).
> 6. Otherwise, if _packageSubpath_ is equal to _"."_, then
> 1. If _pjson.main_ is a string, then
> 1. Return the URL resolution of _main_ in _packageURL_.
> 7. Otherwise,
> 1. Return the URL resolution of _packageSubpath_ in _packageURL_.
> 12. Throw a _Module Not Found_ error.

**PACKAGE\_SELF\_RESOLVE**(_packageName_, _packageSubpath_, _parentURL_)
Expand Down Expand Up @@ -1239,18 +1243,20 @@ _internal_, _conditions_)
> _"/"_ and is not a valid URL, then
> 1. If _pattern_ is **true**, then
> 1. Return **PACKAGE\_RESOLVE**(_target_ with every instance of
> _"\*"_ replaced by _subpath_, _packageURL_ + _"/"_)\_.
> _"\*"_ replaced by _subpath_, _packageURL_ + _"/"_).
> 2. Return **PACKAGE\_RESOLVE**(_target_ + _subpath_,
> _packageURL_ + _"/"_)\_.
> _packageURL_ + _"/"_)_.
> 2. Otherwise, throw an _Invalid Package Target_ error.
> 3. If _target_ split on _"/"_ or _"\\"_ contains any _"."_, _".."_ or
> _"node\_modules"_ segments after the first segment, throw an
> _Invalid Package Target_ error.
> _"node\_modules"_ segments after the first segment, case insensitive and
> including percent encoded variants, throw an _Invalid Package Target_
> error.
> 4. Let _resolvedTarget_ be the URL resolution of the concatenation of
> _packageURL_ and _target_.
> 5. Assert: _resolvedTarget_ is contained in _packageURL_.
> 6. If _subpath_ split on _"/"_ or _"\\"_ contains any _"."_, _".."_ or
> _"node\_modules"_ segments, throw an _Invalid Module Specifier_ error.
> _"node\_modules"_ segments, case insensitive and including percent
> encoded variants, throw an _Invalid Module Specifier_ error.
> 7. If _pattern_ is **true**, then
> 1. Return the URL resolution of _resolvedTarget_ with every instance of
> _"\*"_ replaced with _subpath_.
Expand Down Expand Up @@ -1283,19 +1289,21 @@ _internal_, _conditions_)
> 4. Otherwise, if _target_ is _null_, return **null**.
> 5. Otherwise throw an _Invalid Package Target_ error.

**ESM\_FORMAT**(_url_)
**ESM\_FILE\_FORMAT**(_url_)

> 1. Assert: _url_ corresponds to an existing file.
> 2. Let _pjson_ be the result of **READ\_PACKAGE\_SCOPE**(_url_).
> 3. If _url_ ends in _".mjs"_, then
> 1. Return _"module"_.
> 4. If _url_ ends in _".cjs"_, then
> 1. Return _"commonjs"_.
> 5. If _pjson?.type_ exists and is _"module"_, then
> 5. If _url_ ends in _".json"_, then
> 1. Return _"json"_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t the case unless --experimental-json-modules is enabled; and even then, assert { type: 'json' } will be required, which I’d think should be mentioned here. Perhaps this would be better added when #40250 lands?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this spec doesn't mandate the assertion doesn't mean Node.js core can't mandate the assertion. It's more a question of layering I think. I'm definitely open to explicitly calling out that the assertion is mandated if we have full consensus on that, just wanted to do the minimal change initially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is more that this spec seems to imply that Node can’t mandate the assertion. As in, if the URL ends in .json, that’s the end of the story: the spec says so. I would maybe revise this to:

> 5. If _url_ ends in _".json"_, and the import assertion _type_ is _"json"_, then
     1. Return _"json"_.

It already went to the TSC that the assertion will be mandated, at least at first (when the PR that adds support lands). If/when it ever becomes optional, we can update this spec accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would need to also include the initial _assertions_ parameter as an explicit argument into the resolution function to do this, perhaps a follow-up PR then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s part of #40250. We could maybe include the spec update in that PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I interpreted that was file name ending in ".json" is the browser equivalent of having a response header Content-Type: application/json. To me, the fact a .json file is considered as JSON by Node.js spec does not mean it can be imported without an assertion (the same way the HTML spec do not accept to load a module if the Content-Type is application/json). I don't think we should make the _assertions_ parameter as an explicit argument into the resolution function, a .json file should always be considered as JSON, the assertion check should come after that.

> 6. If _pjson?.type_ exists and is _"module"_, then
> 1. If _url_ ends in _".js"_, then
> 1. Return _"module"_.
> 2. Throw an _Unsupported File Extension_ error.
> 6. Otherwise,
> 7. Otherwise,
> 1. Throw an _Unsupported File Extension_ error.

**READ\_PACKAGE\_SCOPE**(_url_)
Expand Down
77 changes: 32 additions & 45 deletions lib/internal/modules/esm/resolve.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ const {
ObjectGetOwnPropertyNames,
ObjectPrototypeHasOwnProperty,
RegExp,
RegExpPrototypeExec,
RegExpPrototypeSymbolReplace,
RegExpPrototypeTest,
SafeMap,
Expand Down Expand Up @@ -360,9 +361,10 @@ const encodedSepRegEx = /%2F|%5C/i;
/**
* @param {URL} resolved
* @param {string | URL | undefined} base
* @param {boolean} preserveSymlinks
* @returns {URL | undefined}
*/
function finalizeResolution(resolved, base) {
function finalizeResolution(resolved, base, preserveSymlinks) {
if (RegExpPrototypeTest(encodedSepRegEx, resolved.pathname))
throw new ERR_INVALID_MODULE_SPECIFIER(
resolved.pathname, 'must not include encoded "/" or "\\" characters',
Expand Down Expand Up @@ -393,6 +395,17 @@ function finalizeResolution(resolved, base) {
path || resolved.pathname, base && fileURLToPath(base), 'module');
}

if (!preserveSymlinks) {
const real = realpathSync(path, {
[internalFS.realpathCacheKey]: realpathCache
});
const { search, hash } = resolved;
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}
Comment on lines +403 to +407
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I missed the ship (I was away this weekend): I think a code comment would be useful here, something like

Suggested change
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}
// preserve the original extras onto the true fileUrl
resolved =
pathToFileURL(real + (StringPrototypeEndsWith(path, sep) ? '/' : ''));
resolved.search = search;
resolved.hash = hash;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed more comments are always good - let's follow up on this.


return resolved;
}

Expand Down Expand Up @@ -444,7 +457,8 @@ function throwInvalidPackageTarget(
internal, base && fileURLToPath(base));
}

const invalidSegmentRegEx = /(^|\\|\/)(\.\.?|node_modules)(\\|\/|$)/;
const invalidSegmentRegEx = /(^|\\|\/)((\.|%2e)(\.|%2e)?|(n|%6e|%4e)(o|%6f|%4f)(d|%64|%44)(e|%65|%45)(_|%5f)(m|%6d|%4d)(o|%6f|%4f)(d|%64|%44)(u|%75|%55)(l|%6c|%4c)(e|%65|%45)(s|%73|%53))(\\|\/|$)/i;
const invalidPackageNameRegEx = /^\.|%|\\/;
const patternRegEx = /\*/g;

function resolvePackageTargetString(
Expand Down Expand Up @@ -777,13 +791,9 @@ function parsePackageName(specifier, base) {
specifier : StringPrototypeSlice(specifier, 0, separatorIndex);

// Package name cannot have leading . and cannot have percent-encoding or
// separators.
for (let i = 0; i < packageName.length; i++) {
if (packageName[i] === '%' || packageName[i] === '\\') {
validPackageName = false;
break;
}
}
// \\ separators.
if (RegExpPrototypeExec(invalidPackageNameRegEx, packageName) !== null)
validPackageName = false;

if (!validPackageName) {
throw new ERR_INVALID_MODULE_SPECIFIER(
Expand All @@ -803,6 +813,9 @@ function parsePackageName(specifier, base) {
* @returns {URL}
*/
function packageResolve(specifier, base, conditions) {
if (NativeModule.canBeRequiredByUsers(specifier))
return new URL('node:' + specifier);

const { packageName, packageSubpath, isScoped } =
parsePackageName(specifier, base);

Expand Down Expand Up @@ -879,9 +892,10 @@ function shouldBeTreatedAsRelativeOrAbsolutePath(specifier) {
* @param {string} specifier
* @param {string | URL | undefined} base
* @param {Set<string>} conditions
* @param {boolean} preserveSymlinks
* @returns {URL}
*/
function moduleResolve(specifier, base, conditions) {
function moduleResolve(specifier, base, conditions, preserveSymlinks) {
// Order swapped from spec for minor perf gain.
// Ok since relative URLs cannot parse as URLs.
let resolved;
Expand All @@ -896,7 +910,9 @@ function moduleResolve(specifier, base, conditions) {
resolved = packageResolve(specifier, base, conditions);
}
}
return finalizeResolution(resolved, base);
if (resolved.protocol !== 'file:')
return resolved;
return finalizeResolution(resolved, base, preserveSymlinks);
}

/**
Expand Down Expand Up @@ -968,28 +984,6 @@ function defaultResolve(specifier, context = {}, defaultResolveUnused) {
}
}
}
let parsed;
try {
parsed = new URL(specifier);
if (parsed.protocol === 'data:') {
return {
url: specifier
};
}
} catch {}
if (parsed && parsed.protocol === 'node:')
return { url: specifier };
if (parsed && parsed.protocol !== 'file:' && parsed.protocol !== 'data:')
throw new ERR_UNSUPPORTED_ESM_URL_SCHEME(parsed);
if (NativeModule.canBeRequiredByUsers(specifier)) {
return {
url: 'node:' + specifier
};
}
if (parentURL && StringPrototypeStartsWith(parentURL, 'data:')) {
// This is gonna blow up, we want the error
new URL(specifier, parentURL);
}

const isMain = parentURL === undefined;
if (isMain) {
Expand All @@ -1008,7 +1002,8 @@ function defaultResolve(specifier, context = {}, defaultResolveUnused) {
conditions = getConditionsSet(conditions);
let url;
try {
url = moduleResolve(specifier, parentURL, conditions);
url = moduleResolve(specifier, parentURL, conditions,
isMain ? preserveSymlinksMain : preserveSymlinks);
} catch (error) {
// Try to give the user a hint of what would have been the
// resolved CommonJS module
Expand All @@ -1032,17 +1027,9 @@ function defaultResolve(specifier, context = {}, defaultResolveUnused) {
throw error;
}

if (isMain ? !preserveSymlinksMain : !preserveSymlinks) {
const urlPath = fileURLToPath(url);
const real = realpathSync(urlPath, {
[internalFS.realpathCacheKey]: realpathCache
});
const old = url;
url = pathToFileURL(
real + (StringPrototypeEndsWith(urlPath, sep) ? '/' : ''));
url.search = old.search;
url.hash = old.hash;
}
if (url.protocol !== 'file:' && url.protocol !== 'data:' &&
url.protocol !== 'node:')
throw new ERR_UNSUPPORTED_ESM_URL_SCHEME(url);

return { url: `${url}` };
}
Expand Down
4 changes: 4 additions & 0 deletions test/es-module/test-esm-exports.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ import fromInside from '../fixtures/node_modules/pkgexports/lib/hole.js';
// Even though 'pkgexports/sub/asdf.js' works, alternate "path-like"
// variants do not to prevent confusion and accidental loopholes.
['pkgexports/sub/./../asdf.js', './sub/./../asdf.js'],
// Cannot reach into node_modules, even percent encoded
['pkgexports/sub/no%64e_modules', './sub/no%64e_modules'],
// Cannot backtrack below exposed path, even with percent encoded "."
['pkgexports/sub/%2e./asdf', './asdf'],
]);

for (const [specifier, subpath] of undefinedExports) {
Expand Down