Skip to content
This repository was archived by the owner on Sep 2, 2023. It is now read-only.

Loaders: resolved vs. referencing URL #140

Closed
SMotaal opened this issue Jun 27, 2018 · 17 comments
Closed

Loaders: resolved vs. referencing URL #140

SMotaal opened this issue Jun 27, 2018 · 17 comments

Comments

@SMotaal
Copy link

SMotaal commented Jun 27, 2018

The current experimental module resolve loader hook returns an Object { format, url } resulting in a mapping of the (original) referencing URL to a new resolved URL which satisfies idempotency requirements for subsequent calls to the same (original) host resolved URL.

While this is sufficient for many cases, sometimes it resolved URL's merely point to cached files which are generated for temporary use. In such cases, the specifiers inside those files may be invalid, that is to say they too must first be resolved relative to the (original) referencing URL which might follow the same resolution path or more importantly may resolve through a completely different path.

One way to over come this is to let the custom loaders handle it, which imho from all my attempts very degrading to performance, reliability and maybe even security.

The other recommendation is to allow resolvers to return Object { format, url, mapped? } which adds an optional mapped field that if present and is true then resolving within the module itself would be relative to the (original) referencing URL instead of the actual resolved URL.

A closing thought to ponder, while static imports can be rewritten (somewhat of pain), dynamic imports bring about a whole level of complexity when left to be handled by custom loaders (compared to the recommended solution).

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

Re: #127 (comment)

@bmeck
Copy link
Member

bmeck commented Jun 27, 2018

One way to over come this is to let the custom loaders handle it, which imho from all my attempts very degrading to performance, reliability and maybe even security.

Can you clarify the "it" in this section.

A closing thought to ponder, while static imports can be rewritten (somewhat of pain), dynamic imports bring about a whole level of complexity when left to be handled by custom loaders (compared to the recommended solution).

You can instrument dynamic loaders like https://bmeck.github.io/node-sw-compat-loader-test/dist/ but I agree that it is not pleasant.

My general fix it suggestion for the import.meta.url problem is to just rewrite the import.meta.url at the top of your transpiled file:

import.meta.url = "https://example.invalid";
// transpiled code here

However, this is actually part of a series of concerns that fall out of the issues described in #62 and #83 . In particular, the web will be doing caching and setting of import.meta.url differently than Node.

We need to expose the cache key to store a resource under which is currently not available in our implemented Loader design. However, I think that we can fix this by exposing the cache key that will story the related resource. This is part of the design I am wishing to talk about in #135 . The altered design to return the cache key does retain the problem of setting import.meta.url but that is able to be done at the top of a transpiled file.

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

https://github.com/nodejs/node/blob/4750ce26f2e6079b5fee92bdee5356c279171d22/lib/internal/modules/esm/translators.js#L26-L34

// Strategy for loading a standard JavaScript module
translators.set('esm', async (url) => {
  const source = `${await readFileAsync(new URL(url))}`;
  debug(`Translating StandardModule ${url}`);
  return {
    module: new ModuleWrap(stripShebang(source), url),
    reflect: undefined
  };
});

This is the part where I normally distinguish between the resolvedURL vs referencingURL when experimenting beyond the safety zone.

This effectively allows the wrapped module to know of it's referencingURL and not know about the resolvedURL (ie mapping in Node's Loader)

This will require some change in the mapping abstraction in Node's Loader to effectively recognize mapped versus unmapped (ie a mapped module is one that always resolves relative to the original URL irrespective of the resolvedURL which was returned by resolve(specifier, parentURL) as an Object { format: 'esm', url: ${resolvedURL}, mapped: true }.

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

Can you clarify the "it" in this section.

I was referring to how with the current Loader hooks, custom loaders that wish to resolve to an arbitrary file (namely one in something like /.cache/<hash>) will have to rewrite relative static and/or dynamic specifiers by implementing it using their own wisdom and magical powers.

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

However, this is actually part of a series of concerns that fall out of the issues described in #62 and #83 . In particular, the web will be doing caching and setting of import.meta.url differently than Node.

I was completely blown away when you introduced Loader last year not just because we all want real ESM but more importantly because of how your implementation of all static aspects was completely non-intrusive. So obviously for experimental purposes I have not issues with source text padding, but for real-world, I think that anything that is not runtime (aside from annotations like source maps) should not alter the source text because this opens the door to unresolved issues that tend to go across stream.

@bmeck
Copy link
Member

bmeck commented Jun 27, 2018

I'm not super happy with my solutions that I'm coming up with that enforce a more tightly couple aspect to import.meta.url and the cache; but, if loaders ship a full response including the expected cache key it should still stay static and able to be replayed unless I missed something.

@guybedford
Copy link
Contributor

@bmeck my only worry is that RESOLVE(pkgA, /pkgB/index.js) and RESOLVE(pkgA, /pkgC/index.js) is two calls to resolve that won't necessarily know which is called first when it comes to providing the data for pkgA, leading to possible unnecessary work / race condition issues.

@bmeck
Copy link
Member

bmeck commented Jun 27, 2018

@guybedford can you clarify? I'm not entirely sure I understand.

@guybedford
Copy link
Contributor

For a module already loaded, when resolving it from a new parent, data in resolve might be doing unnecessary work.

@bmeck
Copy link
Member

bmeck commented Jun 27, 2018

@guybedford I would assume the extra data would probably be able to be loaded from some sort of cache. I added it as a mechanism to allow loaders to preserve the data being generated, so it seems to be at odds if the loader does not intentionally cache it in some way. I think we can create recommendations on how to avoid duplication, but some things that Node may not directly integrate with like putting an AST in there greatly can speed things up when passing between loaders.

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

We need to expose the cache key to store a resource under which is currently not available in our implemented Loader design. However, I think that we can fix this by exposing the cache key that will story the related resource. This is part of the design I am wishing to talk about in #135 . The altered design to return the cache key does retain the problem of setting import.meta.url but that is able to be done at the top of a transpiled file.

So I will catch up on #135 to keep the related matters there, but I just wanted to point out a special aspect in the scenarios I mentioned in this issue.

Here the only control a hook has is limited to opt to declare a resolution as either symbolic mapping (import.meta.url retains the original resolve(referenceURL = new URL(specifier, referrer))) instead of the concrete mapping (import.meta.url becomes the resolvedURL = resolve(specifier, referrer))

This means that they control this boolean and the URL that will be passed to the loader, but they do not control the referencingURL which allows the Loader to lock in the mappings as soon as the ModuleWrap for the resolvedURL is constructed (free of static errors).

From my limited familiarity with how cache keys can potentially operate, I am of the opinion that they might be overkill and possibly not as effective in eliminating unnecessary redundancies, but I can totally be ignorant here.

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

I added it as a mechanism to allow loaders to preserve the data being generated

@bmeck can you elaborate on this more...

From my own perspective, as the creator/user of custom loaders, I imagine that for replay I node --loader <my chaining loader if Node does not yet chain> <my parentURL-centric resolving loaders> <my recording loader> <my symbolic mapping loaders> and the recording loader would be something like this:

const records = new Map();

export function capture(record) {
  const { referencingURL, resolved: { format, url, ...} } = record

  // logic that saves resources as needed 
}

export function resolve(specifier, referrer, nextResolver) {
  let referencingURL = new URL(specifier, referrer);
  let record = records.get(referencingURL);

  if (!record) {
    const resolved = nextResolver(referencingURL);
    record = { referencingURL, resolved };

    // No errors so far
    capture(record);

    records.set(referencingURL, record);
  }

  return record.resolved;
}

@SMotaal
Copy link
Author

SMotaal commented Jun 27, 2018

So I guess that for chaining purposes I would actually revise my initial thoughts on mapping being just a boolean to mapping being a string that must be identical to the original referencingURL, which means that chains symbolic mappings remain idempotent.

@bmeck
Copy link
Member

bmeck commented Jun 27, 2018

@SMotaal

per #140 (comment)

I'm not sure you want let referencingURL = new URL(specifier, referrer);, since the parent loader you call out to might not want those combined (for example if the specifier /foo removed the pathname from the referrer).

So I guess that for chaining purposes I would actually revise my initial thoughts on mapping being just a boolean to mapping being a string that must be identical to the original referencingURL, which means that chains symbolic mappings remain idempotent.

I'm confused due to comment above this. You shouldn't be combining data in a lossy manner to form a decisions about if the request has been seen before.

I'd expect it to be more like:

export function save(record) {
  // logic that saves a record to the cache (in memory/disk/db/etc.)
}

export function load(key) {
  // logic that returns a record or null
}

export function resolve({specifier, referrer, data}, nextResolver) {
  // only need to key off of data we pass to the parent
  // needs to stay lossless
  let key = someKindOfCompositeKeyOperation(specifier, referrer, data);

  let record = load(key)
  if (!record) {
    const resolved = nextResolver({specifier, referrer, data});
    record = { key, resolved };

    // No errors so far
    save(record);
  }

  return record.resolved;
}

@SMotaal
Copy link
Author

SMotaal commented Jul 20, 2018

@bmeck Just say this... now that I explored new chaining ideas, I was wondering if the passed arguments should be stateful (like a miniature request object but not nearly as intense complicated)

@MylesBorins
Copy link
Contributor

Is there more to be discussed in this issue?

@SMotaal
Copy link
Author

SMotaal commented Sep 26, 2018

I think this is no longer relevant for the time being… Will reopen if needed 👍

@SMotaal SMotaal closed this as completed Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants