Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESM Loader Hooks (20.8) should support a more testable, less stateful design #50042

Closed
ggoodman opened this issue Oct 4, 2023 · 12 comments · May be fixed by #50044
Closed

ESM Loader Hooks (20.8) should support a more testable, less stateful design #50042

ggoodman opened this issue Oct 4, 2023 · 12 comments · May be fixed by #50044
Labels
feature request Issues that request new features to be added to Node.js. stale

Comments

@ggoodman
Copy link
Contributor

ggoodman commented Oct 4, 2023

What is the problem this feature will solve?

The current ESM Loader Hooks are defined as modules having the following named exports: initialize(), resolve() and load(). The initialize() function is passed data specified when calling register(). The intent is that the other two hooks can thus also leverage this data in extensible, future-proof ways.

The problem that this introduces is that the ESM Load Hook module is now a stateful singleton because it encourages designs wherein initialization state is shared at the module level. This means that any tests looking to exercise the loader module needs to deal with ESM cache invalidation. It also makes it slightly less ergonomic (and type-safe!) to set and access any needed shared state.

What is the feature you are proposing to solve the problem?

I propose a slight change (or addition) to the proposed design. The change I'd like to propose is that the initialize() function should be able to produce the object implementing the other two components of the interface: resolve() and load().

In this way:

  1. Tests looking to test the loader's implementation can easily produce multiple instances of the loader for testing different inputs and scenarios in isolation.
  2. State can be passed in a simple, immutable and type-safe manner between the initialize() call and the resolve() and load() hooks.
  3. A Loader could then be implemented as a class that is instantiated and returned by the initialize() function. The class can hold shared state in a private field or instance properties. Alternatively, the initialize() closure could also be used to hold this state in a way that the two other hooks can easily access.
export async function initialize(data) {
  return new MyLoader(data);
}

class MyLoader {
  #port;
  
  constructor(data) {
    this.#port = data.port;
  }
  
  resolve(specifier, context, nextResolve) {
    this.#port.postMessage(['resolve', specifier, context]);
    // No-op, defer to built-in
    return nextResolve(specifier, context);
  }
  
  load(url, context, nextLoad) {
    this.#port.postMessage(['load', url, context]);
    // No-op, defer to built-in
    return nextLoad(url, context);
  } 
}

What alternatives have you considered?

No response

@ggoodman ggoodman added the feature request Issues that request new features to be added to Node.js. label Oct 4, 2023
@github-project-automation github-project-automation bot moved this to Pending Triage in Node.js feature requests Oct 4, 2023
ggoodman added a commit to ggoodman/node that referenced this issue Oct 4, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
ggoodman added a commit to ggoodman/node that referenced this issue Oct 4, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
@GeoffreyBooth
Copy link
Member

@nodejs/loaders

ggoodman added a commit to ggoodman/node that referenced this issue Oct 4, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
@ggoodman
Copy link
Contributor Author

ggoodman commented Oct 4, 2023

BTW, I threw #50044 together to explore what a back-compatible approach would look like for this.

ggoodman added a commit to ggoodman/node that referenced this issue Oct 4, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
@ggoodman
Copy link
Contributor Author

ggoodman commented Oct 19, 2023

@GeoffreyBooth, here are some higher-level observations about the current loader API:

  1. The load hook doesn't have access to a hooked resolve function. This means a load hook can't, for example, attempt to resolve a tsconfig.json or other config file using the loader's own machinery. It must use a parallel approach.
  2. The resolve hook doesn't have access to a hooked load function. Similar to the above, the resolve hook might want to load the content of a config file that would influence module resolution. Using the loader's own machinery would be advantageous over having a parallel construction.
  3. The initialize, load and resolve hooks are top-level functions whose only option for extensibility is through new arguments. By extensibility, I mean unforeseen use-cases or data that needs to be passed into the hook. This is not strictly bad but feels like a missed opportunity.
  4. The use of a module instance whose exports form the loader interface means:
    a. Sharing state between the initialize and resolve / load hooks must rely on module-level state that is hard to evict. This makes testing and de-registering loaders unnecessarily complicated.
    b. It is difficult to cleanly add type-safety to a loader since the interface is the module. We don't have facilities for asserting that an entire modules's exports adhere to a desired interface.
  5. Subjectively, the shortCircuit vs nextResolve / nextLoad interaction is very confusing and clunky. Why do we need shortCircuit?

I've started speccing out what I think an alternative design might look like and as I'm attempting to port the CoffeScript loader from the docs over, I'm seeing some of the motivation for the nextResolve and nextLoad functions. I have run out of time and brain-power trying to make that work today. Hope to pick it up tomorrow.

@GeoffreyBooth
Copy link
Member

@ggoodman I’m warming to the idea of initialize returning something whose resolve and load properties define those hooks. That can be an elegant way to provide a loader “interface” as you’re describing, and if we still permit the top-level exports then it’s not breaking. This API has been around for a few years now and we’ve gotten some strong pushback the previous times we’ve changed it, so I think at this point all changes should be additive unless there’s a strong motivation otherwise. “We could’ve thought of something better” isn’t quite strong enough in my opinion. But we can support both styles.

As for load having access to resolve and vice versa, they do have access in the sense that either hook’s body can call import() which would kick off a new module resolution and loading pipeline. They can’t access the resolve portion or the load portion of that pipeline individually, but I’m not sure how it would make sense to do so; how would that work with chained hooks?

And for your #5, chaining is the reason we need those things. We want the ability to register multiple sets of hooks, like for example just as people today often do --require tsconfig-paths/register --require ts-node/register we want to support the equivalent via --import/register().

@isaacs
Copy link
Contributor

isaacs commented Oct 20, 2023

@ggoodman

a. Sharing state between the initialize and resolve / load hooks must rely on module-level state that is hard to evict. This makes testing and de-registering loaders unnecessarily complicated.
b. It is difficult to cleanly add type-safety to a loader since the interface is the module. We don't have facilities for asserting that an entire modules's exports adhere to a desired interface.

You might find node-tap's loader interfaces informative here.

I've been putting these hook methods in a hooks.mts file (so that I can export one set with initialize/resolve/load, and another with globalPreload/resolve/load), and testing with tap's t.mockImport(). This makes it pretty much the same difficulty to test as it would be to test a regular function, since each invocation has a new unique context and even its own mocked dependencies.

The exported types in @types/node cover the loader hooks interface pretty comprehensively, and if you have type checking on in your tests, I'm not sure how much more type-tested it could really be. Maybe in a perfect world, node would have a way to verify the types at run time, but that's not how ts works. 🤷‍♂️

@giltayar
Copy link
Contributor

If it was just to help with testing, I would not have been for this change, as there are other ways to deal with tests here.

But!

We still have an open issue regarding communication with hook (e.g. an api that changes thing in the hook), when there are multiple worker threads.

If we call initialize for each worker thread, and each worker thread has their own port and resolve and initialize, then this is a problem solved as each worker thread will have its own port, and the hook will have their own state per worker thread.

So not am I only not for this idea, I think it's a must have to solve an open issue in loaders.

@ggoodman
Copy link
Contributor Author

and testing with tap's t.mockImport(). This makes it pretty much the same difficulty to test as it would be to test a regular function

@isaacs That sounds quite awesome TBH because it's much higher fidelity than the sorts of tests I'm proposing. At the same time, it relies on using a smaller subset of test frameworks that support this sort of isolation. I don't think Node's own test runner even has this sort of advanced functionality yet.

@ggoodman
Copy link
Contributor Author

We still have an open issue regarding communication with hook (e.g. an api that changes thing in the hook), when there are multiple worker threads.

If we call initialize for each worker thread, and each worker thread has their own port and resolve and initialize, then this is a problem solved as each worker thread will have its own port, and the hook will have their own state per worker thread.

Agreed @giltayar, I don't think the current design has a good story for instantiation. In other words, the forced reliance on module-level state means that the same loader can't be instantiated multiple times with different configs.

Of course, there are probably some tricky ways to get around this but the current API seems unnecessarily difficult for such use-cases.

@ggoodman
Copy link
Contributor Author

Happy to rebase #50044 and spike out the docs changes if there seems to be momentum in favour of the change.

@GeoffreyBooth
Copy link
Member

If we call initialize for each worker thread, and each worker thread has their own port and resolve and initialize, then this is a problem solved as each worker thread will have its own port, and the hook will have their own state per worker thread.

Can the PR add some tests for this? This would be good to have coverage around.

We might want to add an example in the docs for worker threads if supporting them is something that hook authors will need to explicitly enable.

As for the initialize return value, I would just like to be able to continue adding to it in the future. So for now maybe it supports only { resolve: Function; load: Function } but if we discover new needs for it in the future, we can add more properties.

Happy to rebase #50044 and spike out the docs changes if there seems to be momentum in favour of the change.

Yes please!

ggoodman added a commit to ggoodman/node that referenced this issue Oct 24, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
ggoodman added a commit to ggoodman/node that referenced this issue Oct 24, 2023
This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
Copy link
Contributor

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the never-stale Mark issue so that it is never considered stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment.
For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Apr 18, 2024
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants