Maintaining hook module registration for on-thread hooks #203

guybedford · 2024-06-20T21:35:28Z

One of the benefits of having a module.register hook module registration like we currently do for loaders in that it applies for all spawned workers is that by default, resolver, network and transpilation loaders automatically can be applied to all spawned workers.

It is being discussed to migrate to same-thread hooks as an option of the discussion in #201, while possibly also unifying with the synchronous hooks proposal in #198.

Currently the synchronous hooks proposal does not use register to register a hooks module, but instead takes arbitrary functions.

I'd like to suggest we consider maintaining something similar to our existing register API in providing a module for hook definitions, as this more naturally allows hooks to be shared between different loading contexts in a portable way.

This is separate to the question of hook customization across contexts, which could also in turn have manual configuration - for example some kind of pattern like new Worker(path, { register: 'custom-module' }) or new Worker(path, { hooks: 'blank' }) could be used at worker instantiation time. In addition it might even be possible to register hooks that don't get shared by default something like registerHooks(import.meta.resolve('./hooks.js'), { shared: false }).

Specifically, the default story of hooks using the same code implementation by default is what we get for workers today and were looking to get for a shared hooks worker. I think this would be a really valuable property to keep where the default user experience for writing loaders from transpilers to APM is that they do work across contexts, without that being a major structural change to the loader code only to be discovered too late in the loader authoring process.

The text was updated successfully, but these errors were encountered:

joyeecheung · 2024-06-20T22:10:23Z

To elaborate the current module.register() API looks like this:

// register.js
module.register('/path/to/hook.js');  // inherited by all child workers by default

// hook.js
export function resolve() { ... }

Where as I am proposing in #198 to switch to take the functions directly in the API, with a separate API to configure the inheritance:

// This tells Node.js that from now on all the child worker/process
// spawned from the current instance should preload this file.
process.preloadInChildren(__filename);  // or import.meta.filename

module.registerHooks({
  resolve() { ... }
});

process.preloadInChildren is an imaginary name and can be changed, or we can split or provide more granular control over child worker / child process behaviors (especially fork behaviors). This makes the inheritance opt-in and explicit, and also allows more control, e.g. if the hooks need to spawn workers via a dependency, but don't want the workers spawned by its dependency to load themselves, they can run the code that spawns the worker before calling process.preloadInChildren() with themselves, or temporarily disables the preload of itself via a different function when loading third-party plugins that may spawn workers.

This can also be used in other use cases that are not related to module hooks (e.g. APM hooks that only cares about generic stats like v8 heap statistics/event loop utilization in all children instances, or not patching modules to do tracing when it's turned off). We can also add APIs to allow more control for the preload scripts (querying, disabling).

GeoffreyBooth · 2024-06-20T22:44:11Z

cc @nodejs/loaders plus the participants of #103: @cspotcode @giltayar @arcanis @bumblehead @bizob2828 plus participants of nodejs/node#53332: @VoltrexKeyva @Flarna @alan-agius4

mcollina · 2024-06-21T06:56:07Z

I think we need granular control on how the loaders are passed from parent to children. Moreover, we need to consider the case of a child process/thread changing the hooks too. There are a few of edge cases to consider, and how a developer could "reset the chain".

ShogunPanda · 2024-06-21T08:17:23Z

I agree with @mcollina. I like the approach taken by @joyeecheung but we need to be able to diverge from it, even during runtime.
Thinks, for instance a library that spawns a worker thread with some hooks and then this thread spawns the real user code. The user might also want to extend or clean the hooks chain rather than just inheriting it.

joyeecheung · 2024-06-21T14:07:48Z

for instance a library that spawns a worker thread with some hooks and then this thread spawns the real user code. The user might also want to extend or clean the hooks chain rather than just inheriting it.

Do you mean:

Main thread: no hooks, spawns a worker with hooks (controlled by library)
Worker 2: has hooks (controlled by library), spawns user code
Worker 3: no hooks or extend hooks (controlled by end user)

I think with a flexible API it can be done as:

// Main thread
require('hooks.js')
new Worker('worker2.js', ...);

// hooks.js
process.preloadInChildren(__filename);
module.register({
  ...
  deregister() {
    process.removePreloadInChildren(__filename);
  }
})

// Worker 2
// Has preloaded hooks set by main thread
new Worker('user.js', ...);

// Worker 3
// Has preloaded hooks set by main thread
for (const hook of module.registeredHooks()) {
  hook.deregister();
}
// Or register new hooks with module.registeredHooks()
// Or wrap the hook manipulation code in a separate file
// and set it as preload in children if different behavior needs to be inherited

joyeecheung · 2024-06-21T14:47:36Z

Actually if we want to make it possible for the hooks to be dynamically enabled/disabled, we should add callbacks for users to customize what should be done when the hook is enabled/disabled.

mcollina · 2024-06-21T16:46:08Z

I'm more thinking something like:

new Workers('user.js', { hooks: [] })

or similar.

guybedford · 2024-06-21T17:56:00Z

@mcollina something like that would be great to see. I think there are two kinds of use cases we have here:

Spawning a worker and also creating a new compartment (to use TC39 terminology), where you want to control the hooks for that environment.
Spawning a worker that is within the same overall application compartment and hooks model, so that things like resolver hooks, transpilation hooks and APM attachments also work across the worker spawning.

My main concern here is in the case of (2) where it shouldn't be a requirement that every worker instantiation must be updated to set the hooks, instead having the default of compartment sharing for workers will significantly improve APM ergonomics.

So my preference would be for allowing customization, but with a good default experience.

Qard · 2024-06-21T18:48:43Z

I would have separate APIs for explicit hook passing and implicit sharing.

Explicit:

new Workers('user.js', { hooks: [] })

Implicit:

Worker.setDefaultHooks([])

I might even go as far as only allowing implicit hook inheritance via a cli flag. In my opinion it's a bit of an anti-pattern applying implicit changes to sub-environments without explicit action from the parent. While APM would like to inject itself into every thread, it's a bit of a security concern if child threads inherit code which was not explicitly shared with it and therefore perhaps not intended to be shared.

guybedford · 2024-06-21T19:15:59Z

In principle I'm all for a Worker.setDefaultHooks and a hooks argument to a worker but the thing is that that they both require registration of a hooks module - we can't just pass direct functions in.

My concern is that the existing status quo of the register API solves for hook portability, in a way that is compatible with setDefaultHooks and a hooks argument to worker, but if we drop this API for a direct function registration API then this again adds portability friction for worker use cases.

That is, I simply want us to maintain the existing design of a registered hooks module.

I could be okay with not making hook registration for workers the default, but the important point is that changing that default should be easy. For example one could imagine the register API supporting a custom option for this:

module.registerHooks(import.meta.resolve('./hooks/mjs'), { childWorkers: true });

While APM would like to inject itself into every thread, it's a bit of a security concern if child threads inherit code which was not explicitly shared with it and therefore perhaps not intended to be shared.

Worker threads are quite frankly not a security boundary. Can you elaborate on the specific security concern here? If anything security based loaders and global attenuation that acts as enforcing security properties of the environment would very much want to be setting childWorkers: true as a security guarantee.

guybedford · 2024-06-21T19:26:46Z

If there are concerns around module.registerHooks being async, perhaps we can make it a sync call like require(ESM) and disable top-level await for hooks modules and their dependencies similarly.

ShogunPanda · 2024-06-22T07:31:16Z

Do you mean:

Main thread: no hooks, spawns a worker with hooks (controlled by library)

Worker 2: has hooks (controlled by library), spawns user code

Worker 3: no hooks or extend hooks (controlled by end user)

Exactly, that was the scenario I was thinking about.

I think with a flexible API it can be done as:

...

I love that!

Flarna · 2024-06-24T06:36:07Z

I don't think we should rely on a single user to decide on propagating hooks to workers or not.
Loader hooks are often just an implementation detail of some tool. And these tools may be applied at different levels from different persons.
e.g.

App developer might decide to use ts-node or similar which might use loader hooks
Some framework like angular (?) used as dependency might use workers and hooks internally
App deployer might add an APM tool (e.g. without code change, via NODE_OPTIONS) which uses loader hooks internally
Dev might user workers/hooks for it's own application logic

I think each individual tool/library should have the possibility to set the hooks in a way it is required to operated correct. Delegating all into a single user tends to be tedious.

guybedford · 2024-06-24T17:01:47Z

To be clear - this is exactly what we have currently and I am only advocating to retain the status quo in Node.js. Each tool and library can choose how it sets hooks, there is no single hooks "owner".

All I am requesting is that we either make it a default to allow hooks to work on workers, or as a compromise, allow a simple flag to enable this feature such as workers: true when registering the hooks. Any code is still in a position to be able to register hooks though without having to go through some single gatekeeping process.

The constraint specifically is that hook functions are not transferrable to workers, and if workers are to have sync hooks on their own threads, we should design a worker hooks system that is by default transferrable, as opposed to by default not transferrable. The ultimate reason being one of encouraging workers, not adding more frictions to them (there are enough frictions with workers for JS, and it is our job to remove those to make the language more multi threaded and faster).

jsumners-nr · 2024-06-24T17:17:33Z

I can imagine an application that relies on an APM tool:

Not knowing how the APM tool does what it does
Spawning a worker and expecting to get traces from the code in the worker

Something like:

require('cool-apm')

console.log('app started')
const worker = new Worker('./some.js')

// wait for worker to do things

I know that the tool I work on promotes omitting that first line and starting the app like: node -r cool-apm app.js. So that's a thing to consider.

But we also don't really support threads.

In short, users are going to expect things to just work.

Qard · 2024-06-24T21:56:12Z

Most APMs don't really do much of anything about threads currently, or just fallback on -r loading a separate instance of their library into every thread, so I think we're fine so long as the --require/--import flags continue to work as they already do. It would be nice to have some additional mechanism for APMs to internally wire up worker support when loaded programmatically though. So a require('some-apm') could internally do Worker.useLoaderHooks(...) to auto-load the library into any spawned worker. Or better, having a method to bootstrap into every worker with an explicit separate connection from the main MessagePort.

joyeecheung · 2024-08-05T16:15:16Z

I noticed that with nodejs/node#53787 a hook-specific inheritance setting can be somewhat awkward, if we want to make that config to support "preload this file for all Node.js instances, main thread or workers". The best experience overall I think would be something like users specifying a noderc entry to preload cool-apm and by default, this gets loaded in all threads, which is going to be more akin to relying on -r inheritance described by @Qard just that it's configured with a file on disk. And then we could have some JS API for removing/querying loaded hooks in the current instance or changing preloads when spawning a worker. If the preloaded file registers the inheritance as part of module.registerHooks() it would be weird since there is already a preload config, so either we do some internal deduplication, which seems awkward, of users get the hooks loaded twice, which can be a bug.

joyeecheung · 2024-08-14T09:41:05Z

We discussed in the meeting this week and we agreed that we should let a dedicated preload configuration to handle this, ideally via something like nodejs/node#53787

To summarize why specifier + parentURL based inheritance doesn't work:

This adds the question of "how to resolve the loader" - module.register assumes the loader is resolved as ESM e.g. the matching exports condition is import, which doesn't really make sense for a loading API that's actually neither import() nor require(). It also follows the ESM resolution rules which are different from CJS resolution rules (e.g. NODE_PATH support) so it doesn't work well with the universalness that we are aiming. The resolved URL is also not deterministic (can be affected by registered loaders), so it doesn't really serve what I thought would be nice to have - a unique, predictable identifier for the hook.
For a dependency chain like instrumentation.js -> open-telemetry -> import-in-the-middle/hook.mjs (which gets passed to the registration API), automatically preloading import-in-the-middle/hook.mjs in child workers is meaningless without also preloading instrumentation.js and open-telemetry. But if the entire chain of instrumentation.js -> open-telemetry -> import-in-the-middle/hook.mjs is configured to be preloaded in child workers, having import-in-the-middle/hook.mjs automatically preloaded too is a conflict and leads to double registration.
Actually in the case of 2, IIUC that the registration model is already pretty broken for module.register, that is why to actually inherit the consuming code of the low-level hooks the loader worker has to be a singleton, but then a singleton means you cannot have different configurations for different workers/the main thread which was problematic and why there were regressions. The only way I can think of for this model to work is for all the consuming code to also take options that they are supposed to load via a URL instead of a user-provided object, like what module.register does, so that everything can be sent over to a new worker via URLs to code that should be evaluated again, and it should not have states that cannot be serialized and copied over to another worker. But from what I can tell no existing hook users actually do this pattern (this also requires them to use import() or require() to get the options, which is another can of worms), and asking them to change to this pattern is certainly breaking.

At the end of the day I think for auto registration of end module hooks to work it still needs automatic inheritance for arbitrary scripts that consume the low-level hook, so automatic inheritance of a low-level hook module itself is just redundant.

This was referenced Jun 22, 2024

ESM loaders cannot be defined via Worker option execArgv in v20 nodejs/node#47747

Closed

process: port on-exit-leak-free to core nodejs/node#53239

Merged

guybedford mentioned this issue Jun 23, 2024

doc: add synchronous hooks proposal #198

Open

GeoffreyBooth mentioned this issue Jun 24, 2024

Node.js Technical Steering Committee (TSC) Meeting 2024-06-26 nodejs/TSC#1584

Closed

mhdawson mentioned this issue Jun 24, 2024

Node.js Technical Steering Committee (TSC) Meeting 2024-06-26 nodejs/TSC#1585

Closed

joyeecheung mentioned this issue Aug 27, 2024

Support .noderc or similar file-based initialization configurations? nodejs/node#53787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintaining hook module registration for on-thread hooks #203

Maintaining hook module registration for on-thread hooks #203

guybedford commented Jun 20, 2024 •

edited

Loading

joyeecheung commented Jun 20, 2024 •

edited

Loading

GeoffreyBooth commented Jun 20, 2024

mcollina commented Jun 21, 2024

ShogunPanda commented Jun 21, 2024

joyeecheung commented Jun 21, 2024 •

edited

Loading

joyeecheung commented Jun 21, 2024

mcollina commented Jun 21, 2024

guybedford commented Jun 21, 2024

Qard commented Jun 21, 2024

guybedford commented Jun 21, 2024

guybedford commented Jun 21, 2024

ShogunPanda commented Jun 22, 2024

Flarna commented Jun 24, 2024

guybedford commented Jun 24, 2024 •

edited

Loading

jsumners-nr commented Jun 24, 2024

Qard commented Jun 24, 2024

joyeecheung commented Aug 5, 2024 •

edited

Loading

joyeecheung commented Aug 14, 2024 •

edited

Loading

Maintaining hook module registration for on-thread hooks #203

Maintaining hook module registration for on-thread hooks #203

Comments

guybedford commented Jun 20, 2024 • edited Loading

joyeecheung commented Jun 20, 2024 • edited Loading

GeoffreyBooth commented Jun 20, 2024

mcollina commented Jun 21, 2024

ShogunPanda commented Jun 21, 2024

joyeecheung commented Jun 21, 2024 • edited Loading

joyeecheung commented Jun 21, 2024

mcollina commented Jun 21, 2024

guybedford commented Jun 21, 2024

Qard commented Jun 21, 2024

guybedford commented Jun 21, 2024

guybedford commented Jun 21, 2024

ShogunPanda commented Jun 22, 2024

Flarna commented Jun 24, 2024

guybedford commented Jun 24, 2024 • edited Loading

jsumners-nr commented Jun 24, 2024

Qard commented Jun 24, 2024

joyeecheung commented Aug 5, 2024 • edited Loading

joyeecheung commented Aug 14, 2024 • edited Loading

guybedford commented Jun 20, 2024 •

edited

Loading

joyeecheung commented Jun 20, 2024 •

edited

Loading

joyeecheung commented Jun 21, 2024 •

edited

Loading

guybedford commented Jun 24, 2024 •

edited

Loading

joyeecheung commented Aug 5, 2024 •

edited

Loading

joyeecheung commented Aug 14, 2024 •

edited

Loading