Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entrypoint Hooks (carry over discussion from Austin Collab Summit) #43408

Open
jasnell opened this issue Jun 13, 2022 Discussed in #43384 · 57 comments
Open

Entrypoint Hooks (carry over discussion from Austin Collab Summit) #43408

jasnell opened this issue Jun 13, 2022 Discussed in #43384 · 57 comments
Labels
discuss Issues opened for discussions and feedbacks. loaders Issues and PRs related to ES module loaders

Comments

@jasnell
Copy link
Member

jasnell commented Jun 13, 2022

Originally had this as a discussion in #43384


At the Austin Collaborator Summit, there was significant discussion around the need for a more well-defined startup lifecycle with a clearer boundary between the preload phase and the loading/evaluation of the user entry point. The use cases include more reliable handling for APMs, dynamic transpilers, diagnostic tooling, and more. I took the task of working up an initial proposal. Here is that proposal:


Entrypoint Hooks

Currently, the Node.js startup process consists of a single bootstrap phase in which the Node.js core internal mechanisms and environment are set up followed by the loading and instantiation of the user-provided entry point script.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "User entry point script (sync eval)" as C
  state "Start event loop" as D
  state "Process preload and entry point async tasks" as E
  state "Run event loop" as F
  [*] --> A
  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> [*]
Loading

The User Entry point here is the script that is provided as the argument to the node binary (e.g. node foo.js, foo.js is the User Entry point).

Historically with Node.js, there have always been scenarios where it is desirable to load and run code before the User Entry point performs any actions. This can be accomplished with several methods:

  • Require-first: By strategically positioning require and import statements at the beginning of the User Entry point so that they are loaded and evaluated before anything else. This is the mechanism used typically by many Node.js APMs.
  • Wrapper/Loader: By using an alternative user entry point that ensures that certain code is loaded and evaluated first before the actual user entry point is loaded. This is the mechanism used typically by certain test frameworks, serverless environments like lambda.
  • Preloads: By using the Node.js -r command-line argument, Node.js can be instructed to load and evaluate one or more CommonJS scripts synchronously before loading and evaluating the user entry point script. This is used, for instance, by tools like Node.js Clinic to preload diagnostic tooling into the Node.js process.
  • Module Loaders: By providing an alternative module loader implementation using the still experimental loader API, it is possible to execute startup code the first time a module is loaded – including the user entry point module. This is the mechanism used by tools such as ts-node, for instance.

While each of these have historically been effective, they each suffer from a number of limitations, not the least of which is the lack of a clear separation between the execution of the preload code and the user entry point. Take, for instance, the following example:

Imagine a preload script with a simple one-line of code:

// preload.js
setImmediate(() => console.log('preload');

And a User Entrypoint script with the following:

// entry.js
console.log('entrypoint');

Now run the node binary as:

node -r ./preload.js entry.js

The order of the statements printed will be:

entrypoint
preload

This is because while the preload script does run before entry point script, it schedules async activity that does not get invoked until after the event loop has started, after the entry point script has been evaluated. While waiting for the preload script to complete, a lot of user code can run.

In other words, while there is a clear boundary at which preload can begin, there is no such boundary for when preload completes.

This is a proposal for establishing a clearer lifecycle boundary

Proposal

In the proposed new model, a new Entrypoint Hook phase is introduced into the Node.js startup following the completion of the bootstrap. During the Entrypoint Hook phase, one or more preload scripts can be loaded and evaluated in a user-defined order, in precisely the same way that preload scripts (using the -r argument) are loaded except for one very important distinction: Immediately after loading and evaluating these preload scripts, the Node.js event loop will be started to allow any asynchronous operations initiated by those to be run to completion. When there are no further async tasks for that first run of the event loop to complete, the entry point hook phase of the bootstrap will be considered to be complete, the event loop will be reset, and the user entry point will be loaded and evaluated, continuing the Node.js startup just as it does today. If there are no preload scripts to run, this entire new phase is skipped.

stateDiagram-v2
  state "Node.js Startup" as A
  state "Preloads (sync eval)" as B
  state "Start event loop" as C
  state "Process preload async tasks" as D
  state "Stop event loop" as E
  state "User entry point script (sync eval)" as F
  state "Start event loop" as G
  state "Process entry point async tasks" as H
  state "Run event loop" as I
  state "Entry point hook phase" as J
  state "User entry point run phase" as K
  [*] --> A
  A --> B
  state J {
    B --> C
    C --> D
    D --> E
    E --> F
  }
  state K {
    F --> G
    G --> H
    H --> I
    I --> [*]
  }
Loading

With this approach, the preload scripts run during the Entrypoint Hook phase are permitted to fully complete and can alter the user entry point before it begins.

Importantly, at the end of the entry point hook phase, there are no pending async tasks of any kind carrying over into the evaluation of the user entry point script. The entry point hooks may allocate handles that persist across the boundary between phases (e.g. network handles, file descriptors, etc) but those will have no pending i/o by the end of the phase.

Use Case: Serverless

In the serverless use case, a serverless host environment can use the entry point hook phase to load any supporting framework code and initialization process it needs before completing the actual user entry point script.

Use Case: APMs/Diagnostic Tools

In the APM use case, diagnostic tools can use the entry point hook phase to load any diagnostic instrumentation it needs to prepare, even if that tooling is initialized asynchronously (e.g. to query file system or network for license or configuration data)

Use Case: Dynamic Transpilers

Because the entry point hook is guaranteed to run to completion before the start of the user entry point, they can be used to implement dynamic transpilation of the user entry point before it completes. For instance, a TypeScript entry point hook can transpile a typescript file passed in as the user entry point and trigger Node.js to load and execute the compiled JavaScript result rather than trying to run the typescript file that was provided:

What about startup time? Cold starts?

Entrypoint Hook scripts will have an impact on Node.js binary startup time when used. There are, fortunately, mechanisms for mitigating such costs. It would be possible, for instance, to capture a snapshot of the preloads such that loading and initial evaluation cost is reduced in exactly the same way that we have created snapshots of the Node.js bootstrap and are working to create snapshots of the user entry point. Preloads, however, are not trivial and effort will need to be made to ensure a minimal performance cost.

What is the relationship to Loaders?

Pluggable loaders are invoked as a result of require() or import (static or dynamic). The entry point hooks run once immediately upon start of the Node.js process or worker thread startup, and that is it. As such, they serve two entirely different purposes.

@jasnell
Copy link
Member Author

jasnell commented Jun 13, 2022

@joyeecheung @trevnorris @addaleax ... would absolutely love your input on this.

@F3n67u F3n67u added the discuss Issues opened for discussions and feedbacks. label Jun 14, 2022
@joyeecheung
Copy link
Member

The proposal makes sense to me in general, though regarding:

The entry point hooks may allocate handles that persist across the boundary between phases (e.g. network handles, file descriptors, etc) but those will have no pending i/o by the end of the phase.

This sounds a bit unsafe to me. I'd suggest to make this a non-goal initially and then try to see if we can actually support it safely.

@jasnell
Copy link
Member Author

jasnell commented Jun 15, 2022

It's a bit tricky, yes, but I'm not sure how else we best support a few of the use cases. APMs, for instance, are likely to want to initiate network connections or file descriptors that persist through the lifetime of the process.

@mhdawson
Copy link
Member

I wonder if in order to handle the cases mentioned in the previous comment, there need to be 2 hooks. One which is the APMs can use for tasks that must be done before user code starts and one that runs in parallel once user code runs, the expectation being that handles that need to persist would be only be created in the second one.

@jasnell
Copy link
Member Author

jasnell commented Jul 5, 2022

I don't think a second hook would really help matters or make things any less complicated, and after thinking about it further, I'm not convinced that allowing the entrypoint hooks to allocate handles is at all unsafe. A bit tricky, perhaps, but not unsafe.

@mcollina
Copy link
Member

mcollina commented Jul 6, 2022

What's the process of making this a reality? I think we should really care about the dynamic transpilers goal here.

@jasnell
Copy link
Member Author

jasnell commented Jul 6, 2022

We should likely make this a strategic initiative and make sure it's on an upcoming tsc agenda. The biggest piece tho is just going to be getting it implemented.

@bnb
Copy link
Contributor

bnb commented Jul 6, 2022

Massive +1 to this conceptually, especially given the TS point.

While I personally deeply dislike using TS, the ecosystem has widely adopted it and being able to support something along the lines of node index.ts natively would be beneficial to the continued success of Node.js in an environment where every other runtime I'm aware of directly supports similar functionality.

Happy to do whatever I can to support this moving forward ❤️

@GeoffreyBooth
Copy link
Member

Next time please tag @nodejs/loaders. The transpilation use case is already covered by that API, and it’s not clear to me how a separate entrypoint phase would be a better alternative. The loaders API can handle dynamic import(), which an entrypoint phase presumably could not.

For the use case of instrumentation that wants to load configuration over the network, I would think that top-level await within a loader should be able to load such data, and the loader would wait for it before processing any resolution and loading of user code. If not, that’s probably a change we could make. I would think the same is probably true for any serverless frameworks.

@GeoffreyBooth GeoffreyBooth added the loaders Issues and PRs related to ES module loaders label Jul 12, 2022
@Qard
Copy link
Member

Qard commented Jul 13, 2022

Loaders doesn't help us for CJS, which is the vast majority of code out there.

I feel we should also bring up the mechanism by which this additional phase is triggered. I think it would be good to be able to configure it through package.json for the cloud functions case where you generally do not have direct control of the run command to add args, and often don't have env control either to use NODE_OPTIONS. If the node binary could automatically pull configuration from the package.json, if present, it could sidestep that limitation. And to be clear, I suggest that approach as a fallback to explicit cli args, not as a replacement.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jul 13, 2022

Loaders doesn’t help us for CJS, which is the vast majority of code out there.

The current loaders API can be extended to support CommonJS. It’s been a low priority because --require already covers what you’d want loaders for in CommonJS, since in CommonJS regular packages can do things like monkey-patch Node; but for completeness the plan is for the loaders API to eventually apply to both.

I think it would be good to be able to configure it through package.json for the cloud functions case

We’ve been discussing this for loaders as well. Someone more familiar with cloud functions than I am told me that the way those systems work is that Node is already running before the user code for a particular cloud function is loaded, so anything that that cloud function wants to do regarding patching Node has to be stuff that can happen after Node has fully started up. We’ve discussed ideas like import { registerLoader } from 'module' to add a loader after an app is already running, which would only apply to code loaded after that point (so via dynamic import(), say). But unfortunately as far as I’m aware, what you really want, to be able to somehow pass configuration/flags to Node itself that would apply to your cloud function, doesn’t seem to be possible for most cloud function environments.

@Qard
Copy link
Member

Qard commented Jul 13, 2022

APM doesn't really care that cloud function platforms have their own bootstrap stuff in there. We want to be to run when the platform starts the Node.js process because otherwise it will likely load modules for its own use and leave us unable to patch them at app code load time because it's already in the cache.

@mcollina
Copy link
Member

My goal for this hook is to reach a state where the following user experience is possible:

$ node script.ts
Typescript support is missing, install it with:
npm i --location=global typescript-node-core
$ npm i --location=global typescript-node-core
...
$ node script.ts
"Hello World"

(I picked the typescript-node-core name at random, I would be extremely happy if that could be ts-node)

Note that script.ts could evaluate to either cjs or mjs depending on the tsconfig.json file.

@GeoffreyBooth
Copy link
Member

My goal for this hook is to reach a state where the following user experience is possible

The user experience of ts-node is literally ts-node script.ts. I'm not sure how we would get a helpful prompt like that even with this proposal, unless we hard-code something about .ts files into Node.

@mcollina
Copy link
Member

The user experience of ts-node is literally ts-node script.ts. I'm not sure how we would get a helpful prompt like that even with this proposal, unless we hard-code something about .ts files into Node.

The idea we discussed at the collab summit was to use the mechanism described here to implement the use case described in #43408 (comment).

@bmeck
Copy link
Member

bmeck commented Jul 13, 2022 via email

@jasnell
Copy link
Member Author

jasnell commented Jul 13, 2022

This all just sounds like a loader with preloadCode setup to spin the even loop...

I guess I have a different mental model of loaders. The way I envision it is essentially:

  1. Loaders deal with the actual resolution, loading and parsing of scripts/modules regardless whether they are preload or main phase.
  2. The entry point hook proposal would "simply" introduce a clear boundary between the preload and main phase such that preload async tasks are run to completion before evaluating the user entry point.

... but not the discarded existing features that seem to cover more than the issue implies

It's important to clarify that this proposal does not discard anything. The proposal here is about a concept not an implementation of it. If there are existing capabilities that can be extended/modified to cover this, then fantastic. If those need to be re-evaluated or modified to cover this, then ok. But absolutely no prior work is being discarded here.

@bmeck
Copy link
Member

bmeck commented Jul 13, 2022 via email

@jasnell
Copy link
Member Author

jasnell commented Jul 13, 2022

This issue explicitly states the US case isn't covered. I'm not sure how to reconcile that with your comment

Which use case isn't covered? The proposal just states that loaders are addressing a different problem, which they do.

I'll just stress again, there's no prior work being discarded here. This proposal might require some of the prior work to be changed a bit but nothing is being discarded or ignored.

@bmeck
Copy link
Member

bmeck commented Jul 13, 2022 via email

@GeoffreyBooth
Copy link
Member

The proposal just states that loaders are addressing a different problem, which they do.

@jasnell I’m sure you didn’t intend this, but from the perspective of those who’ve worked on loaders, this proposal comes across as either ignorant of or dismissive of that work; primarily because this purports to solve the same use cases as loaders: https://github.com/nodejs/loaders/blob/main/doc/use-cases.md. It sounds like you have motivation and energy to devote to these issues, and if so we would welcome your help and input in improving loaders and better satisfying these use cases. The loaders design and road map are both at https://github.com/nodejs/loaders; if there’s something lacking in the design that this proposal can address, like something involving restarting the event loop, I would welcome proposals there.

Regarding use cases, all three of the ones mentioned here are already solved. The package ts-node supports runtime transpilation of TypeScript for both CommonJS and ESM, using loaders. Datadog has an instrumentation package that supports both CommonJS and ESM, using loaders. I’m not sure what needs of serverless platforms we aren’t supporting, but AWS lambda, at least, supports ESM lambda functions.

None of this is to knock this proposal. I’m not clear on what the advantages are of stopping and restarting the event loop in this way, since the use cases you list here are already solved without needing to change anything about startup or the event loop. If there are benefits to be had from that, such changes would seem fine to add. However even that seems arguably possible today, by adding top-level await to a loader:

// loader.js
await new Promise(resolve => setTimeout(resolve, 1000))
console.log('hello from loader.js')
// app.js
console.log('hello from app.js')
node --no-warnings --loader ./loader.js app.js
hello from loader.js
hello from app.js

Of course this won’t work in CommonJS, but the loaders API could be extended to include CommonJS if desired.

@jasnell
Copy link
Member Author

jasnell commented Jul 13, 2022

Then please consider this input to the loaders work and feel free to do with this issue as you will.

@bnb
Copy link
Contributor

bnb commented Jul 13, 2022

@GeoffreyBooth given the context you provided here, what do you think the possibility of addressing @mcollina's thoughts on supporting node index.ts are? I believe the (high level) avenue to that functionality which was proposed through the concept of entrypoint hooks was for Node.js to look for a system TS and kill the process if there wasn't one. This would mean we don't have to ship support for TS nor care about the version that's running, ideally. Could/would loaders be a good fit for that?

@GeoffreyBooth
Copy link
Member

Starting with your last question first:

Could/would loaders be a good fit for that?

This is what loaders are. They’re user-defined logic for customizing module resolution and loading, and also for injecting preload code. We’ve been discussing renaming “loaders” to something like “hooks” because some of the newer proposals go beyond just module loading to involve customizing other parts of Node, like all file I/O for example; but for now loaders are the ESM equivalent of require.extensions. One of the two example loaders in the docs is a transpiler loader: https://nodejs.org/api/esm.html#transpiler-loader, which is what you’d want for TypeScript.

what do you think the possibility of addressing @mcollina’s thoughts on supporting node index.ts are?

I think first we need to compare against the status quo. You can run this code today:

mkdir test && cd test
echo '{"type": "module"}' > package.json
npm install --save-dev typescript ts-node
echo "console.log('hello!' as string)" > script.ts
export NODE_OPTIONS="--no-warnings --loader ts-node/esm"
node script.ts  # Prints 'hello!'

So really the question is, what about this do you want to improve, and how? Providing the user with some kind of hint where they can figure out the above steps (or similar ones) so that node script.ts or node --loader ts-node/esm script.ts work for them? Or is the goal to eliminate some of these steps entirely, like by having Node automatically install some of these dependencies (and if so, which ones?). Or have configuration like a package.json define that a particular loader should be used, to avoid the need for --loader? (That raises all sorts of other issues; search for “package loaders” here and in https://github.com/nodejs/modules to get a sense of the complexity.)

This is probably a discussion that’s better had in #43818, as that issue is specifically about this, whereas this thread is about the “entrypoint hooks” concept. I don’t think TypeScript or transpilation generally is a good fit for the “entrypoint hooks” idea because transpilation needs to continue to be possible after the app event loop has started, to handle dynamic import.

@cspotcode
Copy link

One note about the NODE_OPTIONS example, it gets messier in practice. ts-node/esm is resolved relative to the working directory, not the target script nor a global install dir. Keeping in mind common shell scripting use-cases with symlinks on the PATH, where working dir is vastly different than dir containing the script, dir containing nodejs, dir containing globally installed modules.

If the user wants a scripting runtime with native TS support, then the required NODE_OPTIONS variable can get messy.

@GeoffreyBooth
Copy link
Member

If the user wants a scripting runtime with native TS support, then the required NODE_OPTIONS variable can get messy.

I agree; I think in practice most people would just pass the flag, like node --loader ts-node/esm script.ts, and define that in package.json "scripts" as the command to start the app. But the broader question still stands: is this about removing the need for passing a flag, or something else? Also let’s please continue discussing in #43818 so as not to pollute this thread.

@cspotcode
Copy link

Do you think something like this pseudocode can work to support CJS?

// entrypoint-hooks.cjs
const {waitForTask} = require('entrypoint-hooks-api');
waitForTask(new Promise((res, rej) => { /* initialization logic */});

@mcollina
Copy link
Member

I don't understand it.

@cspotcode
Copy link

cspotcode commented Jul 27, 2022

I might be misunderstanding the goals, but I think there were two competing requirements:

a) allow hooks a clear way to finish their async initialization before the entrypoint runs
b) allow hooks to start some sort of background work and leave it running while the app / the entrypoint runs

(a) is covered by top-level await if the hooks file is ESM, and (b) doesn't work well if the event loop is drained before the entrypoint runs.

So to grant CommonJS hooks the ability to implement (a) while still supporting (b), we expose an API that hooks can use, telling node to wait for a task to complete before running the entrypoint.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jul 27, 2022

I’m not sure if we could do that without stopping the event loop.

If code that’s running in --import includes top-level await, wouldn’t that be equivalent to stopping and starting the event loop? So if you ran something like this:

node --import a.mjs b.cjs

Where a.mjs had top-level await, I would think that that would do the same thing as having a special mode where the event loop is stopped and restarted?

And if what you wanted to run in this “before the CommonJS entrypoint is evaluated” phase is CommonJS, then a.mjs could just import it before its top-level await. So you would need an ESM wrapper, but I think the same goal would be achieved.

@bmeck
Copy link
Member

bmeck commented Jul 27, 2022 via email

@cspotcode
Copy link

Might be some confusion here between "hooks installer" and "loader hooks," since the former will probably execute within the main thread, and the latter will probably move off-thread.

I am assuming that the "hooks installer" script runs in the main thread on the same event loop that will eventually run the entrypoint.

So yeah, pretty sure event loop blocking is not the goal and not a requirement here. Perfectly fine to allow hooks installer to run async stuff, with a normal spinning event loop. And perfectly fine to use that same event loop to run the entrypoint.

Issue to be solved is giving hooks the ability to make node wait for some async operation to finish before it runs the entrypoint. Top level await and waitForTask give hooks that ability.

Forcing node to wait for a sync task is a non-issue: just do it and node will wait, since the hooks installer runs in the main thread and can block it.


I guess I'm a tad unclear on why to stop the event loop since it will immediately resume when the entrypoint runs. Maybe something to do with node's C++ internals or whatnot.

@mcollina
Copy link
Member

Might be some confusion here between "hooks installer" and "loader hooks," since the former will probably execute within the main thread, and the latter will probably move off-thread.

Aggreed! Running inside the same thread/isolate think it's a fundamental part of this proposal.

So yeah, pretty sure event loop blocking is not the goal and not a requirement here. Perfectly fine to allow hooks installer to run async stuff, with a normal spinning event loop. And perfectly fine to use that same event loop to run the entrypoint.

If we don't stop the event loop some of the order of the events for the first run of the event loop would be different. We might be ok with that, but it's an important side effect.

@cspotcode
Copy link

some of the order of the events for the first run of the event loop would be different.

Interesting, where would the authors of this ticket suggest I go to learn more about these events, how they will be re-ordered, and whether or not that re-ordering is a problem? If it was already covered in this ticket, then that's my fault, but re-skimming I could not find it.

@guybedford
Copy link
Contributor

I'm not sure exactly which timing Matteo is referring to, but in the initial loaders PR, we attempted to make loaders apply for all module loads, but had to back this out to only apply it to ES modules with a special check due to async hooks core tests failing. You can basically see it if you change the variable in https://github.com/nodejs/node/blob/main/lib/internal/modules/run_main.js#L76 of useESMLoader to always being true and a bunch of async_hooks core tests will fail due to a change of async timings.

@GeoffreyBooth
Copy link
Member

You can basically see it if you change the variable in https://github.com/nodejs/node/blob/main/lib/internal/modules/run_main.js#L76 of useESMLoader to always being true and a bunch of async_hooks core tests will fail due to a change of async timings.

Maybe this is worth investigating. If we can find a way to both allow loaders to affect CommonJS and get the async_hooks core tests to pass again, that would be ideal, but assuming that’s not possible then I would be interested to know what tests we would need to change (and/or what async hooks or Node bootstrap changes we would need to make) and what breaking changes to Node that would imply. Maybe we’re okay with those changes as part of a semver-major change, as part of a tradeoff of getting loaders to apply to CommonJS, and that just ships as part of Node 19. It seems worth a deep dive, if anyone has the time.

@Qard
Copy link
Member

Qard commented Jul 28, 2022

Interesting. I'll have to remember to find some time to investigate that. 🤔

@mcollina
Copy link
Member

I didn't know about the failing tests, but there are fundamental event-loop differences between ESM and commonjs. Checkout https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/. In event loop lingo, ESM starts with a loop that's already processing (we did load files and let the event loop spin), while CJS starts with a timer.

@GeoffreyBooth
Copy link
Member

In event loop lingo, ESM starts with a loop that’s already processing (we did load files and let the event loop spin), while CJS starts with a timer.

What if in the next major version of Node, the root entrypoint is always an ESM script? So either the entry point given if it were an ESM file, or in the case of node entry.cjs we would create a fake entry point like import('./entry.cjs') to be the “real” entry point that wraps the user-supplied one. That way the root of the app is always ESM, and we avoid distinctions between how the event loop is or isn’t running on startup.

And yes, I’m proposing that we start optimizing for the ESM case and not the CommonJS one.

@aduh95
Copy link
Contributor

aduh95 commented Jul 29, 2022

What if in the next major version of Node, the root entrypoint is always an ESM script?

A lot of async hooks tests start failing when the bootstrap is asynchronous, which is why it hasn't been done until now – and AFAIK, why we still can't do it now.

@mcollina
Copy link
Member

In addition to what @aduh95 said, moving to ESM entry point will completely change the diagnostics use case and force everybody to use some level of experimental hook or loader.

This is unlikely to happen before loaders have became the go-to solution for this use case - and this include CJS.

I will keep repeating this, no one plans to deprecate, break or stop supporting commonjs.

@GeoffreyBooth
Copy link
Member

A lot of async hooks tests start failing

Sure, but what does this mean? Just that we need to update tests, or will there be user-facing breaking changes if bootstrap goes async?

@mcollina
Copy link
Member

There will user-facing breaking changes to most users of async_hooks.

@Qard
Copy link
Member

Qard commented Jul 29, 2022

Unless the startup phase loop can be resolved and a new one started with the application entrypoint, there's not likely even a path to a solution for async_hooks.

@GeoffreyBooth
Copy link
Member

There will user-facing breaking changes to most users of async_hooks.

Can you provide an example?

All of async_hooks is experimental, so we could cause changes to it even without waiting for the next major release. Reading https://nodejs.org/api/async_hooks.html it’s not clear to me when or why I would use this API (but I’m happy to be educated) so I struggle to imagine what the userland use cases are for it, in terms of who’s depending on it and for what. If async_hooks behaves significantly differently whether the Node entry point is CommonJS or ESM, that feels like a significant bug or shortcoming in async_hooks that should be addressed. Whether this API is used or not shouldn’t be locking someone into one module system or the other, or preventing us from adding an ESM wrapper to CommonJS entry points so that we can make loaders apply to CommonJS.

There’s an almost-finished PR that adds --import as a counterpart to --require (#43942). This might also solve some of these issues, as then someone can run node --import foo entry.js and foo can contain top-level await, so whatever is in foo could do any async operations it wants to before the entry point is evaluated. If the desired library to “preload” with awaiting is CommonJS, it could be passed into --import via a one-line ESM wrapper like await import('foo').

@mcollina
Copy link
Member

All the APM product category for Node.js is based on async_hooks.

@Qard
Copy link
Member

Qard commented Jul 30, 2022

Also, AsyncLocalStorage, which is built on top of async_hooks, is marked as stable. We can't just break compatibility with async_hooks because it would break AsyncLocalStorage too. There is a rather significant chunk of the ecosystem which directly or indirectly relies on async_hooks continuing to work.

@vdeturckheim
Copy link
Member

Yes, I don't think it is feasible to break AsyncLocalStorage anytime soon without breaking the ecosystem. We marked it as stable actually to speed up its adoption :/

@Flarna
Copy link
Member

Flarna commented Aug 1, 2022

Are async hooks really broken by using an ESM entry point?
Clearly several async operations have been executed already at this time but usually noone relies that asyncId starts with value 1.

Or is it expected that during this startup phase already transactions are created which need to be tracked by AsyncLocalStorage?

@GeoffreyBooth
Copy link
Member

Are async hooks really broken by using an ESM entry point?

This is what I'd like more clarification on. Basically, involving ESM at startup (so an ESM entry, using --loader or --import, or if core starts using promises as part of bootstrap beyond the ESM loader) means that unsettled promises already exist when the first line of user code runs. When I read https://nodejs.org/api/async_hooks.html I don't find any mention of concerns about startup, but apparently it's an undocumented requirement of async_hooks that they be initialized before the first promise of any kind—user or Node internal—is created. That's the assumed premise of the proposal on this thread. If anyone can point me to any resources explaining this, especially what can't be achieved if this requirement is unmet, I would appreciate it.

At the collaborator summit someone said that Datadog had an experimental instrumentation library that worked with Node ESM, that would leave experimental once loaders did. Can anyone explain more about this, too? Does this ESM library have any limitations as compared with the CommonJS equivalent, especially as related to async_hooks?

@mcollina
Copy link
Member

mcollina commented Aug 2, 2022

We are lacking quite a few user friendly explanations of async_hooks. When async_hooks are active, every asynchronous activity in Node.js has an id and a parent id. In other terms, there is a causality link between all asynchronous activities. async_hooks enable developers to attach custom behavior and data to all asynchronous actions. The fundamental premise of this approach is to be able to set up the hooks before any asynchronous activity, otherwise they won't have a complete instrumentation, with hard to debug side effects.

@GeoffreyBooth
Copy link
Member

I appreciate everyone trying to convey the importance of async_hooks being registered before any async activity occurs, but I struggle to understand why it’s so vital, or if it’s even happening. Perhaps some code might help. When I tried the useESMLoader = true experiment mentioned above, I got lots of failing tests. The first one in the list was https://github.com/nodejs/node/blob/main/test/parallel/test-async-hooks-correctly-switch-promise-hook.js, so I looked into that one. Here’s a simplified version:

const async_hooks = require('async_hooks');
const process = require('process');

const promises = new Map();

async_hooks.createHook({
  init(asyncId, _type, triggerAsyncId, _resource) {
    promises.set(asyncId, { asyncId, triggerAsyncId });
  },
  before(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).before = true;
    }
  },
  after(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).after = true;
    }
  },
  promiseResolve(asyncId) {
    if (promises.has(asyncId)) {
      promises.get(asyncId).promiseResolve = true;
    }
  }
}).enable();

async function main() {
  return Promise.resolve();
}

main();

process.on('exit', () => {
  console.log(promises.values());
});

I saved this as test-async-hooks.cjs and ran it under Node 18.7.0 as node ./test-async-hooks.cjs. It printed this output:

[Map Iterator] {
  {
    asyncId: 6,
    triggerAsyncId: 1,
    before: true,
    after: true,
    promiseResolve: true
  },
  { asyncId: 7, triggerAsyncId: 1, promiseResolve: true },
  {
    asyncId: 8,
    triggerAsyncId: 7,
    before: true,
    promiseResolve: true,
    after: true
  }
}

Next I copied it to test-async-hooks.mjs and rewrote the require statements to import statements, leaving everything else identical. Running that as node ./test-async-hooks.mjs printed this output:

[Map Iterator] {
  {
    asyncId: 11,
    triggerAsyncId: 0,
    before: true,
    after: true,
    promiseResolve: true
  },
  { asyncId: 12, triggerAsyncId: 0, promiseResolve: true },
  {
    asyncId: 14,
    triggerAsyncId: 13,
    before: true,
    promiseResolve: true,
    after: true
  },
  {
    asyncId: 15,
    triggerAsyncId: 12,
    before: true,
    promiseResolve: true,
    after: true
  }
}

The hooks captured 3 promises in the CommonJS version versus 4 promises in the ESM version, but in both versions there was one promise which triggered the init and promiseResolve hooks but not the before or after hooks. This leads me to a few questions:

  • Is this a promise created within Node core, that’s pending when the async hooks are registered?
  • If it’s so important that async hooks are registered before any async activity, doesn’t this pending promise already violate that requirement? If so, what specific consequences entail as a result?
  • Neither version captures a promise with an asyncId of 0. Are there promises created during Node bootstrap that are already resolved by the time the async hooks are registered? Is it a problem that these, too, aren’t captured?

In short, at least from this tiny test it seems like Node already violates the rule that async hooks must be registered before any async activity occurs, even for CommonJS. If I had to guess, I would think that all that this means is that instrumentation libraries simply can’t display that activity in their dashboards; and that some async activity can’t be traced back to its parents, but since the parent is something within Node core it’s probably not of much interest to the average application developer. What else am I missing?

@mcollina
Copy link
Member

mcollina commented Aug 2, 2022

If it’s so important that async hooks are registered before any async activity, doesn’t this pending promise already violate that requirement? If so, what specific consequences entail as a result?

Likely nothing because that promise is very likely a one-off, without children. However if that generated a tree of asynchronous activity, that would be problematic.


Most people in this thread that worked on async_hooks would classify them as problematic to maintain. A significant refactor means that we will play whack-a-mole for bugs for a while.


async_hooks is not just about promises, we call them resources and this includes everything that can be asynchronous is Node.js. async_hooks is designed so that hooks can be initialize before everything is even loaded.

In commonjs, we can always execute something before any other code is loaded. In ESM, our entry point is executed last, after the initialization of all the modules we imported is completed. If any of those other modules starts something asynchronous in their initialization, we won't be able to track them.

@bengl can likely explain better how all of this currently works, and if/how having a clear entrypoint hook would make their integration significantly easier too.

@GeoffreyBooth
Copy link
Member

In commonjs, we can always execute something before any other code is loaded. In ESM, our entry point is executed last, after the initialization of all the modules we imported is completed. If any of those other modules starts something asynchronous in their initialization, we won’t be able to track them.

If the requirement is just that users can register async hooks before executing any user code, in order to ensure that all user code-generated async activity can be tracked, that’s a huge difference. We can achieve that today, either through the just-landed --import or through a carefully written ESM entry like this:

await import('./register-async-hooks.js')
await import('./app.js')

Because these are dynamic import()s, no user code anywhere in the tree descending from app.js will be evaluated until after register-async-hooks.js runs and resolves. The brand-new --import flag should allow the same thing at the CLI level, like node --import ./register-async-hooks.js ./app.js.

Or to put it another way, in ESM too we can always execute something before any other code is loaded. If that’s all that’s needed, then I think the module systems are already equivalent with regard to async_hooks?

@Qard
Copy link
Member

Qard commented Aug 3, 2022

Just to clarify the behaviour of that specific test: the promise without a before/after is not created before async_hooks is registered. If it was it never would have run the init at all and would therefore get skipped entirely in the other hook functions because of the has(...) checks.

That promise actually represents the bit of code between the start of an async function and its first await. Due to spec weirdness, a promise is generated representing that part of the function, but it runs synchronously without a continuation so it never triggers the before and after. If you look at the last promise, you'll notice the triggerAsyncId matches the asyncId of that different promise which would normally only be the case if it ran during a continuation callback of it, but in this case it gets connected directly without a continuation.

What matters to avoid crashes with async_hooks is that new init events can correctly resolve the triggerAsyncId edge back to its initiator. If async_hooks was not loaded when that parent was initiated it may produce a resource stack which doesn't reflect reality and then try to unwind past what it knows how to unwind.

Generally async_hooks has a bunch of safety checks to try and gracefully handle conditions where it doesn't understand the state, but they've proven insufficient numerous times in the past so I wouldn't trust it to be able to recover safely.

Also, it has been my experience that most users of async_hooks have generally failed at making their own use of it safe, making assumptions like a before will always have an after, which isn't true if you stop it in a callback before the after for that callback is reached. More commonly I've seen an expectation that before/after should have had an init that was seen, which in your case of having an unresolved promise from before the entrypoint top-level could result in a before/after appearing with no seen init.

These implicit connections between events are why I've always strongly believed async_hooks was a bad design with far too many footguns. This is why I pushed for things like AsyncLocalStorage in core to have APIs which would behave in a much more predictable and understandable way. Unfortunately, ALS doesn't solve all the use cases of async_hooks, and there's been little effort to introduce better APIs for those other use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. loaders Issues and PRs related to ES module loaders
Projects
None yet
Development

No branches or pull requests