-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hooks thread direction #201
Comments
My preference is 3. I don't feel we're getting much from loaders being off-thread, certainly not enough to justify the significant complexity. Also, we don't necessarily need a thread to support async code behind a require, we just need related code to be on a separate event loop. Anna previously did a bunch of work in this realm with synchronous-worker. We could build a similar construct in to Node.js to contain related work we need to syncify. Alternatively, we can also just look at what cases need async presently and figure out if we can just provide ways to do them synchronously. For example, http imports only need async because we don't have a sync http client. |
The work towards (2) is very impressive and seems like it is close to fruition in finding a consistent and sensible model. (3) is also viable provided that we could have a top-level async hook like preImport that runs for the main process import as well as for every dynamic I could support either option (2) or (3). |
I think (3) results in the easiest to understand and maintainable solution for node core and hooks maintainers. I see not that much of a gain we got from moving loaders off-thread. But I'm for sure biased a bit as an APM user. One target of the off thread setup - to get rid of the memory allocated by hooks (e.g. transpilers,...) - is not even on the plate anymore. And I doubt this will be ever done looking at the complexity the off thread solution has already now. |
As for 2, I think it's the only way to keep the off-thread model. It's incredibly complex. I think 3 is possible if we add an API or module to "deasync" based on spawned worker thread. Essentially my preference is 3, 2, 1. |
It sounds like many of us (myself included) don't know what the use-cases for async hooks are. My gut says there surely are legit ones. There is for sure complexity in supporting it, but that's mostly done. In that vein, I think: 2, 3, 1. Perhaps Anna's solution is a better option for syncifying things. But I think we are all too tired of this to invest in a completely new approach. If we can definitely say async hooks are not necessary (or too bespoke to warrant supporting in core): 3, 2, 1. |
It feels like the sentiment is option 3. And while some loaders need to be async, my guess is that most of them can be totally sync, and these sync loaders are today paying the overhead needed to support async loaders. Feels like making loaders need sync, and forcing async loaders to syncify their stuff by them creating worker threads makes sense. Makes simple things simple, make complex things possible. So, and I say it with a heavy hear, I prefer option 3. If not, option 2. Option 1, for my use case (mocking modules) makes things complex and really heavy on memory, so is not really an option in my opinion. |
But if option 3 is implemented, you should re-invite me to give the ESM loader talk again 😂. |
I'd prefer 3 as well. The current async off-thread model has been revealed relatively fragile, with various edge cases difficult to solve in userland, or even wrap our heads around. |
While I worker on the multiple chain for 2, I confirm it was insanely complex before my changes and it will stay insanely complex after them. I agree with @mcollina that 2 is the only viable option for fully-async hooks. But I do also prefer 3 as well, with a caveat: we need to another a new API that allows to synchronously wait for another spawned thread. This way we have the following benefits:
If can't provide this, then I suggest to keep 2. |
I think either 3 (preferred) or 2 (if there are usecases that are covered only by 2 that are absolutely necessary). But the latter needs more details/decisions concerning corner cases. Also, IMO the maintenance difficulties for this part of the code that we get by using the off-thread solution need to be analyzed. Getting changes to work in a stable way is difficult with the off-thread variant, especially with the increased api surface because of the module.register() being allowed only on some of the threads with inheritance rules for descendants, etc. In case of the latest variant of the single thread for many workers I think the contributions of @ShogunPanda from #53332 go in the right direction for solving the problem but some cases on that branch and also on #53200 are not yet defined:
fwiw, I think if the new proposal from @joyeecheung will get to cover the esm and cjs cases and adding the |
It seems you are describing |
@joyeecheung I mean that using |
Unless there is a standardized API for doing this, I think in general it's best to let the user land be creative when developing a helper that doesn't have to be implemented in Node.js core. When it gets mature enough or becomes a de-facto standard we can consider bringing it into core i.e. undici style. But developing it in core from scratch is suboptimial (think of release churn, CI flakes, compatibility across release lines, etc.). It can be a user-land package under the Node.js organization if necessary. Also since it likely only relies on Web and JS APIs this can be reused in browsers and other runtimes which can be better for the JS ecosystem. |
@joyeecheung I agree. But I think we should have a standard way to do it ourself in order not to repeat in several parts of the codebase. We could even have this in NPM and eventually add it as a dep. WDYT? |
Having something on npm SGTM if someone is up for designing an API and implementing it (at this point I would be surprised if there aren’t already some existing helpers on npm TBH). Although that sounds like a something that can happen in parallel, not really a blocker for synchronous module hooks IMO. |
Sounds fair. I'll develop an API soon. |
Possibly related: https://github.com/un-ts/synckit |
Vitest was hoping to use the Node.js loader API when it stabilizes. Unfortunately, Vitest uses Vite plugins to resolve import paths and load modules and they are always async - this is actually the main reason why Vitest exists in the first place (jest's sync nature was too limiting), so this is very much our use case:
I am unaware of the current internal implementation, so I have a few questions that might be important to us. What are the possible performance issues with manual |
The in-thread hooks being proposed in #198 will only support synchronous hook functions, since it intends to replace the wide-spread monkey patching in the user land which are generally in-thread, synchronous hooks (e.g. see require-in-the-middle/pirates), as For a "loading plugins that are ESM" scenario, with the in-thread hooks users could still spawn their own worker and From a performance perspective, in-thread synchronous hooks will run typically faster than asynchronous off-thread hooks if the asynchronous work doesn't involve network access, because worker startup + locking + messaging overhead would be non-trivial, and asynchronous FS APIs have a lot of overhead that the synchronous FS APIs don't have. #198 (comment) might be an intersting data point (tsx's synchronous hooks based on CommonJS loader monkey patching is ~2.5x faster than |
So for the last six months or so, the top priority of the loaders work has been fixing the last bug standing in the way of stability, where the separate thread we spawn to run module customization hooks is unintentionally duplicated for each application worker thread rather than shared between all of them. A (stable) fix might now be almost at hand, but it was raised that we should perhaps reevaluate whether the intended behavior is what we want. As I see things, we have three options, with varying pros and cons:
Keep things as they are, where there’s one hooks thread per application thread.
NodeRealm
,ShadowRealm
or module compartments.)Run the hooks in a dedicated thread that supports all threads in the process, with a separate chain of hooks for each application thread.
Move the module customization hooks back on the main thread and change the
resolve
andload
hooks to be synchronous. (A big reason that the threads are off-thread currenly is so that async hooks can affect sync CommonJSrequire
calls.) Similar to (or could be combined with/replaced by) @joyeecheung’s proposal for synchronous main thread hooks.I’m not sure if it would be technically feasible to support async hooks for
import
and sync hooks forrequire
; and on a product level I don’t think we want separate hooks APIs forrequire
andimport
. If others diagree with either of those assessments then I guess we could have fourth and/or fifth options to consider. I honestly don’t have a preference between any of the options; I just want to get this API stable, ideally before Node 23 in October.Aside from the overall “which option do you prefer” question, some other questions I have for the folks who have been with us on this journey:
How necessary is it for the
resolve
andload
hooks to be async? Do you commonly write asyncresolve
and/orload
hooks, where refactoring it to be sync would be difficult or impossible? (We are considering taking the “syncify async work via a separate thread” code from the hooks thread and splitting it off into a userland package that libraries could use as a general utility. We might also provide some kind of sync network request API similar toreadFileSync
.)How necessary is isolation for hooks code, and on what level is it desired? Even the current API doesn’t isolate hooks from other hooks running in the same chain; they’re only isolated from application code.
@nodejs/loaders plus the participants of #103: @cspotcode @giltayar @arcanis @bumblehead @bizob2828 plus participants of nodejs/node#53332: @VoltrexKeyva @Flarna @alan-agius4
The text was updated successfully, but these errors were encountered: