Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

Conversation

puzpuzpuz
Copy link
Member

@puzpuzpuz puzpuzpuz commented Jul 25, 2020

This experiment attempts to remove unnecessary native-to-JS code switches in fast-path for PromiseHooks introduced in #32891. Those switches happen even if certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario (as ALS uses only init hook).

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jul 25, 2020
@puzpuzpuz puzpuzpuz added async_hooks Issues and PRs related to the async hooks subsystem. performance Issues and PRs related to the performance of Node.js. labels Jul 25, 2020
@puzpuzpuz puzpuzpuz force-pushed the experiment/optimize-promise-hook-for-als branch from ed4df22 to bfe3a68 Compare July 25, 2020 18:37
src/async_wrap.cc Outdated Show resolved Hide resolved
src/async_wrap.cc Outdated Show resolved Hide resolved
@addaleax
Copy link
Member

Also, not sure if this was inspired by this issue, but I think you can add: Fixes: https://github.com/nodejs/node/issues/34493?

Remove unnecessary native-to-JS code switches in fast-path for
PromiseHooks. Those switches happen even if a certain type of
hook (say, before) is not installed, which may lead to sub-optimal
performance in the AsyncLocalStorage scenario, i.e. when there is
only an init hook.
@puzpuzpuz puzpuzpuz force-pushed the experiment/optimize-promise-hook-for-als branch from bfe3a68 to 1274aad Compare July 25, 2020 20:11
@puzpuzpuz puzpuzpuz marked this pull request as ready for review July 25, 2020 20:11
@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jul 25, 2020

Also, not sure if this was inspired by this issue, but I think you can add: Fixes: https://github.com/[/issues/34493](https://github.com/nodejs/node/issues/34493)?

@addaleax
No, I was recently thinking about potential optimizations which could reduce ALS overhead and came to this idea. As for the issue, I haven't seen it before, but thanks for the heads-up. Still, I'm not sure if that's OK to claim that this PR fixes the issue, as it assumes a performance improvement, but it's not very big. WDYT?

@puzpuzpuz
Copy link
Member Author

Here are the results of benchmark/async_hooks/promises.js benchmark run on CI (https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/652/):

22:14:35                                                                     confidence improvement accuracy (*)   (**)  (***)
22:14:35  async_hooks/promises.js asyncHooks='disabled' n=1000000                            0.90 %       ±2.32% ±3.10% ±4.05%
22:14:35  async_hooks/promises.js asyncHooks='enabled' n=1000000                     **     -5.27 %       ±3.43% ±4.58% ±5.97%
22:14:35  async_hooks/promises.js asyncHooks='enabledWithDestroy' n=1000000           *     -2.64 %       ±2.54% ±3.38% ±4.40%
22:14:35  async_hooks/promises.js asyncHooks='enabledWithInitOnly' n=1000000        ***      5.10 %       ±2.72% ±3.63% ±4.73%
22:14:35 
22:14:35 Be aware that when doing many comparisons the risk of a false-positive
22:14:35 result increases. In this case there are 4 comparisons, you can thus
22:14:35 expect the following amount of false-positive results:
22:14:35   0.20 false positives, when considering a   5% risk acceptance (*, **, ***),
22:14:35   0.04 false positives, when considering a   1% risk acceptance (**, ***),
22:14:35   0.00 false positives, when considering a 0.1% risk acceptance (***)

As expected, there is an improvement in the init hook only case (~5%), but on the other hand it's not that large. So, I'd like to gather some opinions here.

cc @nodejs/async_hooks

@puzpuzpuz
Copy link
Member Author

@addaleax many thanks for the quick review and helpful comments!

@puzpuzpuz
Copy link
Member Author

BTW I've just tried to replace FastPromiseHook with the following no-op implementation which does almost nothing:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  if (type == PromiseHookType::kInit) {
    promise->Set(context, env->async_id_symbol(),
        v8::Number::New(v8::Isolate::GetCurrent(), 42)).FromJust();
  }
}

On master I got the following:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 185,945.55495370456
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 158,913.00117897094
async_hooks/promises.js asyncHooks="enabledWithInit" n=1000000: 182,217.81928900545
async_hooks/promises.js asyncHooks="disabled" n=1000000: 408,466.1162559881

With the dummy PromiseHook:

$ ./node benchmark/async_hooks/promises.js 
async_hooks/promises.js asyncHooks="enabled" n=1000000: 231,443.61998399772
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 151,732.07311244417
async_hooks/promises.js asyncHooks="enabledWithInit" n=1000000: 243,183.30819966158
async_hooks/promises.js asyncHooks="disabled" n=1000000: 404,664.95117296715

As you may see, there is a slight improvement (which has to be properly measured in multiple runs, but let's assume it's ~20-25% for now), but the main cost seems to be associated with the PromiseHook being installed in V8.

Are there any optimizations that could be made in V8's PromiseHook implementation? I'm not familiar with V8 internals, so I may be wrong here. Maybe collaborators who are good at V8 can provide any inputs here.

@addaleax
Copy link
Member

@puzpuzpuz Trying to reproduce your results, during profiling there’s two few things that are noticeable:

  • About 40 % of the benchmark is spent creating the Error object in the benchmark, because capturing the stack trace is expensive – and an additional 10 % spent throwing the exception
  • The dummy promise hook is still expensive, but not on the calling V8 side. About 85 % its cost comes from the Object::Set() call, and the rest from the other calls into V8 that we perform.

Generally, reducing the # of calls to Get(), Set() or Call() is a great way to improve performance. That’s certainly not trivial here, though.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jul 26, 2020

@addaleax
Yes, capturing stack trace is costly. Yet, the benchmark does it in all scenarios, so it's still fair to compare disabled scenario with other ones. If we change the benchmark to plain Promise.resolve(), it should make the overhead only worse.

As for the V8 calls made within the dummy hook, that's a good finding. Unfortunately, I don't see any workarounds that would allow us to avoid Get calls. But I'm glad to hear that the main cost is not on the V8 side. I'll try to do some profiling myself.

addaleax added a commit to addaleax/node that referenced this pull request Jul 26, 2020
Stack trace capturing currently accounts for 40 % of the benchmark
running time. Always throwing the same exception object removes
that overhead and lets the benchmark be more focused on what it is
supposed to measure.

Refs: nodejs#34512 (comment)
@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jul 27, 2020

@addaleax I'd like to do a follow-up on further PromiseHook optimizations topic. I could create a separate GH issue, but for now I'm fine with an ad-hoc discussion here.

What I'm thinking of is using Object::SetInternalField()/GetInternalField() methods which seems to have much less overhead than the Set()/Get() methods. They are already used by the slow-path PromiseWrap:
https://github.com/nodejs/node/blob/master/src/async_wrap.cc#L230

With this approach we could assign async id on native side and then pass it into the fastPromiseHook JS hook, so that JS code doesn't need to read it itself.

To implement this approach we could expand count of internal fields to 2 instead of 1 as it is now. But I failed to find the exact place where it's set for native promises. Could you point me at the place?

Update. It appears that we should be fine with a single internal field, so please ignore the above question.

@nodejs-github-bot
Copy link
Collaborator

CI: https://ci.nodejs.org/job/node-test-pull-request/32514/

@addaleax
Copy link
Member

@puzpuzpuz Yeah, that should work, we just need to be very careful about not accidentally reading the async id and interpreting it as a PromiseWrap* or vice versa.

The place where the internal field count is configured is here:

node/configure.py

Line 1325 in 62198d2

o['variables']['v8_promise_internal_field_count'] = 1 # Add internal field to promises for async hooks.

@puzpuzpuz
Copy link
Member Author

@addaleax Yes, fast/slow path hooks interop has to be kept. I'm going to submit a draft experiment PR to follow-up on the idea. And thanks for pointing me at the configuration file.

@puzpuzpuz puzpuzpuz added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jul 27, 2020
@danielgindi
Copy link

I'm trying to understand something here. V8 knows how to generate a stack trace that considers await calls, but the cost is paid when delivered right?
So when we want to query ALS store, can't use the same technique to find the path we went through, and add some additional metadata over the breadcrumbs as we walk through it?

As I noted in the issue, I achieved a much more performant ALS (actually Zone) using a JS only technique that did basically that. It generated a stack trace when querying the Zone.current, looked at ids that were added to the Error stack (by appending them to the wrapped function called with .run), and mapped those ids to the relevant Zone.
I really don't understand why a possible implementation for Zone/ALS should cripple Node by this much?

@Flarna
Copy link
Member

Flarna commented Jul 27, 2020

v8 is able to show the origin of a await (not full long stack traces) but not more (see https://v8.dev/blog/fast-async#improved-developer-experience). If you use plain Promises or callback based APIs there is nothing like for await.

ALS doesn't has this limitation, it follows all async transactions signaled by async hooks - and this comes with a cost.

@danielgindi
Copy link

v8 is able to show the origin of a await (not full long stack traces) but not more (see https://v8.dev/blog/fast-async#improved-developer-experience). If you use plain Promises or callback based APIs there is nothing like for await.

ALS doesn't has this limitation, it follows all async transactions signaled by async hooks - and this comes with a cost.

That's right. But my Zone implementation filled the rest by wrapping callback functions and patching Promises, much like Angular's polyfill did.
The overall impact was between ~10%-~25%. And it was all userland. JS. It was not 97% impact.

puzpuzpuz added a commit that referenced this pull request Jul 28, 2020
Remove unnecessary native-to-JS code switches in fast-path for
PromiseHooks. Those switches happen even if a certain type of
hook (say, before) is not installed, which may lead to sub-optimal
performance in the AsyncLocalStorage scenario, i.e. when there is
only an init hook.

PR-URL: #34512
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
@puzpuzpuz
Copy link
Member Author

Landed in 53870dd

@puzpuzpuz puzpuzpuz closed this Jul 28, 2020
@puzpuzpuz puzpuzpuz deleted the experiment/optimize-promise-hook-for-als branch July 28, 2020 08:10
@puzpuzpuz
Copy link
Member Author

@danielgindi probably it makes sense to continue the conversation in #34493.

@danielgindi
Copy link

@puzpuzpuz Could you run the benchmark snippet in #34493 on your PR and put the results here?

@puzpuzpuz
Copy link
Member Author

@danielgindi sure. Already did that.

puzpuzpuz added a commit to puzpuzpuz/node that referenced this pull request Jul 28, 2020
ruyadorno pushed a commit that referenced this pull request Jul 28, 2020
Remove unnecessary native-to-JS code switches in fast-path for
PromiseHooks. Those switches happen even if a certain type of
hook (say, before) is not installed, which may lead to sub-optimal
performance in the AsyncLocalStorage scenario, i.e. when there is
only an init hook.

PR-URL: #34512
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
@ruyadorno ruyadorno mentioned this pull request Jul 28, 2020
ruyadorno pushed a commit that referenced this pull request Jul 28, 2020
Remove unnecessary native-to-JS code switches in fast-path for
PromiseHooks. Those switches happen even if a certain type of
hook (say, before) is not installed, which may lead to sub-optimal
performance in the AsyncLocalStorage scenario, i.e. when there is
only an init hook.

PR-URL: #34512
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
puzpuzpuz pushed a commit that referenced this pull request Jul 29, 2020
Stack trace capturing currently accounts for 40 % of the benchmark
running time. Always throwing the same exception object removes
that overhead and lets the benchmark be more focused on what it is
supposed to measure.

Refs: #34512 (comment)

PR-URL: #34523
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
ruyadorno pushed a commit that referenced this pull request Jul 29, 2020
Remove unnecessary native-to-JS code switches in fast-path for
PromiseHooks. Those switches happen even if a certain type of
hook (say, before) is not installed, which may lead to sub-optimal
performance in the AsyncLocalStorage scenario, i.e. when there is
only an init hook.

PR-URL: #34512
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
puzpuzpuz added a commit to puzpuzpuz/node that referenced this pull request Jul 31, 2020
Native side of fast-path promise hook was not calling JS fastPromiseHook
function when there were no async ids previously assigned to the promise.
Because of that already created promises could not get id assigned in
situations when an async hook without a before listener function is enabled
after their creation. As the result executionAsyncId could return wrong id
when called within promise's .then().

Refs: nodejs#34512
puzpuzpuz added a commit that referenced this pull request Aug 3, 2020
Native side of fast-path promise hook was not calling JS
fastPromiseHook function when there were no async ids
previously assigned to the promise. Because of that already
created promises could not get id assigned in situations
when an async hook without a before listener function is
enabled after their creation. As the result executionAsyncId
could return wrong id when called within promise's .then().

Refs: #34512

PR-URL: #34548
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
codebytere pushed a commit that referenced this pull request Aug 5, 2020
Stack trace capturing currently accounts for 40 % of the benchmark
running time. Always throwing the same exception object removes
that overhead and lets the benchmark be more focused on what it is
supposed to measure.

Refs: #34512 (comment)

PR-URL: #34523
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
codebytere pushed a commit that referenced this pull request Aug 6, 2020
Native side of fast-path promise hook was not calling JS
fastPromiseHook function when there were no async ids
previously assigned to the promise. Because of that already
created promises could not get id assigned in situations
when an async hook without a before listener function is
enabled after their creation. As the result executionAsyncId
could return wrong id when called within promise's .then().

Refs: #34512

PR-URL: #34548
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
codebytere pushed a commit that referenced this pull request Aug 11, 2020
Native side of fast-path promise hook was not calling JS
fastPromiseHook function when there were no async ids
previously assigned to the promise. Because of that already
created promises could not get id assigned in situations
when an async hook without a before listener function is
enabled after their creation. As the result executionAsyncId
could return wrong id when called within promise's .then().

Refs: #34512

PR-URL: #34548
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
addaleax added a commit that referenced this pull request Sep 22, 2020
Stack trace capturing currently accounts for 40 % of the benchmark
running time. Always throwing the same exception object removes
that overhead and lets the benchmark be more focused on what it is
supposed to measure.

Refs: #34512 (comment)

PR-URL: #34523
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
addaleax added a commit that referenced this pull request Sep 22, 2020
Stack trace capturing currently accounts for 40 % of the benchmark
running time. Always throwing the same exception object removes
that overhead and lets the benchmark be more focused on what it is
supposed to measure.

Refs: #34512 (comment)

PR-URL: #34523
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants