-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: use EnqueueMicrotask for tls writes #20287
src: use EnqueueMicrotask for tls writes #20287
Conversation
Instead of using SetImmediate, use EnqueueMicrotask as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.
this moves its execution forward in time a bit but otherwise i don't think there should be any problems (i've always wanted to see us move more stuff into the microtask queue like this) |
Yeah, that's better anyway. I'm just trying to think of edge cases but there shouldn't be any. (Hypothetically if any C++ called this code directly rather than via MakeCallback then that could be an issue if there were never any future MakeCallbacks. In that case microtasks would never flush again. Not sure if that's possible.) |
/cc @nodejs/crypto |
Not blocking and +1 on the PR - I'd love to have those benchmarks as part of our actual benchmarks and to see the results with statistical significance. |
I still haven't figured out what makes this benchmark's performance profile different from the one we have in |
It definitely should be possible in the future. I don’t think it’s an issue right now, but it’s worth at least a
Other than the one you mentioned, the |
Wouldn't surprise me if that's what's breaking Windows. 😅 Back to the drawing board. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I suppose this could be extended to SetImmediate() calls in e.g. src/stream_pipe.cc?
Is the wrap object weak? (I hope the answer is 'no.') |
@bnoordhuis Sorry to disappoint you, but the answer is 'yes' (and has been for a long time). That does seem correct to me, though, for something that doesn’t have its own I/O mechanisms? |
I'd design it so the lifetime is explicitly managed; in the long run that's almost always easier to reason about. The fact that this innocuous looking pull request introduces a bug underscores that, IMO. A |
@bnoordhuis When there is a But you’re definitely right in that more obvious/explicit lifetime management would be a good idea here. |
@apapirovski Just fyi, while looking into this more, there is one case that we probably want to be aware of: When the TLS impl reads data from the underlying socket, and that prompts some protocol-level response (i.e. no extra payload data), then we do end up in this block of code without any JS stack beneath it. I’m not sure whether that is a non-issue or whether we just don’t catch it because our tests just aren’t written to expose that kind of problem. |
@addaleax Thanks for looking into it. I'm working on a slightly different take on this now. Will have something tomorrow maybe. |
728cb33
to
f89600d
Compare
How exactly are you doing streams buffering? I was implementing it on my own but I don't get anywhere near 11 MB/s. Maybe send a PR to https://github.com/logtrust/tls-server-demo?
If server and client are on the same process then they will fight for the CPU, and the client will usually be the bottleneck. The client needs to be on a separate worker process to avoid this, and probably use more than one worker to really saturate the server. Also, the server needs to be doing something else (in my test proxying data to another server) to really feel the effect on performance of multiplying the number of events received. |
@alexfernandez To be clear, I'm not talking about the issue you have with In that particular example, one can switch from something like |
Instead of using
SetImmediate
, useEnqueueMicrotask
as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.Not sure if there are any potential downsides here (in terms of using
EnqueueMicrotask
) that we need to watch out for.Here are some rough stats when using write callbacks as the original test:
With SetImmediate:
With EnqueueMicrotask:
And here's the performance when using
drain
and the usual streams buffering:With SetImmediate:
With EnqueueMicrotask:
Refs: #20263
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes