src: use EnqueueMicrotask for tls writes #20287

apapirovski · 2018-04-25T13:25:11Z

Instead of using SetImmediate, use EnqueueMicrotask as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.

Not sure if there are any potential downsides here (in terms of using EnqueueMicrotask) that we need to watch out for.

Here are some rough stats when using write callbacks as the original test:

With SetImmediate:

Elapsed 5 s, sent 8 MB (2 MB/s), received 8 MB (2 MB/s)
Elapsed 10 s, sent 16 MB (2 MB/s), received 16 MB (2 MB/s)
Elapsed 15 s, sent 23 MB (2 MB/s), received 23 MB (2 MB/s)
Elapsed 20 s, sent 31 MB (2 MB/s), received 31 MB (2 MB/s)

With EnqueueMicrotask:

Elapsed 5 s, sent 17 MB (3 MB/s), received 16 MB (3 MB/s)
Elapsed 10 s, sent 32 MB (3 MB/s), received 32 MB (3 MB/s)
Elapsed 15 s, sent 48 MB (3 MB/s), received 48 MB (3 MB/s)
Elapsed 20 s, sent 64 MB (3 MB/s), received 63 MB (3 MB/s)

And here's the performance when using drain and the usual streams buffering:

With SetImmediate:

Elapsed 5 s, received 53 MB (11 MB/s)
Elapsed 10 s, received 108 MB (11 MB/s)
Elapsed 15 s, received 162 MB (11 MB/s)
Elapsed 20 s, received 217 MB (11 MB/s)

With EnqueueMicrotask:

Elapsed 5 s, received 59 MB (12 MB/s)
Elapsed 10 s, received 120 MB (12 MB/s)
Elapsed 15 s, received 180 MB (12 MB/s)
Elapsed 20 s, received 241 MB (12 MB/s)

Refs: #20263

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Instead of using SetImmediate, use EnqueueMicrotask as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.

apapirovski · 2018-04-25T13:25:49Z

CI:
~~https://ci.nodejs.org/job/node-test-pull-request/14492/~~
https://ci.nodejs.org/job/node-test-pull-request/14499/

devsnek · 2018-04-25T13:31:42Z

this moves its execution forward in time a bit but otherwise i don't think there should be any problems (i've always wanted to see us move more stuff into the microtask queue like this)

apapirovski · 2018-04-25T13:34:39Z

this moves its execution forward in time a bit but otherwise i don't think there should be any problems

Yeah, that's better anyway. I'm just trying to think of edge cases but there shouldn't be any.

(Hypothetically if any C++ called this code directly rather than via MakeCallback then that could be an issue if there were never any future MakeCallbacks. In that case microtasks would never flush again. Not sure if that's possible.)

apapirovski · 2018-04-25T13:37:00Z

/cc @nodejs/crypto

benjamingr · 2018-04-25T13:59:36Z

Not blocking and +1 on the PR - I'd love to have those benchmarks as part of our actual benchmarks and to see the results with statistical significance.

apapirovski · 2018-04-25T14:13:36Z

Not blocking and +1 on the PR - I'd love to have those benchmarks as part of our actual benchmarks and to see the results with statistical significance.

I still haven't figured out what makes this benchmark's performance profile different from the one we have in benchmark/tls/throughput.js when using a small chunk. I mean, the original I get but once I changed it to use drain, it still shows a clear 10% improvement over the course of several minutes with steady rate of transfer whereas our throughput benchmark is all over the place.

addaleax · 2018-04-25T14:13:37Z

Hypothetically if any C++ called this code directly rather than via MakeCallback then that could be an issue if there were never any future MakeCallbacks. In that case microtasks would never flush again. Not sure if that's possible.

It definitely should be possible in the future. I don’t think it’s an issue right now, but it’s worth at least a // TODO() or // FIXME().

Not sure if there are any potential downsides here (in terms of using EnqueueMicrotask) that we need to watch out for.

Other than the one you mentioned, the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct. As in: With this change in its current form, how can we be sure that no GC collects the TLSWrap* object before the microtask queue gets to run?

apapirovski · 2018-04-25T14:15:49Z

Other than the one you mentioned, the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct. As in: With this change in its current form, how can we be sure that no GC collects the TLSWrap* object before the microtask queue gets to run?

Wouldn't surprise me if that's what's breaking Windows. 😅 Back to the drawing board.

bnoordhuis

LGTM. I suppose this could be extended to SetImmediate() calls in e.g. src/stream_pipe.cc?

bnoordhuis · 2018-04-25T14:20:42Z

the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct

Is the wrap object weak? (I hope the answer is 'no.')

addaleax · 2018-04-25T14:23:57Z

Is the wrap object weak? (I hope the answer is 'no.')

@bnoordhuis Sorry to disappoint you, but the answer is 'yes' (and has been for a long time). That does seem correct to me, though, for something that doesn’t have its own I/O mechanisms?

bnoordhuis · 2018-04-25T14:31:44Z

I'd design it so the lifetime is explicitly managed; in the long run that's almost always easier to reason about. The fact that this innocuous looking pull request introduces a bug underscores that, IMO.

A TLSWrap is ultimately a construct to encrypt and decrypt socket data on the fly and sockets have a well-defined life cycle.

addaleax · 2018-04-25T14:34:46Z

@bnoordhuis When there is a TLSWrap object that is not consuming a real network socket (or other thing with its own lifetime management), that becomes pretty hard.

But you’re definitely right in that more obvious/explicit lifetime management would be a good idea here.

addaleax · 2018-04-25T19:13:43Z

@apapirovski Just fyi, while looking into this more, there is one case that we probably want to be aware of: When the TLS impl reads data from the underlying socket, and that prompts some protocol-level response (i.e. no extra payload data), then we do end up in this block of code without any JS stack beneath it.

I’m not sure whether that is a non-issue or whether we just don’t catch it because our tests just aren’t written to expose that kind of problem.

apapirovski · 2018-04-25T19:22:21Z

@addaleax Thanks for looking into it. I'm working on a slightly different take on this now. Will have something tomorrow maybe.

alexfernandez · 2018-04-25T22:21:51Z

@apapirovski

And here's the performance when using drain and the usual streams buffering:

How exactly are you doing streams buffering? I was implementing it on my own but I don't get anywhere near 11 MB/s. Maybe send a PR to https://github.com/logtrust/tls-server-demo?

I still haven't figured out what makes this benchmark's performance profile different from the one we have in benchmark/tls/throughput.js when using a small chunk.

If server and client are on the same process then they will fight for the CPU, and the client will usually be the bottleneck. The client needs to be on a separate worker process to avoid this, and probably use more than one worker to really saturate the server. Also, the server needs to be doing something else (in my test proxying data to another server) to really feel the effect on performance of multiplying the number of events received.

apapirovski · 2018-04-26T07:15:25Z

@alexfernandez To be clear, I'm not talking about the issue you have with SecurePair, I was just testing your simple example and noticed that we had a regression from 9.6.0 to 9.7.0. That's what this PR is about.

In that particular example, one can switch from something like conn.write(chunk, sendPacket) to instead writing until the conn.write returns false and then using drain to resume writing again. What I was getting at is that we had a huge regression for the callback form of writing that you were using but only a slight regression for the drain version.

src: use EnqueueMicrotask for tls writes

48e6289

Instead of using SetImmediate, use EnqueueMicrotask as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.

apapirovski added the tls Issues and PRs related to the tls subsystem. label Apr 25, 2018

apapirovski requested a review from addaleax April 25, 2018 13:25

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. tls Issues and PRs related to the tls subsystem. labels Apr 25, 2018

apapirovski mentioned this pull request Apr 25, 2018

Performance Regression: tls.createSecurePair() Not Buffering Connections in v10 #20263

Closed

devsnek approved these changes Apr 25, 2018

View reviewed changes

TimothyGu approved these changes Apr 25, 2018

View reviewed changes

bnoordhuis approved these changes Apr 25, 2018

View reviewed changes

fixup: try something

f89600d

apapirovski force-pushed the patch-tls-use-enqueue-microtasks branch from 728cb33 to f89600d Compare April 25, 2018 19:38

alexfernandez mentioned this pull request Apr 26, 2018

benchmark: add secure-pair benchmark #20344

Closed

4 tasks

apapirovski closed this Apr 27, 2018

apapirovski deleted the patch-tls-use-enqueue-microtasks branch April 27, 2018 07:15

apapirovski mentioned this pull request Apr 27, 2018

src: opt out of DoTryWrite in TLS #20363

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: use EnqueueMicrotask for tls writes #20287

src: use EnqueueMicrotask for tls writes #20287

apapirovski commented Apr 25, 2018

apapirovski commented Apr 25, 2018 •

edited

Loading

devsnek commented Apr 25, 2018 •

edited

Loading

apapirovski commented Apr 25, 2018

apapirovski commented Apr 25, 2018

benjamingr commented Apr 25, 2018

apapirovski commented Apr 25, 2018

addaleax commented Apr 25, 2018

apapirovski commented Apr 25, 2018

bnoordhuis left a comment

bnoordhuis commented Apr 25, 2018 •

edited

Loading

addaleax commented Apr 25, 2018

bnoordhuis commented Apr 25, 2018

addaleax commented Apr 25, 2018

addaleax commented Apr 25, 2018

apapirovski commented Apr 25, 2018

alexfernandez commented Apr 25, 2018 •

edited

Loading

apapirovski commented Apr 26, 2018

src: use EnqueueMicrotask for tls writes #20287

src: use EnqueueMicrotask for tls writes #20287

Conversation

apapirovski commented Apr 25, 2018

Checklist

apapirovski commented Apr 25, 2018 • edited Loading

devsnek commented Apr 25, 2018 • edited Loading

apapirovski commented Apr 25, 2018

apapirovski commented Apr 25, 2018

benjamingr commented Apr 25, 2018

apapirovski commented Apr 25, 2018

addaleax commented Apr 25, 2018

apapirovski commented Apr 25, 2018

bnoordhuis left a comment

Choose a reason for hiding this comment

bnoordhuis commented Apr 25, 2018 • edited Loading

addaleax commented Apr 25, 2018

bnoordhuis commented Apr 25, 2018

addaleax commented Apr 25, 2018

addaleax commented Apr 25, 2018

apapirovski commented Apr 25, 2018

alexfernandez commented Apr 25, 2018 • edited Loading

apapirovski commented Apr 26, 2018

apapirovski commented Apr 25, 2018 •

edited

Loading

devsnek commented Apr 25, 2018 •

edited

Loading

bnoordhuis commented Apr 25, 2018 •

edited

Loading

alexfernandez commented Apr 25, 2018 •

edited

Loading