Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: use EnqueueMicrotask for tls writes #20287

Closed

Conversation

apapirovski
Copy link
Member

Instead of using SetImmediate, use EnqueueMicrotask as it appears to be significantly more performant in certain cases and even in the optimal case yields roughly 10% higher throughput.

Not sure if there are any potential downsides here (in terms of using EnqueueMicrotask) that we need to watch out for.

Here are some rough stats when using write callbacks as the original test:

With SetImmediate:

Elapsed 5 s, sent 8 MB (2 MB/s), received 8 MB (2 MB/s)
Elapsed 10 s, sent 16 MB (2 MB/s), received 16 MB (2 MB/s)
Elapsed 15 s, sent 23 MB (2 MB/s), received 23 MB (2 MB/s)
Elapsed 20 s, sent 31 MB (2 MB/s), received 31 MB (2 MB/s)

With EnqueueMicrotask:

Elapsed 5 s, sent 17 MB (3 MB/s), received 16 MB (3 MB/s)
Elapsed 10 s, sent 32 MB (3 MB/s), received 32 MB (3 MB/s)
Elapsed 15 s, sent 48 MB (3 MB/s), received 48 MB (3 MB/s)
Elapsed 20 s, sent 64 MB (3 MB/s), received 63 MB (3 MB/s)

And here's the performance when using drain and the usual streams buffering:

With SetImmediate:

Elapsed 5 s, received 53 MB (11 MB/s)
Elapsed 10 s, received 108 MB (11 MB/s)
Elapsed 15 s, received 162 MB (11 MB/s)
Elapsed 20 s, received 217 MB (11 MB/s)

With EnqueueMicrotask:

Elapsed 5 s, received 59 MB (12 MB/s)
Elapsed 10 s, received 120 MB (12 MB/s)
Elapsed 15 s, received 180 MB (12 MB/s)
Elapsed 20 s, received 241 MB (12 MB/s)

Refs: #20263

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

Instead of using SetImmediate, use EnqueueMicrotask as it appears to
be significantly more performant in certain cases and even in the
optimal case yields roughly 10% higher throughput.
@apapirovski apapirovski added the tls Issues and PRs related to the tls subsystem. label Apr 25, 2018
@apapirovski apapirovski requested a review from addaleax April 25, 2018 13:25
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. tls Issues and PRs related to the tls subsystem. labels Apr 25, 2018
@apapirovski
Copy link
Member Author

apapirovski commented Apr 25, 2018

@devsnek
Copy link
Member

devsnek commented Apr 25, 2018

this moves its execution forward in time a bit but otherwise i don't think there should be any problems (i've always wanted to see us move more stuff into the microtask queue like this)

@apapirovski
Copy link
Member Author

this moves its execution forward in time a bit but otherwise i don't think there should be any problems

Yeah, that's better anyway. I'm just trying to think of edge cases but there shouldn't be any.

(Hypothetically if any C++ called this code directly rather than via MakeCallback then that could be an issue if there were never any future MakeCallbacks. In that case microtasks would never flush again. Not sure if that's possible.)

@apapirovski
Copy link
Member Author

/cc @nodejs/crypto

@benjamingr
Copy link
Member

Not blocking and +1 on the PR - I'd love to have those benchmarks as part of our actual benchmarks and to see the results with statistical significance.

@apapirovski
Copy link
Member Author

Not blocking and +1 on the PR - I'd love to have those benchmarks as part of our actual benchmarks and to see the results with statistical significance.

I still haven't figured out what makes this benchmark's performance profile different from the one we have in benchmark/tls/throughput.js when using a small chunk. I mean, the original I get but once I changed it to use drain, it still shows a clear 10% improvement over the course of several minutes with steady rate of transfer whereas our throughput benchmark is all over the place.

@addaleax
Copy link
Member

Hypothetically if any C++ called this code directly rather than via MakeCallback then that could be an issue if there were never any future MakeCallbacks. In that case microtasks would never flush again. Not sure if that's possible.

It definitely should be possible in the future. I don’t think it’s an issue right now, but it’s worth at least a // TODO() or // FIXME().

Not sure if there are any potential downsides here (in terms of using EnqueueMicrotask) that we need to watch out for.

Other than the one you mentioned, the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct. As in: With this change in its current form, how can we be sure that no GC collects the TLSWrap* object before the microtask queue gets to run?

@apapirovski
Copy link
Member Author

Other than the one you mentioned, the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct. As in: With this change in its current form, how can we be sure that no GC collects the TLSWrap* object before the microtask queue gets to run?

Wouldn't surprise me if that's what's breaking Windows. 😅 Back to the drawing board.

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I suppose this could be extended to SetImmediate() calls in e.g. src/stream_pipe.cc?

@bnoordhuis
Copy link
Member

bnoordhuis commented Apr 25, 2018

the SetImmediate() feature of keeping the object alive asynchronously makes the code a lot more obviously correct

Is the wrap object weak? (I hope the answer is 'no.')

@addaleax
Copy link
Member

Is the wrap object weak? (I hope the answer is 'no.')

@bnoordhuis Sorry to disappoint you, but the answer is 'yes' (and has been for a long time). That does seem correct to me, though, for something that doesn’t have its own I/O mechanisms?

@bnoordhuis
Copy link
Member

I'd design it so the lifetime is explicitly managed; in the long run that's almost always easier to reason about. The fact that this innocuous looking pull request introduces a bug underscores that, IMO.

A TLSWrap is ultimately a construct to encrypt and decrypt socket data on the fly and sockets have a well-defined life cycle.

@addaleax
Copy link
Member

@bnoordhuis When there is a TLSWrap object that is not consuming a real network socket (or other thing with its own lifetime management), that becomes pretty hard.

But you’re definitely right in that more obvious/explicit lifetime management would be a good idea here.

@addaleax
Copy link
Member

@apapirovski Just fyi, while looking into this more, there is one case that we probably want to be aware of: When the TLS impl reads data from the underlying socket, and that prompts some protocol-level response (i.e. no extra payload data), then we do end up in this block of code without any JS stack beneath it.

I’m not sure whether that is a non-issue or whether we just don’t catch it because our tests just aren’t written to expose that kind of problem.

@apapirovski
Copy link
Member Author

@addaleax Thanks for looking into it. I'm working on a slightly different take on this now. Will have something tomorrow maybe.

@apapirovski apapirovski force-pushed the patch-tls-use-enqueue-microtasks branch from 728cb33 to f89600d Compare April 25, 2018 19:38
@alexfernandez
Copy link

alexfernandez commented Apr 25, 2018

@apapirovski

And here's the performance when using drain and the usual streams buffering:

How exactly are you doing streams buffering? I was implementing it on my own but I don't get anywhere near 11 MB/s. Maybe send a PR to https://github.com/logtrust/tls-server-demo?

I still haven't figured out what makes this benchmark's performance profile different from the one we have in benchmark/tls/throughput.js when using a small chunk.

If server and client are on the same process then they will fight for the CPU, and the client will usually be the bottleneck. The client needs to be on a separate worker process to avoid this, and probably use more than one worker to really saturate the server. Also, the server needs to be doing something else (in my test proxying data to another server) to really feel the effect on performance of multiplying the number of events received.

@apapirovski
Copy link
Member Author

@alexfernandez To be clear, I'm not talking about the issue you have with SecurePair, I was just testing your simple example and noticed that we had a regression from 9.6.0 to 9.7.0. That's what this PR is about.

In that particular example, one can switch from something like conn.write(chunk, sendPacket) to instead writing until the conn.write returns false and then using drain to resume writing again. What I was getting at is that we had a huge regression for the callback form of writing that you were using but only a slight regression for the drain version.

@apapirovski apapirovski deleted the patch-tls-use-enqueue-microtasks branch April 27, 2018 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants