Rework to remove implicit flushes #357

alexheretic · 2023-05-24T16:57:31Z

My context: I've been working on a websocket load generator service that will write lots of small json message events to it's ws clients. I was interested in having a high peak throughput from this service to allow me to test downstream load performance. It uses axum->tokio-tungstenite->tungstenite. Doing localhost testing I got ~1.2M msg/s peak throughput. This PR improves that to ~2M msg/s.

Because of this improvement I thought this PR may be worthwhile.

Summary: Reduce flushing and allow better control of when flushes happen

I noticed there was a lot of flushing happening. Currently flushing is done something like:

Before writing each message.
After writing each message.
Before reading each message.

This seems excessive particularly when trying to send batches of lots of small messages. This PR minimises flushes so that a flush will only occur if write_pending is called, or if a "system message" (pong/close) should be dispatched. So users can call write_message many times without flushing (which is more like how I expected it to behave), then call write_pending which will flush.

Impl notes

My first approach was to remove the flush before each frame write currently in WebSocketContext::write_pending

// First, make sure we have no pending frame sending.
self.frame.write_pending(stream)?;

And remove the flush after each FrameCodec::write_frame. Then wire up write_message to not call flush and write_pending to essentially be flush.

Flush before write & `send_queue`

I noticed that after these changes the WebSocketContext::send_queue no longer really seemed to do much. The logic as is actually requires flushing before each write to work otherwise a message is simply added to the send_queue then taken off it and put in the FrameCodec::out_buffer.

So I removed the send_queue since it no longer does anything. It didn't seem to do too much previously, as it could only build up there if flushing before each write returned an error. If the write itself returned an error the send_queue would be empty and the message will simply be stored in the FrameCodec::out_buffer. I'm not sure what the benefit of having both a send_queue and an out_buffer is/was?

This PR I replace the send_queue by using the existing out_buffer. Writes are buffered there without flushing before each. We can buffer many writes here, as was done before, but without any send queue.

I replaced the max_send_queue config with max_write_buffer_size which can bound the previously unbound out_buffer. If a write would exceed this bound the message is instead returned as an error, in a very similar way previous logic would handle the send_queue bound. The one difference here is the messages is returned as a Message::Frame since it has already been serialized at this point. This seems worse than before where the message was returned as it was given. However, logic to re-send should still work. I'm not sure this difference is worth the flush call to prevent. It might be possible to put in a conversion back to the old message somehow but I'm not sure it's worth it.

Flush after each write

Removing this seems straightforward. We now require the caller calls write_pending after any writes to guarantee it will be flushed. This requirement seems similar to most write interfaces.

An exception is "system messages" like pong responses. If these have been scheduled we'll still automatically flush.

Flush on each read

Currently WebSocketContext::read_message will flush writes on each call.

// Since we may get ping or close, we need to reply to the messages even during read.
// Thus we call write_pending() but ignore its blocking.
self.write_pending(stream).no_block()?;

This means that if my service is polling for incoming messages I can still end up flushing writes a lot more than I'd expect. This PR changes this to only write if we actually have a pong/close "system message" to send. Since system messages won't scale with load this minimises extra flush impact while still writing-and-flushing these messages asap.

micro-bench

I added a micro-bench benches/write.rs to reproduce flush cost. This simulates a Write impl that takes 100ns to flush. The scenario is 100k write_messages then one write_pending. E.g. how I would approach writing a batch of small messages with the current API.

The improvement is clear since this PR will call only one flush per batch.

write 100k small texts then flush
        time:   [6.1961 ms 6.2280 ms 6.2360 ms]
        change: [-66.921% -66.617% -66.309%] (p = 0.07 > 0.05)

Issues

The micro-bench is quite artificial. Perhaps it could be made more realistic without too much bloat?
Error::WriteBufferFull will return the message as a Message::Frame, whereas SendQueueFull would return the message as given.
This is a breaking change.
It may be worth considering some API changes to reflect the changes. E.g. rename write_pending to flush?

agalakhov · 2023-05-26T13:39:43Z

Thank you very much for the improvement! Technically this looks fine. But there are two small issues.

First, does automatic Pong reply still works correctly? It used to rely on automatic flush while reading. Also, does it work correctly if there is already some message in outgoing buffer? I'm quite sure I introduced the outgoing queue mostly for handling pings correctly.

Second, I'd rather rename write_message to something else in order to break old code. There are probably pieces of code here and there on the web that still rely on implicit flushing, so let's make them non-compileable. Renaming write_pending to flush is a good idea as well.

alexheretic · 2023-05-26T14:28:31Z

First, does automatic Pong reply still works correctly? It used to rely on automatic flush while reading. Also, does it work correctly if there is already some message in outgoing buffer? I'm quite sure I introduced the outgoing queue mostly for handling pings correctly.

Auto pong replies should still auto flush.

Previously read_message->read_message_frame would set WebSocketContext::pong to Some. The next call to write_message, write_pending or read_message would write+flush since they all flushed writes. I don't think the send_queue was used for this, and indeed deduping auto-pongs is desirable.

In this PR read_message->read_message_frame will set WebSocketContext::additional_send to Some in a similar way (generalised to allow setting close frames here too). As before the next call to write_message, write_pending or read_message will write+flush the pong. The difference is calls to write_message & read_message won't flush in the general case where WebSocketContext::additional_send is None.

So we flush eagerly whenever we have a auto generated pong or close to send otherwise we only flush on user call.

Related: If the user sends a manual pong there is no implicit flushing, since if the user writes they can also flush as normal. My thinking was implicit/auto flushing only really made sense for auto generated messages outside of the user's direct control.

Second, I'd rather rename write_message to something else in order to break old code. There are probably pieces of code here and there on the web that still rely on implicit flushing, so let's make them non-compileable. Renaming write_pending to flush is a good idea as well.

That's a good idea. Perhaps we have methods write & flush so to align with Write trait? There could also be another method that simply does both write+flush, a bit like SinkExt::send vs feed. This could be called send. I'm not 100% sure how useful that is downstream, just an idea.

We could also keep a deprecated write_message & write_pending that both work similarly as before (write+flush) to ease migration. Then presumably remove the deprecated code on the next breaking change.

agalakhov · 2023-05-26T23:44:34Z

Yes, write and flush sounds good. Then probably read as well. And one has to change tokio_tungstenite accordingly.

How does Pong work now if there is something not yet flushed and a Ping is incoming? This is the only thing that has to be verified. Probably it is covered by Autobahn test suite but I'd like to double check this before merging.

alexheretic · 2023-05-27T00:03:04Z

Yes, write and flush sounds good. Then probably read as well. And one has to change tokio_tungstenite accordingly.

Ok I'll make these modifications when I get a moment.

How does Pong work now if there is something not yet flushed and a Ping is incoming?

Once a ping is read a pong is scheduled to be written and flushed on the next call to read or write. On that call, pong is appended to the out_buffer all of the out_buffer will be written and flushed. So if stuff already resides in that buffer, e.g. because of a previous write error, the pong frame is just appended on top and we try to write & flush all of it.

alexheretic · 2023-05-27T12:17:01Z

I have renamed the methods to better reflect the changes. We now use read, send, write & flush (send is write+flush and seems nice for simple usage and for migrating previous usage).

I've added entries to the changelog

To aid migration I have deprecated WebSocket::read_message, write_message & write_pending and deprecated WebSocketConfig::max_send_queue. This should prevent many users from seeing "hard" breaks and the warnings should be easy to action. I didn't do the same for WebSocketContext or FrameSocket as they seem lower level, so less of an issue.

Refine docs Add `send` method Add deprecated versions of write_message, write_pending, read_message Handle pong WriteBufferFull error Add changelog

agalakhov · 2023-05-28T16:41:25Z

This looks good, passes all tests, so merged. Let's see how it behaves in real life.

alexheretic · 2023-05-28T18:41:33Z

Great to see merged, thanks for reviewing!

I've prepared snapview/tokio-tungstenite#284 for the merged 0.20 API. Do you have an idea of when the new release could be published?

alexheretic mentioned this pull request May 24, 2023

Remove flush on each WebSocketStream::poll_ready call + update to tungstenite 0.20 snapview/tokio-tungstenite#284

Merged

alexheretic force-pushed the flush-writes-less branch from d1021cd to 37a2e20 Compare May 24, 2023 17:41

alexheretic added 3 commits May 24, 2023 18:42

Add write 100k micro-bench

d298089

Remove implicit write flushing

483d229

Remove send_queue, use out_buffer instead

0203a18

alexheretic force-pushed the flush-writes-less branch from 37a2e20 to 0203a18 Compare May 24, 2023 17:42

daniel-abramov requested a review from agalakhov May 25, 2023 22:26

alexheretic force-pushed the flush-writes-less branch from 3107e38 to b2069f8 Compare May 27, 2023 12:10

alexheretic force-pushed the flush-writes-less branch from b2069f8 to ae66c81 Compare May 27, 2023 12:18

Rename methods to read, send, write & flush

84a54b7

Refine docs Add `send` method Add deprecated versions of write_message, write_pending, read_message Handle pong WriteBufferFull error Add changelog

alexheretic force-pushed the flush-writes-less branch from ae66c81 to 84a54b7 Compare May 27, 2023 12:19

Refactor additional_send writing

06e55a4

alexheretic mentioned this pull request May 28, 2023

Add methods on WebSocket to better implement Sink::poll_ready #342

Closed

agalakhov merged commit 5a3115c into snapview:master May 28, 2023

alexheretic deleted the flush-writes-less branch May 28, 2023 18:20

alexheretic mentioned this pull request May 30, 2023

Buffer writes before writing to the underlying stream #358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework to remove implicit flushes #357

Rework to remove implicit flushes #357

alexheretic commented May 24, 2023 •

edited

Loading

agalakhov commented May 26, 2023

alexheretic commented May 26, 2023

agalakhov commented May 26, 2023

alexheretic commented May 27, 2023

alexheretic commented May 27, 2023 •

edited

Loading

agalakhov commented May 28, 2023

alexheretic commented May 28, 2023

Rework to remove implicit flushes #357

Rework to remove implicit flushes #357

Conversation

alexheretic commented May 24, 2023 • edited Loading

Summary: Reduce flushing and allow better control of when flushes happen

Impl notes

Flush before write & send_queue

Flush after each write

Flush on each read

micro-bench

Issues

agalakhov commented May 26, 2023

alexheretic commented May 26, 2023

agalakhov commented May 26, 2023

alexheretic commented May 27, 2023

alexheretic commented May 27, 2023 • edited Loading

agalakhov commented May 28, 2023

alexheretic commented May 28, 2023

alexheretic commented May 24, 2023 •

edited

Loading

Flush before write & `send_queue`

alexheretic commented May 27, 2023 •

edited

Loading