Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework to remove implicit flushes #357

Merged
merged 5 commits into from
May 28, 2023

Conversation

alexheretic
Copy link
Contributor

@alexheretic alexheretic commented May 24, 2023

My context: I've been working on a websocket load generator service that will write lots of small json message events to it's ws clients. I was interested in having a high peak throughput from this service to allow me to test downstream load performance. It uses axum->tokio-tungstenite->tungstenite. Doing localhost testing I got ~1.2M msg/s peak throughput. This PR improves that to ~2M msg/s.

Because of this improvement I thought this PR may be worthwhile.

Summary: Reduce flushing and allow better control of when flushes happen

I noticed there was a lot of flushing happening. Currently flushing is done something like:

  • Before writing each message.
  • After writing each message.
  • Before reading each message.

This seems excessive particularly when trying to send batches of lots of small messages. This PR minimises flushes so that a flush will only occur if write_pending is called, or if a "system message" (pong/close) should be dispatched. So users can call write_message many times without flushing (which is more like how I expected it to behave), then call write_pending which will flush.

Impl notes

My first approach was to remove the flush before each frame write currently in WebSocketContext::write_pending

// First, make sure we have no pending frame sending.
self.frame.write_pending(stream)?;

And remove the flush after each FrameCodec::write_frame. Then wire up write_message to not call flush and write_pending to essentially be flush.

Flush before write & send_queue

I noticed that after these changes the WebSocketContext::send_queue no longer really seemed to do much. The logic as is actually requires flushing before each write to work otherwise a message is simply added to the send_queue then taken off it and put in the FrameCodec::out_buffer.

So I removed the send_queue since it no longer does anything. It didn't seem to do too much previously, as it could only build up there if flushing before each write returned an error. If the write itself returned an error the send_queue would be empty and the message will simply be stored in the FrameCodec::out_buffer. I'm not sure what the benefit of having both a send_queue and an out_buffer is/was?

This PR I replace the send_queue by using the existing out_buffer. Writes are buffered there without flushing before each. We can buffer many writes here, as was done before, but without any send queue.

I replaced the max_send_queue config with max_write_buffer_size which can bound the previously unbound out_buffer. If a write would exceed this bound the message is instead returned as an error, in a very similar way previous logic would handle the send_queue bound. The one difference here is the messages is returned as a Message::Frame since it has already been serialized at this point. This seems worse than before where the message was returned as it was given. However, logic to re-send should still work. I'm not sure this difference is worth the flush call to prevent. It might be possible to put in a conversion back to the old message somehow but I'm not sure it's worth it.

Flush after each write

Removing this seems straightforward. We now require the caller calls write_pending after any writes to guarantee it will be flushed. This requirement seems similar to most write interfaces.

An exception is "system messages" like pong responses. If these have been scheduled we'll still automatically flush.

Flush on each read

Currently WebSocketContext::read_message will flush writes on each call.

// Since we may get ping or close, we need to reply to the messages even during read.
// Thus we call write_pending() but ignore its blocking.
self.write_pending(stream).no_block()?;

This means that if my service is polling for incoming messages I can still end up flushing writes a lot more than I'd expect. This PR changes this to only write if we actually have a pong/close "system message" to send. Since system messages won't scale with load this minimises extra flush impact while still writing-and-flushing these messages asap.

micro-bench

I added a micro-bench benches/write.rs to reproduce flush cost. This simulates a Write impl that takes 100ns to flush. The scenario is 100k write_messages then one write_pending. E.g. how I would approach writing a batch of small messages with the current API.

The improvement is clear since this PR will call only one flush per batch.

write 100k small texts then flush
        time:   [6.1961 ms 6.2280 ms 6.2360 ms]
        change: [-66.921% -66.617% -66.309%] (p = 0.07 > 0.05)

Issues

  • The micro-bench is quite artificial. Perhaps it could be made more realistic without too much bloat?
  • Error::WriteBufferFull will return the message as a Message::Frame, whereas SendQueueFull would return the message as given.
  • This is a breaking change.
  • It may be worth considering some API changes to reflect the changes. E.g. rename write_pending to flush?

@agalakhov
Copy link
Member

Thank you very much for the improvement! Technically this looks fine. But there are two small issues.

First, does automatic Pong reply still works correctly? It used to rely on automatic flush while reading. Also, does it work correctly if there is already some message in outgoing buffer? I'm quite sure I introduced the outgoing queue mostly for handling pings correctly.

Second, I'd rather rename write_message to something else in order to break old code. There are probably pieces of code here and there on the web that still rely on implicit flushing, so let's make them non-compileable. Renaming write_pending to flush is a good idea as well.

@alexheretic
Copy link
Contributor Author

First, does automatic Pong reply still works correctly? It used to rely on automatic flush while reading. Also, does it work correctly if there is already some message in outgoing buffer? I'm quite sure I introduced the outgoing queue mostly for handling pings correctly.

Auto pong replies should still auto flush.

Previously read_message->read_message_frame would set WebSocketContext::pong to Some. The next call to write_message, write_pending or read_message would write+flush since they all flushed writes. I don't think the send_queue was used for this, and indeed deduping auto-pongs is desirable.

In this PR read_message->read_message_frame will set WebSocketContext::additional_send to Some in a similar way (generalised to allow setting close frames here too). As before the next call to write_message, write_pending or read_message will write+flush the pong. The difference is calls to write_message & read_message won't flush in the general case where WebSocketContext::additional_send is None.

So we flush eagerly whenever we have a auto generated pong or close to send otherwise we only flush on user call.

Related: If the user sends a manual pong there is no implicit flushing, since if the user writes they can also flush as normal. My thinking was implicit/auto flushing only really made sense for auto generated messages outside of the user's direct control.

Second, I'd rather rename write_message to something else in order to break old code. There are probably pieces of code here and there on the web that still rely on implicit flushing, so let's make them non-compileable. Renaming write_pending to flush is a good idea as well.

That's a good idea. Perhaps we have methods write & flush so to align with Write trait? There could also be another method that simply does both write+flush, a bit like SinkExt::send vs feed. This could be called send. I'm not 100% sure how useful that is downstream, just an idea.

We could also keep a deprecated write_message & write_pending that both work similarly as before (write+flush) to ease migration. Then presumably remove the deprecated code on the next breaking change.

@agalakhov
Copy link
Member

Yes, write and flush sounds good. Then probably read as well. And one has to change tokio_tungstenite accordingly.

How does Pong work now if there is something not yet flushed and a Ping is incoming? This is the only thing that has to be verified. Probably it is covered by Autobahn test suite but I'd like to double check this before merging.

@alexheretic
Copy link
Contributor Author

Yes, write and flush sounds good. Then probably read as well. And one has to change tokio_tungstenite accordingly.

Ok I'll make these modifications when I get a moment.

How does Pong work now if there is something not yet flushed and a Ping is incoming?

Once a ping is read a pong is scheduled to be written and flushed on the next call to read or write. On that call, pong is appended to the out_buffer all of the out_buffer will be written and flushed. So if stuff already resides in that buffer, e.g. because of a previous write error, the pong frame is just appended on top and we try to write & flush all of it.

@alexheretic
Copy link
Contributor Author

alexheretic commented May 27, 2023

I have renamed the methods to better reflect the changes. We now use read, send, write & flush (send is write+flush and seems nice for simple usage and for migrating previous usage).

I've added entries to the changelog

To aid migration I have deprecated WebSocket::read_message, write_message & write_pending and deprecated WebSocketConfig::max_send_queue. This should prevent many users from seeing "hard" breaks and the warnings should be easy to action. I didn't do the same for WebSocketContext or FrameSocket as they seem lower level, so less of an issue.

Refine docs
Add `send` method
Add deprecated versions of write_message, write_pending,
read_message
Handle pong WriteBufferFull error
Add changelog
@agalakhov
Copy link
Member

This looks good, passes all tests, so merged. Let's see how it behaves in real life.

@alexheretic alexheretic deleted the flush-writes-less branch May 28, 2023 18:20
@alexheretic
Copy link
Contributor Author

Great to see merged, thanks for reviewing!

I've prepared snapview/tokio-tungstenite#284 for the merged 0.20 API. Do you have an idea of when the new release could be published?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants