Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer writes before writing to the underlying stream #358

Merged
merged 8 commits into from
Jun 2, 2023

Conversation

alexheretic
Copy link
Contributor

@alexheretic alexheretic commented May 30, 2023

Add ability to buffer multiple writes before writing to the underlying stream, controlled by WebSocketConfig::write_buffer_size (default 128 KiB). Improves batch message write performance.

This PR comes out of some further investigation this morning regarding writing many small messages. After #357 we no longer flush on each write which gives a big boost. I wondered what would happen if we also avoided/minimised write calls to the underlying stream and allowed the messages to buffer in the out_buffer/write buffer.

In short it does seem to provide a boost, though not as dramatic as the flush rework. With #357 my load-generator could write ~2.15M msg/s, with 128KiB write_buffer_size this goes up to ~2.44M msg/s.

Larger sizes didn't yield a benefit for me so 128KiB seems a decent default value, I'm proposing that here. The previous code essentially acted as if this value was zero, which could also be a reasonable default though users would need to opt in to get this benefit.

Note: This will only affect users that are calling write multiple times then flush. Since on flush all write buffer data will be written regardless of this config.

Naming

There is a slight issue with naming clarity here write_buffer_size vs max_write_buffer_size. I've tried to clarify as much as possible in docs. Perhaps this can be improved?

@alexheretic
Copy link
Contributor Author

I can probably make a similar micro-bench to show the benefits, or update the current one to include a small simulated io::Write::write cost.

@alexheretic
Copy link
Contributor Author

I can probably make a similar micro-bench to show the benefits, or update the current one to include a small simulated io::Write::write cost.

Yep, I re-calibrated the simulated io so underlying write calls take ~80ns and underlying flush takes ~8µs. So still keeping flush as the expensive operation but adding some cost to underlying stream writes. This slows master code micro-benches to ~18ms on my setup.

The PR 128 KiB buffer minimises underlying write calls and so performs much better.

write 100k small texts then flush
        time:   [6.5226 ms 6.6930 ms 6.7356 ms]
        change: [-63.907% -63.298% -62.689%] (p = 0.07 > 0.05)

@alexheretic
Copy link
Contributor Author

@agalakhov wdyt?

@agalakhov
Copy link
Member

We used to buffer messages and now we're effectively switching to buffering raw data instead. This seems to be a good idea.

My only concern is: what happens if some data get buffered without actually sending them and then a Ping is received? How do we deal with this ordering?

@alexheretic
Copy link
Contributor Author

My only concern is: what happens if some data get buffered without actually sending them and then a Ping is received? How do we deal with this ordering?

Basically the same as before, since we still auto flush when we schedule a pong response. So on the next read/write/flush if there is any additional_send we'll append it to the out_buffer. Then flush, that is write all the out_buffer to the underlying stream then flush it. The only difference here is the out_buffer may have more stuff in it to write, whereas before (or now with write_buffer_size: 0) each message would be eagerly written to the underlying stream.

This optimisation is a particularly good fit since we already did write all message frames into the out_buffer. The out_buffer can already store many messages fine, this is necessary to handle write errors properly. So here all we're doing is delaying underlying stream writes until there are more bytes to write, as this is more efficient.

@alexheretic
Copy link
Contributor Author

alexheretic commented Jun 2, 2023

We used to buffer messages

Just for clarity; from when I started looking at the code we didn't actually buffer messages in this way, though it kind of looked like we did. They were always immediately written to the underlying stream. The old message queue would only build up on flush errors, since flush used to be called before each write attempt. That's why I removed the send_queue in #357.

Messages can build up in the out_buffer now, and before my changes, on underlying-write errors. This PR just makes extra use of the out_buffer so that underlying-write is called fewer times with more bytes.

@agalakhov agalakhov merged commit 8f23e17 into snapview:master Jun 2, 2023
@agalakhov
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants