-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing latency? #82
Comments
At the moment I don't have plans to implement connection pooling. I haven't looked at using TCP no delay either. It's possible that latency could be lowered by using a BufWriter and buffering the request line and headers. I'd be open to a change that implements this if latency is indeed improved by the change. |
I'm doing my own research here (though I used to know it better long time ago), so take everything with a grain of salt. Re: connection re-use, I totally understand not adding it, and in many setups making a new connection is ~1ms thing, so it truly doesn't matter much. The tcp delaying sending data is a bigger problem. I am not sure if attohtpc, the way it is being used now is being affected. I did seen delays in the past in other software. It seems like the socket flags help on some setups. Without connection polling, i would expect simply closing shutting down the socket (the sending side) with https://doc.rust-lang.org/std/net/struct.TcpStream.html#method.shutdown after all the data had been written had been written, would work as a flush. Maybe. I would hope. Using buffered writer would potentially save on syscalls, and with nodelay would fix this hypotetical issue. Seems like other http client libraries are doing nodelay: |
Also interesting read: https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable , though considering that after |
Would an option to enable tcp no delay be okay with you? We let the users choose what best fits their use case. |
Personally, I think this is a compromise at best as disabling Nagle's algorithm also means sending out half-full packets when the default buffer size (the 8K used by I think using As we currently use one connection per request, i.e. no pipelining is possible, corking and uncorking will never be wrong and we would not need to expose an option. The main downside is that it is a Linux-specific API. Another issue is that it requires unsafe code to call into (Setting |
It is hard to measure this using a real network due to noise, but it seems like using I would therefore suggest calling The edge case of the full request length hitting a multiple of the buffer size exactly seems unfortunate, but I am not sure fixing that is worth the downsides of integrating |
Ok, I have done some more measurements using a container running
to enforce a 1s delay before any response packet is sent. At least running kernel 5.10, I see no measurable effect of using So at least with contemporary kernels, it does not seem worth it to try to second guess the TCP stack. The interaction of Nagle's algorithm and delayed acknowledgements does not seem to be an issue either as the first segment is never delayed but our request line is always the first segment as we do not reuse connections. @dpc Could you describe a specific setup which one could reproduce to investigate your initial issue? P.S.: So concerning your initial questions:
As written above, I think this is not a problem at all indeed.
We already use Rust's default 8 kB buffer and flush it at the end. Setting P.P.S.: One interesting effect happens when data is larger than the internal buffer. Rust's |
Thanks for investigating. Behavior might be different on Mac/Windows, it might be worth setting TCP_NODELAY for those. Since we use a BufWriter it would probably be fine to do it. But I'm happy with letting the OS handle it, they surely have optimizations for this case. |
But only before the last write system call then? Internal buffer size and maximum segment size not matching up looks like an issue with how TCP works to me, not with the implementation of TCP that is part of the Linux kernel. |
I'm not sure I understand the issue with the maximum segment size and the BufWriter. The BufWriter buffers up to 8kb, so when we flush it should always result in full segments except for the last one. I'm mostly thinking of how we write the headers here. I haven't checked how we write the body. |
Everything goes through the same
The problem is that usually each write system call (say handing over 8 kB of data to the kernel) will result in multiple packets being sent, each up to the maximum segment size (MSS). But 8 kB is also usually not a multiple of the MSS, so each write will produce many full seqments and one less than full segment. If Only for the very last write system call, we want this last less-than-full segment to be sent immediately. Because we know that we will not write anymore data. (Which is also why we would want to "uncork" the socket after that last write.) Maybe as an illustration where W means write system call issued, F means full segment sent and L means less-than-full segment sent, writing a large request with
whereas without it, we will get
|
Ah yes, thank you. I see the issue. Best to just leave it like it is now then. Like you said before, the kernels should be smart enough to flush the nagle buffer once we try reading from the socket. |
Just to be clear - I have no issue with my Rust project. We've been suspecting an issue in another project (that was reusing plain tcp connection for multiple requests), and it made me think about the Rust project I'm using attohttpc where latency is also somewhat important (but we never did any measurements, nor was latency ever an issue). So, I just wanted to put this into consideration. Thanks! If there's no actionable items, we might just close this issue. |
So yes, I think we should close this for now. The whole question needs reconsideration if we start to reuse connections. |
I was just recently investigating an issue caused by https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment and it occurred to me to investigate latency optimization in the project I use
attohttpc
in.I think right now the library doesn't reuse the connection at all. Should there be a way to do it in the future or is it already out of scope of this library?
Since there's no connection reuse, is delayed ACK etc. a problem at all? Would it make sense for attohttpc to set (or allow setting) nodelay + use a large buffer and flush it at the end?
https://doc.rust-lang.org/std/net/struct.TcpStream.html#method.set_nodelay
The text was updated successfully, but these errors were encountered: