-
Notifications
You must be signed in to change notification settings - Fork 13.6k
rpc: join small packets in send_msg and recv_msg
#16892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Combining multiple Here I am sending 3 The total time is 50.587595 - 50.553828 = 0.033766 sec. Here I am sending 3 The total time is 52.224281 - 52.190746 = 0.033535 sec. In both cases we are bound by the network latency which is ~33ms. In other words, combining multiple For this simple example the difference is less than 1ms for a very small tensor. For large tensors it'd be worse due to the additional copies being made. |
Yeah, it didn't help me at all when I tried this last week and:
The extra The only reason I tried it again was to try and get it working with the volatile cache idea, as it was sending the first 9 bytes as 2 packets and then spending a significant time working on hashing the payload to send and I suspected it might help this more. It was only when I tried it with your other graph-reuse PR that I seemed to gain an extra 2-3 tokens/s on top of the already big increase from that PR (eg: 15.5 --> 19.5 --> 22.5), but I need to test much more carefully if this isn't due to some other change in the codebase since I ran the last tests. I will try and run |
|
I'm closing this, as it doesn't seem to help with the reason pointed out by @rgerganov regarding latency. |
This PR just joins the 1-byte command and 8-byte size packets, with the main payload.
It didn't seem to make much difference to me to start with, but possibly different network setups handle
TCP_NODELAYdifferently.Combined with #15405 it does seem to give a large speedup to TG now (more testing needed to be sure).
It may be better to create the buffer once outside of the function if it doesn't need to be thread safe, so just making a draft for now to see if worthwhile for others.