-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transport: refactor to reduce lock contention and improve performance #1962
Conversation
98874de
to
8dd7549
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add benchmark results before/after to the PR description.
// In case this is triggered because clientConn.Close() | ||
// was called, we want to immeditately close the transport | ||
// since no other goroutine might notice it for a while. | ||
t.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jadekler note this new requirement for the grpc layer to manually close the transport. This will need to be done by the onError
callback in your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll add it.
Free stream (RPC) goroutines from the onus of checking transport level quotas like flow control and to remove all the global locks. The streams instead schedule their messages and headers by adding them to
controlBuf
which is read by a dedicated goroutine which takes care of checking these quotas and writes bytes on the wire. This eliminates friction in highly-concurrent environments.Furthermore, this dedicated writing goroutine is tuned to maximise batch size for each syscall by yielding its processor (
runtime.Gosched()
) when there isn’t much data to write. This leads to stream goroutines getting scheduled and writing more data oncontrolBuf
and eventually increasing the batch size per syscall. This also helps with improving QPS (benchmark not covered above) by about 20%.Following are the benchmark results:
Pinger benchmark:
mmukhi@mmukhi:~/sandbox/pinger$ ./pinger -p 100 -n 1 -d 10s -t grpc
Internal benchmarks:
gRPC-Go OSS benchmarks: