Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rpc: reduce context switches and receive calls
* When queueing a task to execute on the reactor, avoid writing to the eventfd to wake it up if such a write has already been done. This should reduce the number of read/write syscalls to the eventfd and avoid "spurious" wakeups of the reactor. * When reading inbound data, read an extra 4 bytes, and if it's available, loop around to read another call without putting the reactor back to sleep. The effect on context switches is clearly visible using rpc-bench --gtest_filter=\*Async Before: I0305 12:50:56.463312 7468 rpc-bench.cc:128] Ctx Sw. per req: 0.640409 I0305 12:50:58.015260 7542 rpc-bench.cc:128] Ctx Sw. per req: 0.613172 I0305 12:50:59.563201 7587 rpc-bench.cc:128] Ctx Sw. per req: 0.589479 I0305 12:51:01.014848 7662 rpc-bench.cc:128] Ctx Sw. per req: 0.562744 I0305 12:51:02.666339 7736 rpc-bench.cc:128] Ctx Sw. per req: 0.569126 After: I0305 12:52:03.567790 9005 rpc-bench.cc:128] Ctx Sw. per req: 0.383251 I0305 12:52:05.050909 9079 rpc-bench.cc:128] Ctx Sw. per req: 0.454404 I0305 12:52:06.626401 9138 rpc-bench.cc:128] Ctx Sw. per req: 0.3308 I0305 12:52:08.123154 9198 rpc-bench.cc:128] Ctx Sw. per req: 0.317752 I0305 12:52:09.666586 9272 rpc-bench.cc:128] Ctx Sw. per req: 0.391739 And on system CPU: Before: I0305 12:50:56.463310 7468 rpc-bench.cc:127] Sys CPU per req: 16.5524us I0305 12:50:58.015259 7542 rpc-bench.cc:127] Sys CPU per req: 16.1158us I0305 12:50:59.563199 7587 rpc-bench.cc:127] Sys CPU per req: 17.3184us I0305 12:51:01.014847 7662 rpc-bench.cc:127] Sys CPU per req: 16.7911us I0305 12:51:02.666337 7736 rpc-bench.cc:127] Sys CPU per req: 15.7659us After: I0305 12:52:03.567787 9005 rpc-bench.cc:127] Sys CPU per req: 13.0533us I0305 12:52:05.050906 9079 rpc-bench.cc:127] Sys CPU per req: 13.7925us I0305 12:52:06.626399 9138 rpc-bench.cc:127] Sys CPU per req: 11.6987us I0305 12:52:08.123152 9198 rpc-bench.cc:127] Sys CPU per req: 11.9214us I0305 12:52:09.666584 9272 rpc-bench.cc:127] Sys CPU per req: 13.4031us And on syscalls: todd@turbo:~/kudu$ grep recvfr /tmp/before /tmp/after /tmp/before: 1458969 syscalls:sys_enter_recvfrom ( +- 1.99% ) /tmp/before: 1458969 syscalls:sys_exit_recvfrom ( +- 1.99% ) /tmp/after: 1252328 syscalls:sys_enter_recvfrom ( +- 1.82% ) /tmp/after: 1252328 syscalls:sys_exit_recvfrom ( +- 1.82% ) todd@turbo:~/kudu$ grep epoll_ctl /tmp/before /tmp/after /tmp/before: 915862 syscalls:sys_enter_epoll_ctl ( +- 1.47% ) /tmp/before: 915862 syscalls:sys_exit_epoll_ctl ( +- 1.47% ) /tmp/after: 475978 syscalls:sys_enter_epoll_ctl ( +- 3.61% ) /tmp/after: 475978 syscalls:sys_exit_epoll_ctl ( +- 3.61% ) On a more macro-benchmark (TSBS single-groupby-1-1-1 16 workers on an 8-core machine) this also reduces syscalls a bit, though the end-to-end improvement is minimal. Before: Performance counter stats for 'system wide' (10 runs): 340,444 cs ( +- 0.30% ) 144,024 syscalls:sys_enter_recvfrom ( +- 0.00% ) 94,379 syscalls:sys_enter_epoll_ctl ( +- 0.06% ) 129,376 syscalls:sys_enter_epoll_wait ( +- 0.10% ) 2.025755946 seconds time elapsed ( +- 0.43% ) After: Performance counter stats for 'system wide' (10 runs): 333,865 cs ( +- 0.27% ) 119,216 syscalls:sys_enter_recvfrom ( +- 0.04% ) 88,731 syscalls:sys_enter_epoll_ctl ( +- 0.08% ) 104,149 syscalls:sys_enter_epoll_wait ( +- 0.08% ) 2.005614271 seconds time elapsed ( +- 0.19% ) Change-Id: I32c5e4d146c25be8e90665a0cb8385fcd017b15c Reviewed-on: http://gerrit.cloudera.org:8080/15440 Reviewed-by: Andrew Wong <awong@cloudera.com> Tested-by: Andrew Wong <awong@cloudera.com> Reviewed-by: Bankim Bhavsar <bankim@cloudera.com>
- Loading branch information