Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor __inet_check_established() implementation #1419

Open
krizhanovsky opened this issue Jun 10, 2020 · 3 comments
Open

Poor __inet_check_established() implementation #1419

krizhanovsky opened this issue Jun 10, 2020 · 3 comments
Labels
kernel The Linux mainstream issues low priority performance
Milestone

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Jun 10, 2020

Linux kernel 4.19 (standard Debian 9.12 kernel). Couple of runs of wrk like

wrk -H 'Connection: close' -c 32768 -d 30 -t 8

can easily lead to significantly degraded performance on the client machine, sometime up to 20-30 times, with following perf profile.

  36.28%  [kernel]            [k] __inet_check_established
  20.68%  [kernel]            [k] _raw_spin_lock_bh
  14.76%  [kernel]            [k] _raw_spin_lock
  11.17%  [kernel]            [k] native_queued_spin_lock_slowpath
   9.23%  [kernel]            [k] __inet_hash_connect
   3.14%  [kernel]            [k] inet_ehashfn
   1.32%  [kernel]            [k] tcp_twsk_unique
   1.15%  [kernel]            [k] __local_bh_enable_ip
   0.64%  [kernel]            [k] _cond_resched
   0.43%  [kernel]            [k] _raw_spin_unlock_bh
   0.42%  [kernel]            [k] __indirect_thunk_start
   0.19%  [kernel]            [k] rcu_all_qs

Need to check the current kernel implementation. Last time I checked it, it was a dummy hash table with lists. Probably the recent our research in high performance concurrent hash tables for MariaDB can be employed here. Also see how VPP TCP manages per-CPU/per-thread connections hash tables.

@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Jun 12, 2020

While the hash table tcp_hashinfo is really naive (most of time the function spends in scanning a linked list), __inet_check_established() is called on connect() path only and the issue was observed on the system with multiple TIME-WAIT sockets. The problem can be mitigates with setting tcp_tw_reuse or reducing tcp_max_tw_buckets sysctls.

Meantime, the hash table is used also scanned in the main receiving routine tcp_v4_rcv(), so it still can cause a contention on a multi-core system servicing thousands TCP connections.

The problem was also reported by CloudFlare.

BTW, it seems FreeBSD/F-stack use very similar approach with inpcbinfo (see netinet/in_pcb.h) - see for an example tcp_input() -> in_pcblookup_mbuf(&V_tcbinfo, ...).

@defanator
Copy link

TWIMC, https://lore.kernel.org/netdev/20210220110356.84399-1-redsky110@gmail.com/#t

@krizhanovsky
Copy link
Contributor Author

The same issue is relevant for server-side connections. We saw 4ms delays in __inet_lookup_established().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel The Linux mainstream issues low priority performance
Projects
None yet
Development

No branches or pull requests

3 participants