Commit 306f911
bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_EXCLUSIVE.
If a socket has sk->sk_memcg with SK_MEMCG_EXCLUSIVE, it is decoupled
from the global protocol memory accounting.
This is controlled by net.core.memcg_exclusive sysctl, but it lacks
flexibility.
Let's support flagging (and clearing) SK_MEMCG_EXCLUSIVE via
bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook.
u32 flags = SK_BPF_MEMCG_EXCLUSIVE;
bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_MEMCG_FLAGS,
&flags, sizeof(flags));
As with net.core.memcg_exclusive, this is inherited to child sockets,
and BPF always takes precedence over sysctl at socket(2) and accept(2).
SK_BPF_MEMCG_FLAGS is only supported at BPF_CGROUP_INET_SOCK_CREATE
and not supported on other hooks for some reasons:
1. UDP charges memory under sk->sk_receive_queue.lock instead
of lock_sock()
2. For TCP child sockets, memory accounting is adjusted only in
__inet_accept() which sk->sk_memcg allocation is deferred to
3. Modifying the flag after skb is charged to sk requires such
adjustment during bpf_setsockopt() and complicates the logic
unnecessarily
We can support other hooks later if a real use case justifies that.
Most changes are inline and hard to trace, but a microbenchmark on
__sk_mem_raise_allocated() during neper/tcp_stream showed that more
samples completed faster with SK_MEMCG_EXCLUSIVE. This will be more
visible under tcp_mem pressure.
# bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; }
kretprobe:__sk_mem_raise_allocated /@start[tid]/
{ @EnD[tid] = nsecs - @start[tid]; @times = hist(@EnD[tid]); delete(@start[tid]); }'
# tcp_stream -6 -F 1000 -N -T 256
Without bpf prog:
[128, 256) 3846 | |
[256, 512) 1505326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1371006 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 198207 |@@@@@@ |
[2K, 4K) 31199 |@ |
With bpf prog in the next patch:
(must be attached before tcp_stream)
# bpftool prog load sk_memcg.bpf.o /sys/fs/bpf/sk_memcg type cgroup/sock_create
# bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/sk_memcg
[128, 256) 6413 | |
[256, 512) 1868425 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1101697 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 117031 |@@@@ |
[2K, 4K) 11773 | |
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>1 parent 260bd76 commit 306f911
File tree
4 files changed
+49
-0
lines changed- include/uapi/linux
- mm
- net/core
- tools/include/uapi/linux
4 files changed
+49
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7182 | 7182 | | |
7183 | 7183 | | |
7184 | 7184 | | |
| 7185 | + | |
7185 | 7186 | | |
7186 | 7187 | | |
7187 | 7188 | | |
| |||
7204 | 7205 | | |
7205 | 7206 | | |
7206 | 7207 | | |
| 7208 | + | |
| 7209 | + | |
| 7210 | + | |
| 7211 | + | |
| 7212 | + | |
7207 | 7213 | | |
7208 | 7214 | | |
7209 | 7215 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4997 | 4997 | | |
4998 | 4998 | | |
4999 | 4999 | | |
| 5000 | + | |
| 5001 | + | |
| 5002 | + | |
5000 | 5003 | | |
5001 | 5004 | | |
5002 | 5005 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5723 | 5723 | | |
5724 | 5724 | | |
5725 | 5725 | | |
| 5726 | + | |
| 5727 | + | |
| 5728 | + | |
| 5729 | + | |
| 5730 | + | |
| 5731 | + | |
| 5732 | + | |
| 5733 | + | |
| 5734 | + | |
| 5735 | + | |
| 5736 | + | |
| 5737 | + | |
| 5738 | + | |
| 5739 | + | |
| 5740 | + | |
| 5741 | + | |
| 5742 | + | |
| 5743 | + | |
| 5744 | + | |
| 5745 | + | |
| 5746 | + | |
| 5747 | + | |
| 5748 | + | |
| 5749 | + | |
| 5750 | + | |
| 5751 | + | |
5726 | 5752 | | |
5727 | 5753 | | |
5728 | 5754 | | |
| 5755 | + | |
| 5756 | + | |
| 5757 | + | |
| 5758 | + | |
5729 | 5759 | | |
5730 | 5760 | | |
5731 | 5761 | | |
| |||
5743 | 5773 | | |
5744 | 5774 | | |
5745 | 5775 | | |
| 5776 | + | |
| 5777 | + | |
| 5778 | + | |
| 5779 | + | |
5746 | 5780 | | |
5747 | 5781 | | |
5748 | 5782 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7182 | 7182 | | |
7183 | 7183 | | |
7184 | 7184 | | |
| 7185 | + | |
7185 | 7186 | | |
7186 | 7187 | | |
7187 | 7188 | | |
| |||
7204 | 7205 | | |
7205 | 7206 | | |
7206 | 7207 | | |
| 7208 | + | |
| 7209 | + | |
| 7210 | + | |
| 7211 | + | |
| 7212 | + | |
7207 | 7213 | | |
7208 | 7214 | | |
7209 | 7215 | | |
| |||
0 commit comments