Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tcp window clamp example #172

Merged
merged 4 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ CLANG_BPF_SYS_INCLUDES = $(shell $(CC) -v -E - </dev/null 2>&1 \
| sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')

$(OBJ): %.bpf.o: %.bpf.c
$(CC) -g -O2 -Wall -Werror -D__TARGET_ARCH_$(ARCH) $(CLANG_BPF_SYS_INCLUDES) -I../include/$(ARCH) -c -target bpf $< -o $@
$(CC) -g -O2 -Wall -Werror -D__TARGET_ARCH_$(ARCH) $(CFLAGS) $(CLANG_BPF_SYS_INCLUDES) -I../include/$(ARCH) -c -target bpf $< -o $@

.PHONY: clean
clean:
Expand Down
90 changes: 90 additions & 0 deletions examples/tcp-window-clamps.bpf.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#include "maps.bpf.h"

/* Minimum value for tp->rcv_ssthresh that is not considered a clamp */
#define MIN_CLAMP 32 * 1024

struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, u32);
__type(value, u64);
} tcp_window_clamps_total SEC(".maps");

static int handle_tcp_sock(struct tcp_sock *tp)
{
u32 zero = 0, rcv_ssthresh;

if (!tp) {
return 0;
}

rcv_ssthresh = BPF_CORE_READ(tp, rcv_ssthresh);

if (rcv_ssthresh < MIN_CLAMP) {
increment_map(&tcp_window_clamps_total, &zero, 1);
}

return 0;
}

#ifdef FENTRY_SUPPORT
// If fentry/fexit is supported, use it for simpler and faster probe.
// You need to pass -DFENTRY_SUPPORT in compiler flags to enable this.

SEC("fexit/tcp_try_rmem_schedule")
int BPF_PROG(tcp_try_rmem_schedule_exit, struct sock *sk)
{
return handle_tcp_sock((struct tcp_sock *) sk);
}

#else
// Otherwise, fall back to good old kprobe.

struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, 1024);
__type(key, u64);
__type(value, struct sock *);
} tcp_rmem_schedule_enters SEC(".maps");

static int enter_key()
{
u64 pid = bpf_get_current_pid_tgid();
if (pid) {
return pid;
}

return bpf_get_smp_processor_id();
bobrik marked this conversation as resolved.
Show resolved Hide resolved
}

SEC("kprobe/tcp_try_rmem_schedule")
int BPF_KPROBE(tcp_try_rmem_schedule, struct sock *sk)
{
u64 key = enter_key();

bpf_map_update_elem(&tcp_rmem_schedule_enters, &key, &sk, BPF_NOEXIST);

return 0;
}

SEC("kretprobe/tcp_try_rmem_schedule")
Copy link
Contributor

@wenlxie wenlxie Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I met performance issue when use kreprobe to probe function ipt_do_table() in kernel 5.4.0. So I'd suggest that you'd better have a LnP test before this enabled on production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by LNP test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load and performance, stress test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a given. I did a quick test with kernel source download in a VM:

ivan@vm:~$ curl -o /dev/null https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.0.11.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  127M  100  127M    0     0  44.5M      0  0:00:02  0:00:02 --:--:-- 44.5M

I see ~4k packets going via iptables for this:

[3973:134454161] -A INPUT -i eth0 -p tcp -m tcp --sport 443 -j ACCEPT

There are 1286 bpf program runs and ~8ms of CPU time spent:

# HELP ebpf_exporter_ebpf_program_info Info about ebpf programs
# TYPE ebpf_exporter_ebpf_program_info gauge
ebpf_exporter_ebpf_program_info{config="tcp-window-clamps",id="552",program="tcp_try_rmem_schedule",tag="f14b021593f58e05"} 1
ebpf_exporter_ebpf_program_info{config="tcp-window-clamps",id="553",program="tcp_try_rmem_schedule_ret",tag="d88afa963de02adb"} 1

# HELP ebpf_exporter_ebpf_program_run_count_total How many times has the program been executed
# TYPE ebpf_exporter_ebpf_program_run_count_total counter
ebpf_exporter_ebpf_program_run_count_total{id="552"} 1286
ebpf_exporter_ebpf_program_run_count_total{id="553"} 1286

# HELP ebpf_exporter_ebpf_program_run_time_seconds How long has the program been executing
# TYPE ebpf_exporter_ebpf_program_run_time_seconds counter
ebpf_exporter_ebpf_program_run_time_seconds{id="552"} 0.00434047
ebpf_exporter_ebpf_program_run_time_seconds{id="553"} 0.003490548

With fexit based probe it drops to 1190 runs and 3ms of CPU time:

# HELP ebpf_exporter_ebpf_program_run_count_total How many times has the program been executed
# TYPE ebpf_exporter_ebpf_program_run_count_total counter
ebpf_exporter_ebpf_program_run_count_total{id="547"} 1190

# HELP ebpf_exporter_ebpf_program_run_time_seconds How long has the program been executing
# TYPE ebpf_exporter_ebpf_program_run_time_seconds counter
ebpf_exporter_ebpf_program_run_time_seconds{id="547"} 0.002882893

The number of runs depends on how buffers are drained.

With curl spending 711ms of combined system and user time it comes down to 1.13% for kprobe + kretprobe and 0.42% for fexit. Whether that's an acceptable overhead is up to any consumer (as with any other config).

Copy link
Contributor

@wenlxie wenlxie Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobrik Thanks for the data.
In old kernel versions(at least 5.4.0), kretprobe is implemented with acquire global locks raw_spin_lock_irqsave(&rp->lock, flags) before get a free instance in pre_handler_kretprobe(). When there are high traffics (different 5 tuples) which means they are handled by different CPUs, it may trigger this issue. You can see high si usage and high latency of tcp packets.

int BPF_KRETPROBE(tcp_try_rmem_schedule_ret)
{
u64 key = enter_key();
struct sock **skp = bpf_map_lookup_elem(&tcp_rmem_schedule_enters, &key);

if (!skp) {
return 0;
}

bpf_map_delete_elem(&tcp_rmem_schedule_enters, &key);

return handle_tcp_sock((struct tcp_sock *) *skp);
}

#endif

char LICENSE[] SEC("license") = "GPL";
8 changes: 8 additions & 0 deletions examples/tcp-window-clamps.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Linux memory management can overestimate memory pressure and punish well behaving
# TCP sockets with a window clamp. The clamp limits max throughput.
# See: https://lore.kernel.org/netdev/CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfNkXUE+Uw@mail.gmail.com/
metrics:
counters:
- name: tcp_window_clamps_total
help: Number of times that TCP window was clamped to a low value
labels: []