Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap BPF_CORE_* helper calls with kernel version checks #21

Closed
vincentmli opened this issue Oct 20, 2021 · 19 comments
Closed

Wrap BPF_CORE_* helper calls with kernel version checks #21

vincentmli opened this issue Oct 20, 2021 · 19 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@vincentmli
Copy link

Hi,

I am trying to use pwru to trouble shoot issue cilium/cilium#17528. this is on 5.4 kernel, I got error

[root@centos-dev pwru]# ./pwru --filter-dst-ip=10.169.72.236 --filter-dst-port=8472 --filter-proto=udp --output-stack

2021/10/20 13:24:35 Loading objects: field KprobeSkb1: program kprobe_skb_1: load program: invalid argument: ; int kprobe_skb_1(struct pt_regs *ctx) {
0: (bf) r6 = r1
; struct sk_buff *skb = (struct sk_buff *) PT_REGS_PARM1(ctx);
1: (79) r9 = *(u64 *)(r6 +112)
2: (b7) r1 = 0
; struct event_t event = {};
3: (7b) *(u64 *)(r10 -56) = r1
last_idx 3 first_idx 0
regs=2 stack=0 before 2: (b7) r1 = 0
4: (7b) *(u64 *)(r10 -64) = r1
5: (7b) *(u64 *)(r10 -72) = r1
6: (7b) *(u64 *)(r10 -80) = r1
7: (7b) *(u64 *)(r10 -88) = r1
8: (7b) *(u64 *)(r10 -96) = r1
9: (7b) *(u64 *)(r10 -104) = r1
10: (7b) *(u64 *)(r10 -112) = r1
11: (7b) *(u64 *)(r10 -120) = r1
12: (7b) *(u64 *)(r10 -128) = r1
13: (7b) *(u64 *)(r10 -136) = r1
; u32 index = 0;
14: (63) *(u32 *)(r10 -140) = r1
15: (bf) r2 = r10
; 
16: (07) r2 += -140
; struct config *cfg = bpf_map_lookup_elem(&cfg_map, &index);
17: (18) r1 = 0xffff9c67d9e55400
19: (85) call bpf_map_lookup_elem#1
20: (bf) r7 = r0
; if (cfg) {
21: (15) if r7 == 0x0 goto pc+430
 R0_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R6_w=ctx(id=0,off=0,imm=0) R7_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9_w=inv(id=0) R10=fp0 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144=mmmm????
; if (cfg->mark) {
22: (71) r1 = *(u8 *)(r7 +1)
 R0_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R6_w=ctx(id=0,off=0,imm=0) R7_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9_w=inv(id=0) R10=fp0 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144=mmmm????
23: (67) r1 <<= 8
24: (71) r2 = *(u8 *)(r7 +0)
 R0_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0,umax_value=65280,var_off=(0x0; 0xff00)) R6_w=ctx(id=0,off=0,imm=0) R7_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9_w=inv(id=0) R10=fp0 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144=mmmm????
25: (4f) r1 |= r2
26: (71) r2 = *(u8 *)(r7 +2)
 R0_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R2_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R6_w=ctx(id=0,off=0,imm=0) R7_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9_w=inv(id=0) R10=fp0 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144=mmmm????
27: (71) r3 = *(u8 *)(r7 +3)
 R0_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R2_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R6_w=ctx(id=0,off=0,imm=0) R7_w=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9_w=inv(id=0) R10=fp0 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144=mmmm????
28: (67) r3 <<= 8
29: (4f) r3 |= r2
30: (67) r3 <<= 16
31: (4f) r3 |= r1
; if (cfg->mark) {
32: (15) if r3 == 0x0 goto pc+19
 R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1=inv(id=0) R2=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=48,imm=0) R9=inv(id=0) R10=fp0 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=mmmm????
33: (b7) r1 = 164
34: (bf) r3 = r9
35: (0f) r3 += r1
36: (bf) r1 = r10
; 
37: (07) r1 += -24
; mark = BPF_CORE_READ(skb, mark);
38: (b7) r2 = 4
39: (85) call unknown#113
invalid func unknown#113
processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1```
@vincentmli
Copy link
Author

it works on 5.5 kernel

[root@centos-dev pwru]# ./pwru --filter-dst-ip=10.169.72.236 --filter-dst-port=8472 --filter-proto=udp --output-tuple

2021/10/20 14:19:13 Attaching kprobes...
1060 / 1060 [--------------------------------------------------------------------------------------------------] 100.00% 29 p/s
Attached (ignored 0)
2021/10/20 14:19:50 Listening for events..
               SKB         PROCESS                     FUNC        TIMESTAMP
0xffff9a1407361b00          [ping]             ip_local_out     979036123709 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]           __ip_local_out     979036131143 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]             nf_hook_slow     979036136262 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]                ip_output     979036595661 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]             nf_hook_slow     979036611100 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]         ip_finish_output     979037170778 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]       __ip_finish_output     979037174425 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]        ip_finish_output2     979037176770 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]     neigh_resolve_output     979037179575 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]       __neigh_event_send     979037181829 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]               eth_header     979037184534 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]                 skb_push     979037186678 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]           dev_queue_xmit     979037189303 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]         __dev_queue_xmit     979037191497 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]      netdev_core_pick_tx     979037193712 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]           netdev_pick_tx     979037197088 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]           __skb_get_hash     979037199463 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]          sch_direct_xmit     979037205394 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]   validate_xmit_skb_list     979037207919 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]        validate_xmit_skb     979037209902 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]       netif_skb_features     979037212076 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]     skb_network_protocol     979037214080 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]       validate_xmit_xfrm     979037216424 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00          [ping]      dev_hard_start_xmit     979037218649 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]      __dev_kfree_skb_any     979037351640 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]              consume_skb     979037359014 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]          skb_release_all     979037361258 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]   skb_release_head_state     979037363262 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]               sock_wfree     979037365517 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]         skb_release_data     979037368672 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]            skb_free_head     979037370867 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a1407361b00 [containerd-shim]             kfree_skbmem     979037373521 10.169.72.233:48805->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]             ip_local_out     984083409078 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]           __ip_local_out     984083419898 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]             nf_hook_slow     984083422523 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]                ip_output     984083845844 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]             nf_hook_slow     984083852967 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]         ip_finish_output     984084177621 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]       __ip_finish_output     984084183542 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]        ip_finish_output2     984084186678 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]     neigh_resolve_output     984084191337 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]               eth_header     984084194212 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]                 skb_push     984084196467 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]           dev_queue_xmit     984084198721 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]         __dev_queue_xmit     984084200695 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]      netdev_core_pick_tx     984084203079 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]           netdev_pick_tx     984084206535 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]           __skb_get_hash     984084209481 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]          sch_direct_xmit     984084216775 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]   validate_xmit_skb_list     984084219189 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]        validate_xmit_skb     984084221554 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]       netif_skb_features     984084223718 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]     skb_network_protocol     984084225792 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]       validate_xmit_xfrm     984084228216 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00   [ksoftirqd/6]      dev_hard_start_xmit     984084230371 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]      __dev_kfree_skb_any     984084258243 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]              consume_skb     984084261850 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]          skb_release_all     984084265387 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]   skb_release_head_state     984084267831 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]         skb_release_data     984084270717 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]            skb_free_head     984084273452 10.169.72.233:33754->10.169.72.236:8472(udp)
0xffff9a13af5e6e00       [<empty>]             kfree_skbmem     984084277480 10.169.72.233:33754->10.169.72.236:8472(udp)

@aditighag
Copy link
Member

aditighag commented Oct 20, 2021

; mark = BPF_CORE_READ(skb, mark);
38: (b7) r2 = 4
39: (85) call unknown#113
invalid func unknown#113 

The bpf_core_read.h that defined BPF_CORE_READ was added in 5.5 - https://elixir.bootlin.com/linux/v5.5/source/tools/lib/bpf/bpf_core_read.h#L117. We'll need to use bpf_probe_read for kernels <5.5.

@aditighag aditighag changed the title load program: invalid argument: int kprobe_skb_1(struct pt_regs *ctx) Wrap BPF_CORE_* helper calls with kernel version checks Oct 20, 2021
@vincentmli
Copy link
Author

; mark = BPF_CORE_READ(skb, mark);
38: (b7) r2 = 4
39: (85) call unknown#113
invalid func unknown#113 

The bpf_core_read.h that defined BPF_CORE_READ was added in 5.5 - https://elixir.bootlin.com/linux/v5.5/source/tools/lib/bpf/bpf_core_read.h#L117. We'll need to use bpf_probe_read for kernels <5.5.

ok, that sounds good, my issue happens to be in 5.4, I can't think of what tool I can use to trouble shoot the issue, hope pwru could help here.

@duanjiong
Copy link
Contributor

; mark = BPF_CORE_READ(skb, mark);
38: (b7) r2 = 4
39: (85) call unknown#113
invalid func unknown#113 

The bpf_core_read.h that defined BPF_CORE_READ was added in 5.5 - https://elixir.bootlin.com/linux/v5.5/source/tools/lib/bpf/bpf_core_read.h#L117. We'll need to use bpf_probe_read for kernels <5.5.

In this case we should update the readme, because it says kernel version 5.3

@brb
Copy link
Member

brb commented Oct 21, 2021

@vincentmli For your debugging you could revert 00de303 and build the tool yourself (please refer to README.md how to do that). Let me know if you have problems with this.

I think for older kernels we could rely on bpf_probe_read() and __sk_buff instead (UPDATE: the latter seems to be not available for kprobes. However, it's safe to assume that the offset / size of the relevant sk_buff fields does not change on <5.5).

@brb brb added the bug Something isn't working label Oct 21, 2021
@vincentmli
Copy link
Author

@vincentmli For your debugging you could revert 00de303 and build the tool yourself (please refer to README.md how to do that). Let me know if you have problems with this.

I think for older kernels we could rely on bpf_probe_read() and __sk_buff instead (UPDATE: the latter seems to be not available for kprobes. However, it's safe to assume that the offset / size of the relevant sk_buff fields does not change on <5.5).

@brb thanks, git revert has some conflicts so I manually changed the code, it works on 5.4, FYI, I got different output for my issue, do you see any problem there :) cilium/cilium#17528 (comment)

@zhangbo1882
Copy link

add a PR to fix it. #27

@brb brb added enhancement New feature or request help wanted Extra attention is needed and removed bug Something isn't working labels Jan 24, 2022
@brb
Copy link
Member

brb commented Jun 20, 2022

@vincentmli Just stumbled into the issue again, as I am able to run on 5.4 kernel (Ubuntu 20.04).

39: (85) call unknown#113 means that the following function was compiled out on your kernel:

static long (*bpf_probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) 113;

Could you attach your kernel configuration and bpftool feature output?

brb added a commit that referenced this issue Jun 20, 2022
We came to early to a conclusion regarding >= 5.5 min kernel. The tool
works on 5.3 too. See [1]

[1]: #21 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this issue Jun 20, 2022
We came to early to a conclusion regarding >= 5.5 min kernel. The tool
works on 5.3 too. See [1]

[1]: #21 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@vincentmli
Copy link
Author

@brb I attached bpftool feature and default ubuntu 5.4 kernel config, yes, it would be really nice to run pwru on default ubuntu 5.4 :)

bpftool-feature.txt
config-5.4.0-117-generic.txt

@brb
Copy link
Member

brb commented Jun 20, 2022

@vincentmli Thanks. Interesting, you might be running into the lockdown issues (iovisor/bcc#2565). I am running on the following:

vagrant@vagrant:~$ uname -a
Linux vagrant 5.4.0-110-generic #124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
vagrant@vagrant:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Could you try running pwru and then attaching the dmesg output?

@vincentmli
Copy link
Author

# pwru version
2022/06/20 15:14:58 Loading objects: field KprobeSkb1: program kprobe_skb_1: load program: invalid argument: ; int kprobe_skb_1(struct pt_regs *ctx) {
.........
39: (85) call unknown#113
invalid func unknown#113
processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

dmesg.txt

by the way, i tried to re-build pwru with most recent master branch, I got error

[root@centos-dev pwru]# make
go generate
Generating for amd64
# github.com/cilium/ebpf
vendor/github.com/cilium/ebpf/marshalers.go:102:10: undefined: unsafe.Slice
main_amd64.go:5: running "go": exit status 2
make: *** [Makefile:15: pwru] Error 1

@vincentmli
Copy link
Author

also fyi, https://github.com/ehids/ecapture and cilium tetragon runs fine on the same ubuntu

@vincentmli
Copy link
Author

vincentmli commented Jun 20, 2022

also fyi, https://github.com/ehids/ecapture and cilium tetragon runs fine on the same ubuntu

I guess these two project not involving (*bpf_probe_read_kernel)

@vincentmli
Copy link
Author

so far I am unable to find evidence that my ubuntu VM is in lockdown mode or not after reading through online resources :)

@vincentmli
Copy link
Author

[root@centos-dev pwru]# make
go generate
Generating for amd64
# github.com/cilium/ebpf
vendor/github.com/cilium/ebpf/marshalers.go:102:10: undefined: unsafe.Slice
main_amd64.go:5: running "go": exit status 2
make: *** [Makefile:15: pwru] Error 1

I need to upgrade golang to 1.18.3 and above issue is resolved

@vincentmli
Copy link
Author

@brb the issue is resolved after I build most recent pwru from mater branch, it might be because I am using an old pwru on this new installed ubuntu 20.04

@brb
Copy link
Member

brb commented Jun 21, 2022

the issue is resolved after I build most recent pwru from mater branch

Do you mean that pwru is able to run on your machine with 5.4 kernel?

@vincentmli
Copy link
Author

Do you mean that pwru is able to run on your machine with 5.4 kernel?

correct

@brb
Copy link
Member

brb commented Jun 21, 2022

Cool, then closing this issue!

@brb brb closed this as completed Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants