Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious eBPF driver regression starting with 0.16.x libs #1890

Closed
incertum opened this issue May 30, 2024 · 10 comments · Fixed by #1896
Closed

Serious eBPF driver regression starting with 0.16.x libs #1890

incertum opened this issue May 30, 2024 · 10 comments · Fixed by #1896
Labels
kind/bug Something isn't working

Comments

@incertum
Copy link
Contributor

incertum commented May 30, 2024

When I last forked off usptream (libs 0.15.x) the eBPF driver (legacy and modern_ebpf) continued to work wonderfully as they did did for almost 1.5 years.

However, now there seem to be serious issues and regressions for older kernels, but I have also observed issues with modern_ebpf depending on the compiler version used.

I tried clang 12, 14, 16, 18 with some updated builder containers. While the eBPF probes compiled I have either observed eBPF verifier issues (one time for sys_poll, another time for sys_preadv). I also observed scap errors during initialization or map type not allowed errors ... in summary, it does not appear to be very clear what the issue is, especially because there were a lot of changes.

@incertum incertum added the kind/bug Something isn't working label May 30, 2024
@incertum
Copy link
Contributor Author

Initially posted eBPF compiler issues for the VM test suite, however it was caused by commit 209243e

I can now run the test suite against the past releases for comparison and post here later.

@incertum
Copy link
Contributor Author

Ran the VM test suites against past 2 release branches. I observed few consistent verifier issues we might be able to address for the following tail calls:

  • sys_recvfrom_x
  • sys_readv_preadv_x
  • plus I had also observed issues with sys_poll filler (but not within the VM test suite)

release/0.16.x

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
3.10.0-1160.49.1.el7.x86_64 🔵
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵
6.5.0-060500-generic 🔵
6.5.8-1.el7.elrepo.x86_64 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢 🟢
6.5.0-060500-generic 🟢
6.5.8-1.el7.elrepo.x86_64 🟢 🟢 🟢

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_recvfrom_x: Operation not permitted (1)
[STATUS] FAILED /libs/test/vm/build/driver/clang-14/4.19.296-0419296-generic

release/0.17.x

clang-12 nothing ran anymore, manually added the red crosses below

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
3.10.0-1160.49.1.el7.x86_64 🔵
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵
6.5.0-060500-generic 🔵
6.5.8-1.el7.elrepo.x86_64 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢
6.5.0-060500-generic 🟢
6.5.8-1.el7.elrepo.x86_64 🟢 🟢 🟢

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1)

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.14.15-1.el7.elrepo.x86_64
[STATUS] DONE 5.14.15-1.el7.elrepo.x86_64

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1)

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.19.17-051917-generic
[STATUS] DONE 5.19.17-051917-generic

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1)
[STATUS] FAILED /libs/test/vm/build/driver/clang-16/4.19.296-0419296-generic
[STATUS] DONE 4.19.296-0419296-generic

@incertum
Copy link
Contributor Author

Re the scap errors I observed, I have to look more into it next week when I will more consistently go through test conditions. I had never encountered these type of scap errors related to the driver loading before.

@incertum
Copy link
Contributor Author

@leogr yes I was on branch release/0.17.x, see #1890 (comment)

@incertum
Copy link
Contributor Author

incertum commented Jun 5, 2024

@FedeDP here is the output for your PR. I just ran it for clang, primary issues seem fixed now, especially for clang-12.

I know you are still working on the 4.14 kernels improvements.

[Note that the output matrix auto-adjusts and for the 6.x kernels I need to add new / better builder containers as they just don't compile right now, so ignore that -> only focus on blue dot turning green or not]

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-15 clang-16
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-15 clang-16
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢 🟢

@incertum
Copy link
Contributor Author

incertum commented Jun 5, 2024

@FedeDP I tried updating the builder containers to check on the 6.5.0-060500 ubuntu test kernels, but I still get legitimate compile erros, not the fault of the builder container.

Edit: Exact same issues when compiling the eBPF probe for 6.5.8-1.el7.elrepo.x86_64 ...

Btw my IDE also highlights issues with that line (struct mm_struct *mm expression must have struct or union type but it has type "struct percpu_counter *"C/C++(154)).

/libs/build/driver/bpf/src/fillers.h:923:56: error: member reference base type 'struct percpu_counter[4]' is not a structure or union
        bpf_probe_read_kernel(&val, sizeof(val), &mm->rss_stat.count[member]);
                                                  ~~~~~~~~~~~~^~~~~~
/libs/build/driver/bpf/src/fillers.h:2447:48: warning: passing 'volatile long *' to parameter of type 'long *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
                res = bpf_accumulate_argv_or_env(data, argv, &args_len);
                                                             ^~~~~~~~~
/libs/build/driver/bpf/src/fillers.h:1985:19: note: passing argument to parameter 'args_len' here
                                                      long *args_len)
                                                            ^
1 warning and 1 error generated.
make[6]: *** [/libs/build/driver/bpf/src/Makefile:74: /libs/build/driver/bpf/src/probe.o] Error 1
make[5]: *** [/headers/6.5.0-060500-generic/usr/src/linux-headers-6.5.0-060500/Makefile:2038: /libs/build/driver/bpf/src] Error 2
make[4]: *** [Makefile:234: __sub-make] Error 2
make[4]: Leaving directory '/headers/6.5.0-060500-generic/usr/src/linux-headers-6.5.0-060500'
make[3]: *** [Makefile:23: all] Error 2
make[3]: Leaving directory '/libs/build/driver/bpf/src'
make[2]: *** [driver/bpf/CMakeFiles/bpf.dir/build.make:70: driver/bpf/CMakeFiles/bpf] Error 2
make[2]: Leaving directory '/libs/build'
make[1]: *** [CMakeFiles/Makefile2:646: driver/bpf/CMakeFiles/bpf.dir/all] Error 2
make[1]: Leaving directory '/libs/build'
make: Leaving directory '/libs/build/driver/bpf'
make: *** [Makefile:136: all] Error 2

@incertum
Copy link
Contributor Author

incertum commented Jun 5, 2024

Updated the test/vm setup #1897, right now on master, also given #1890 (comment) updates oddly changed a bit the matrix wrt what compiled, and the issue outlined here #1890 (comment) is now even more prevalent across multiple clang versions etc.

@FedeDP
Copy link
Contributor

FedeDP commented Jun 11, 2024

/libs/build/driver/bpf/src/fillers.h:923:56: error: member reference base type 'struct percpu_counter[4]' is not a structure or union
bpf_probe_read_kernel(&val, sizeof(val), &mm->rss_stat.count[member]);

We have a bpf configure module for this: https://github.com/falcosecurity/libs/tree/master/driver/bpf/configure/RSS_STAT_ARRAY
Ie: it should always be able to tell whether the rss array is present or not, and compile fine. Are you building against master? Perhaps your test suite is not running the bpf build through cmake?

@FedeDP
Copy link
Contributor

FedeDP commented Jun 11, 2024

I saw in test/vm/scripts/compile_drivers.sh that you are indeed configuring libs sources with cmake; that should work fine then!
But you are using

make LLC=${LLC} CLANG=${CLANG}
KERNELDIR=${SOURCES} -B -C "${LIBS_DIR}/build/driver/bpf" || true

${LIBS_DIR}/build/driver/bpf folder; locally, i need to use: ~/Work/libs/build/driver/bpf/src instead. Can it be the root cause? That folder was ok before we merged #1709

EDIT: i was wrong, it works fine from ~/Work/libs/build/driver/bpf fodler too, sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants