Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket tracer unable to start on 6.10 and later kernels #2035

Closed
ddelnano opened this issue Oct 1, 2024 · 2 comments · Fixed by #2041
Closed

Socket tracer unable to start on 6.10 and later kernels #2035

ddelnano opened this issue Oct 1, 2024 · 2 comments · Fixed by #2041
Labels
area/datacollector Issues related to Stirling (datacollector)

Comments

@ddelnano
Copy link
Member

ddelnano commented Oct 1, 2024

Describe the bug
From working with someone in the community, I received a report that their OpenSUSE MicroOS instances were failing to start the socket tracer. Their PEMs fail to compile multiple BPF programs (as seen below). Their instances are running a 6.11 kernel while our latest kernel headers are 6.1.x.

I'm in the process of verifying that newer kernel headers resolves their problems, and if that's the case, our linux header kernels should be updated through 6.11.

Logs

pixie_logs_20241001120707.zip

The relevant logs from that PEM are the following:

E20241001 18:31:33.085537 91215 task_struct_resolver.cc:330] Internal : Unable to initialize BCC BPF program: Unable to initialize BPF program
I20241001 18:31:33.150326 91217 bcc_wrapper.cc:166] Initializing BPF program ...
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:33:27: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                        return BITS_PER_LONG - PAGE_SHIFT;
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:35:22: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                if (size < (1UL << PAGE_SHIFT))
                                   ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:38:30: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                return ilog2((size) - 1) - PAGE_SHIFT + 1;
                                           ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:42:11: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        size >>= PAGE_SHIFT;
                 ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/bpf_tools/bcc_bpf/task_struct_mem_read.c:24:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:14:
In file included from include/linux/pid.h:5:
In file included from include/linux/rculist.h:11:
In file included from include/linux/rcupdate.h:30:
arch/arm64/include/asm/processor.h:314:16: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        return addr < TASK_SIZE;
                      ^
arch/arm64/include/asm/processor.h:68:5: note: expanded from macro 'TASK_SIZE'
                                TASK_SIZE_32 : TASK_SIZE_64)
                                ^
arch/arm64/include/asm/processor.h:65:42: note: expanded from macro 'TASK_SIZE_32'
#define TASK_SIZE_32            (UL(0x100000000) - PAGE_SIZE)
                                                   ^
arch/arm64/include/asm/page-def.h:15:35: note: expanded from macro 'PAGE_SIZE'
#define PAGE_SIZE               (_AC(1, UL) << PAGE_SHIFT)
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/bpf_tools/bcc_bpf/task_struct_mem_read.c:24:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:32:
include/linux/mm_types_task.h:19:10: fatal error: 'asm/tlbbatch.h' file not found
#include <asm/tlbbatch.h>
         ^~~~~~~~~~~~~~~~
6 errors generated.
I20241001 18:31:34.551831 91217 scoped_timer.h:48] Timer(init_bpf_program) : 1.40 s
E20241001 18:31:34.551985 91217 task_struct_resolver.cc:330] Internal : Unable to initialize BCC BPF program: Unable to initialize BPF program
W20241001 18:31:34.552109 91084 bcc_wrapper.cc:149] Failed to obtain task_struct offsets, will not override the task_struct offsets, error: Internal : Resolution failed in subprocess. Check subprocess logs for the error.
I20241001 18:31:34.552258 91084 bcc_wrapper.cc:166] Initializing BPF program ...
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:33:27: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                        return BITS_PER_LONG - PAGE_SHIFT;
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:35:22: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                if (size < (1UL << PAGE_SHIFT))
                                   ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:38:30: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                return ilog2((size) - 1) - PAGE_SHIFT + 1;
                                           ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:42:11: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        size >>= PAGE_SHIFT;
                 ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/source_connectors/proc_exit/bcc_bpf/proc_exit_trace.c:24:
In file included from ./src/stirling/bpf_tools/bcc_bpf/task_struct_utils.h:26:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:14:
In file included from include/linux/pid.h:5:
In file included from include/linux/rculist.h:11:
In file included from include/linux/rcupdate.h:30:
arch/arm64/include/asm/processor.h:314:16: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        return addr < TASK_SIZE;
                      ^
arch/arm64/include/asm/processor.h:68:5: note: expanded from macro 'TASK_SIZE'
                                TASK_SIZE_32 : TASK_SIZE_64)
                                ^
arch/arm64/include/asm/processor.h:65:42: note: expanded from macro 'TASK_SIZE_32'
#define TASK_SIZE_32            (UL(0x100000000) - PAGE_SIZE)
                                                   ^
arch/arm64/include/asm/page-def.h:15:35: note: expanded from macro 'PAGE_SIZE'
#define PAGE_SIZE               (_AC(1, UL) << PAGE_SHIFT)
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/source_connectors/proc_exit/bcc_bpf/proc_exit_trace.c:24:
In file included from ./src/stirling/bpf_tools/bcc_bpf/task_struct_utils.h:26:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:32:
include/linux/mm_types_task.h:19:10: fatal error: 'asm/tlbbatch.h' file not found
#include <asm/tlbbatch.h>
         ^~~~~~~~~~~~~~~~
6 errors generated.

App information (please complete the following information):

  • Pixie version: v0.14.11
  • K8s cluster version:
  • Node Kernel version: 6.11.0-1-default
  • OS distro: openSUSE MicroOS
  • Browser version: N/A
@ddelnano
Copy link
Member Author

ddelnano commented Oct 2, 2024

After supplying one-off built kernel headers from #2036, this community user's ARM64 and x86 PEMs are still seeing BPF compilation issues. I ran one of the trace bpf tests in qemu with a 6.11.1 kernel and the new headers, and I'm able to reproduce the same error message that they have.

observability__vizier-pem-dv96n__pem.log
qemu_dns_trace_bpf_test.log

@ddelnano ddelnano changed the title Socket tracer unable to start on 6.11 kernel Socket tracer unable to start on 6.10 and later kernels Oct 3, 2024
@ddelnano
Copy link
Member Author

ddelnano commented Oct 7, 2024

I was able to track down the problem and upgrading bcc fixes the issue.

BCC has certain "virtual" files it includes behind the scenes. The compat/linux/virtual_bpf.h file in particular needs to be kept in sync with libbpf and matches the header guard of the include/uapi/linux/bpf.h file. This means that while our linux headers were updated, our older bcc install was inserting an older copy of the uapi/linux/bpf.h file -- one that didn't contain the bpf_wq declaration.

I need to double check that my rebasing of bcc's updated changes is correct, and update our fork (pixie-io/bcc) first, but I should be able to open a PR for this soon.

ddelnano added a commit that referenced this issue Oct 11, 2024
…er kernels (#2041)

Summary: Upgrade bcc and libbpf to fix BPF program compilation on 6.10
and later kernels

Bcc provides some
"[virtual](https://github.com/iovisor/bcc/blob/cb1ba20f4800f556dc940682ba7016c50bd0a3ac/src/cc/exported_files.cc#L28-L48)"
includes to BPF programs. The `compat/linux/virtual_bpf.h` file in
particular needs to be kept in sync with libbpf and matches the [header
guard](https://github.com/iovisor/bcc/blob/cb1ba20f4800f556dc940682ba7016c50bd0a3ac/src/cc/compat/linux/virtual_bpf.h#L9)
of the `include/uapi/linux/bpf.h` file. This means that while our linux
headers were updated, our older bcc install was inserting an older copy
of the `uapi/linux/bpf.h` file -- one that didn't contain the `bpf_wq`
declaration.

```
  include/linux/bpf.h:348:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_wq'
                  return sizeof(struct bpf_wq);
                         ^     ~~~~~~~~~~~~~~~
  include/linux/bpf.h:348:24: note: forward declaration of 'struct bpf_wq'
                  return sizeof(struct bpf_wq);
                                       ^
  include/linux/bpf.h:377:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_wq'
                  return __alignof__(struct bpf_wq);
                         ^          ~~~~~~~~~~~~~~~
  include/linux/bpf.h:377:29: note: forward declaration of 'struct bpf_wq'
                  return __alignof__(struct bpf_wq);
```

Note: while this fixes the 6.10 compilation issue, our 6.10 qemu build
fails without disabling [this
logic](https://github.com/pixie-io/pixie/blob/3c41d554215528e688328aef94192e696db617dc/src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc#L464-L472).
6.10 kernels added BPF token support. This changes the BPF permission
model slightly and causes the BPF instruction limit to be dependent on
the permissions of the BPF syscall caller ([linux
source](https://elixir.bootlin.com/linux/v6.11.1/source/kernel/bpf/syscall.c#L2757)).

This new BPF token logic coupled with our qemu setup, causes our 6.10
build to fallback to the 4096 instruction limit. I'll be addressing this
in #2040 and #2042. Those issues shouldn't block this change since that
loop limit code can be bypasses at runtime with our current cli flags.

Relevant Issues: Closes #2035

Type of change: /kind bugfix

Test Plan: Built 6.10 and 6.11 kernels and the associated linux headers
from #2036 and verified that a local qemu build passes
- [x] Verify `#ci:bpf-build-all-kernels` build passes

Changelog Message: Upgraded bcc and libbpf to support kernels 6.10 and
later

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datacollector Issues related to Stirling (datacollector)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant