-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ebpf): treat sched_process_exit corner case #4557
Conversation
7882acb
to
bf9fd8c
Compare
@geyslan did you understand why -237 was showing up as the syscall number? From what I understand it should always be -1 when not in syscall context |
I believe this is due to kernel signal handling. These syscalls (futex, nanosleep) are triggered by the Go runtime, and on their return, the process group has already received a signal. Relevant references: arch/x86/kernel/signal.c#L333-L341 --- EDIT
Since --- EDIT Some debugging output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The sched_process_exit event may be triggered by a standard exit, such as a syscall, or by alternative kernel paths, making it unsafe to assume that it is always associated with a syscall exit. do_exit and do_exit_group, while typically invoked by the exit and exit_group syscalls, can also be reached through internal kernel mechanisms such as signal handling. A concrete example of this occurs when a syscall returns, enters signal handling, and subsequently calls do_exit after get_signal. Both get_signal and do_exit involve tracepoints. A real execution flow illustrating this scenario in the kernel is as follows: entry_SYSCALL_64 ├── do_syscall_64 ├── syscall_exit_to_user_mode ├── __syscall_exit_to_user_mode_work ├── exit_to_user_mode_prepare ├── exit_to_user_mode_loop ├── arch_do_signal_or_restart ├── get_signal (has signal_deliver tracepoint) ├── do_group_exit └── do_exit (has sched_process_exit tracepoint)
bf9fd8c
to
cccaf7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/fast-forward |
Close: #4558
1. Explain what the PR does
cccaf7f fix(ebpf): treat sched_process_exit corner cases
2. Explain how to test it
Run this before on main to get sporadic errors as negative syscall numbers (triggered by signals):
INSTTESTS="WRITABLE_DATA_SOURCE" ./tests/e2e-inst-test.sh
After that, test this PR by running the same command above and make sure that there's no error since
sched_process_exit
will just submitNO_SYSCALL
in such cases.3. Other comments