Notes on various debugging tools and techniques.
See perf.md
file.
A nice GUI utility with graphs and stuff, supporting both system-wide and not profiling, profiling CPU, memory allocations, IO accesses, power consumption, and other things. Can be used without GUI with sysprof-cli
As of latest nice tutorial is here https://blogs.gnome.org/chergert/2020/03/14/how-to-use-sysprof-to/ and https://blogs.gnome.org/chergert/2020/03/15/how-to-use-sysprof-to-part-ii/
See gdb.md
file.
Well integrated with gdb, so you can use the usual gdb commands.
For faster execution either set sudo sysctl -w kernel.perf_event_paranoid=1
, or make sure to have CAP_SYS_ADMIN
on your user.
Dir to save record is _RR_TRACE_DIR
(from what I saw in the source, you can't modify the record name, only the directory). I set it to _RR_TRACE_DIR=./
rr record app app_args
to recordrr replay
to play the last record (one pointed bylatest-trace
symlink)
Cool stuff you can do:
- run record till the point, where you know, some value is wrong. Then watchpoint it
wa -l some_value
, and run reverse-execution until the watchpoint gets hit:rc
. - sometimes watchpoints may cause more confusion, even combined with reverse-execution. Instead you can exploit another feature: every
rr replay
has same memory addresses. So you can set a bunch of breakpoints in suspicious places, and acommands
to each one that e.g. only stops when variable has specific memory address.
In general it can be found here https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm In short:
- sanitizer uses some "shadow memory", where each byte shows state of 8-bytes block of app's memory.
- before or after every app's memory accesses inserted a guard that checks validity of the access.
- my experiments show, by default the "shadow memory" always resides at address
0x7fff8000
. This address is mentioned in asan docs, and is hardcoded in the compiler-generated code.
While debugging a complex sanitizer bug I wrote down below the actual implementation of stack-access checks by GCC 8.3.0 version.
Given this code:
arr[i] = 0;
The assembly snip my GCC generated is:
movq -232(%rbp), %rdx # arr, tmp127
movq -240(%rbp), %rax # i, tmp128
leaq (%rdx,%rax), %rcx #, _1
# ../iscsi/asan_testcase.c:23: arr[i] = 0;
.loc 1 23 16 discriminator 3
movq %rcx, %rax # _1, _19
movq %rax, %rdx # _19, _20
shrq $3, %rdx #, _20
addq $2147450880, %rdx #, _21
movzbl (%rdx), %edx # *_22, _23
testb %dl, %dl # _23
setne %sil #, _24
movq %rax, %rdi # _19, _25
andl $7, %edi #, _25
cmpb %dl, %dil # _23, _26
setge %dl #, _27
andl %esi, %edx # _24, _28
testb %dl, %dl # _28
je .L6 #,
# ../iscsi/asan_testcase.c:23: arr[i] = 0;
.loc 1 23 16 is_stmt 0
movq %rax, %rdi # _19,
call __asan_report_store1@PLT #
.L6:
# ../iscsi/asan_testcase.c:23: arr[i] = 0;
.loc 1 23 16 discriminator 3
movb $0, (%rcx) #, *_1
pseudo-Haskell breakdown:
let ptr_shadow_mem_base = Ptr 0x7fff8000
let ptr_arr_elem = address arr + i -- arr and i appears somewhere higher
ptr_shadow_arr = (binary_shift_r 3 ptr_arr_elem) + ptr_shadow_mem_base
shadow_byte = get_byte ptr_shadow_arr
is_shadow_byte_set = shadow_byte != 0
smthng4 = shadow_byte >= (ptr_arr_elem & 7)
if is_shadow_byte_set != smthng4 then
__asan_report_store1 ptr_arr_elem
else
store_at 0 ptr_arr_elem
Works as follows: records every allocation/free, then before the end of the process scans entire process memory for pointers to recorded memory. Reports memory without pointers to it as leaked.
- silently doesn't work under debugger. Including hooks like
__lsan_do_recoverable_leak_check
(calling which may result in weird behavior such as exiting the process). - leak detection may be run manually by calling
__lsan_do_recoverable_leak_check()
or__lsan_do_leak_check()
, but note that the latter is confusing. It does not shut down the process, but instead it makes it so any further calls to leak checks (including one that sanitizer will always do at the end of the process) will no longer work. Better use__lsan_do_recoverable_leak_check()
. - "out-of-scope" leaks are mostly undetected. That is, if a pointer is still in memory but no longer reachable, it won't be reported. The only exception is if "out-of-scope" pointer is inside a function the code returned from and
detect_stack_use_after_return=1
set (default since GCC-13). - a possible reason for sporadic leaks may be that a thread was cancelled and didn't free memory nobody else has access to. Since threads have shared process memory, it is a valid leak for other threads.
Probes, kprobes, tracepoint, etc. Refer to the table https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#terminology
An analog to Solaris dtrace
. Syntax and options are different though, so scripts have to be ported between them.
bpftrace
supports passing arguments to the script on command line. To print the first one, useprintf("%s", str($1));
. It's funny, that$1
by default is an integer, i.e. you can print it with%d
and you can not do it with%s
. But applying astr()
to it magically converts it from an integer to string. No idea.- Missing prints: are usually caused by lack of
\n
at printf call. bpftrace has printfs newline-buffered, so it won't appear until either ^C or newline.
- Trace all processes created in the system:
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_exec*{ printf("pid: %d, comm: %s, args: ", pid, comm); join(args->argv); }'
- Trace all threads created in the system:
sudo bpftrace -e 'kprobe:_do_fork{ printf("pid = %d, comm = %s\n", pid, comm); }'
- Trace all
connect()
calls made in the system:sudo bpftrace -e 'tracepoint:syscalls:sys_enter_connect{ printf("pid = %d, comm = %s\n", pid, comm); }'
Example strace -fe "openat,open" -T ls
. -f
is for "follow forks", -e
limits tracing to specific calls, -T
shows time calls took in seconds resolution.