seastar-addr2line can take minutes to decode a backtrace #1035

travisdowns · 2022-03-31T19:42:05Z

On a fairly vanilla Redpanda backtrace with about about a dozen decoded frames took 16 minutes to decode. Most frames take more than a minute alone.

I traced this down a quadratic algorithm in addr2line (bug filed) combined with many function and inline function debug info sections in some source files (e.g., reactor.cc results in a about 114,000 entries for functions and inlined functions so any quadratic algo processing this many elements is going to hurt).

My fix (I can submit a patch if you'd like) is to extend the script slightly to support the use of llvm-addr2line which doesn't have this problem: my 16-minute decode was reduced to less than 4 seconds.

The text was updated successfully, but these errors were encountered:

Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.

nyh · 2022-03-31T19:55:52Z

Nice catch! Sounds like a good idea. Even better if the script can try llvm-addr2line, but fall back to the regular addr2line if llvm-addr2line doesn't exist (e.g., somone is using gcc only).

travisdowns · 2022-03-31T20:07:35Z

Nice catch! Sounds like a good idea. Even better if the script can try llvm-addr2line, but fall back to the regular addr2line if llvm-addr2line doesn't exist (e.g., somone is using gcc only).

I'm happy to implement it that way (currently I have it so you use a command line argument to point to your desired addr2line implementation, which also lets you provide a patched binutils addr2line if you wish), though I cannot be sure that llvm-addr2line is in all ways equivalent to addr2line (I did notice some small differences in output which I accomodate in the patch). So there's some risk that this approach would break some people/workflows.

travisdowns · 2022-03-31T21:33:25Z

BTW, I am also curious why Scylla hasn't run into this, I've been asking about that on slack, I'll update here too if there's a conclusion.

psarna · 2022-04-01T06:34:07Z

BTW, I am also curious why Scylla hasn't run into this, I've been asking about that on slack, I'll update here too if there's a conclusion.

As you discovered in https://sourceware.org/bugzilla/show_bug.cgi?id=29010#c2, if the regression was introduced in addr2line 2.36, then we simply haven't started using it that much, because Fedora 34's default is 2.35. It was only a matter of time, so thanks for solving the problem ahead of time :)

Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.

nyh · 2022-04-10T15:39:30Z

As far as I can tell, this issue was incorrectly closed when a fix was merged into a Seastar fork (redpanda-data/seastar), not the upstream Seastar project (this one). So I'm reopening.

travisdowns · 2022-04-10T20:28:38Z

@psarna - right and thanks for updating this issue with our conclusion from other thread.

@nyh - good catch, that was not my intent at all. It was closed automatically by github after I merged the fix to our fork, but I wouldn't expect it to close this issue on upstream. I'm not sure if this is by design or a bug in GitHub (quite an obvious one to remain unfixed so long if is a bug, though). I've asked about it on the forums.

Re-opening this makes sense: I plan to submit a PR.

I guess as a workaround we should only mention issues like issue #1035 not fix them like fixes #1035, to avoid the auto-close, the close issues by hand with a note after the PR, if any, is accepted.

avikivity · 2022-04-11T16:22:46Z

Can one just spam-fix all issues by cloning the repo and committing

Fixes #1
Fixes #2
...
Fixes #999999

?

psarna · 2022-04-11T16:41:16Z

@avikivity I think (and deeply hope) that it's only possible because Travis is also the author of the issue he closed.

travisdowns · 2022-04-11T18:04:38Z

@psarna - yeah, it would be a straight-up vulnerability if I could close issues via this trick that I couldn't otherwise close.

We can do an experiment: I'm about to submit a change to our fork that decodes the kernel backtraces in the dumps, maybe someone else can create a seastar issue for it, so we see if that gets closed when I submit to our fork?

Suggested text could be something like:

Currently, seastar-addr2line doesn't decode the kernel callstack: sections if present. We can decode these sections using /proc/kallsyms, though this only works until the next reboot if KASLR is enabled (and is not portable across machines).

Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.

travisdowns mentioned this issue Mar 31, 2022

Allow use of llvm-addr2line as the command redpanda-data/seastar#18

Merged

travisdowns closed this as completed in redpanda-data/seastar#18 Apr 7, 2022

nyh reopened this Apr 10, 2022

travisdowns mentioned this issue Apr 11, 2022

Allow use of llvm-addr2line as the command #1042

Closed

avikivity closed this as completed in d6abba4 Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seastar-addr2line can take minutes to decode a backtrace #1035

seastar-addr2line can take minutes to decode a backtrace #1035

travisdowns commented Mar 31, 2022

nyh commented Mar 31, 2022

travisdowns commented Mar 31, 2022 •

edited

Loading

travisdowns commented Mar 31, 2022

psarna commented Apr 1, 2022

nyh commented Apr 10, 2022

travisdowns commented Apr 10, 2022

avikivity commented Apr 11, 2022

psarna commented Apr 11, 2022

travisdowns commented Apr 11, 2022

seastar-addr2line can take minutes to decode a backtrace #1035

seastar-addr2line can take minutes to decode a backtrace #1035

Comments

travisdowns commented Mar 31, 2022

nyh commented Mar 31, 2022

travisdowns commented Mar 31, 2022 • edited Loading

travisdowns commented Mar 31, 2022

psarna commented Apr 1, 2022

nyh commented Apr 10, 2022

travisdowns commented Apr 10, 2022

avikivity commented Apr 11, 2022

psarna commented Apr 11, 2022

travisdowns commented Apr 11, 2022

travisdowns commented Mar 31, 2022 •

edited

Loading