Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seastar-addr2line can take minutes to decode a backtrace #1035

Closed
travisdowns opened this issue Mar 31, 2022 · 9 comments · Fixed by redpanda-data/seastar#18
Closed

seastar-addr2line can take minutes to decode a backtrace #1035

travisdowns opened this issue Mar 31, 2022 · 9 comments · Fixed by redpanda-data/seastar#18

Comments

@travisdowns
Copy link
Contributor

On a fairly vanilla Redpanda backtrace with about about a dozen decoded frames took 16 minutes to decode. Most frames take more than a minute alone.

I traced this down a quadratic algorithm in addr2line (bug filed) combined with many function and inline function debug info sections in some source files (e.g., reactor.cc results in a about 114,000 entries for functions and inlined functions so any quadratic algo processing this many elements is going to hurt).

My fix (I can submit a patch if you'd like) is to extend the script slightly to support the use of llvm-addr2line which doesn't have this problem: my 16-minute decode was reduced to less than 4 seconds.

travisdowns added a commit to travisdowns/seastar that referenced this issue Mar 31, 2022
Prior to this change the addr2line.py script always uses addr2line as the
binary to decode stack frames. This change allows the path and/or name
of the binary to be provided explicitly on the command line.

This allows the of llvm-addr2line, an alternative implementation
based on llvm-symbolizer, which in my experiments is over *200* times as
fast as addr2line in decoding some redpanda stack traces.

See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more
on the addr2line slowness.

This change also slightly changes the 'dummy line' strategy used to
detect when addr2line has finished outputting frames for the current
address to make it compatible with llvm-addr2line.

Fixes scylladb#1035.
@nyh
Copy link
Contributor

nyh commented Mar 31, 2022

Nice catch! Sounds like a good idea. Even better if the script can try llvm-addr2line, but fall back to the regular addr2line if llvm-addr2line doesn't exist (e.g., somone is using gcc only).

@travisdowns
Copy link
Contributor Author

travisdowns commented Mar 31, 2022

Nice catch! Sounds like a good idea. Even better if the script can try llvm-addr2line, but fall back to the regular addr2line if llvm-addr2line doesn't exist (e.g., somone is using gcc only).

I'm happy to implement it that way (currently I have it so you use a command line argument to point to your desired addr2line implementation, which also lets you provide a patched binutils addr2line if you wish), though I cannot be sure that llvm-addr2line is in all ways equivalent to addr2line (I did notice some small differences in output which I accomodate in the patch). So there's some risk that this approach would break some people/workflows.

@travisdowns
Copy link
Contributor Author

BTW, I am also curious why Scylla hasn't run into this, I've been asking about that on slack, I'll update here too if there's a conclusion.

@psarna
Copy link
Contributor

psarna commented Apr 1, 2022

BTW, I am also curious why Scylla hasn't run into this, I've been asking about that on slack, I'll update here too if there's a conclusion.

As you discovered in https://sourceware.org/bugzilla/show_bug.cgi?id=29010#c2, if the regression was introduced in addr2line 2.36, then we simply haven't started using it that much, because Fedora 34's default is 2.35. It was only a matter of time, so thanks for solving the problem ahead of time :)

travisdowns added a commit to redpanda-data/seastar that referenced this issue Apr 7, 2022
Prior to this change the addr2line.py script always uses addr2line as the
binary to decode stack frames. This change allows the path and/or name
of the binary to be provided explicitly on the command line.

This allows the of llvm-addr2line, an alternative implementation
based on llvm-symbolizer, which in my experiments is over *200* times as
fast as addr2line in decoding some redpanda stack traces.

See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more
on the addr2line slowness.

This change also slightly changes the 'dummy line' strategy used to
detect when addr2line has finished outputting frames for the current
address to make it compatible with llvm-addr2line.

Fixes scylladb#1035.
@nyh
Copy link
Contributor

nyh commented Apr 10, 2022

As far as I can tell, this issue was incorrectly closed when a fix was merged into a Seastar fork (redpanda-data/seastar), not the upstream Seastar project (this one). So I'm reopening.

@nyh nyh reopened this Apr 10, 2022
@travisdowns
Copy link
Contributor Author

@psarna - right and thanks for updating this issue with our conclusion from other thread.

@nyh - good catch, that was not my intent at all. It was closed automatically by github after I merged the fix to our fork, but I wouldn't expect it to close this issue on upstream. I'm not sure if this is by design or a bug in GitHub (quite an obvious one to remain unfixed so long if is a bug, though). I've asked about it on the forums.

Re-opening this makes sense: I plan to submit a PR.

I guess as a workaround we should only mention issues like issue #1035 not fix them like fixes #1035, to avoid the auto-close, the close issues by hand with a note after the PR, if any, is accepted.

@avikivity
Copy link
Member

Can one just spam-fix all issues by cloning the repo and committing

Fixes #1
Fixes #2
...
Fixes #999999

?

@psarna
Copy link
Contributor

psarna commented Apr 11, 2022

@avikivity I think (and deeply hope) that it's only possible because Travis is also the author of the issue he closed.

@travisdowns
Copy link
Contributor Author

@psarna - yeah, it would be a straight-up vulnerability if I could close issues via this trick that I couldn't otherwise close.

We can do an experiment: I'm about to submit a change to our fork that decodes the kernel backtraces in the dumps, maybe someone else can create a seastar issue for it, so we see if that gets closed when I submit to our fork?

Suggested text could be something like:

Currently, seastar-addr2line doesn't decode the kernel callstack: sections if present. We can decode these sections using /proc/kallsyms, though this only works until the next reboot if KASLR is enabled (and is not portable across machines).

travisdowns added a commit to travisdowns/seastar that referenced this issue Apr 11, 2022
Prior to this change the addr2line.py script always uses addr2line as the
binary to decode stack frames. This change allows the path and/or name
of the binary to be provided explicitly on the command line.

This allows the of llvm-addr2line, an alternative implementation
based on llvm-symbolizer, which in my experiments is over *200* times as
fast as addr2line in decoding some redpanda stack traces.

See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more
on the addr2line slowness.

This change also slightly changes the 'dummy line' strategy used to
detect when addr2line has finished outputting frames for the current
address to make it compatible with llvm-addr2line.

Fixes scylladb#1035.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants