-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seastar-addr2line can take minutes to decode a backtrace #1035
Comments
Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.
Nice catch! Sounds like a good idea. Even better if the script can try llvm-addr2line, but fall back to the regular addr2line if llvm-addr2line doesn't exist (e.g., somone is using gcc only). |
I'm happy to implement it that way (currently I have it so you use a command line argument to point to your desired |
BTW, I am also curious why Scylla hasn't run into this, I've been asking about that on slack, I'll update here too if there's a conclusion. |
As you discovered in https://sourceware.org/bugzilla/show_bug.cgi?id=29010#c2, if the regression was introduced in addr2line 2.36, then we simply haven't started using it that much, because Fedora 34's default is 2.35. It was only a matter of time, so thanks for solving the problem ahead of time :) |
Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.
As far as I can tell, this issue was incorrectly closed when a fix was merged into a Seastar fork (redpanda-data/seastar), not the upstream Seastar project (this one). So I'm reopening. |
@psarna - right and thanks for updating this issue with our conclusion from other thread. @nyh - good catch, that was not my intent at all. It was closed automatically by github after I merged the fix to our fork, but I wouldn't expect it to close this issue on upstream. I'm not sure if this is by design or a bug in GitHub (quite an obvious one to remain unfixed so long if is a bug, though). I've asked about it on the forums. Re-opening this makes sense: I plan to submit a PR. I guess as a workaround we should only mention issues like |
Can one just spam-fix all issues by cloning the repo and committing
? |
@avikivity I think (and deeply hope) that it's only possible because Travis is also the author of the issue he closed. |
@psarna - yeah, it would be a straight-up vulnerability if I could close issues via this trick that I couldn't otherwise close. We can do an experiment: I'm about to submit a change to our fork that decodes the kernel backtraces in the dumps, maybe someone else can create a seastar issue for it, so we see if that gets closed when I submit to our fork? Suggested text could be something like:
|
Prior to this change the addr2line.py script always uses addr2line as the binary to decode stack frames. This change allows the path and/or name of the binary to be provided explicitly on the command line. This allows the of llvm-addr2line, an alternative implementation based on llvm-symbolizer, which in my experiments is over *200* times as fast as addr2line in decoding some redpanda stack traces. See https://sourceware.org/bugzilla/show_bug.cgi?id=29010 for more on the addr2line slowness. This change also slightly changes the 'dummy line' strategy used to detect when addr2line has finished outputting frames for the current address to make it compatible with llvm-addr2line. Fixes scylladb#1035.
On a fairly vanilla Redpanda backtrace with about about a dozen decoded frames took 16 minutes to decode. Most frames take more than a minute alone.
I traced this down a quadratic algorithm in
addr2line
(bug filed) combined with many function and inline function debug info sections in some source files (e.g., reactor.cc results in a about 114,000 entries for functions and inlined functions so any quadratic algo processing this many elements is going to hurt).My fix (I can submit a patch if you'd like) is to extend the script slightly to support the use of
llvm-addr2line
which doesn't have this problem: my 16-minute decode was reduced to less than 4 seconds.The text was updated successfully, but these errors were encountered: