Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks-debugging: make it possible to get the backtrace of a task #32283

Merged
merged 1 commit into from
Dec 17, 2020

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jun 10, 2019

debugging-only, this adds some internal utility functions for introspecting task state while the process is stopped

this should work for any non-copy stack task,
although gdb puts in a special hook to the longjmp
which causes it to crash when gdb returns :/

to work around that gdb issue, you may want to
use the more compatible task switching backend:

diff --git a/src/julia_threads.h b/src/julia_threads.h
index 1da831bafe..8fe282d731 100644
--- a/src/julia_threads.h
+++ b/src/julia_threads.h
@@ -22,0 +23,2 @@
+#define JL_HAVE_UCONTEXT
+

This also exports the list of all live (currently running or suspended)
tasks which have real stacks (the non-copy-stack tasks) on the current
thread, so that they can be read from gdb or Julia.

@vtjnash vtjnash changed the title tasks-debugging: make is possible to get the backtrace of a task tasks-debugging: make it possible to get the backtrace of a task Jun 10, 2019
@vchuravy
Copy link
Member

As an aside, I would love to be able to send SIGQUIT to a julia program and dump the stacktraces for every task. It seems Go may be using both SIGABRT and SIGQUIT, and there is some precedent in JVM that SIGQUIT only prints the backtraces, but doesn't actually terminate the program.

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Jun 10, 2019
@vtjnash
Copy link
Member Author

vtjnash commented Jun 10, 2019

We've usually done that on SIGINFO (or SIGUSR1 for linux)

@vchuravy
Copy link
Member

The nice thing about SIGQUIT is that it is Ctrl+\ on Unix

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useful. I won't claim to follow what's really going on in jl_backtrace_fiber 😬

src/stackwalk.c Outdated Show resolved Hide resolved
src/task.c Outdated Show resolved Hide resolved
src/gc-stacks.c Outdated Show resolved Hide resolved
src/task.c Outdated Show resolved Hide resolved
src/task.c Outdated Show resolved Hide resolved
@staticfloat
Copy link
Member

What needs to be done here to get this merged in? I would really like to be able to debug my tasks to see if there are tasks escaping, taking up memory, etc...

@StefanKarpinski
Copy link
Member

Bump. Having this would be really useful for debugging pkg server issues which are fairly urgent for 1.5 (yes, this won't be in 1.5 but the server doesn't have to run 1.5, it's the fact that once 1.5 is released, all Pkg clients will talk to the server by default that makes this urgent).

@vtjnash
Copy link
Member Author

vtjnash commented Jul 16, 2020

Okay, I took a slightly different approach and fixed the JL_HAVE_UNW_CONTEXT backend, and made that the default for most platforms. The downside is that it's not quite as fast:

julia> t = @task while true; yieldto(Base.roottask); end; @btime yieldto($t)
  562.092 ns (0 allocations: 0 bytes) # PR
  179.916 ns (0 allocations: 0 bytes) # MASTER

compare with #13099 (comment) —this gives up most of the speed up we got from avoiding COPY_STACKS.

But compare also to:

julia> @btime Base.process_events()
  493.830 ns (0 allocations: 0 bytes)

which we do on (nearly) every task switch, and the impact should not be too bad overall.

With additional effort per-platform (already done on Windows, where libunwind doesn't exist), we can resume using the current setjmp.

@staticfloat
Copy link
Member

What constitutes a 'live' task?

julia> ccall(:jl_live_tasks, Vector, ())
3-element Vector{Any}:
 Task (runnable) @0x00007f8905200010
 Task (runnable) @0x00007f8905200450
 Task (runnable) @0x00007f890599fb90

julia> @async nothing
Task (done) @0x00007f8908950450

julia> ccall(:jl_live_tasks, Vector, ())
4-element Vector{Any}:
 Task (runnable) @0x00007f8905200010
 Task (runnable) @0x00007f8905200450
 Task (runnable) @0x00007f890599fb90
 Task (done) @0x00007f8908950450

I was kind of expecting that 4th task to not be returned.

@vtjnash
Copy link
Member Author

vtjnash commented Dec 16, 2020

okay, I've fixed that (removed dead tasks), though linux performance cost to making this the default is seeming unacceptable there (the numbers above were for masOS):

julia> t = @task while true; yieldto(Base.roottask); end; @btime yieldto($t)
  198.808 ns (0 allocations: 0 bytes) # master
  12.426 μs (0 allocations: 0 bytes) # PR

profile shows that libunwind is doing a lot of dumb stuff inside unw_resume.

This should work for any non-copy stack task. To make it work better,
this now switches to the JL_HAVE_UNW_CONTEXT by default for some
platforms.

Also export the list of all live (currently running or suspended)
tasks which have real stacks (the non-copy-stack tasks) which were
started by the current thread.
@vtjnash vtjnash merged commit 5327824 into master Dec 17, 2020
@vtjnash vtjnash deleted the jn/task-bt branch December 17, 2020 21:41
__tsan_destroy_fiber(ctx->tsan_state);
static inline void tsan_destroy_ctx(jl_ptls_t ptls, void *state) {
if (state != &ptls->root_task->state) {
__tsan_destroy_fiber(ctx->state);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line, and line 60, refer to ctx, but that variable is no longer defined, so this cannot possibly compile. What's supposed to happen here?

ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
…iaLang#32283)

This should work for any non-copy stack task. To make it work better,
this now switches to the JL_HAVE_UNW_CONTEXT by default for some
platforms.

Also export the list of all live (currently running or suspended)
tasks which have real stacks (the non-copy-stack tasks) which were
started by the current thread.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants