Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make profiling more robust with many tasks #42978

Merged
merged 14 commits into from
Nov 14, 2021
Merged

Conversation

tkf
Copy link
Member

@tkf tkf commented Nov 7, 2021

As I mentioned in #42973, there seems to be a deadlock problem in the profiler. A bit of rring points to that there seems to be a problem in lock ordering of jl_lock_profile (threadsafe) and jl_thread_suspend_and_get_state.

Here's an illustrative rr replay session of the execution recorded by OPENBLAS_NUM_THREADS=1 rr record --num-cores=8 julia-debug script.jl and terminated by SIGHUP once the process is idle.

Thread 1 received signal SIGHUP, Hangup.
0x0000000070000002 in ?? ()
(rr) i thr
  Id   Target Id         Frame
* 1    Thread 72033.72033 (mmap_hardlink_3) 0x0000000070000002 in ?? ()
  2    Thread 72033.72034 (mmap_hardlink_3) 0x0000000070000002 in ?? ()
(rr) where
#0  0x0000000070000002 in ?? ()
#1  0x00007f1052731cb3 in _raw_syscall () at /home/arakaki/repos/watch/rr/src/preload/raw_syscall.S:120
...
#10 __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x7f1050919bc0 <threadsafe>) at pthread_rwlock_common.c:803
#11 __GI___pthread_rwlock_wrlock (rwlock=0x7f1050919bc0 <threadsafe>) at pthread_rwlock_wrlock.c:27
#12 0x00007f10505de692 in uv_rwlock_wrlock (rwlock=0x7f1050919bc0 <threadsafe>) at src/unix/thread.c:461
#13 0x00007f1050536345 in jl_profile_atomic<JuliaJITEventListener::_NotifyObjectEmitted(const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&, llvm::RTDyldMemoryManager*)::<lambda()> >(JuliaJITEventListener::<lambda()>) (
    f=...) at /home/arakaki/repos/watch/julia/src/debuginfo.cpp:73
#14 0x00007f1050538520 in JuliaJITEventListener::_NotifyObjectEmitted (this=0x562d04718690, Object=..., L=...,
    memmgr=0x562d047484d0) at /home/arakaki/repos/watch/julia/src/debuginfo.cpp:393
#15 0x00007f1050532faa in ORCNotifyObjectEmitted (Listener=0x562d04718690, Object=..., L=..., memmgr=0x562d047484d0)
    at /home/arakaki/repos/watch/julia/src/debuginfo.cpp:426
...
(rr)
(rr) thr 2
[Switching to thread 2 (Thread 72033.72034)]
#0  0x0000000070000002 in ?? ()
(rr) where
#0  0x0000000070000002 in ?? ()
#1  0x00007f1052731cb3 in _raw_syscall () at /home/arakaki/repos/watch/rr/src/preload/raw_syscall.S:120
...
#12 0x00007f10513c99e8 in jl_thread_suspend_and_get_state (tid=0, ctx=0x7f104bbe4cd8)
    at /home/arakaki/repos/watch/julia/src/signals-unix.c:375
#13 0x00007f10513ca3d4 in signal_listener (arg=0x0) at /home/arakaki/repos/watch/julia/src/signals-unix.c:760
#14 0x00007f10523116db in start_thread (arg=0x7f104bbe5700) at pthread_create.c:463
#15 0x00007f105203a71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

As you can see, thread 1 is executing jl_profile_atomic:

julia/src/debuginfo.cpp

Lines 393 to 406 in 564ddfe

jl_profile_atomic([&]() {
if (codeinst)
linfomap[Addr] = std::make_pair(Size, codeinst->def);
if (first) {
ObjectInfo tmp = {&debugObj,
(size_t)SectionSize,
(ptrdiff_t)(SectionAddr - SectionLoadAddr),
*Section,
nullptr,
};
objectmap[SectionLoadAddr] = tmp;
first = false;
}
});

and thread 2 is executing jl_thread_suspend_and_get_state:

julia/src/signals-unix.c

Lines 753 to 760 in 564ddfe

jl_lock_profile();
if (!critical)
jl_shuffle_int_array_inplace(profile_round_robin_thread_order, jl_n_threads, &profile_cong_rng_seed);
for (int idx = jl_n_threads; idx-- > 0; ) {
// Stop the threads in the random round-robin order.
int i = critical ? idx : profile_round_robin_thread_order[idx];
// notify thread to stop
jl_thread_suspend_and_get_state(i, &signal_context);

My patch 71041f7 moves jl_lock_profile after jl_thread_suspend_and_get_state to avoid suspending the worker thread while it's waiting for the profiler threadsafe lock.

close #42975

@tkf tkf added backport 1.7 kind:bugfix This change fixes an existing bug domain:multithreading Base.Threads and related functionality labels Nov 7, 2021
@tkf tkf requested a review from vtjnash November 7, 2021 00:34
src/signals-unix.c Show resolved Hide resolved
@tkf tkf removed the request for review from vtjnash November 7, 2021 06:32
@tkf tkf closed this Nov 7, 2021
@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 8, 2021

Definite no to this, but it has appeared to me that we need to make jl_thread_suspend_and_get_state execute a loop, since there seem to be conditions where that signal gets lost (I noticed that happening particularly under TSAN, for example)

@tkf
Copy link
Member Author

tkf commented Nov 8, 2021

Yeah, I closed the issue since I realized I might be breaking the invariance that the lock is meant to protect.

Can a signal get lost? It explains the behavior I see, but, if so, we can't call pthread_cond_wait before confirming that the worker thread received the signal, right? Otherwise, there's nothing to wake up the signal listener thread. So would it mean to spin in the signal listener thread until an atomic flag is set by the worker thread?

Somewhat aside and I'm no expert in signal programming, but aren't there too many synchronizations in the signal handler? Why not fill the profiling buffer in a lock-free manner directly in the signal handler of each worker? Maybe a naive thought, but it seems to simplify a lot of tricky communications.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 9, 2021

A signal cannot get lost by the kernel, but some libc APIs may discard them, and ptrace especially may end up in them being accidentally lost. Yes, we would likely switch to pthread_cond_timedwait.

That would be generally valid also, but less flexible.

This makes it difficult for `usr2_handler` to observe null ptls.
It makes sure the current task to have valid ptls.
@tkf tkf reopened this Nov 11, 2021
@tkf tkf changed the title Take profile lock after suspending thread Make profiling more robust with many tasks Nov 11, 2021
@tkf
Copy link
Member Author

tkf commented Nov 11, 2021

I think it'd be nice to separate the commits for the edge case handling (use pthread_cond_timedwait e001bc7) and the improvement for updating ptls (do lastt->ptls = NULL after jl_set_pgcstack(&t->gcstack) 17a70d2). But the history of this PR is now rather cluttered. Should I rebase things to create a tidy git history here or open a different PR? (Squashing this is OK too though.)

@tkf
Copy link
Member Author

tkf commented Nov 11, 2021

Or we could just revert e001bc7 and also #42973 if we assume 17a70d2 fixed all the bugs here.

Comment on lines -427 to -438
lastt->ptls = NULL;
}

// set up global state for new task and clear global state for old task
t->ptls = ptls;
jl_atomic_store_relaxed(&ptls->current_task, t);
JL_GC_PROMISE_ROOTED(t);
jl_signal_fence();
jl_set_pgcstack(&t->gcstack);
jl_signal_fence();
lastt->ptls = NULL;
#ifdef MIGRATE_TASKS
ptls->previous_task = lastt;
#endif
jl_set_pgcstack(&t->gcstack);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a debugging session today, Jameson figured out that the request was not delivered because we are hitting if (ptls == NULL) branch in usr2_handler

julia/src/signals-unix.c

Lines 469 to 471 in 569d56f

jl_ptls_t ptls = ct->ptls;
if (ptls == NULL)
return;

It then would "look like" the signal is lost (but actually we were just ignoring it).

The fix was to ensure setting lastt->ptls = NULL after jl_set_pgcstack(&t->gcstack). Since lastt is still the current task until jl_set_pgcstack(&t->gcstack) takes effect, we were previously observing jl_get_current_task()->ptls == NULL. This is fixed by the above patch.

test/threads.jl Outdated Show resolved Hide resolved
test/threads.jl Outdated
@@ -147,3 +147,39 @@ end

# We don't need the watchdog anymore
close(proc.in)

# https://github.com/JuliaLang/julia/pull/42973
@testset "spawn and wait *a lot* of tasks in @profile" begin
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@testset "spawn and wait *a lot* of tasks in @profile" begin
Sys.islinux() && @testset "spawn and wait *a lot* of tasks in @profile" begin

perhaps limit this to linux, then merge, and open a new PR to remove the conditional and then we can work on any issues found by other platforms?

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@tkf tkf merged commit 8131580 into JuliaLang:master Nov 14, 2021
@tkf tkf deleted the fix-spawnmany branch November 14, 2021 01:12
KristofferC pushed a commit that referenced this pull request Nov 15, 2021
This patch includes two sets of changes.

(1) `jl_thread_suspend_and_get_state` uses `pthread_cond_timedwait` to
recover from the case where the request is not received by the signal
handler. This is required because `usr2_handler` contains some paths
for the case where it is not possible to obtain `ptls`.

(2) `ctx_switch` now makes sure to null out `ptls` of the last task
(`lastt->ptls = NULL`) after changing the current task by updating
pgcstack (`jl_set_pgcstack(&t->gcstack)`).  This closes the gap in
which `usr2_handler` can observe the null `ptls`.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
(cherry picked from commit 8131580)
daviehh pushed a commit to daviehh/julia that referenced this pull request Nov 16, 2021
This patch includes two sets of changes.

(1) `jl_thread_suspend_and_get_state` uses `pthread_cond_timedwait` to
recover from the case where the request is not received by the signal
handler. This is required because `usr2_handler` contains some paths
for the case where it is not possible to obtain `ptls`.

(2) `ctx_switch` now makes sure to null out `ptls` of the last task
(`lastt->ptls = NULL`) after changing the current task by updating
pgcstack (`jl_set_pgcstack(&t->gcstack)`).  This closes the gap in
which `usr2_handler` can observe the null `ptls`.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
(cherry picked from commit 8131580)
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
This patch includes two sets of changes.

(1) `jl_thread_suspend_and_get_state` uses `pthread_cond_timedwait` to
recover from the case where the request is not received by the signal
handler. This is required because `usr2_handler` contains some paths
for the case where it is not possible to obtain `ptls`.

(2) `ctx_switch` now makes sure to null out `ptls` of the last task
(`lastt->ptls = NULL`) after changing the current task by updating
pgcstack (`jl_set_pgcstack(&t->gcstack)`).  This closes the gap in
which `usr2_handler` can observe the null `ptls`.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
This patch includes two sets of changes.

(1) `jl_thread_suspend_and_get_state` uses `pthread_cond_timedwait` to
recover from the case where the request is not received by the signal
handler. This is required because `usr2_handler` contains some paths
for the case where it is not possible to obtain `ptls`.

(2) `ctx_switch` now makes sure to null out `ptls` of the last task
(`lastt->ptls = NULL`) after changing the current task by updating
pgcstack (`jl_set_pgcstack(&t->gcstack)`).  This closes the gap in
which `usr2_handler` can observe the null `ptls`.

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality kind:bugfix This change fixes an existing bug profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants