Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up atomic access to sleep_check_state #36785

Merged
merged 1 commit into from
Jul 24, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 12 additions & 14 deletions src/partr.c
Original file line number Diff line number Diff line change
Expand Up @@ -330,13 +330,10 @@ static int sleep_check_after_threshold(uint64_t *start_cycles)
static void wake_thread(int16_t tid)
{
jl_ptls_t other = jl_all_tls_states[tid];
if (jl_atomic_load(&other->sleep_check_state) != not_sleeping) {
int16_t state = jl_atomic_exchange(&other->sleep_check_state, not_sleeping); // prohibit it from sleeping
if (state == sleeping) { // see if it was possibly sleeping before now
uv_mutex_lock(&other->sleep_lock);
uv_cond_signal(&other->wake_signal);
uv_mutex_unlock(&other->sleep_lock);
}
if (jl_atomic_bool_compare_exchange(&other->sleep_check_state, sleeping, not_sleeping)) {
uv_mutex_lock(&other->sleep_lock);
uv_cond_signal(&other->wake_signal);
uv_mutex_unlock(&other->sleep_lock);
}
}

Expand All @@ -358,7 +355,7 @@ JL_DLLEXPORT void jl_wakeup_thread(int16_t tid)
JULIA_DEBUG_SLEEPWAKE( wakeup_enter = cycleclock() );
if (tid == self || tid == -1) {
// we're already awake, but make sure we'll exit uv_run
if (ptls->sleep_check_state != not_sleeping)
if (jl_atomic_load_relaxed(&ptls->sleep_check_state) == sleeping)
jl_atomic_store(&ptls->sleep_check_state, not_sleeping);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this pair of operations looks like a CAS, but done non-atomically. Can that be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a data race, because the only other writer is monotonic (makes the same change). In fact, we can probably relax the atomic store here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I think that makes sense. I wondered whether the store could be less strict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further reflection the atomic does guard the memory contents of the work queue, so we might want something stronger. I need to think about it.

Copy link
Member

@vtjnash vtjnash Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this now reads like an CAS, but it's not. I'd suggest writing this as != sleeping, so it's clearer that it's just a regular (relaxed) store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the version which avoids the double negative. To be more explanatory it would be helpful to factor the uses of sleep_check_state into a minimum of locations and expand the comments describing which state/invariant it's meant to protect and how this implies the atomic ops which are used.

if (uvlock == system_self)
uv_stop(jl_global_event_loop());
Expand Down Expand Up @@ -405,7 +402,10 @@ static jl_task_t *get_next_task(jl_value_t *trypoptask, jl_value_t *q)

static int may_sleep(jl_ptls_t ptls)
{
return ptls->sleep_check_state == sleeping;
// sleep_check_state is only transitioned from not_sleeping to sleeping
// by the thread itself. As a result, if this returns false, it will
// continue returning false. If it returns true, there are no guarantees.
return jl_atomic_load_relaxed(&ptls->sleep_check_state) == sleeping;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need something stronger than relaxed ordering here, because sleep_check_state is meant to ensure consistency of other loads/stores?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual condition variable is protected by a lock, so we don't depend on any memory access dependencies here, I don't think. That's why I picked relaxed. Do you disagree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you disagree?

I don't disagree, I'm just unsure.

The condition variable seems fine. But I noticed that may_sleep is also used in a branch related to the event loop which is protected with jl_uv_mutex. So may_sleep doesn't always go along with the same lock. Which seems potentially confusing but perhaps harmless.

}

extern volatile unsigned _threadedregion;
Expand Down Expand Up @@ -476,8 +476,7 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *trypoptask, jl_value_t *q)
JL_UV_UNLOCK();
// optimization: check again first if we may have work to do
if (!may_sleep(ptls)) {
if (ptls->sleep_check_state != not_sleeping)
jl_atomic_store(&ptls->sleep_check_state, not_sleeping); // let other threads know they don't need to wake us
assert(ptls->sleep_check_state == not_sleeping);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make every access of sleep_check_state atomic as a matter of course?

Suggested change
assert(ptls->sleep_check_state == not_sleeping);
assert(jl_atomic_load_relaxed(&ptls->sleep_check_state) == not_sleeping);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The change to use assert also makes sense (since that's currently the implementation of may_sleep), but assumes that we don't alter the implementation of may_sleep to possibly have disjoint conditions that we want to check (i.e. the original partr code had a global sleep_check_state, so that bulk operations were also attempted).

start_cycles = 0;
continue;
}
Expand All @@ -495,7 +494,7 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *trypoptask, jl_value_t *q)
if (!_threadedregion && active && ptls->tid == 0) {
// thread 0 is the only thread permitted to run the event loop
// so it needs to stay alive
if (ptls->sleep_check_state != not_sleeping)
if (jl_atomic_load_relaxed(&ptls->sleep_check_state) != not_sleeping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (jl_atomic_load_relaxed(&ptls->sleep_check_state) != not_sleeping)
if (jl_atomic_load_relaxed(&ptls->sleep_check_state) == sleeping)

jl_atomic_store(&ptls->sleep_check_state, not_sleeping); // let other threads know they don't need to wake us
start_cycles = 0;
continue;
Expand All @@ -510,8 +509,7 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *trypoptask, jl_value_t *q)
uv_cond_wait(&ptls->wake_signal, &ptls->sleep_lock);
// TODO: help with gc work here, if applicable
}
if (ptls->sleep_check_state != not_sleeping)
jl_atomic_store(&ptls->sleep_check_state, not_sleeping); // let other threads know they don't need to wake us
assert(ptls->sleep_check_state == not_sleeping);
uv_mutex_unlock(&ptls->sleep_lock);
JULIA_DEBUG_SLEEPWAKE( ptls->sleep_leave = cycleclock() );
jl_gc_safe_leave(ptls, gc_state); // contains jl_gc_safepoint
Expand Down