Skip to content

Commit 8b61871

Browse files
authored
Rollup merge of #138874 - Zoxc:waiter-race, r=SparrowLii,davidtwco
Batch mark waiters as unblocked when resuming in the deadlock handler This fixes a race when resuming multiple threads to resolve query cycles. This now marks all threads as unblocked before resuming any of them. Previously if one was resumed and marked as unblocked at a time. The first thread resumed could fall asleep then Rayon would detect a second false deadlock. Later the initial deadlock handler thread would resume further threads. This also reverts the workaround added in #137731. cc `@SparrowLii` `@lqd`
2 parents a883b23 + 38c39ff commit 8b61871

File tree

1 file changed

+12
-25
lines changed
  • compiler/rustc_query_system/src/query

1 file changed

+12
-25
lines changed

compiler/rustc_query_system/src/query/job.rs

+12-25
Original file line numberDiff line numberDiff line change
@@ -163,13 +163,6 @@ struct QueryWaiter {
163163
cycle: Mutex<Option<CycleError>>,
164164
}
165165

166-
impl QueryWaiter {
167-
fn notify(&self, registry: &rayon_core::Registry) {
168-
rayon_core::mark_unblocked(registry);
169-
self.condvar.notify_one();
170-
}
171-
}
172-
173166
#[derive(Debug)]
174167
struct QueryLatchInfo {
175168
complete: bool,
@@ -232,7 +225,8 @@ impl QueryLatch {
232225
info.complete = true;
233226
let registry = rayon_core::Registry::current();
234227
for waiter in info.waiters.drain(..) {
235-
waiter.notify(&registry);
228+
rayon_core::mark_unblocked(&registry);
229+
waiter.condvar.notify_one();
236230
}
237231
}
238232

@@ -477,8 +471,8 @@ fn remove_cycle(
477471
/// Detects query cycles by using depth first search over all active query jobs.
478472
/// If a query cycle is found it will break the cycle by finding an edge which
479473
/// uses a query latch and then resuming that waiter.
480-
/// There may be multiple cycles involved in a deadlock, but we only search
481-
/// one cycle at a call and resume one waiter at once. See `FIXME` below.
474+
/// There may be multiple cycles involved in a deadlock, so this searches
475+
/// all active queries for cycles before finally resuming all the waiters at once.
482476
pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry) {
483477
let mut wakelist = Vec::new();
484478
let mut jobs: Vec<QueryJobId> = query_map.keys().cloned().collect();
@@ -488,19 +482,6 @@ pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry)
488482
while jobs.len() > 0 {
489483
if remove_cycle(&query_map, &mut jobs, &mut wakelist) {
490484
found_cycle = true;
491-
492-
// FIXME(#137731): Resume all the waiters at once may cause deadlocks,
493-
// so we resume one waiter at a call for now. It's still unclear whether
494-
// it's due to possible issues in rustc-rayon or instead in the handling
495-
// of query cycles.
496-
// This seem to only appear when multiple query cycles errors
497-
// are involved, so this reduction in parallelism, while suboptimal, is not
498-
// universal and only the deadlock handler will encounter these cases.
499-
// The workaround shows loss of potential gains, but there still are big
500-
// improvements in the common case, and no regressions compared to the
501-
// single-threaded case. More investigation is still needed, and once fixed,
502-
// we can wake up all the waiters up.
503-
break;
504485
}
505486
}
506487

@@ -519,9 +500,15 @@ pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry)
519500
);
520501
}
521502

522-
// FIXME: Ensure this won't cause a deadlock before we return
503+
// Mark all the thread we're about to wake up as unblocked. This needs to be done before
504+
// we wake the threads up as otherwise Rayon could detect a deadlock if a thread we
505+
// resumed fell asleep and this thread had yet to mark the remaining threads as unblocked.
506+
for _ in 0..wakelist.len() {
507+
rayon_core::mark_unblocked(registry);
508+
}
509+
523510
for waiter in wakelist.into_iter() {
524-
waiter.notify(registry);
511+
waiter.condvar.notify_one();
525512
}
526513
}
527514

0 commit comments

Comments
 (0)