Skip to content

Commit

Permalink
fix(subscriber): mitigate race in Callsites::contains
Browse files Browse the repository at this point in the history
The `ConsoleLayer` uses the `Callsites` struct to store and check for
callsites for specific kinds of traces, for example spawn spans or waker
events.

`Callsites` stores a fixed size array of pointers to the `Metadata` for
each callsite and a length to indicate how many callsites it has
registered. The length and each individual pointer are stored in
atomics.

Since it is possible for these values to change individually, if a
callsite lookup fails, we check if the length of the array has changed
while we were checking the pointers, if it has, the lookup is started
again.

However, there is still a possible race condition. If the length
changes, but the lookup occurs before the callsite pointer is actually
written, then we may miss a callsite that is in the process of being
registered. In this case, the pointer which is loaded from the
`Callsites` array will be null.

This change adds a check for this case (null ptr), and reperforms the
lookup if it occurs.

This race condition was found while chasing down the source of #473. It
doesn't solve the flakiness, but it can reduce the likelihood of it
occuring, thus it is a mitigation only.

In reality, neither of these race condition checks should be needed, as
we would expect that `tracing` guarantees that `ConsoleLayer` completes
`register_callsite()` before `on_event()` or `new_span()` are called.
  • Loading branch information
hds committed Oct 11, 2023
1 parent 6dd661d commit 78483b4
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion console-subscriber/src/callsites.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,13 @@ impl<const MAX_CALLSITES: usize> Callsites<MAX_CALLSITES> {
let mut len = self.len.load(Ordering::Acquire);
loop {
for cs in &self.ptrs[start..len] {
if ptr::eq(cs.load(Ordering::Acquire), callsite) {
let recorded = cs.load(Ordering::Acquire);
if ptr::eq(recorded, callsite) {
return true;
} else if ptr::eq(recorded, ptr::null_mut()) {
// We have read a recorded callsite before it has been
// written. We need to check again.
continue;
}
}

Expand Down

0 comments on commit 78483b4

Please sign in to comment.