Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping DefaultGuard in another thread doesn't cause it to get cleaned up properly #2482

Open
bgw opened this issue Feb 21, 2023 · 3 comments

Comments

@bgw
Copy link

bgw commented Feb 21, 2023

Bug Report

Crates / Version

tracing_core 0.1.30 and 0.2.x (master)

Platform

This issue is not platform specific. I've reproduced it on the rust playground and replit (Linux).

Description

The Drop impl of DefaultGuard modifies the CURRENT_STATE thread local, assuming it to be the same thread local it was constructed with.

thread_local! {
static CURRENT_STATE: State = State {
default: RefCell::new(None),
can_enter: Cell::new(true),
};
}

#[cfg(feature = "std")]
impl Drop for DefaultGuard {
#[inline]
fn drop(&mut self) {
SCOPED_COUNT.fetch_sub(1, Ordering::Release);
if let Some(dispatch) = self.0.take() {
// Replace the dispatcher and then drop the old one outside
// of the thread-local context. Dropping the dispatch may
// lead to the drop of a collector which, in the process,
// could then also attempt to access the same thread local
// state -- causing a clash.
let prev = CURRENT_STATE.try_with(|state| state.default.replace(Some(dispatch)));
drop(prev)
}
}
}

However, because DefaultGuard is Send, that may be a different thread local.

This bug is further obfuscated by an optimization: The dispatcher keeps track of a global SCOPED_COUNT, which must be non-zero to reproduce this bug:

pub fn get_default<T, F>(mut f: F) -> T
where
F: FnMut(&Dispatch) -> T,
{
if SCOPED_COUNT.load(Ordering::Acquire) == 0 {
// fast path if no scoped dispatcher has been set; just use the global
// default.
return f(get_global());
}
get_default_slow(f)
}

In my repro example, I create a NoopCollect to ensure SCOPED_COUNT is non-zero.

Repro

Code for the 0.2.x repro:

use std::fmt::Debug;

use tracing_core::dispatch::set_default;
use tracing_core::field::{Field, Visit};
use tracing_core::span::{Attributes, Current, Id, Record};
use tracing_core::{Collect, Dispatch, Event, Metadata};

/// A Collect that does nothing (doesn't print anything)
struct NoopCollect;

impl Collect for NoopCollect {
    fn enabled(&self, _metadata: &Metadata<'_>) -> bool {
        true
    }
    fn new_span(&self, _span: &Attributes<'_>) -> Id {
        Id::from_u64(1234)
    }
    fn record(&self, _span: &Id, _values: &Record<'_>) {}
    fn record_follows_from(&self, _span: &Id, _follows: &Id) {}
    fn event(&self, _event: &Event<'_>) {}
    fn enter(&self, _span: &Id) {}
    fn exit(&self, _span: &Id) {}
    fn current_span(&self) -> Current {
        Current::unknown()
    }
}

/// A collect that prints event messages to stdout
struct PrintCollect;

impl Collect for PrintCollect {
    fn enabled(&self, _metadata: &Metadata<'_>) -> bool {
        true
    }
    fn new_span(&self, _span: &Attributes<'_>) -> Id {
        Id::from_u64(1234)
    }
    fn record(&self, _span: &Id, _values: &Record<'_>) {}
    fn record_follows_from(&self, _span: &Id, _follows: &Id) {}
    fn event(&self, event: &Event<'_>) {
        event.record(&mut PrintVisit)
    }
    fn enter(&self, _span: &Id) {}
    fn exit(&self, _span: &Id) {}
    fn current_span(&self) -> Current {
        Current::unknown()
    }
}

struct PrintVisit;

impl Visit for PrintVisit {
    fn record_debug(&mut self, field: &Field, value: &dyn Debug) {
        if field.name() == "message" {
            println!("{:?}", value);
        }
    }
}

fn main() {
    // this is needed so that dispatch's SCOPED_COUNT doesn't go to zero,
    // which would shortcut the logic we're testing
    let _noop_guard = set_default(&Dispatch::new(NoopCollect));

    // this works as expected:
    let guard = set_default(&Dispatch::new(PrintCollect));
    tracing::info!("this is printed");
    drop(guard);
    tracing::info!("this is not printed");

    // however, this fails to clean up properly:
    let guard = set_default(&Dispatch::new(PrintCollect));
    tracing::info!("this is printed");

    // dropping the guard on another thread causes the wrong thread local to be modified
    std::thread::spawn(move || {
        drop(guard);
    })
    .join()
    .unwrap();
    
    // the current guard should be `_noop_guard`
    tracing::info!("this should not be printed, but it is");
}

Suggestions

Either make DefaultGuard !Send, which is what was done for span Entered guards in #698, or make DefaultGuard track the CURRENT_STATE associated with the thread it was constructed in.

Even though it would be a breaking API change (and would need to wait for 0.2.x), I prefer the !Send approach because:

  • This avoid footguns with Send futures. Somebody might write an async function that thinks it's within a certain guard, but isn't if work stealing happens.
  • Making DefaultGuard track the CURRENT_STATE it was constructed with would likely require an Arc and a Mutex instead of the current RefCell that's used. There would be a performance penalty.
@bgw
Copy link
Author

bgw commented Feb 21, 2023

This is related to #1657, but is a different bug with a different root cause.

@hawkw
Copy link
Member

hawkw commented Feb 21, 2023

We should definitely make the guard !Send in v0.2.x.

In the meantime, though, we may want to investigate alternate solutions for 0.1.x. I agree that the potential performance penalty for making the guard handle the case where it's sent across threads is quite unfortunate, especially if it's paid in all uses of the scoped dispatcher API...

Alternative approaches include adding a new version of the set_default API in v0.1.x (under a different name) that returns a !Send guard, and deprecating set_default in favor of that. This wouldn't be as good as just fixing the existing API, as it requires user action, but it's not technically breaking, although deprecating set_default could be quite annoying, especially for users who fail builds on deprecations. We could also add a big warning in the documentation, which...doesn't really solve the problem, but has no negative impacts and would be a good stop-gap until we can solve this properly...

@bgw
Copy link
Author

bgw commented Feb 22, 2023

I can make a PR for v0.2.x in the next few days, as that seems straightforward. v0.1.x is a little less clear.

Span::enter has a warning in the documentation, which we should replicate: https://docs.rs/tracing/latest/tracing/span/struct.Span.html#method.enter

However, there is no alternative API for Span::enter that returns a !Send guard.


Also, making an implementation note here:

#1001 used (a trivial) unsafe to create a !Send type

unsafe impl<'a> Sync for Entered<'a> {}

I believe it is possible to do something similar without unsafe (and without requiring std):

use std::marker::PhantomData;

trait NotSend: Sync {}

#[derive(Default)]
struct Foo {
    _not_send: PhantomData<dyn NotSend>,
}

fn main() {
    let f = Foo::default();
    std::thread::spawn(move || {
        drop(f);
    });
}
error[[E0277]](https://doc.rust-lang.org/stable/error_codes/E0277.html): `(dyn NotSend + 'static)` cannot be sent between threads safely
  --> src/main.rs:12:24
   |
12 |       std::thread::spawn(move || {
   |       ------------------ ^------
   |       |                  |
   |  _____|__________________within this `[closure@src/main.rs:12:24: 12:31]`
   | |     |
   | |     required by a bound introduced by this call
13 | |         drop(f);
14 | |     });
   | |_____^ `(dyn NotSend + 'static)` cannot be sent between threads safely
   |
   = help: within `[closure@src/main.rs:12:24: 12:31]`, the trait `Send` is not implemented for `(dyn NotSend + 'static)`
   = note: required because it appears within the type `PhantomData<(dyn NotSend + 'static)>`

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5d630f35fb0b69c2b4e9d4e559afd487

bgw added a commit to bgw/tracing that referenced this issue Feb 26, 2023
The `Drop` impl of `DefaultGuard` modifies the `CURRENT_STATE` thread
local, assuming it to be the same thread local it was constructed with.

However, because `DefaultGuard` is `Send`, that may be a different
thread local.

Therefore, we should mark `DefaultGuard` as `!Send`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants