Incorrect `Interest::Never` is cached and never invalidated on v0.1 #2874

plietar · 2024-02-08T21:30:32Z

Bug Report

Version

tracing-core 0.1.32. The master branch does not have the relevant optimization and is probably fine.

Platform

x86_64 Linux

Crates

tracing-core

Description

Under the right set of circumstances, it is possible for all events and spans to be silenced, regardless of the subscriber's configuration.

The particular sequence of events is as follows:

A Dispatch object is created, wrapping any kind of subscriber (an fmt subscriber with maximum logging will do). The object must not be set as the global or thread-local dispatcher yet.
A tracing macro is called. Since no dispatcher is configured yet, nothing is printed, as expected.
The dispatch object from step 1 is registered as a thread-local dispatcher, using tracing::dispatcher::with_default
The same line of code as step 2 is executed again. One would expect this to print, since a dispatcher is now configured, but instead nothing happens again.

The following piece of code demontrates the bug is a deterministic way. You can achieve a similar result using only the higher level tracing::subscriber::with_default API, but that requires multiple threads and a bit of luck to get the right order of operations.

fn is_enabled() -> bool {
    tracing::info!("log me maybe");
    return tracing::enabled!(tracing::Level::DEBUG);
}

fn main() {
    let subscriber = tracing_subscriber::fmt()
        .with_max_level(tracing::Level::TRACE)
        .finish();
    let dispatch = tracing::Dispatch::new(subscriber);

    // This returns false and doesn't print anything, as expected
    is_enabled();

    tracing::dispatcher::with_default(&dispatch, || {
        // This should have returned true, but does not.
        assert!(is_enabled());
    });
}

I've tracked the bug down to an optimization in tracing-core's callsite module, specifically the has_just_one atomic boolean.

At step 1, a dispatcher is created and registered using the Dispatchers::register_dispatch method. Since it is the first one to be registered, the has_just_one flag is set to true.
At step 2, the call site of the tracing macro needs to be registered, using DefaultCallsite::register. In that process, the callsite's interest is calculated by rebuild_callsite_interest. DefaultCallsite::register had created and passed in a Rebuilder object, and based on the value of has_just_one, a Rebuilder::JustOne is used.
The rebuild_callsite_interest function calls Rebuilder::for_each to iterate over all the registered dispatchers. Since the rebuilder is a JustOne, the for_each function uses dispatcher::get_default as an optimization which should point to the one and only dispatcher in existence.
However, because the only registered dispatcher is not active yet, dispatcher::get_default actually returns the GLOBAL_DISPATCH object, which points to a no-op subscriber.
The no-op subscriber returns a Never interest. This interest is cached by the callsite module.
At step 3, the original dispatcher is made active. Since it had already been created and registered in step 1, the interest cache is not invalidated.
Finally at step 4, the cached Never interest is used again when the same macro call is made, even though that is now incorrect. No log line is printed.

The core of the issue comes from the has_just_one optimization, which is in contradiction with the fact that there are in fact two dispatchers: the global no-op one and the registered but not yet active one. One fix would be to register the global no-op to make the has_just_one flag more accurate. However I believe that by doing this, has_just_one would pretty much never be true. I think an easier solution would be to remove that optimization altogether. Indeed, it doesn't even exist on the master branch anymore.

If there had been other dispatchers around, has_just_one would have been false and the fallback code path would have worked fine. This can be tested by adding let _dont_drop_me = tracing::Dispatch::new(tracing::subscriber::NoSubscriber::new()); as the first line of the main function, to create a dummy dispatcher, enough to avoid that optimization.

This bug is probably not present on the master branch given that the optimization doesn't exist on it. I don't know what your policy is, but given that 0.1 is still the only published branch, it would be nice to have a bugfix for this issue. If you think dropping the has_just_one optimization is acceptable and reasonable I'll happily make a PR for it.

The text was updated successfully, but these errors were encountered:

ThomWright · 2024-09-06T21:59:58Z

For what it's worth, I believe this is what is making my tests flaky.

Is this related to, or the same bug as, #2743?

tokio-rs/tracing#2874 Looks like a bug in tracing is making the tests flaky.

ThomWright added a commit to ThomWright/batch-aint-one that referenced this issue Sep 6, 2024

Put plaster over test flakiness

5d5389c

tokio-rs/tracing#2874 Looks like a bug in tracing is making the tests flaky.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect `Interest::Never` is cached and never invalidated on v0.1 #2874

Incorrect `Interest::Never` is cached and never invalidated on v0.1 #2874

plietar commented Feb 8, 2024

ThomWright commented Sep 6, 2024

Incorrect Interest::Never is cached and never invalidated on v0.1 #2874

Incorrect Interest::Never is cached and never invalidated on v0.1 #2874

Comments

plietar commented Feb 8, 2024

Bug Report

Version

Platform

Crates

Description

ThomWright commented Sep 6, 2024

Incorrect `Interest::Never` is cached and never invalidated on v0.1 #2874

Incorrect `Interest::Never` is cached and never invalidated on v0.1 #2874