Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: avoid scoped dispatch overhead when scoped dispatchers are not in use #861

Closed
hawkw opened this issue Jul 29, 2020 · 1 comment · Fixed by #1017
Closed

core: avoid scoped dispatch overhead when scoped dispatchers are not in use #861

hawkw opened this issue Jul 29, 2020 · 1 comment · Fixed by #1017
Labels
area/no-std Related to improving support for no-std. crate/core Related to the `tracing-core` crate kind/perf Performance improvements

Comments

@hawkw
Copy link
Member

hawkw commented Jul 29, 2020

Currently, the global default dispatcher is essentially implemented as a layer on top of the scoped default dispatcher. This means that using the global default dispatcher still pays some of the overhead of the scoped default dispatcher. In particular:

  • Atomic reference counting. The global default dispatcher will never be dropped, as it lives for the entire lifetime of the program. However, we still Arc it, since the scoped default dispatcher requires Arcing. This means we have to bump a reference count to clone it that will never be 0. We could potentially change the internal representation to an enum of either Arc<dyn Subscriber + ...> (when scoped) or &'static dyn Subscriber + ... when global. That way, cloning the global dispatch is essentially free. This also means we could feature flag the Arc use, moving us one step closer to working on no_std targets without liballoc.

  • Currently, we always check the thread-local storage for the scoped dispatcher before checking the global default. This is because a scoped default takes priority over a global default. However, this also means that getting the dispatch always involves a TLS hit, even when the TLS dispatcher is not being used. As I suggested here:

    [...] or because tracing requires an additional thread-local storage hit to check if there is a scoped dispatcher. A potentially worthwhile optimization is having a flag indicating whether scoped thread local dispatchers are in use, and skipping the TLS hit if it's unset...

    Originally posted by @hawkw in Tracing std::io::err performance #859 (comment)
    we could add a flag tracking whether any scoped default dispatchers exist, and skip checking TLS if it's unset. The easiest approach is just to have an AtomicBool flag that's set once a scoped dispatcher is created, and never unset --- this way, we would avoid checking TLS in programs that only use set_global_default (which is a fairly common case!). However, if we wanted to be fancier, we could also track a count of scoped default dispatchers. That would let us avoid checking TLS if scoped defaults are used briefly and then no longer used.

The additional complexity of this should, of course, be benchmarked.

@hawkw hawkw added crate/core Related to the `tracing-core` crate kind/perf Performance improvements labels Jul 29, 2020
@hawkw
Copy link
Member Author

hawkw commented Jul 29, 2020

The overhead of adding an enum to dispatchers turns out to be more noticeable than one would have thought!

@hawkw hawkw added the area/no-std Related to improving support for no-std. label Sep 29, 2020
hawkw added a commit that referenced this issue Oct 7, 2020
## Motivation

Presently, `tracing` and `tracing-core` allow disabling the Rust
standard library. However, even when used with the standard library
disabled, they still require the use of `alloc`, which means some form
of global allocator must exist. This can be a significant limitation on
some very constrained embedded devices. Additionally, the use of any
heap allocations on these devices ought to be avoided.

Now that the callsite registry is no longer a `Vec`, we don't need heap
allocations for storing registered callsites. However, `Dispatch`s are
currently _always_ stored in `Arc`s, and the global registry always
contains a `Vec` of `Weak` refs to them. This is required in order to
support thread-local scoped dispatcher API, where multiple `Dispatch`es
can coexist. However, on `no-std` platforms without threads, we already
don't support the thread-local scoped dispatch API. In this case, we can
reduce the complexity and the number of allocations required by the
dispatcher.

Additionally, even when the standard library is in use, if the
dispatcher is a global default rather than a scoped dispatcher, it
remains set forever. This means it can never be dropped. When this is
the case, it's possible to clone the dispatcher by cloning a `'static`
reference, rather than a (comparatively) costly `Arc` bump.

## Solution

This branch changes the global default dispatcher to be represented as a
`&'static dyn Subscriber` reference on all platforms. In addition, the
dependency on `liballoc` is now feature flagged, allowing it to be
disabled along with the dependency on `libstd`. This means that
`tracing` can now be used with _no_ heap allocations required.

In order to use `tracing` without `liballoc`, `Dispatch`es can only be
created from `&'static dyn Subscriber` references, since this is the
only way to erase their types that exists without boxed trait objects.
Therefore, I've added a new `Dispatch::from_static` constructor that is
always available. This is more limiting than `Dispatch::new` --- some
ways of modeling runtime composition of the subscriber are much harder
(although it could still contain state). However, users without `alloc`
are probably fairly used to this --- the `log` crate [has the same
limitation](https://docs.rs/log/0.4.11/log/#use-with-std) when used
without `std`.

Also, I've changed the global subscriber registry when `std` _is_
enabled to use a `RwLock` rather than a mutex. Now, multiple callsites
can be registered simultaneously, and a write lock is only necessary to
add a new subscriber or deregister old ones. It's now fine to use
`RwLock` here, because we no longer use any locking at all when `std` is
not enabled --- we always use the single global default subscriber, so
we don't require a list of currently live subscribers.

I've modified the `tracing` and `tracing-core` crates to add new feature
flags for enabling `std` and `alloc` separately.

Closes #861

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/no-std Related to improving support for no-std. crate/core Related to the `tracing-core` crate kind/perf Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant