Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: move macro callsite impls out of macro expansion #869

Merged
merged 8 commits into from
Jul 31, 2020

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jul 31, 2020

Motivation

Currently, every tracing macro generates a new implementation of the
Callsite trait for a zero-sized struct created for that particular
callsite. This callsite accesses several statics defined in the macro
expansion.

This means that each tracing macro expands to a lot of code — check
out the cargo expand output:

eliza on butterfly in tracing/examples on  master [$?] is v0.0.0 via ⚙️ v1.44.0
:; cargo expand --example all-levels | wc -l
    Checking tracing-examples v0.0.0 (/home/eliza/code/tracing/examples)
    Finished check [unoptimized + debuginfo] target(s) in 0.20s

463

More code in the macro expansion means more code in the function
invoking the macro, which may make that function harder for rustc to
optimize. This effects the performance of other code in the function,
not the tracing code, so this isn't necessarily visible in tracing's
microbenchmarks, which only contain tracing code.

In rustc itself, there is a small but noticeable performance impact
from switching from log to tracing even after making changes that
should make the filtering overhead equivalent:
rust-lang/rust#74726 (comment).
This appears to be due to more complex generated code impacting
optimizer behavior.

Solution

This branch moves the callsite generated by each macro out of the macro
expansion and into a single MacroCallsite private API type in the
__macro_support module. Instead of creating a zero-sized Callsite
static and multiple statics for the Metadata, the Once cell for
registration, and the Interest atomic, these are all now fields on the
Callsite struct. This shouldn't result in any real change, but makes
the implementation simpler. All the hot filtering functions on
MacroCallsite are #[inline(always)], so we shouldn't be adding stack
frames to code that was previously generated in the macro expansion.

After making this change, the expanded output is about half as long
as it was before:

eliza on butterfly in tracing/examples on  eliza/smaller-macros [$?] is v0.0.0 via ⚙️ v1.44.0 
:; cargo expand --example all-levels | wc -l
    Checking tracing-examples v0.0.0 (/home/eliza/code/tracing/examples)
    Finished check [unoptimized + debuginfo] target(s) in 0.75s

233

This change appears to fix most of the remaining rustc performance
regressions:
rust-lang/rust#74726 (comment)

Additionally, it has some other side benefits. I imagine it probably
improves compile times a bit for crates using tracing (although I
haven't tested this), since the compiler is generating fewer callsite
implementations. Reducing the number of branches in the macro expansion
probably helps make the pesky cognitive_complexity Clippy lint show up
less often, and improves maintainability for the macros as well.

hawkw added 6 commits July 31, 2020 09:31
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added crate/tracing Related to the `tracing` crate kind/perf Performance improvements labels Jul 31, 2020
@hawkw hawkw requested review from yaahc, davidbarsky and a team July 31, 2020 17:16
@hawkw hawkw changed the title Eliza/smaller macros tracing: move macro callsite impls out of macro expansion Jul 31, 2020
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw
Copy link
Member Author

hawkw commented Jul 31, 2020

We could factor more code out of the macros by encapsulating the ValueSet construction in a closure that's passed to some function. However, adding a closure to every callsite probably makes build times worse, so I'd prefer to avoid this unless necessary.

Copy link
Member

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but just asking some corrections on the doc (private) comments.

tracing/src/lib.rs Outdated Show resolved Hide resolved
tracing/src/lib.rs Outdated Show resolved Hide resolved
tracing/src/lib.rs Outdated Show resolved Hide resolved
tracing/src/lib.rs Outdated Show resolved Hide resolved
Thanks, @davidbarsky, that _was_, in fact, what I meant...

Co-authored-by: David Barsky <me@davidbarsky.com>
Copy link
Collaborator

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@hawkw hawkw merged commit 90dfcca into master Jul 31, 2020
hawkw added a commit that referenced this pull request Jul 31, 2020
Fixed

- Fixed a bug where `LevelFilter::OFF` (and thus also the
  `static_max_level_off`  feature flag) would enable *all* traces,
  rather than *none* (#853)
- **log**: Fixed `tracing` macros and `Span`s not checking
  `log::max_level` before emitting `log` records (#870)

Changed

- **macros**: Macros now check the global max level
  (`LevelFilter::current`) before the per-callsite cache when
  determining if a span or event is enabled. This significantly improves
  performance in some use cases (#853)
- **macros**: Simplified the code generated by macro expansion
  significantly, which may improve compile times and/or `rustc`
  optimizatation of surrounding code (#869, #869)
- **macros**: Macros now check the static max level before checking any
  runtime filtering, improving performance when a span or event is
  disabled by a `static_max_level_XXX` feature flag (#868)
- `LevelFilter` is now a re-export of the `tracing_core::LevelFilter`
  type, it can now be used interchangably with the versions in
  `tracing-core` and `tracing-subscriber` (#853)
- Significant performance improvements when comparing `LevelFilter`s and
  `Level`s (#853)
- Updated the minimum `tracing-core` dependency to 0.1.12 (#853)

Added

- **macros**: Quoted string literals may now be used as field names, to
  allow fields whose names are not valid Rust identifiers (#790)
- **docs**: Several documentation improvements (#850, #857, #841)
- `LevelFilter::current()` function, which returns the highest level
  that any subscriber will enable (#853)
- `Subscriber::max_level_hint` optional trait method, for setting the
  value returned by `LevelFilter::current()` (#853)

Thanks to new contributors @cuviper, @ethanboxx, @ben0x539, @dignati,
@colelawrence, and @rbtcollins for helping out with this release!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this pull request Jul 31, 2020
### Fixed

- Fixed a bug where `LevelFilter::OFF` (and thus also the
  `static_max_level_off`  feature flag) would enable *all* traces,
  rather than *none* (#853)
- **log**: Fixed `tracing` macros and `Span`s not checking
  `log::max_level` before emitting `log` records (#870)

### Changed

- **macros**: Macros now check the global max level
  (`LevelFilter::current`) before the per-callsite cache when
  determining if a span or event is enabled. This significantly improves
  performance in some use cases (#853)
- **macros**: Simplified the code generated by macro expansion
  significantly, which may improve compile times and/or `rustc`
  optimizatation of surrounding code (#869, #869)
- **macros**: Macros now check the static max level before checking any
  runtime filtering, improving performance when a span or event is
  disabled by a `static_max_level_XXX` feature flag (#868)
- `LevelFilter` is now a re-export of the `tracing_core::LevelFilter`
  type, it can now be used interchangably with the versions in
  `tracing-core` and `tracing-subscriber` (#853)
- Significant performance improvements when comparing `LevelFilter`s and
  `Level`s (#853)
- Updated the minimum `tracing-core` dependency to 0.1.12 (#853)

### Added

- **macros**: Quoted string literals may now be used as field names, to
  allow fields whose names are not valid Rust identifiers (#790)
- **docs**: Several documentation improvements (#850, #857, #841)
- `LevelFilter::current()` function, which returns the highest level
  that any subscriber will enable (#853)
- `Subscriber::max_level_hint` optional trait method, for setting the
  value returned by `LevelFilter::current()` (#853)

Thanks to new contributors @cuviper, @ethanboxx, @ben0x539, @dignati,
@colelawrence, and @rbtcollins for helping out with this release!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Jul 31, 2020
This picks up upstream changes tokio-rs/tracing#853,
tokio-rs/tracing#868, and tokio-rs/tracing#869 which improve performance
in some use cases. The overhead removed by these changes may already be
amortized enough in the proxy that it's not a problem, but it seems
worth picking up regardless.
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Aug 4, 2020
This picks up upstream changes tokio-rs/tracing#853,
tokio-rs/tracing#868, and tokio-rs/tracing#869 which improve performance
in some use cases. The overhead removed by these changes may already be
amortized enough in the proxy that it's not a problem, but it seems
worth picking up regardless.


Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/tracing Related to the `tracing` crate kind/perf Performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants