-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: ensmallerate assembly generated by macro expansions #943
Conversation
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Are you piping the assembly through sed or something like that? It looks... weird in some places, e.g. But this looks great. Let me know if you want to get this run by perf.rlo, I am happy to help out with that. |
This is from |
yeah. there's some "good" reason |
Clearly I should have just used objdump. This saved me from having to a recompile in a separate command, and I didn't notice the commas until you pointed them out. Now it's driving me crazy. :) |
A perf run would be great. Our microbenchmarks were not terribly helpful here, and I imagine the impact is a lot more noticeable when there's actual non- |
/// | ||
/// [dispatcher]: ../dispatcher/struct.Dispatch.html | ||
#[cfg(feature = "std")] | ||
#[doc(hidden)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this was necessary since constructing the ValueSet
in a closure means we needed a FnOnce
so that values can be moved into the closure (which happens primarily when users wrap values in field::display
/field::debug
without borrowing). Not supporting this would break at least some code. I opted to add a new function that takes a FnOnce
, rather than capturing an Option<FnOnce>
into the FnMut
and take
ing it, because that seemed significantly less efficient.
We could probably commit to making this a public API (and even deprecate get_default
, the version that takes a FnMut
). This is probably a more useful API than get_default
taking a FnMut
... I didn't make it public here, though, because introducing a new public API seemed like a job for another PR.
Also, in 0.2, the Value
changes will make Value::debug
and Value::display
also borrow rather than move. So, the FnOnce
will no longer be necessary, and we could, alternatively, remove this function in 0.2, rather than deprecating the existing get_current
.
So, there should probably be an actual design discussion... :)
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
I replaced the |
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This was a really neat PR to read :) left a nit and a question but nothing blocking.
pub fn register(&'static self) -> Interest { | ||
self.registration | ||
.call_once(|| crate::callsite::register(self)); | ||
match self.interest.load(Ordering::Relaxed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually worse here to call the inlined interest fn here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be fine for it to be inlined into this function, since this function shouldn't be inlined. The reason it doesn't is that the interest
function calls register
(this function) if the cached value is not 0, 1, or 2. That means that if we register the callsite and the cached interest value is still not one of those, we could recurse again, potentially infinitely.
This shouldn't happen ever, but if there's a bug elsewhere, I'd rather just assume the cached value is invalid and return Interest::sometimes()
(which is what this method does), rather than overflowing the stack.
Co-authored-by: Lucio Franco <luciofranco14@gmail.com>
Fixed - When combining `Interest` from multiple subscribers, if the interests differ, the current subscriber is now always asked if a callsite should be enabled (#927) Added - Internal API changes to support optimizations in the `tracing` crate (#943) - **docs**: Multiple fixes and improvements (#913, #941)
### Fixed - When combining `Interest` from multiple subscribers, if the interests differ, the current subscriber is now always asked if a callsite should be enabled (#927) ### Added - Internal API changes to support optimizations in the `tracing` crate (#943) - **docs**: Multiple fixes and improvements (#913, #941)
## Motivation In PR #943, the construction of `ValueSet`s for events and spans was moved out of the code expanded at the callsite and into a closure, in order to reduce the amount of assembly generated in functions containing tracing macros. However, this introduced an accidental breaking change for some dependent crates. Borrowing values inside a closure meant that when a field of a struct, array, or tuple was used as a field value, the closure must borrow the _entire_ struct, array, or tuple, rather than the individual field. This broke code where another unrelated field of that struct, array, or tuple would then be mutably borrowed or moved elsewhere. ## Solution This branch fixes the breaking change by moving `ValueSet` construction back out of a closure and into the code expanded at the callsite. This _does_ regress the amount of assembly generated a bit: a function containing a single `event!` macro generates 32 instructions in release mode on master, and after this change, it generates 83 instructions. However, this is better than reverting PR #943 entirely, which generates 103 instructions for the same function. This change allows us to continue to benefit from *some* of the changes made in #943, although we no longer can benefit from the most significant one. Rather than trying to further optimize the macro expanded code now, I think we should wait for the `ValueSet` rework that will land in `tracing` 0.2, where we could potentially generate significantly less code by virtue of constructing a `ValueSet` with a much simpler array literal (no `FieldSet` iteration required). Fixes #954 Closes #986 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Motivation
Currently, tracing's macros generate a lot of code at the expansion
site, most of which is not hidden behind function calls. Additionally,
several functions called in the generated code are inlined when they
probably shouldn't be. This results in a really gross disassembly for
code that calls a
tracing
macro.Inlining code can be faster, by avoiding function calls. However, so
much code can have some negative impacts. Obviously, the output binary
is much larger. Perhaps more importantly, though, this code may optimize
much less well --- for example, functions that contain tracing macros
are unlikely to be inlined, since they are quite large.
Solution
This branch "outlines" most of the macro code, with the exception of the
two hottest paths for skipping a disabled callsite (the global max level
and the per-callsite cache). Actually recording the span or event,
performing runtime filtering via
Subscriber::enabled
, and registeringthe callsite for the first time, are behind a function call. This means
that the fast paths still don't require a stack frame, but the amount of
code they jump past is significantly shorter.
Also, while working on this, I noticed that the event macro expansions
were hitting the dispatcher thread-local two separate times for enabled
events. That's not super great. I folded these together into one
LocalKey::with
call, so that we don't access the thread-local twice.Building
ValueSet
s to actually record a span or event is the path thatgenerates the most code at the callsite currently. In order to put this
behind a function call, it was necessary to capture the local variables
that comprise the
ValueSet
using a closure. While closures haveimpacted build times in the past, my guess is that this may be offset a
bit by generating way less code overall. It was also necessary to add
new
dispatcher
-accessing functions that take aFnOnce
rather than aFnMut
. This is because fields are sometimes moved into theValueSet
by value rather than by ref, when they are inside of a callto
tracing::debug
ordisplay
that doesn't itself borrow the value.Not supporting this would be a breaking change, so we had to make it
work. Capturing a
FnOnce
into theFnMut
in an&mut Option
seemedinefficient, so I did this instead.
Before & After
Here's the disassembly of the following function:
On master:
And on this branch:
So, uh...that seems better.