Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing::{info, error, warn, trace, debug} is slower by factor of 30 #6072

Closed
pmnoxx opened this issue Jan 14, 2022 · 4 comments · Fixed by #6470
Closed

tracing::{info, error, warn, trace, debug} is slower by factor of 30 #6072

pmnoxx opened this issue Jan 14, 2022 · 4 comments · Fixed by #6470
Assignees
Labels
C-discussion Category: Discussion, leading to research or enhancement

Comments

@pmnoxx
Copy link
Contributor

pmnoxx commented Jan 14, 2022

We are currently using tracing for logging. This is done through info, error, warn, trace, debug macros.
By default, tracing logs data to stderr, or the data gets redirected through pipe to a file.

I did a few measurements, that shows that it's taking a considerable amount of time and that can be improved.

For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

// tracing::info!                   time:   [1.5989 us 1.6017 us 1.6065 us]
// log::info!                   time:   [1.8204 us 1.8230 us 1.8256 us]
// println!                time:   [809.49 ns 810.59 ns 811.76 ns]

However, when the data is redirected to /dev/null, then it takes less time to execute those commands.

// to /dev/null
// println!                time:   [265.99 ns 267.48 ns 269.30 ns]
// tracing::info!                   time:   [742.95 ns 743.34 ns 743.79 ns]
// log::info!                   time:   [992.79 ns 993.24 ns 993.76 ns]

This shows that, the tracing:info! is blocking, and does a syscall which blocks the thread from executing, and it practice thread can be frozen for a longer period of time while doing disk write.

On the other hand, doing only format! takes only around 40ns. Due to, both the tracing being inefficient at both formatting the data and writing them to stdout, the time to write a log increases from 40ns to 1600ns

// only format!
// format!                 time:   [38.676 ns 38.705 ns 38.740 ns]

I did some experiments, when instead of using tracing, and instead of writing data to stderr, we write data to a concurrent, queue, which then gets consumed by another process. The result is that it takes only around 60ns to write a log, instead of 1600ns.

// use format! and send result to conqueue
// format+conqueue         time:   [61.043 ns 61.050 ns 61.059 ns]

My suggestion is to do the following.

  1. Replace, tracing::{info, error, warn, trace, debug}, with our own code, which formats the data, and sends an event through MPSC conqueue to another thread
  2. From another thread we have two choices
    a) write data using println!, this is what tracing does internally. It will take around 700ns, but at the very least we won't block the target thread
    b) send data using UDP to syslog (https://en.wikipedia.org/wiki/Syslog)
    A syslog service will be responsible for writing data to disk, and our process will not be blocked by disk writes anymore.

Drawbacks:

  • The conqueue can potentially get larger, than we can handle.
    However, that's still better than, the current solution, which is to block the log writer thread.
    We can implement a custom strategy of dropping logs, to mitigate this. The advantage of this solution, is that we would still keep the current process alive, while logs.
@pmnoxx pmnoxx added the C-discussion Category: Discussion, leading to research or enhancement label Jan 14, 2022
@pmnoxx pmnoxx self-assigned this Jan 14, 2022
@pmnoxx pmnoxx mentioned this issue Jan 14, 2022
2 tasks
@pmnoxx pmnoxx changed the title tracing::{info, error, warn, trace, debug} take long time tracing::{info, error, warn, trace, debug} is slower by factor of 30 Jan 14, 2022
@nagisa
Copy link
Collaborator

nagisa commented Jan 18, 2022

For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

Just to make sure, have you verified all of these are doing the same kind of work?

println! will naturally be faster since its going to write its entire output to a io::StdoutLock, which not only avoids taking a potentially concurrent lock across format segments, but also is line-buffered, so you can expect only one syscall per line in most cases. io::Stderr, on the other hand, is not buffered at all, and also internally will take an exclusive resource lock for every single format segment.

Just making sure because this is a common gotcha people hit when benchmarking similar stuff.


The performance of emitting tracing events and spans is heavily dependent on the implementation of the subscribers that have been registered.

Right now we set up the formatting subscriber to use io::stderr writer – possibly the least efficient possible writer to use (from the ones that can be considered built-in; in part for the reasons outlined above). Consider running the same test with different sorts of writers and subscribers in place and see how that goes.

Of particular interest may be the tracing_appender::non_blocking::NonBlocking.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Jan 18, 2022

> > For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

Just to make sure, have you verified all of these are doing the same kind of work?

I did the benchmark, and I wrote my own formatting subscriber. They are not doing exactly the same amount of work.

For example, info!(?a, ?b, "abc: {}", val); will do formatting with std::fmt::Debug 3 times. Two are written explicitly with ?a, ?b, and there is an extra for message argument that's not needed.

At the same time, if a, b were String, format!("abc :{} a={} b={}", val, a, b) would work.
Also we can rewrite the code to be info!("abc: {} a={} b={}", val, a, ?b);. However, tracing does one extra call to std::fmt::Debug, which makes things slower. I think this is a bug,

While I agree that the amount of work is not the same. There is overhead caused by doing this extra formatting. The reason comes to this piece of code:

    fn record_debug(&mut self, field: &tracing::field::Field, value: &dyn std::fmt::Debug) {
        self.0.push((field.name(), Value::Debug(format!("{:?}", value))));
    }

Formatting dyn std::fmt::Debug is simply slow.

Though, the other 800ns is takes by printlin.

I tried creating a tracing subscriber, that doesn't nothing except format!("...") with dummy argument, without using println! and it took around 40-60ns. So definitely, the overhead is split between printlin and doing extra formatting of dyn std::fmt::Debug.

Right now we set up the formatting subscriber to use io::stderr writer – possibly the least efficient possible writer to use (from the ones that can be considered built-in; in part for the reasons outlined above). Consider running the same test with different sorts of writers and subscribers in place and see how that goes.

Of particular interest may be the tracing_appender::non_blocking::NonBlocking.

I probably should publish my benchmark, and then maybe we can find a way to optimize the subscriber, or maybe we can file a bug report to tracing, and get it fixed.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Jan 19, 2022

println! will naturally be faster since its going to write its entire output to a io::StdoutLock, which not only avoids taking a potentially concurrent lock across format segments, but also is line-buffered, so you can expect only one syscall per line in most cases. io::Stderr, on the other hand, is not buffered at all, and also internally will take an exclusive resource lock for every single format segment.

Actually, that's a good point.
I did tests while using info! while outputting data to stdout, it took then 1500ns per call.

However, when using info! while outputting data to stderr, while using the default tracing subscriber, like in production, takes 2100ns.

In summary:

  • With our current setup used by neard in production it takes around 2100ns per each info!, while using default tracing subscriber, and outputting data to stderr.
  • Single println! would take around 800ns.
  • The solution I'm proposing, which writes the data to another thread takes 60ns.

Also, keep in mind, that google cloud machines are around 2-3x slower, than my local desktop, so you need to multiply those values by a factor of 2-3x.

@pmnoxx pmnoxx changed the title tracing::{info, error, warn, trace, debug} is slower by factor of 30 tracing::{info, error, warn, trace, debug} is slower by factor of 50 Jan 19, 2022
@pmnoxx pmnoxx changed the title tracing::{info, error, warn, trace, debug} is slower by factor of 50 tracing::{info, error, warn, trace, debug} is slower by factor of 30 Jan 19, 2022
@pmnoxx pmnoxx removed their assignment Mar 7, 2022
@pmnoxx pmnoxx closed this as completed Mar 7, 2022
@bowenwang1996 bowenwang1996 reopened this Mar 9, 2022
@bowenwang1996
Copy link
Collaborator

@nagisa volunteered to make an improvement on this

near-bulldozer bot pushed a commit that referenced this issue Mar 25, 2022
This will utilize a separate thread to write out the spans and events
without while letting the main computation to proceed with its business.
Additionally, we are buffering the output by lines, thus reducing the
frequency of syscalls that can occur when the subscriber is writing out
parts of the message.

This should mitigate concerns of enabling debug logging as its impact on
performance should now be minimal (putting an event structure onto a
MPSC queue.) There are still costs associated with logging everything
however. Most notably formatting and construction of the
`tracing_core::ValueSet`s still occur on the caller side, so if
constructing those is expensive, the logging might remain expensive.
An example of code sketchy like that (although silly) could be:

```
debug!(message = { std::time::sleep(Duration::from_secs(1)); "hello" })
```

or for a less silly example:

```
debug!("{}", my_vector.iter().map(|...| {
  do_expensive_stuff()
}).collect::<String>())
```

These should be considered a bug (alas one that `tracing` does not have
any tooling to detect, sadly.)

I opted adding a new crate dedicated to observability utilities. From my
experience using things like prometheus will eventually result in a
variety of utilities being written, so this crate eventually would
likely expand in scope...

Fixes #6072 (though I haven't made any actual measurements as to what the improvement really is)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion, leading to research or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants