`tracing::{info, error, warn, trace, debug}` is slower by factor of 30 #6072

pmnoxx · 2022-01-14T22:49:47Z

We are currently using tracing for logging. This is done through info, error, warn, trace, debug macros.
By default, tracing logs data to stderr, or the data gets redirected through pipe to a file.

I did a few measurements, that shows that it's taking a considerable amount of time and that can be improved.

For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

// tracing::info!                   time:   [1.5989 us 1.6017 us 1.6065 us]
// log::info!                   time:   [1.8204 us 1.8230 us 1.8256 us]
// println!                time:   [809.49 ns 810.59 ns 811.76 ns]

However, when the data is redirected to /dev/null, then it takes less time to execute those commands.

// to /dev/null
// println!                time:   [265.99 ns 267.48 ns 269.30 ns]
// tracing::info!                   time:   [742.95 ns 743.34 ns 743.79 ns]
// log::info!                   time:   [992.79 ns 993.24 ns 993.76 ns]

This shows that, the tracing:info! is blocking, and does a syscall which blocks the thread from executing, and it practice thread can be frozen for a longer period of time while doing disk write.

On the other hand, doing only format! takes only around 40ns. Due to, both the tracing being inefficient at both formatting the data and writing them to stdout, the time to write a log increases from 40ns to 1600ns

// only format!
// format!                 time:   [38.676 ns 38.705 ns 38.740 ns]

I did some experiments, when instead of using tracing, and instead of writing data to stderr, we write data to a concurrent, queue, which then gets consumed by another process. The result is that it takes only around 60ns to write a log, instead of 1600ns.

// use format! and send result to conqueue
// format+conqueue         time:   [61.043 ns 61.050 ns 61.059 ns]

My suggestion is to do the following.

Replace, tracing::{info, error, warn, trace, debug}, with our own code, which formats the data, and sends an event through MPSC conqueue to another thread
From another thread we have two choices
a) write data using println!, this is what tracing does internally. It will take around 700ns, but at the very least we won't block the target thread
b) send data using UDP to syslog (https://en.wikipedia.org/wiki/Syslog)
A syslog service will be responsible for writing data to disk, and our process will not be blocked by disk writes anymore.

Drawbacks:

The conqueue can potentially get larger, than we can handle.
However, that's still better than, the current solution, which is to block the log writer thread.
We can implement a custom strategy of dropping logs, to mitigate this. The advantage of this solution, is that we would still keep the current process alive, while logs.

The text was updated successfully, but these errors were encountered:

nagisa · 2022-01-18T14:53:39Z

For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

Just to make sure, have you verified all of these are doing the same kind of work?

println! will naturally be faster since its going to write its entire output to a io::StdoutLock, which not only avoids taking a potentially concurrent lock across format segments, but also is line-buffered, so you can expect only one syscall per line in most cases. io::Stderr, on the other hand, is not buffered at all, and also internally will take an exclusive resource lock for every single format segment.

Just making sure because this is a common gotcha people hit when benchmarking similar stuff.

The performance of emitting tracing events and spans is heavily dependent on the implementation of the subscribers that have been registered.

Right now we set up the formatting subscriber to use io::stderr writer – possibly the least efficient possible writer to use (from the ones that can be considered built-in; in part for the reasons outlined above). Consider running the same test with different sorts of writers and subscribers in place and see how that goes.

Of particular interest may be the tracing_appender::non_blocking::NonBlocking.

pmnoxx · 2022-01-18T18:57:44Z

> > For example tracing::info! takes around 1600ns to execute, when it's output is logged to file, while println! takes "only" 800 ns.

Just to make sure, have you verified all of these are doing the same kind of work?

I did the benchmark, and I wrote my own formatting subscriber. They are not doing exactly the same amount of work.

For example, info!(?a, ?b, "abc: {}", val); will do formatting with std::fmt::Debug 3 times. Two are written explicitly with ?a, ?b, and there is an extra for message argument that's not needed.

At the same time, if a, b were String, format!("abc :{} a={} b={}", val, a, b) would work.
Also we can rewrite the code to be info!("abc: {} a={} b={}", val, a, ?b);. However, tracing does one extra call to std::fmt::Debug, which makes things slower. I think this is a bug,

While I agree that the amount of work is not the same. There is overhead caused by doing this extra formatting. The reason comes to this piece of code:

    fn record_debug(&mut self, field: &tracing::field::Field, value: &dyn std::fmt::Debug) {
        self.0.push((field.name(), Value::Debug(format!("{:?}", value))));
    }

Formatting dyn std::fmt::Debug is simply slow.

Though, the other 800ns is takes by printlin.

I tried creating a tracing subscriber, that doesn't nothing except format!("...") with dummy argument, without using println! and it took around 40-60ns. So definitely, the overhead is split between printlin and doing extra formatting of dyn std::fmt::Debug.

Right now we set up the formatting subscriber to use io::stderr writer – possibly the least efficient possible writer to use (from the ones that can be considered built-in; in part for the reasons outlined above). Consider running the same test with different sorts of writers and subscribers in place and see how that goes.

Of particular interest may be the tracing_appender::non_blocking::NonBlocking.

I probably should publish my benchmark, and then maybe we can find a way to optimize the subscriber, or maybe we can file a bug report to tracing, and get it fixed.

pmnoxx · 2022-01-19T01:10:15Z

println! will naturally be faster since its going to write its entire output to a io::StdoutLock, which not only avoids taking a potentially concurrent lock across format segments, but also is line-buffered, so you can expect only one syscall per line in most cases. io::Stderr, on the other hand, is not buffered at all, and also internally will take an exclusive resource lock for every single format segment.

Actually, that's a good point.
I did tests while using info! while outputting data to stdout, it took then 1500ns per call.

However, when using info! while outputting data to stderr, while using the default tracing subscriber, like in production, takes 2100ns.

In summary:

With our current setup used by neard in production it takes around 2100ns per each info!, while using default tracing subscriber, and outputting data to stderr.
Single println! would take around 800ns.
The solution I'm proposing, which writes the data to another thread takes 60ns.

Also, keep in mind, that google cloud machines are around 2-3x slower, than my local desktop, so you need to multiply those values by a factor of 2-3x.

bowenwang1996 · 2022-03-21T16:55:45Z

@nagisa volunteered to make an improvement on this

This will utilize a separate thread to write out the spans and events without while letting the main computation to proceed with its business. Additionally, we are buffering the output by lines, thus reducing the frequency of syscalls that can occur when the subscriber is writing out parts of the message. This should mitigate concerns of enabling debug logging as its impact on performance should now be minimal (putting an event structure onto a MPSC queue.) There are still costs associated with logging everything however. Most notably formatting and construction of the `tracing_core::ValueSet`s still occur on the caller side, so if constructing those is expensive, the logging might remain expensive. An example of code sketchy like that (although silly) could be: ``` debug!(message = { std::time::sleep(Duration::from_secs(1)); "hello" }) ``` or for a less silly example: ``` debug!("{}", my_vector.iter().map(|...| { do_expensive_stuff() }).collect::<String>()) ``` These should be considered a bug (alas one that `tracing` does not have any tooling to detect, sadly.) I opted adding a new crate dedicated to observability utilities. From my experience using things like prometheus will eventually result in a variety of utilities being written, so this crate eventually would likely expand in scope... Fixes #6072 (though I haven't made any actual measurements as to what the improvement really is)

pmnoxx added the C-discussion Category: Discussion, leading to research or enhancement label Jan 14, 2022

pmnoxx self-assigned this Jan 14, 2022

pmnoxx mentioned this issue Jan 14, 2022

Support log rotation #6009

Open

2 tasks

pmnoxx changed the title ~~tracing::{info, error, warn, trace, debug} take long time~~ tracing::{info, error, warn, trace, debug} is slower by factor of 30 Jan 14, 2022

pmnoxx changed the title ~~tracing::{info, error, warn, trace, debug} is slower by factor of 30~~ tracing::{info, error, warn, trace, debug} is slower by factor of 50 Jan 19, 2022

pmnoxx changed the title ~~tracing::{info, error, warn, trace, debug} is slower by factor of 50~~ tracing::{info, error, warn, trace, debug} is slower by factor of 30 Jan 19, 2022

bowenwang1996 mentioned this issue Jan 24, 2022

Consolidate performance related logging #4040

Closed

pmnoxx removed their assignment Mar 7, 2022

pmnoxx closed this as completed Mar 7, 2022

bowenwang1996 reopened this Mar 9, 2022

bowenwang1996 assigned nagisa Mar 21, 2022

nagisa mentioned this issue Mar 21, 2022

Use non-blocking log writer #6470

Merged

near-bulldozer bot closed this as completed in #6470 Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tracing::{info, error, warn, trace, debug}` is slower by factor of 30 #6072

`tracing::{info, error, warn, trace, debug}` is slower by factor of 30 #6072

pmnoxx commented Jan 14, 2022 •

edited

Loading

nagisa commented Jan 18, 2022 •

edited

Loading

pmnoxx commented Jan 18, 2022 •

edited

Loading

pmnoxx commented Jan 19, 2022 •

edited

Loading

bowenwang1996 commented Mar 21, 2022

tracing::{info, error, warn, trace, debug} is slower by factor of 30 #6072

tracing::{info, error, warn, trace, debug} is slower by factor of 30 #6072

Comments

pmnoxx commented Jan 14, 2022 • edited Loading

nagisa commented Jan 18, 2022 • edited Loading

pmnoxx commented Jan 18, 2022 • edited Loading

pmnoxx commented Jan 19, 2022 • edited Loading

bowenwang1996 commented Mar 21, 2022

`tracing::{info, error, warn, trace, debug}` is slower by factor of 30 #6072

`tracing::{info, error, warn, trace, debug}` is slower by factor of 30 #6072

pmnoxx commented Jan 14, 2022 •

edited

Loading

nagisa commented Jan 18, 2022 •

edited

Loading

pmnoxx commented Jan 18, 2022 •

edited

Loading

pmnoxx commented Jan 19, 2022 •

edited

Loading