Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Use slog #3010

Merged
merged 14 commits into from
May 14, 2018
Merged

*: Use slog #3010

merged 14 commits into from
May 14, 2018

Conversation

Hoverbear
Copy link
Contributor

@Hoverbear Hoverbear commented May 1, 2018

After speaking with @siddontang we think this is a better option than #2997.

Motivation

We noted with some users that logging was taking a considerable amount of time during system operation. This is due to the fact that the log crate operates syncronously. We noted when exploring #2997 that moving to asyncronous logging was measurably faster.

During our discussion it was decided that slog was a good alternative that offered built in asyncronous logging capabilities. It offers other benefits as well, such as structured logging and fancy terminal output.

Implementation

This PR moves us to using slog via the slog_stdlog crate which essentially adapts the log based macros to use slog. In the future we can place work into migrating to the slog based logging macros which offer the option for structure data.

With this PR it was possible to completely remove the StdErrLogger and use the provided slog_term terminal logger. The RotatingFileLogger was altered so it simply implements Write and can be called from the slog decorators. Additionally the distinction between the Logger and the LoggerCore was removed, as well as the mutex which was involved.

A future PR will implement file rotation by size.

Benchmarks

On master:

Begin to run: raftstore
benching Set on raft cluster (channel), nodes: 1, value len:    8       ...     29,214 ns/iter (+/- 5,149)
benching Set on raft cluster (channel), nodes: 1, value len:  128       ...     41,056 ns/iter (+/- 7,479)
benching Set on raft cluster (channel), nodes: 1, value len: 1024       ...     45,485 ns/iter (+/- 12,575)
benching Set on raft cluster (channel), nodes: 1, value len: 4096       ...     63,040 ns/iter (+/- 35,221)
benching Get on raft cluster (channel), nodes: 1        ...     16,488 ns/iter (+/- 872)
benching Delete on raft cluster (channel),      nodes: 1        ...     38,510 ns/iter (+/- 9,025)
benching Set on raft cluster (channel), nodes: 3, value len:    8       ...     87,967 ns/iter (+/- 12,569)
benching Set on raft cluster (channel), nodes: 3, value len:  128       ...     88,835 ns/iter (+/- 12,777)
benching Set on raft cluster (channel), nodes: 3, value len: 1024       ...    100,263 ns/iter (+/- 14,892)
benching Set on raft cluster (channel), nodes: 3, value len: 4096       ...    121,325 ns/iter (+/- 51,883)
benching Get on raft cluster (channel), nodes: 3        ...     15,778 ns/iter (+/- 1,064)
benching Delete on raft cluster (channel),      nodes: 3        ...     89,725 ns/iter (+/- 12,090)
benching Set on raft cluster (channel), nodes: 5, value len:    8       ...    110,969 ns/iter (+/- 8,464)
benching Set on raft cluster (channel), nodes: 5, value len:  128       ...    113,681 ns/iter (+/- 9,638)
benching Set on raft cluster (channel), nodes: 5, value len: 1024       ...    123,668 ns/iter (+/- 12,713)
benching Set on raft cluster (channel), nodes: 5, value len: 4096       ...    149,208 ns/iter (+/- 76,215)
benching Get on raft cluster (channel), nodes: 5        ...     16,907 ns/iter (+/- 1,410)
benching Delete on raft cluster (channel),      nodes: 5        ...    110,360 ns/iter (+/- 11,433)
benching Set on raft cluster (tcp),     nodes: 1, value len:    8       ...     37,840 ns/iter (+/- 8,776)
benching Set on raft cluster (tcp),     nodes: 1, value len:  128       ...     39,376 ns/iter (+/- 12,806)
benching Set on raft cluster (tcp),     nodes: 1, value len: 1024       ...     45,441 ns/iter (+/- 11,976)
benching Set on raft cluster (tcp),     nodes: 1, value len: 4096       ...     68,510 ns/iter (+/- 15,056)
benching Get on raft cluster (tcp),     nodes: 1        ...     16,730 ns/iter (+/- 1,119)
benching Delete on raft cluster (tcp),  nodes: 1        ...     36,457 ns/iter (+/- 12,851)
benching Set on raft cluster (tcp),     nodes: 3, value len:    8       ...    214,640 ns/iter (+/- 18,331)
benching Set on raft cluster (tcp),     nodes: 3, value len:  128       ...    216,273 ns/iter (+/- 23,011)
benching Set on raft cluster (tcp),     nodes: 3, value len: 1024       ...    226,920 ns/iter (+/- 32,660)
benching Set on raft cluster (tcp),     nodes: 3, value len: 4096       ...    167,235 ns/iter (+/- 17,127)
benching Get on raft cluster (tcp),     nodes: 3        ...     17,650 ns/iter (+/- 1,727)
benching Delete on raft cluster (tcp),  nodes: 3        ...    221,035 ns/iter (+/- 25,323)
benching Set on raft cluster (tcp),     nodes: 5, value len:    8       ...    329,054 ns/iter (+/- 12,363)
benching Set on raft cluster (tcp),     nodes: 5, value len:  128       ...    331,151 ns/iter (+/- 25,594)
benching Set on raft cluster (tcp),     nodes: 5, value len: 1024       ...    343,870 ns/iter (+/- 24,609)
benching Set on raft cluster (tcp),     nodes: 5, value len: 4096       ...    375,979 ns/iter (+/- 28,572)
benching Get on raft cluster (tcp),     nodes: 5        ...     17,735 ns/iter (+/- 1,447)
benching Delete on raft cluster (tcp),  nodes: 5        ...    329,308 ns/iter (+/- 23,500)

With this PR:

Begin to run: raftstore
benching Set on raft cluster (channel), nodes: 1, value len:    8       ...     30,221 ns/iter (+/- 7,197)
benching Set on raft cluster (channel), nodes: 1, value len:  128       ...     30,800 ns/iter (+/- 7,490)
benching Set on raft cluster (channel), nodes: 1, value len: 1024       ...     41,309 ns/iter (+/- 6,337)
benching Set on raft cluster (channel), nodes: 1, value len: 4096       ...     58,102 ns/iter (+/- 35,968)
benching Get on raft cluster (channel), nodes: 1        ...     15,947 ns/iter (+/- 969)
benching Delete on raft cluster (channel),      nodes: 1        ...     32,619 ns/iter (+/- 7,035)
benching Set on raft cluster (channel), nodes: 3, value len:    8       ...     80,111 ns/iter (+/- 13,715)
benching Set on raft cluster (channel), nodes: 3, value len:  128       ...     80,830 ns/iter (+/- 13,125)
benching Set on raft cluster (channel), nodes: 3, value len: 1024       ...     87,357 ns/iter (+/- 11,750)
benching Set on raft cluster (channel), nodes: 3, value len: 4096       ...    113,161 ns/iter (+/- 66,971)
benching Get on raft cluster (channel), nodes: 3        ...     15,388 ns/iter (+/- 1,103)
benching Delete on raft cluster (channel),      nodes: 3        ...     81,159 ns/iter (+/- 13,002)
benching Set on raft cluster (channel), nodes: 5, value len:    8       ...     91,051 ns/iter (+/- 13,179)
benching Set on raft cluster (channel), nodes: 5, value len:  128       ...     92,492 ns/iter (+/- 15,413)
benching Set on raft cluster (channel), nodes: 5, value len: 1024       ...    102,103 ns/iter (+/- 10,981)
benching Set on raft cluster (channel), nodes: 5, value len: 4096       ...    135,170 ns/iter (+/- 62,198)
benching Get on raft cluster (channel), nodes: 5        ...     16,462 ns/iter (+/- 621)
benching Delete on raft cluster (channel),      nodes: 5        ...     94,448 ns/iter (+/- 16,299)
benching Set on raft cluster (tcp),     nodes: 1, value len:    8       ...     32,616 ns/iter (+/- 9,700)
benching Set on raft cluster (tcp),     nodes: 1, value len:  128       ...     35,519 ns/iter (+/- 7,489)
benching Set on raft cluster (tcp),     nodes: 1, value len: 1024       ...     42,009 ns/iter (+/- 10,693)
benching Set on raft cluster (tcp),     nodes: 1, value len: 4096       ...     58,263 ns/iter (+/- 31,890)
benching Get on raft cluster (tcp),     nodes: 1        ...     16,245 ns/iter (+/- 1,830)
benching Delete on raft cluster (tcp),  nodes: 1        ...     32,963 ns/iter (+/- 18,811)
benching Set on raft cluster (tcp),     nodes: 3, value len:    8       ...    207,939 ns/iter (+/- 19,772)
benching Set on raft cluster (tcp),     nodes: 3, value len:  128       ...    214,757 ns/iter (+/- 29,875)
benching Set on raft cluster (tcp),     nodes: 3, value len: 1024       ...    218,234 ns/iter (+/- 32,567)
benching Set on raft cluster (tcp),     nodes: 3, value len: 4096       ...    247,335 ns/iter (+/- 54,398)
benching Get on raft cluster (tcp),     nodes: 3        ...     16,917 ns/iter (+/- 1,437)
benching Delete on raft cluster (tcp),  nodes: 3        ...    208,355 ns/iter (+/- 23,317)
benching Set on raft cluster (tcp),     nodes: 5, value len:    8       ...    315,500 ns/iter (+/- 25,545)
benching Set on raft cluster (tcp),     nodes: 5, value len:  128       ...    313,588 ns/iter (+/- 32,911)
benching Set on raft cluster (tcp),     nodes: 5, value len: 1024       ...    328,193 ns/iter (+/- 23,980)
benching Set on raft cluster (tcp),     nodes: 5, value len: 4096       ...    363,133 ns/iter (+/- 46,415)
benching Get on raft cluster (tcp),     nodes: 5        ...     17,430 ns/iter (+/- 956)
benching Delete on raft cluster (tcp),  nodes: 5        ...    318,073 ns/iter (+/- 22,214)

Samples

Sample of terminal output:

2018-05-03-163205_1868x2050_scrot

Sample of a log file:

May 03 16:30:01.968 INFO starting working thread: storage-scheduler
May 03 16:30:01.968 INFO starting working thread: end-point-worker
May 03 16:30:01.969 INFO starting working thread: snap-handler
May 03 16:30:01.975 INFO TiKV is ready to serve
May 03 16:30:04.326 INFO heartbeat receiver is refreshed.
May 03 16:30:04.326 INFO heartbeat sender is refreshed.
May 03 16:30:44.693 INFO receive signal 2, stopping server...
May 03 16:30:44.693 INFO stoping end-point-worker
May 03 16:30:44.693 INFO stoping snap-handler
May 03 16:30:44.693 INFO stoping storage-scheduler
May 03 16:30:44.694 INFO scheduler stopped
May 03 16:30:44.694 INFO scheduler stopped
May 03 16:30:44.721 INFO storage RaftKv closed.
May 03 16:30:44.734 INFO stop raft store 1 thread
May 03 16:30:44.734 INFO [store 1] receive quit message
May 03 16:30:44.734 INFO start to stop raftstore.
May 03 16:30:44.734 INFO stoping split check worker
May 03 16:30:44.734 INFO stoping snapshot worker
May 03 16:30:44.734 INFO stoping raft gc worker
May 03 16:30:44.734 INFO stoping compact worker
May 03 16:30:44.734 INFO stoping pd worker
May 03 16:30:44.734 INFO stoping consistency check worker
May 03 16:30:44.735 INFO stoping cleanup sst worker
May 03 16:30:44.735 INFO stoping apply worker
May 03 16:30:44.735 INFO stop raftstore finished.
May 03 16:30:44.735 INFO stoping store address resolve worker

@siddontang
Copy link
Contributor

Cool.

What is the benchmark?

src/config.rs Outdated
@@ -696,15 +696,15 @@ impl Default for MetricConfig {
}

#[derive(Serialize, Deserialize)]
#[serde(remote = "LogLevelFilter")]
#[serde(remote = "Level")]
#[serde(rename_all = "kebab-case")]
pub enum LogLevel {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change.

slog does not have the concept of an Off setting, and changes the name of Warn, while introducing a Critical level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off is barely used, I think it's OK to just removing it. But Warn is commonly used, better make it compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do a conversion internally? some users now use warn in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course!

@Hoverbear
Copy link
Contributor Author

Using root_typed to create loggers instead of root allows for static dispatch, resulting in slightly better performance.

Begin to run: raftstore                                                                      
benching Set on raft cluster (channel), nodes: 1, value len:    8       ...     27,903 ns/ite
benching Set on raft cluster (channel), nodes: 1, value len:  128       ...     29,409 ns/ite
benching Set on raft cluster (channel), nodes: 1, value len: 1024       ...     38,013 ns/ite
benching Set on raft cluster (channel), nodes: 1, value len: 4096       ...     52,222 ns/ite
benching Get on raft cluster (channel), nodes: 1        ...     14,292 ns/iter (+/- 606)
benching Delete on raft cluster (channel),      nodes: 1        ...     37,094 ns/iter (+/- 6
benching Set on raft cluster (channel), nodes: 3, value len:    8       ...     82,538 ns/ite
benching Set on raft cluster (channel), nodes: 3, value len:  128       ...     83,726 ns/ite
benching Set on raft cluster (channel), nodes: 3, value len: 1024       ...     89,531 ns/ite
benching Set on raft cluster (channel), nodes: 3, value len: 4096       ...    117,176 ns/ite
benching Get on raft cluster (channel), nodes: 3        ...     15,463 ns/iter (+/- 267)
benching Delete on raft cluster (channel),      nodes: 3        ...     79,670 ns/iter (+/- 1
benching Set on raft cluster (channel), nodes: 5, value len:    8       ...    100,858 ns/ite
benching Set on raft cluster (channel), nodes: 5, value len:  128       ...    102,090 ns/ite
benching Set on raft cluster (channel), nodes: 5, value len: 1024       ...    112,210 ns/ite
benching Set on raft cluster (channel), nodes: 5, value len: 4096       ...    139,906 ns/ite
benching Get on raft cluster (channel), nodes: 5        ...     16,570 ns/iter (+/- 987)
benching Delete on raft cluster (channel),      nodes: 5        ...    101,935 ns/iter (+/- 1
benching Set on raft cluster (tcp),     nodes: 1, value len:    8       ...     34,674 ns/ite
benching Set on raft cluster (tcp),     nodes: 1, value len:  128       ...     36,785 ns/ite
benching Set on raft cluster (tcp),     nodes: 1, value len: 1024       ...     41,612 ns/ite
benching Set on raft cluster (tcp),     nodes: 1, value len: 4096       ...     60,229 ns/ite
benching Get on raft cluster (tcp),     nodes: 1        ...     16,646 ns/iter (+/- 1,061)
benching Delete on raft cluster (tcp),  nodes: 1        ...     36,733 ns/iter (+/- 6,562)
benching Set on raft cluster (tcp),     nodes: 3, value len:    8       ...    202,911 ns/ite
benching Set on raft cluster (tcp),     nodes: 3, value len:  128       ...    201,478 ns/ite
benching Set on raft cluster (tcp),     nodes: 3, value len: 1024       ...    208,972 ns/ite
benching Set on raft cluster (tcp),     nodes: 3, value len: 4096       ...    239,455 ns/ite
benching Get on raft cluster (tcp),     nodes: 3        ...     16,782 ns/iter (+/- 861)
benching Delete on raft cluster (tcp),  nodes: 3        ...    202,863 ns/iter (+/- 22,222)
benching Set on raft cluster (tcp),     nodes: 5, value len:    8       ...    300,921 ns/ite
benching Set on raft cluster (tcp),     nodes: 5, value len:  128       ...    301,656 ns/ite
benching Set on raft cluster (tcp),     nodes: 5, value len: 1024       ...    314,641 ns/ite
benching Set on raft cluster (tcp),     nodes: 5, value len: 4096       ...    349,159 ns/ite
benching Get on raft cluster (tcp),     nodes: 5        ...     17,086 ns/iter (+/- 791)
benching Delete on raft cluster (tcp),  nodes: 5        ...    302,832 ns/iter (+/- 12,650)

@Hoverbear Hoverbear force-pushed the slog branch 4 times, most recently from b863213 to 8648e36 Compare May 3, 2018 18:43
@Hoverbear
Copy link
Contributor Author

Sample of file based logging:

18/05/03 11:42:11.008 store.rs:3208: [INFO] [store 1] receive quit message ()
18/05/03 11:42:11.008 store.rs:609: [INFO] start to stop raftstore. ()
18/05/03 11:42:11.008 mod.rs:386: [INFO] stoping split check worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping snapshot worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping raft gc worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping compact worker ()
18/05/03 11:42:11.009 future.rs:181: [INFO] stoping pd worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping consistency check worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping cleanup sst worker ()
18/05/03 11:42:11.009 mod.rs:386: [INFO] stoping apply worker ()
18/05/03 11:42:11.010 store.rs:635: [INFO] stop raftstore finished. ()
18/05/03 11:42:11.011 mod.rs:386: [INFO] stoping store address resolve worker ()

I'm not a huge fan of the empty brackets (which is where the currently unused structured data goes) so I'm currently investigating the best way to work around this. I think we might be able to use a decorator like we do in the terminal.

@Hoverbear
Copy link
Contributor Author

I was able to use the PlainDecorator instead of a custom format command to have a consistent style with the terminal logging and avoid any weird () output.

@siddontang
Copy link
Contributor

PTAL @BusyJay @overvenus

src/bin/setup.rs Outdated
@@ -11,6 +11,9 @@
// See the License for the specific language governing permissions and
// limitations under the License.

use slog::{self, Drain};
use slog_async;
use slog_term;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import the structs?

src/config.rs Outdated
@@ -695,16 +697,41 @@ impl Default for MetricConfig {
}
}

// This type exists purely for interacting with Serde, and using `slog::Level` should
// be preferred.
#[derive(Serialize, Deserialize)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like not necessary for Deserialize anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove it and got an error:

   Compiling tikv v2.1.0-alpha.1 (file:///home/hoverbear/git/tikv)
error[E0308]: match arms have incompatible types
   --> src/config.rs:762:28
    |
762 | #[derive(Clone, Serialize, Deserialize, PartialEq, Debug)]
    |                            ^^^^^^^^^^^
    |                            |
    |                            expected enum `slog::Level`, found enum `config::Level`
    |                            match arm with an incompatible type
    |
    = note: expected type `slog::Level`
               found type `config::Level`
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error[E0308]: match arms have incompatible types
   --> src/config.rs:762:28
    |
762 | #[derive(Clone, Serialize, Deserialize, PartialEq, Debug)]
    |                            ^^^^^^^^^^^
    |                            |
    |                            expected enum `slog::Level`, found enum `config::Level`
    |                            match arm with an incompatible type
    |
    = note: expected type `slog::Level`
               found type `config::Level`
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0308`.
error: Could not compile `tikv`.

To learn more, run the command again with --verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to set config.log-level to warn and get it parsed successfully? I'm afraid something has gone wrong.

Also seems the enum can be removed completely now that you custom the deserialization. You can also check out util::config::compression_type_level_serde to see how does it work.

src/config.rs Outdated
#[serde(rename_all = "kebab-case")]
pub enum LogLevel {
pub enum Level {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use FilterLevel? What's the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to slog::LevelFilter (which is a Drain wrapper) or are you suggesting to rename this to LevelFilter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is slog::FilterLevel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a Drain which is a target for logging, it is produced when you call Drain::filter_level.

I'm not sure I understand what you are suggesting I do. 😦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I referred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drain::filter_level accepts a Level which we call here.

I'm not entirely clear what the difference is, but here is their definition and it looks like Level is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem that we need to define a new Drain or pass a filter like https://docs.rs/slog/2.2.3/slog/trait.Drain.html#method.filter if we want to use FilterLevel directly.

Btw, the FilterLevel also uses warning, we still need to do the warn -> warning conversation for the compatibility. If there is no extra benefit and we don't care off, I prefer Level here.

"debug" => Some(Level::Debug),
"trace" => Some(Level::Trace),
"info" => Some(Level::Info),
_ => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em, I prefer using info instead of None here. Log Level is not a critical configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a None here allows the caller to determine if an invalid value was provided, and decide how to act. If we replace this with Level::Info we are forcing a default. Level::Info is should still be the default after this PR.

@siddontang
Copy link
Contributor

Rest LGTM

PTAL @BusyJay

@Hoverbear
Copy link
Contributor Author

While working on #3035 I may have found a bug with log rotation, please DNM while I investigate.

@Hoverbear Hoverbear added DNM and removed DNM labels May 12, 2018
@Hoverbear
Copy link
Contributor Author

Disregard, this is related to #3035 .

@siddontang
Copy link
Contributor

so @Hoverbear

can we merge this now?

@Hoverbear
Copy link
Contributor Author

Hoverbear commented May 13, 2018

@siddontang Yes, no bugs which I know of currently in this PR.

@BusyJay
Copy link
Member

BusyJay commented May 14, 2018

LGTM

@BusyJay
Copy link
Member

BusyJay commented May 14, 2018

/run-all-tests

@siddontang siddontang merged commit ed3768e into master May 14, 2018
@siddontang siddontang deleted the slog branch May 14, 2018 04:52
@Hoverbear Hoverbear mentioned this pull request May 15, 2018
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants