Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support fail points #2354

Merged
merged 30 commits into from
Oct 13, 2017
Merged

*: support fail points #2354

merged 30 commits into from
Oct 13, 2017

Conversation

overvenus
Copy link
Member

This PR adds raft fail points and a tool for injecting failures.

CC pingcap/kvproto#200

/// # Examples
///
/// ```sh
/// ./tikv-fail -a 127.0.0.1:9090 inject\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we support a failure list?

@@ -90,6 +95,9 @@ branch = "dev"
features = ["profiling"]
optional = true

[dependencies.fail]
git = "https://github.com/pingcap/fail-rs.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use crates.io version?

use kvproto::debugpb;
use kvproto::debugpb_grpc::DebugClient;

fn main() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be merged with TiKV Control?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@overvenus
Copy link
Member Author

PTAL

#![feature(plugin)]
#![cfg_attr(feature = "dev", plugin(clippy))]
#![cfg_attr(not(feature = "dev"), allow(unknown_lints))]
#![allow(needless_pass_by_value)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow?

let app = App::new("TiKV fail point")
.author("PingCAP")
.about(
"Distributed transactional key value database powered by Rust and Raft",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the about message suitable for this binary?

Arg::with_name("actions")
.short("a")
.takes_value(true)
.help("A list of fail point and action to inject"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true.

),
)
.subcommand(
SubCommand::with_name("recover")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ./tikv-fail recover point1 point2 is more handy.

}
}

fn read_list(matches: &ArgMatches) -> Vec<(String, String)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Arg::multiple instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the configured failpoints be printed out at runtime?

SubCommand::with_name("inject")
.about("Inject failures")
.arg(
Arg::with_name("failpoint")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just support list should be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tikv-fail inject point1 actions point2 actions is simpler. And then -f can be use as short flag for file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tikv-fail inject p1 a1 p2 a2 is hard to achive with clap, see demo[1]. Also I think batch injecting with a file is more simpler. If you want f for file, I can set n for "failpoint", "n" means "name".

[1] https://play.rust-lang.org/?gist=27f38b67c051359a6e28c3dc6e66c4dd&version=stable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's because an additional arg is missing. For example, you can add following to the subcommand:

.arg(
    Arg::with_name("args")
        .takes_value(true)
        .multiple(true),
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject is not an Arg

I didn't say that. What I said was .arg(...) could be added to subcommand inject to accept positional args.

..., but an App.

I don't get it. So SubCommand::with_name("inject") at L58 create an App instead of a SubCommand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it turns out that SubCommand::with_name will return an App instead of a SubCommand. Very weird API design. But the approach I proposed should still work.

inject_req.set_name(name);
inject_req.set_actions(actions);

println!("Req {:?}", inject_req);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message may be annoying.

}
}

fn read_list(matches: &ArgMatches) -> Vec<(String, String)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Makefile Outdated
@@ -35,9 +35,9 @@ run:
cargo run --features "${ENABLE_FEATURES}"

release:
cargo build --release --features "${ENABLE_FEATURES}"
cargo build --release --features "${ENABLE_FEATURES} no-fail"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using a var to control the fail build like RocksDB does.

E.g., we can support a fail_release to build.

@overvenus overvenus changed the title [DNM] *: support fail points *: support fail points Oct 12, 2017
///
/// ```sh
/// ./tikv-fail -a 127.0.0.1:9090 inject\
// -f tikv::raftstore::store::store::raft_between_save\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we support regexp like using "raft_between_save" directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tikv::raftstore::store::store::raft_between_save is too long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems can't, unless tikv-fail knows all possible failpoints on remote servers. ListFailPoints returns failpoints that are injected already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can get the fail point list at first, and use a match rule like checking action name at first, then plus structure name, then plus mod name, etc.

In most cases, we will use different action names directly, rarely use duplicated names. And If we support passing injection list in one command, the cost of getting the fail list at first can be neglected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean there is no way to get a full list of fail points on a remote server.
ListFailpoints returns nothing if there is no injected[1] failpoints, so I can't get the fail point list at first.

[1] injected: it is not something we hard code in tikv source file, it's injected from tikv-fail. eg.

// raft.rs
fail_points("a");
fail_points("b");

tikv-fail inject a off => tik-fail list -> a
tikv-fail inject b off => tik-fail list -> a, b
tikv-fail recover a => tik-fail list -> b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we have no way to get the registered fail point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

let mut buffer = String::new();
fs::File::open(path)
.unwrap()
.read_to_string(&mut buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use BufReader instead? So you can collect lines without allocate memory twice the size. Read line lazily is better.

@siddontang
Copy link
Contributor

LGTM

@overvenus
Copy link
Member Author

/run-all-test

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@overvenus overvenus merged commit 89002f6 into master Oct 13, 2017
@overvenus overvenus deleted the ov/raft-fail-point branch October 13, 2017 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants