-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: support fail points #2354
*: support fail points #2354
Conversation
/// # Examples | ||
/// | ||
/// ```sh | ||
/// ./tikv-fail -a 127.0.0.1:9090 inject\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we support a failure list?
@@ -90,6 +95,9 @@ branch = "dev" | |||
features = ["profiling"] | |||
optional = true | |||
|
|||
[dependencies.fail] | |||
git = "https://github.com/pingcap/fail-rs.git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use crates.io
version?
src/bin/tikv-fail.rs
Outdated
use kvproto::debugpb; | ||
use kvproto::debugpb_grpc::DebugClient; | ||
|
||
fn main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be merged with TiKV Control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it has been added to TODO. https://github.com/pingcap/tikv/pull/2354/files#diff-3a911291a9738b9e1ccccbde951efdbcR21
Currently, @hicqu is refactoring tikv-ctl.
PTAL |
src/bin/tikv-fail.rs
Outdated
#![feature(plugin)] | ||
#![cfg_attr(feature = "dev", plugin(clippy))] | ||
#![cfg_attr(not(feature = "dev"), allow(unknown_lints))] | ||
#![allow(needless_pass_by_value)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow?
src/bin/tikv-fail.rs
Outdated
let app = App::new("TiKV fail point") | ||
.author("PingCAP") | ||
.about( | ||
"Distributed transactional key value database powered by Rust and Raft", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the about message suitable for this binary?
src/bin/tikv-fail.rs
Outdated
Arg::with_name("actions") | ||
.short("a") | ||
.takes_value(true) | ||
.help("A list of fail point and action to inject"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true.
src/bin/tikv-fail.rs
Outdated
), | ||
) | ||
.subcommand( | ||
SubCommand::with_name("recover") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ./tikv-fail recover point1 point2
is more handy.
src/bin/tikv-fail.rs
Outdated
} | ||
} | ||
|
||
fn read_list(matches: &ArgMatches) -> Vec<(String, String)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use Arg::multiple
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the configured failpoints be printed out at runtime?
src/bin/tikv-fail.rs
Outdated
SubCommand::with_name("inject") | ||
.about("Inject failures") | ||
.arg( | ||
Arg::with_name("failpoint") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just support list should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tikv-fail inject point1 actions point2 actions
is simpler. And then -f
can be use as short flag for file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tikv-fail inject p1 a1 p2 a2
is hard to achive with clap, see demo[1]. Also I think batch injecting with a file is more simpler. If you want f
for file
, I can set n
for "failpoint", "n" means "name".
[1] https://play.rust-lang.org/?gist=27f38b67c051359a6e28c3dc6e66c4dd&version=stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because an additional arg is missing. For example, you can add following to the subcommand:
.arg(
Arg::with_name("args")
.takes_value(true)
.multiple(true),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inject
is not an Arg
, but an App
.
See more:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inject
is not anArg
I didn't say that. What I said was .arg(...)
could be added to subcommand inject
to accept positional args.
..., but an App.
I don't get it. So SubCommand::with_name("inject")
at L58 create an App
instead of a SubCommand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it turns out that SubCommand::with_name
will return an App
instead of a SubCommand
. Very weird API design. But the approach I proposed should still work.
src/bin/tikv-fail.rs
Outdated
inject_req.set_name(name); | ||
inject_req.set_actions(actions); | ||
|
||
println!("Req {:?}", inject_req); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message may be annoying.
src/bin/tikv-fail.rs
Outdated
} | ||
} | ||
|
||
fn read_list(matches: &ArgMatches) -> Vec<(String, String)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
c211a5b
to
c4cbf5d
Compare
Makefile
Outdated
@@ -35,9 +35,9 @@ run: | |||
cargo run --features "${ENABLE_FEATURES}" | |||
|
|||
release: | |||
cargo build --release --features "${ENABLE_FEATURES}" | |||
cargo build --release --features "${ENABLE_FEATURES} no-fail" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using a var to control the fail build like RocksDB does.
E.g., we can support a fail_release
to build.
src/bin/tikv-fail.rs
Outdated
/// | ||
/// ```sh | ||
/// ./tikv-fail -a 127.0.0.1:9090 inject\ | ||
// -f tikv::raftstore::store::store::raft_between_save\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we support regexp like using "raft_between_save" directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tikv::raftstore::store::store::raft_between_save
is too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems can't, unless tikv-fail knows all possible failpoints on remote servers. ListFailPoints
returns failpoints that are injected already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can get the fail point list at first, and use a match rule like checking action name at first, then plus structure name, then plus mod name, etc.
In most cases, we will use different action names directly, rarely use duplicated names. And If we support passing injection list in one command, the cost of getting the fail list at first can be neglected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean there is no way to get a full list of fail points on a remote server.
ListFailpoints
returns nothing if there is no injected[1] failpoints, so I can't get the fail point list at first.
[1] injected: it is not something we hard code in tikv source file, it's injected from tikv-fail. eg.
// raft.rs
fail_points("a");
fail_points("b");
tikv-fail inject a off => tik-fail list -> a
tikv-fail inject b off => tik-fail list -> a, b
tikv-fail recover a => tik-fail list -> b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we have no way to get the registered fail point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
src/bin/tikv-fail.rs
Outdated
let mut buffer = String::new(); | ||
fs::File::open(path) | ||
.unwrap() | ||
.read_to_string(&mut buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use BufReader
instead? So you can collect lines without allocate memory twice the size. Read line lazily is better.
LGTM |
/run-all-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds raft fail points and a tool for injecting failures.
CC pingcap/kvproto#200