Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support fail points #2354

Merged
merged 30 commits into from
Oct 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
fcd3767
add fail point
lishuai87 Sep 19, 2017
97f2ad0
add grpc fail point
lishuai87 Sep 20, 2017
f60ecb7
add tikv-fail
lishuai87 Sep 21, 2017
48adf0e
Cargo: update fail
overvenus Sep 26, 2017
3315d2a
*: get rid of useless closures
overvenus Sep 26, 2017
a42b931
Cargo: update fail
overvenus Sep 27, 2017
5435fec
Merge branch 'master' into ov/raft-fail-point
overvenus Sep 27, 2017
9c807b4
Fix build
overvenus Sep 27, 2017
e371910
Cargo, Makefile: add a feature for disabling failpoint
overvenus Sep 28, 2017
4ed0993
tests: remove fail point test
overvenus Sep 28, 2017
7f6cd5d
tikv-fail: support inject and recover failures
overvenus Sep 28, 2017
62a43ae
*: clean up
overvenus Sep 28, 2017
55b31cb
Merge branch 'master' into ov/raft-fail-point
overvenus Sep 29, 2017
9b4c3cd
Cargo: update fail
overvenus Sep 29, 2017
3767a8d
Address comments
overvenus Sep 30, 2017
62309b9
tests: add a fail point test
overvenus Sep 30, 2017
794ec70
tikv-fail: fix copyright
overvenus Sep 30, 2017
28c6778
Merge branch 'master' into ov/raft-fail-point
overvenus Sep 30, 2017
728109c
Address comments
overvenus Oct 2, 2017
4ca4c37
Merge branch 'master' into ov/raft-fail-point
overvenus Oct 2, 2017
8677a4f
Address comments
overvenus Oct 10, 2017
c4cbf5d
Address comments
overvenus Oct 10, 2017
cb577ab
Cargo: update fail
overvenus Oct 11, 2017
fbc0614
Merge branch 'master' into ov/raft-fail-point
overvenus Oct 12, 2017
633ffa8
Address comments and test list fail points
overvenus Oct 12, 2017
8b54140
Address comments
overvenus Oct 13, 2017
ecf3fc7
Address comments
overvenus Oct 13, 2017
f8e32a3
Address comments
overvenus Oct 13, 2017
106d58d
Merge branch 'master' into ov/raft-fail-point
overvenus Oct 13, 2017
be054bc
Merge branch 'master' into ov/raft-fail-point
overvenus Oct 13, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ static-link = ["rocksdb/static-link"]
portable = ["rocksdb/portable"]
sse = ["rocksdb/sse"]
mem-profiling = ["jemallocator"]
no-fail = ["fail/no_fail"]

[lib]
name = "tikv"
Expand All @@ -28,6 +29,9 @@ path = "benches/bin/bench-tikv.rs"
[[bin]]
name = "tikv-ctl"

[[bin]]
name = "tikv-fail"

[[test]]
name = "tests"

Expand Down Expand Up @@ -75,6 +79,7 @@ git = "https://github.com/pingcap/rust-rocksdb.git"

[dependencies.kvproto]
git = "https://github.com/pingcap/kvproto.git"
branch = "ov/fail-point"

[dependencies.tipb]
git = "https://github.com/pingcap/tipb.git"
Expand All @@ -90,6 +95,9 @@ branch = "dev"
features = ["profiling"]
optional = true

[dependencies.fail]
git = "https://github.com/pingcap/fail-rs.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use crates.io version?


[profile.dev]
opt-level = 0 # Controls the --opt-level the compiler builds with
debug = true # Controls whether the compiler passes `-g`
Expand Down
9 changes: 8 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ ifeq ($(ROCKSDB_SYS_SSE),1)
ENABLE_FEATURES += sse
endif

ifneq ($(FAIL_POINT),1)
ENABLE_FEATURES += no-fail
endif

PROJECT_DIR:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))

DEPS_PATH = $(CURDIR)/tmp
Expand All @@ -37,7 +41,7 @@ run:
release:
cargo build --release --features "${ENABLE_FEATURES}"
@mkdir -p ${BIN_PATH}
cp -f ${CARGO_TARGET_DIR}/release/tikv-ctl ${CARGO_TARGET_DIR}/release/tikv-server ${BIN_PATH}/
cp -f ${CARGO_TARGET_DIR}/release/tikv-ctl ${CARGO_TARGET_DIR}/release/tikv-fail ${CARGO_TARGET_DIR}/release/tikv-server ${BIN_PATH}/

static_release:
ROCKSDB_SYS_STATIC=1 ROCKSDB_SYS_PORTABLE=1 ROCKSDB_SYS_SSE=1 make release
Expand All @@ -48,6 +52,9 @@ static_unportable_release:
static_prof_release:
ENABLE_FEATURES=mem-profiling make static_release

static_fail_release:
FAIL_POINT=1 make static_release

# unlike test, this target will trace tests and output logs when fail test is detected.
trace_test:
export CI=true && \
Expand Down
161 changes: 161 additions & 0 deletions src/bin/tikv-fail.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
// Copyright 2017 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// See the License for the specific language governing permissions and
// limitations under the License.

#![feature(plugin)]
#![cfg_attr(feature = "dev", plugin(clippy))]

/// Inject failures to `TikV`.
///
/// TODO: Integrate into tikv-ctl.
///
/// # Examples
///
/// ```sh
/// ./tikv-fail -a 127.0.0.1:9090 inject\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we support a failure list?

// tikv::raftstore::store::store::raft_between_save=panic
/// ```
///

extern crate tikv;
extern crate clap;
extern crate grpcio as grpc;
extern crate protobuf;
extern crate kvproto;

use std::fs;
use std::io::{BufRead, BufReader};
use std::str;
use std::time::Duration;
use std::sync::Arc;

use clap::{App, Arg, SubCommand};
use grpc::{CallOption, ChannelBuilder, EnvBuilder};
use kvproto::debugpb;
use kvproto::debugpb_grpc::DebugClient;

fn main() {
let app = App::new("TiKV fail point")
.author("PingCAP")
.about("A tool for injecting failures to TiKV and recovery")
.arg(
Arg::with_name("addr")
.short("a")
.takes_value(true)
.help("set tikv ip:port"),
)
.subcommand(
SubCommand::with_name("inject")
.about("Inject failures")
.arg(
Arg::with_name("args")
.multiple(true)
.takes_value(true)
.help(
"Inject fail point and actions pairs.\
E.g. tikv-fail inject fail::a=off fail::b=panic",
),
)
.arg(
Arg::with_name("file")
.short("f")
.takes_value(true)
.help("Read a file of fail points and actions to inject"),
),
)
.subcommand(
SubCommand::with_name("recover")
.about("Recover failures")
.arg(
Arg::with_name("args")
.multiple(true)
.takes_value(true)
.help("Recover fail points. Eg. tikv-fail recover fail::a fail::b"),
)
.arg(
Arg::with_name("file")
.short("f")
.takes_value(true)
.help("Recover from a file of fail points"),
),
)
.subcommand(SubCommand::with_name("list").about("List all fail points"));
let matches = app.clone().get_matches();
let addr = matches.value_of("addr").unwrap();
let addr = addr.trim_left_matches("http://");

let env = Arc::new(EnvBuilder::new().name_prefix("tikv-fail").build());
let channel = ChannelBuilder::new(env).connect(addr);
let client = DebugClient::new(channel);

if let Some(matches) = matches.subcommand_matches("inject") {
let mut list = matches.value_of("file").map_or_else(Vec::new, read_file);
if let Some(ps) = matches.values_of("args") {
for pair in ps {
let mut parts = pair.split('=');
list.push((
parts.next().unwrap().to_owned(),
parts.next().unwrap_or("").to_owned(),
))
}
}

for (name, actions) in list {
if actions.is_empty() {
println!("No action for fail point {}", name);
continue;
}
let mut inject_req = debugpb::InjectFailPointRequest::new();
inject_req.set_name(name);
inject_req.set_actions(actions);

let option = CallOption::default().timeout(Duration::from_secs(10));
client.inject_fail_point_opt(inject_req, option).unwrap();
}
} else if let Some(matches) = matches.subcommand_matches("recover") {
let mut list = matches.value_of("file").map_or_else(Vec::new, read_file);
if let Some(fps) = matches.values_of("args") {
for fp in fps {
list.push((fp.to_owned(), "".to_owned()))
}
}

for (name, _) in list {
let mut recover_req = debugpb::RecoverFailPointRequest::new();
recover_req.set_name(name);

let option = CallOption::default().timeout(Duration::from_secs(10));
client.recover_fail_point_opt(recover_req, option).unwrap();
}
} else if matches.is_present("list") {
let list_req = debugpb::ListFailPointsRequest::new();
let option = CallOption::default().timeout(Duration::from_secs(10));
let resp = client.list_fail_points_opt(list_req, option).unwrap();
println!("{:?}", resp.get_entries());
}
}

fn read_file(path: &str) -> Vec<(String, String)> {
let f = fs::File::open(path).unwrap();
let f = BufReader::new(f);

let mut list = vec![];
for line in f.lines() {
let line = line.unwrap();
let mut parts = line.split('=');
list.push((
parts.next().unwrap().to_owned(),
parts.next().unwrap_or("").to_owned(),
))
}
list
}
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ extern crate toml;
extern crate sys_info;
#[cfg(test)]
extern crate utime;
#[macro_use]
extern crate fail;

#[macro_use]
pub mod util;
Expand Down
2 changes: 2 additions & 0 deletions src/raftstore/store/peer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -735,6 +735,7 @@ impl Peer {
// The leader can write to disk and replicate to the followers concurrently
// For more details, check raft thesis 10.2.1.
if self.is_leader() {
fail_point!("raft_before_leader_send");
let msgs = ready.messages.drain(..);
self.send(ctx.trans, msgs, &mut ctx.metrics.message)
.unwrap_or_else(|e| {
Expand Down Expand Up @@ -770,6 +771,7 @@ impl Peer {
let apply_snap_result = self.mut_store().post_ready(invoke_ctx);

if !self.is_leader() {
fail_point!("raft_before_follower_send");
self.send(trans, ready.messages.drain(..), &mut metrics.message)
.unwrap_or_else(|e| {
warn!("{} follower send messages err {:?}", self.tag, e);
Expand Down
3 changes: 3 additions & 0 deletions src/raftstore/store/peer_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1097,12 +1097,15 @@ impl PeerStorage {
let snapshot_index = if raft::is_empty_snap(&ready.snapshot) {
0
} else {
fail_point!("raft_before_apply_snap");
self.apply_snapshot(
&mut ctx,
&ready.snapshot,
&ready_ctx.kv_wb,
&ready_ctx.raft_wb,
)?;
fail_point!("raft_after_apply_snap");

last_index(&ctx.raft_state)
};

Expand Down
3 changes: 3 additions & 0 deletions src/raftstore/store/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,7 @@ impl<T: Transport, C: PdClient> Store<T, C> {
// apply_snapshot, peer_destroy will clear_meta, so we need write region state first.
// otherwise, if program restart between two write, raft log will be removed,
// but region state may not changed in disk.
fail_point!("raft_before_save");
if !kv_wb.is_empty() {
// RegionLocalState, ApplyState
let mut write_opts = WriteOptions::new();
Expand All @@ -1118,6 +1119,7 @@ impl<T: Transport, C: PdClient> Store<T, C> {
panic!("{} failed to save append state result: {:?}", self.tag, e);
});
}
fail_point!("raft_between_save");

if !raft_wb.is_empty() {
// RaftLocalState, Raft Log Entry
Expand All @@ -1129,6 +1131,7 @@ impl<T: Transport, C: PdClient> Store<T, C> {
panic!("{} failed to save raft append result: {:?}", self.tag, e);
});
}
fail_point!("raft_after_save");

let mut ready_results = Vec::with_capacity(append_res.len());
for (mut ready, invoke_ctx) in append_res {
Expand Down
Loading