Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore/*: gc stale peer out of region #1003

Merged
merged 3 commits into from
Sep 12, 2016

Conversation

hhkbp2
Copy link
Contributor

@hhkbp2 hhkbp2 commented Aug 29, 2016

Fix issue #804

@@ -158,6 +158,8 @@ pub struct Peer {
// if we remove ourself in ChangePeer remove, we should set this flag, then
// any following committed logs in same Ready should be applied failed.
pending_remove: bool,
pub leader_missing: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a Option instead of these two fields.

if let Some(peer) = self.region_peers.get_mut(&region_id) {
let duration = peer.since_leader_missing();
if duration >= self.cfg.max_leader_missing_duration {
info!("leader for peer {} missing for a long time, check with pd whether \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log tag first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to reset the leader missing?
If the peer can't connect to PD or the PD worker doesn't handle it, we may send this message every time in Ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better idea?
There is another case that the corresponding region info is not available in PD for some time. So the PD worker should retry later (due to this message resent). And raftstore doesn't know how the worker is progressing.

Copy link
Contributor

@siddontang siddontang Aug 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to care PD problem here.
IMO, if we find a peer is stale, but can't get the result from PD, we should check the peer again after the checking interval time.

@hhkbp2 hhkbp2 force-pushed the hhkbp2/gc-stale-peer-out-of-region branch from d635d70 to 584c172 Compare August 30, 2016 08:43
if let Some(peer) = self.region_peers.get_mut(&region_id) {
let duration = peer.since_leader_missing();
if duration >= self.cfg.max_leader_missing_duration {
info!("peer {} leader missing for a long time, check with pd whether \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag should be at the start of the log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@hhkbp2
Copy link
Contributor Author

hhkbp2 commented Aug 30, 2016

@BusyJay @siddontang
PTAL closely.

@@ -444,13 +447,36 @@ impl Peer {
ready.hs.take();
}

if let Some(ref soft_state) = ready.ss {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we receive a message, we will create a replicated peer, and if this peer doesn't receive any message later, the peer can be considered stale too.
But this peer can't be ready, so I think we should check this in tick function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@hhkbp2
Copy link
Contributor Author

hhkbp2 commented Sep 1, 2016

@siddontang @BusyJay @ngaut PTAL

// for peer B in case 2 above
// directly destroy peer without data since it doesn't have region range,
// so that it doesn't have the correct region start_key to validate peer with PD
region_to_be_destroyed.push((region_id, peer.peer.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any test to cover this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe no need to push region_id, because peer has it already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case is added for the gc of uninitialized peer.
The region_id is cached for direct call to destroy_peer() without reference to the peer. I think we should keep it this way.

@hhkbp2
Copy link
Contributor Author

hhkbp2 commented Sep 5, 2016

@BusyJay @siddontang PTAL

local_region.get_id(),
peer.get_id(),
local_region.get_start_key());
warn!("xxx local region: {:?}", local_region);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unnecessary logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@siddontang
Copy link
Contributor

LGTM
PTAL @BusyJay

@hhkbp2
Copy link
Contributor Author

hhkbp2 commented Sep 7, 2016

PTAL @BusyJay @zhangjinpeng1987 @ngaut

// before it could successfully receive snapshot from the leader and
// apply that snapshot, no raft ready event will be triggered,
// so that we could not detect the leader is missing for it at here.
if self.is_initialized() && self.leader_missing_time.is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary to check if self.leader_missing_time is some or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@@ -273,13 +273,65 @@ impl<T: Transport, C: PdClient> Store<T, C> {
}

fn on_raft_base_tick(&mut self, event_loop: &mut EventLoop<Self>) {
let mut region_to_be_destroyed = vec![];
for (&region_id, peer) in &mut self.region_peers {
if !peer.get_store().is_applying_snap() {
peer.raft_group.tick();
self.pending_raft_groups.insert(region_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not insert into pending_raft_groups if it's about to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@hhkbp2 hhkbp2 force-pushed the hhkbp2/gc-stale-peer-out-of-region branch 3 times, most recently from 541c0d1 to cc697a1 Compare September 7, 2016 10:43
// in the `leader missing` state. That is because if it's isolated from the leader
// before it could successfully receive snapshot from the leader and
// apply that snapshot, no raft ready event will be triggered,
// so that we could not detect the leader is missing for it at here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the uninitialised peer is isolated rather than removed, it may be destroyed before network recovered. And the further heartbeat can't recreate the peer again due to the tombstone. So the region will lose a peer forever. If it happens during creating second replica, the region will not work forever.

Copy link
Contributor Author

@hhkbp2 hhkbp2 Sep 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raft works at the time granularity of ms. The distribution and re-balance of PD work at the time granularity of minutes. And the peer gc triggered by leader missing works at hours(or days). They form a relationship of complementation rather than clashing.

If a peer could not be initialized after hours(or days) but still counts for quorum, there must be some fatal error in the system. It doesn't matter that much destroying this peer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's user's choice to configure max_leader_missing_duration. It doesn't have to be at hours or days.

@ngaut @siddontang

Copy link
Contributor Author

@hhkbp2 hhkbp2 Sep 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point why the configuration is a problem. The default value of max_leader_missing_duration should be a guide to the user. Without enough knowing of how to configure these parameters, any user should not touch the default value.

If the user want to (or accidentally) make a mess of the system, there are a million ways to do it. e.g. he could configure election timeout to the value of min server RTT, so that there will be no leader elected for any raft group and the system fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration is not a problem, the problem is that destroying an uninitialised peer can cause problem. The configuration I mentioned above is to show complementation should not rely on the configuration assumption. If the configuration is valid (pass the validate check), then the group should work in some way(good performance or bad performance).

Besides, even the region is not working for hours, it won't cause any problem at all if the data in it is not frequently visited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the region_id or the peer_id to ask pd if the uninitialised peer is still in the region before destroying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the tombstone key will be destroyed too, see #978

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can even destroy the tombstone key when destroying the too old stale peer.

Copy link
Contributor Author

@hhkbp2 hhkbp2 Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the region_id or the peer_id to ask pd if the uninitialised peer is still in the region before destroying.

@BusyJay Currently the PD only provides the interface get region info of key.
@siddontang has put a comment above:

For an uninitialized peer, the region range is empty, so if you use the start key to search in PD, you will get the first region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recheck the code, when an uninitialised peer is destroyed, the epoch won't be written. So the tombstone won't stop recreating this peer when it's just isolated. So seems it's fine to destroy the peer here.

@hhkbp2 hhkbp2 force-pushed the hhkbp2/gc-stale-peer-out-of-region branch from cc697a1 to 5e5c239 Compare September 7, 2016 12:15
if self.leader_missing_time.is_none() {
self.leader_missing_time = Some(Instant::now())
}
} else if self.is_initialized() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the peer is uninitialized, if it receives a raft message from leader, I think we should clear leader_missing_time too.

Copy link
Contributor Author

@hhkbp2 hhkbp2 Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment mentioned, if leader_missing_time is set to None even for uninitialized peer, when the peer is isolated before initialized, then there would be no ready event triggered, and this peer couldn't be gc anymore.

It's OK to consider peer in this situation as leader missing, since the leader missing timeout is triggered after a long time. For uninitialized peers, the leader is "missing" due to it fails to do the replication job in a specified long time, rather than fails to send heartbeats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not check this in raft tick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what you mean. The leader missing time is checked in raft tick for every peer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why we check here too?
In tick, we can check the leader too, if has leader, clear leader_missing_time, if no leader, set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that, for uninitialized peers:

  1. A ready event notifies that a leader has emerged, then it "has leader".
  2. This peer get isolated, and no ready event will be triggered. This peer still keeps its state as "has leader".

Then we don't know when this peer could be gc, even do checking it in raft tick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This peer get isolated, and no ready event will be triggered. This peer still keeps its state as "has leader".

In raft, if the follower doesn't receive the message from leader after ElectionTimeout, it will begin to campaign, and clear the leader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In raft, if the follower doesn't receive the message from leader after ElectionTimeout, it will begin to campaign, and clear the leader.

It's true for initialized peers, which already has all nodes' info of the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can use is_initialized() && has_leader to clear the leader_missing_time.
For the un-initialized peer, if it doesn't receive any message from leader for a long time, we can think it is stale.
I suggest checking this in tick too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@BusyJay
Copy link
Member

BusyJay commented Sep 8, 2016

LGTM

@hhkbp2
Copy link
Contributor Author

hhkbp2 commented Sep 8, 2016

Please help to take a look again. @siddontang @BusyJay @ngaut

for (&region_id, peer) in &mut self.region_peers {
if !peer.get_store().is_applying() {
peer.raft_group.tick();
self.pending_raft_groups.insert(region_id);

// If this peer detects the leader is missing for a long long time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@queenypingcap
Please review the comments.

@BusyJay
Copy link
Member

BusyJay commented Sep 9, 2016

LGTM

@@ -36,6 +36,9 @@ const DEFAULT_MGR_GC_TICK_INTERVAL_MS: u64 = 60000;
const DEFAULT_SNAP_GC_TIMEOUT_SECS: u64 = 60 * 10;
const DEFAULT_MESSAGES_PER_TICK: usize = 256;
const DEFAULT_MAX_PEER_DOWN_SECS: u64 = 300;
// If the leader missing time exceeds 2 hours,
Copy link
Contributor

@QueenyJin QueenyJin Sep 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+// If the leader is missing for over 2 hours,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comment.

@hhkbp2 hhkbp2 merged commit 10b7b88 into master Sep 12, 2016
@hhkbp2 hhkbp2 deleted the hhkbp2/gc-stale-peer-out-of-region branch September 12, 2016 07:36
@sre-bot sre-bot added the contribution This PR is from a community contributor. label Dec 18, 2019
iosmanthus pushed a commit to iosmanthus/tikv that referenced this pull request Oct 30, 2023
Signed-off-by: Ping Yu <yuping@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants