You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Down-peer is detected and reported by TiKV through heartbeat to PD. However, when PD checks the placement-rule and finds there is a down-peer in one region, it will first check if the peer's tikv is down, if not, it will take no action(skip directly without any log).
However, in some cases, when there are issues with the internal region raft group of TiKV, it may cause some replicas to fail to maintain raft heartbeats and result in down-peers, while TiKV can still report heartbeats to PD normally. In this situation, down-peers will continue to exist, resulting in incomplete replica numbers for a long period of time for some regions.
I think in this situation, if a peer has been without heartbeat for a certain period of time(down-peer), regardless of whether tikv is in a down state or not, we should try to recover these down-peers on the PD scheduling side just like replace-offline-peers.
The text was updated successfully, but these errors were encountered:
Enhancement Task
Down-peer is detected and reported by TiKV through heartbeat to PD. However, when PD checks the placement-rule and finds there is a down-peer in one region, it will first check if the peer's tikv is down, if not, it will take no action(skip directly without any log).
pd/pkg/schedule/checker/rule_checker.go
Lines 190 to 203 in da5a4e9
However, in some cases, when there are issues with the internal region raft group of TiKV, it may cause some replicas to fail to maintain raft heartbeats and result in down-peers, while TiKV can still report heartbeats to PD normally. In this situation, down-peers will continue to exist, resulting in incomplete replica numbers for a long period of time for some regions.
I think in this situation, if a peer has been without heartbeat for a certain period of time(down-peer), regardless of whether tikv is in a down state or not, we should try to recover these down-peers on the PD scheduling side just like replace-offline-peers.
The text was updated successfully, but these errors were encountered: