Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checker: avoid unnecessary remove disconnected peer with multi orphan peers #7315

Merged
merged 9 commits into from
Nov 6, 2023

Conversation

lhy1024
Copy link
Contributor

@lhy1024 lhy1024 commented Nov 3, 2023

What problem does this PR solve?

Issue Number: Close #7249

When there are many orphan peers, we don't think disconnected peer are healthy
And when we decide to remove peer, we will pick disconnected peer firstly.

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
    image

Release note

None.

Copy link
Contributor

ti-chi-bot bot commented Nov 3, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • nolouch
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 3, 2023
@ti-chi-bot ti-chi-bot bot requested review from nolouch and rleungx November 3, 2023 03:43
@ti-chi-bot ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 3, 2023
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 3, 2023
Signed-off-by: lhy1024 <admin@liudos.us>
Copy link

codecov bot commented Nov 3, 2023

Codecov Report

Merging #7315 (f12d0b4) into master (ab8bf7b) will decrease coverage by 0.03%.
The diff coverage is 84.61%.

@@            Coverage Diff             @@
##           master    #7315      +/-   ##
==========================================
- Coverage   74.49%   74.46%   -0.03%     
==========================================
  Files         446      446              
  Lines       48346    48352       +6     
==========================================
- Hits        36016    36007       -9     
- Misses       9160     9165       +5     
- Partials     3170     3180      +10     
Flag Coverage Δ
unittests 74.46% <84.61%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 3, 2023
@@ -534,8 +534,7 @@ func (c *RuleChecker) fixOrphanPeers(region *core.RegionInfo, fit *placement.Reg
return operator.CreatePromoteLearnerOperatorAndRemovePeer("replace-down-peer-with-orphan-peer", c.cluster, region, orphanPeer, pinDownPeer)
case orphanPeerRole == metapb.PeerRole_Voter && destRole == metapb.PeerRole_Learner:
return operator.CreateDemoteLearnerOperatorAndRemovePeer("replace-down-peer-with-orphan-peer", c.cluster, region, orphanPeer, pinDownPeer)
case orphanPeerRole == metapb.PeerRole_Voter && destRole == metapb.PeerRole_Voter &&
isDisconnectedPeer(pinDownPeer) && !dstStore.IsDisconnected():
case orphanPeerRole == destRole && isDisconnectedPeer(pinDownPeer) && !dstStore.IsDisconnected():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow replace learner

}
if hasHealthPeer {
// there already exists a healthy orphan peer, so we can remove other orphan Peers.
ruleCheckerRemoveOrphanPeerCounter.Inc()
// if there exists a disconnected orphan peer, we will pick it to remove firstly.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid to remove normal peer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that we have 3 orphan peer, two healthy and one disconnected. Is it possiable that we remove a healthy peer first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always remove disconnected peer first?

Copy link
Contributor Author

@lhy1024 lhy1024 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always remove disconnected peer first?

I think so.

@lhy1024 lhy1024 requested a review from nolouch November 3, 2023 11:46
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Nov 6, 2023

@rleungx PTAL

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 6, 2023
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Nov 6, 2023

image

succesfully run test

image
no failure in tidb

image

all orphan peers are removed with 11min down store time config

@lhy1024
Copy link
Contributor Author

lhy1024 commented Nov 6, 2023

/merge

Copy link
Contributor

ti-chi-bot bot commented Nov 6, 2023

@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Nov 6, 2023

This pull request has been accepted and is ready to merge.

Commit hash: f12d0b4

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 6, 2023
@ti-chi-bot ti-chi-bot bot merged commit c332ddc into tikv:master Nov 6, 2023
26 checks passed
lhy1024 added a commit to lhy1024/pd that referenced this pull request Nov 6, 2023
… peers (tikv#7315)

close tikv#7249

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Nov 8, 2023
… peers (tikv#7315)

close tikv#7249

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Nov 8, 2023
… peers (tikv#7315)

close tikv#7249

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Nov 8, 2023
… peers (tikv#7315)

close tikv#7249

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Nov 8, 2023
… peers (tikv#7315)

close tikv#7249

Signed-off-by: lhy1024 <admin@liudos.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

checker: reduces the probability of deleting normal peers when the store becomes unavailable
3 participants