Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule_checker: can replace unhealthPeer with orphanPeer #6831

Merged
merged 10 commits into from
Jul 26, 2023

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented Jul 21, 2023

What problem does this PR solve?

Issue Number: Close #6559

When Region looks like the below, and there were only 4 stores:

{
  "id": 55929554,
  "start_key": "748000000000000BFF065F728000000042FFEF8B7A0000000000FA",
  "end_key": "748000000000000BFF065F728000000042FFF07AFD0000000000FA",
  "epoch": {
    "conf_ver": 6,
    "version": 109399
  },
  "peers": [
    {
      "id": 55929555,
      "store_id": 1,
      "role_name": "Voter"
    },
    {
      "id": 55929556,
      "store_id": 4,
      "role_name": "Voter"
    },
    {
      "id": 55929557,
      "store_id": 5,
      "role_name": "Voter"
    },
    {
      "id": 55929558,
      "store_id": 2751139,
      "role": 1,
      "role_name": "Learner",
      "is_learner": true
    }
  ],
  "leader": {
    "id": 55929555,
    "store_id": 1,
    "role_name": "Voter"
  },
  "down_peers": [
    {
      "down_seconds": 40307,
      "peer": {
        "id": 55929556,
        "store_id": 4,
        "role_name": "Voter"
      }
    }
  ],
  "pending_peers": [
    {
      "id": 55929556,
      "store_id": 4,
      "role_name": "Voter"
    }
  ],
  "cpu_usage": 0,
  "written_bytes": 0,
  "read_bytes": 0,
  "written_keys": 0,
  "read_keys": 0,
  "approximate_size": 1,
  "approximate_keys": 40960
}

and the region fit likes:

{
  "rule-fits": [
    {
      "rule": {
        "group_id": "pd",
        "id": "default",
        "start_key": "",
        "end_key": "",
        "role": "voter",
        "is_witness": false,
        "count": 3,
        "location_labels": [
          "region",
          "zone",
          "host"
        ]
      },
      "peers": [
        {
          "id": 55929555,
          "store_id": 1
        },
        {
          "id": 55929557,
          "store_id": 5
        },
        {
          "id": 55929556,
          "store_id": 4
        }
      ],
      "peers-different-role": null,
      "isolation-score": 300
    }
  ],
  "orphan-peers": [
    {
      "id": 55929558,
      "store_id": 2751139,
      "role": 1
    }
  ]
}

cannot fix the region, always report as an unhealthy region. and store 4 is gone.

What is changed and how does it work?

add logic try to replace unhealthy peer with orphan peer

Check List

Tests

  • Unit test

Release note

None.

Signed-off-by: nolouch <nolouch@gmail.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 21, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • CabinfeverB
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels Jul 21, 2023
@ti-chi-bot ti-chi-bot bot requested review from HunDunDM and JmPotato July 21, 2023 11:06
@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 21, 2023
@nolouch nolouch requested review from CabinfeverB and removed request for HunDunDM July 21, 2023 11:12
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Jul 21, 2023
@nolouch nolouch added type/cherry-pick-for-release-6.5 This PR is cherry-picked to release-6.5 from a source PR. and removed type/cherry-pick-for-release-6.5 This PR is cherry-picked to release-6.5 from a source PR. labels Jul 21, 2023
@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #6831 (b72fbea) into master (de985b8) will increase coverage by 0.04%.
The diff coverage is 75.00%.

❗ Current head b72fbea differs from pull request most recent head a148eda. Consider uploading reports for the commit a148eda to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6831      +/-   ##
==========================================
+ Coverage   74.21%   74.26%   +0.04%     
==========================================
  Files         414      414              
  Lines       43471    43511      +40     
==========================================
+ Hits        32264    32313      +49     
+ Misses       8338     8319      -19     
- Partials     2869     2879      +10     
Flag Coverage Δ
unittests 74.26% <75.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@nolouch
Copy link
Contributor Author

nolouch commented Jul 24, 2023

PTAL @rleungx @CabinfeverB

Copy link
Member

@CabinfeverB CabinfeverB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

pkg/schedule/operator/operator.go Outdated Show resolved Hide resolved
pkg/schedule/operator/create_operator.go Outdated Show resolved Hide resolved
Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: nolouch <nolouch@gmail.com>
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 25, 2023
@nolouch nolouch requested a review from lhy1024 July 25, 2023 16:32
continue
}
// store should be down.
if !c.isStoreDownTimeHitMaxDownTime(pinDownPeer.GetPeer().GetStoreId()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do nothing if the store-id is not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return false when calling isStoreDownTimeHitMaxDownTime.

// check if down peer can replace with orphan peer.

dstStore := c.cluster.GetStore(orphanPeer.GetStoreId())
if fit.Replace(pinDownPeer.GetPeer().GetStoreId(), dstStore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waht it will do if the orphan peer is also down peer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will skip in the before.

Signed-off-by: nolouch <nolouch@gmail.com>
// check if down peer can replace with orphan peer.
dstStore := c.cluster.GetStore(orphanPeer.GetStoreId())
if fit.Replace(pinDownPeer.GetPeer().GetStoreId(), dstStore) {
destRole := pinDownPeer.GetPeer().Role
Copy link
Member

@rleungx rleungx Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a variable for pinDownPeer.GetPeer() and using GetRole here?

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 26, 2023
Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: nolouch <nolouch@gmail.com>
@nolouch
Copy link
Contributor Author

nolouch commented Jul 26, 2023

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 26, 2023

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 26, 2023

This pull request has been accepted and is ready to merge.

Commit hash: b72fbea

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 26, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 26, 2023

@nolouch: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit f916e90 into tikv:master Jul 26, 2023
19 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #6843.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Jul 26, 2023
close tikv#6559

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #6844.

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Jul 26, 2023
close tikv#6559

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@nolouch nolouch deleted the fix-unhealthy-repair branch July 26, 2023 06:32
ti-chi-bot bot pushed a commit that referenced this pull request Jul 26, 2023
close #6559

add logic try to replace unhealthy peer with orphan peer

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot pushed a commit that referenced this pull request Aug 2, 2023
close #6559

add logic try to replace unhealthy peer with orphan peer

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
rleungx pushed a commit to rleungx/pd that referenced this pull request Dec 4, 2023
close tikv#6559

add logic try to replace unhealthy peer with orphan peer

Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regions get stuck in 2 voters, 1 down peer, 1 learner state
5 participants