Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner #17441

Merged
merged 1 commit into from
May 27, 2020
Merged

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner #17441

merged 1 commit into from
May 27, 2020

Conversation

lysu
Copy link
Contributor

@lysu lysu commented May 27, 2020

What problem does this PR solve?

Issue Number: close #17442

Problem Summary:

in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.

but for follower or learner requests, this may introduce infinitely retry.

when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.

What is changed and how it works?

What's Changed:

do retry immediately instead of check store liveness when it's a follower/learner read.

Related changes

  • Need to cherry-pick to the release branch 4.0

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • n/a

Release note

  • Fix infinite follower/learner retry when network partition only between the leader and follower/learner

This change is Reviewable

@lysu lysu added type/bugfix This PR fixes a bug. priority/release-blocker This issue blocks a release. Please solve it ASAP. component/tikv needs-cherry-pick-4.0 labels May 27, 2020
@codecov
Copy link

codecov bot commented May 27, 2020

Codecov Report

Merging #17441 into master will decrease coverage by 0.0129%.
The diff coverage is 50.0000%.

@@               Coverage Diff                @@
##             master     #17441        +/-   ##
================================================
- Coverage   79.8205%   79.8075%   -0.0130%     
================================================
  Files           520        520                
  Lines        139840     139686       -154     
================================================
- Hits         111621     111480       -141     
+ Misses        19259      19256         -3     
+ Partials       8960       8950        -10     

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lzmhhh123 lzmhhh123 added the status/LGT1 Indicates that a PR has LGTM 1. label May 27, 2020
Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp
Copy link
Member

jackysp commented May 27, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 27, 2020
@sre-bot
Copy link
Contributor

sre-bot commented May 27, 2020

/run-all-tests

@sre-bot sre-bot merged commit 322c55a into pingcap:master May 27, 2020
@lysu
Copy link
Contributor Author

lysu commented May 27, 2020

/run-cherry-picker

@lysu lysu deleted the fix-retry-when-network-partition-between-leader-follower-learner branch May 27, 2020 06:02
sre-bot pushed a commit to sre-bot/tidb that referenced this pull request May 27, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented May 27, 2020

cherry pick to release-4.0 in PR #17443

sre-bot added a commit that referenced this pull request May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv priority/release-blocker This issue blocks a release. Please solve it ASAP. status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

infinite follower/learner retry when network partition only between leader and follower/learner
5 participants