Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

Merged
merged 1 commit into from
May 30, 2020

Conversation

sre-bot
Copy link
Contributor

@sre-bot sre-bot commented May 27, 2020

cherry-pick #17441 to release-4.0


What problem does this PR solve?

Issue Number: close #17442

Problem Summary:

in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.

but for follower or learner requests, this may introduce infinitely retry.

when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.

What is changed and how it works?

What's Changed:

do retry immediately instead of check store liveness when it's a follower/learner read.

Related changes

  • Need to cherry-pick to the release branch 4.0

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • n/a

Release note

  • Fix infinite follower/learner retry when network partition only between the leader and follower/learner

This change is Reviewable

Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor Author

sre-bot commented May 27, 2020

/run-all-tests

@sre-bot sre-bot added component/tikv priority/release-blocker This issue blocks a release. Please solve it ASAP. status/PTAL type/4.0-cherry-pick type/bugfix This PR fixes a bug. labels May 27, 2020
@sre-bot sre-bot added this to the v4.0.0-ga milestone May 27, 2020
@lysu lysu removed the priority/release-blocker This issue blocks a release. Please solve it ASAP. label May 27, 2020
@lysu lysu modified the milestones: v4.0.0-ga, v4.0.1 May 27, 2020
Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sre-bot
Copy link
Contributor Author

sre-bot commented May 29, 2020

@crazycs520, @coocood, @jackysp, @hicqu, @lzmhhh123, PTAL.

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp
Copy link
Member

jackysp commented May 30, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 30, 2020
@sre-bot
Copy link
Contributor Author

sre-bot commented May 30, 2020

/run-all-tests

@sre-bot sre-bot merged commit abe3e72 into pingcap:release-4.0 May 30, 2020
@bb7133 bb7133 modified the milestones: v4.0.1, v4.0.2 Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv status/can-merge Indicates a PR has been approved by a committer. type/bugfix This PR fixes a bug. type/4.0-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants