tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

sre-bot · 2020-05-27T06:03:54Z

cherry-pick #17441 to release-4.0

What problem does this PR solve?

Issue Number: close #17442

Problem Summary:

in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.

but for follower or learner requests, this may introduce infinitely retry.

when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.

What is changed and how it works?

What's Changed:

do retry immediately instead of check store liveness when it's a follower/learner read.

Related changes

Need to cherry-pick to the release branch 4.0

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

n/a

Release note

Fix infinite follower/learner retry when network partition only between the leader and follower/learner

This change is

Signed-off-by: sre-bot <sre-bot@pingcap.com>

sre-bot · 2020-05-27T06:03:56Z

/run-all-tests

crazycs520

LGTM

sre-bot · 2020-05-29T15:38:03Z

@crazycs520, @coocood, @jackysp, @hicqu, @lzmhhh123, PTAL.

jackysp

LGTM

jackysp · 2020-05-30T06:04:07Z

/merge

sre-bot · 2020-05-30T06:04:17Z

/run-all-tests

cherry pick pingcap#17441 to release-4.0

bd2d5d1

Signed-off-by: sre-bot <sre-bot@pingcap.com>

sre-bot mentioned this pull request May 27, 2020

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner #17441

Merged

sre-bot added component/tikv priority/release-blocker This issue blocks a release. Please solve it ASAP. status/PTAL type/4.0-cherry-pick type/bugfix This PR fixes a bug. labels May 27, 2020

sre-bot requested review from coocood, hicqu, jackysp and lzmhhh123 May 27, 2020 06:03

sre-bot added this to the v4.0.0-ga milestone May 27, 2020

sre-bot assigned lysu May 27, 2020

lysu removed the priority/release-blocker This issue blocks a release. Please solve it ASAP. label May 27, 2020

lysu modified the milestones: v4.0.0-ga, v4.0.1 May 27, 2020

crazycs520 reviewed May 27, 2020

View reviewed changes

jackysp approved these changes May 30, 2020

View reviewed changes

sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 30, 2020

sre-bot merged commit abe3e72 into pingcap:release-4.0 May 30, 2020

bb7133 modified the milestones: v4.0.1, v4.0.2 Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

sre-bot commented May 27, 2020

sre-bot commented May 27, 2020

crazycs520 left a comment

sre-bot commented May 29, 2020

jackysp left a comment

jackysp commented May 30, 2020

sre-bot commented May 30, 2020

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner (#17441) #17443

Conversation

sre-bot commented May 27, 2020

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

sre-bot commented May 27, 2020

crazycs520 left a comment

Choose a reason for hiding this comment

sre-bot commented May 29, 2020

jackysp left a comment

Choose a reason for hiding this comment

jackysp commented May 30, 2020

sre-bot commented May 30, 2020