-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner #17441
tikv: fix infinite follower/learner retry when network partition only between leader and follower/learner #17441
Conversation
… between leader and follower/learner
Codecov Report
@@ Coverage Diff @@
## master #17441 +/- ##
================================================
- Coverage 79.8205% 79.8075% -0.0130%
================================================
Files 520 520
Lines 139840 139686 -154
================================================
- Hits 111621 111480 -141
+ Misses 19259 19256 -3
+ Partials 8960 8950 -10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
/run-all-tests |
/run-cherry-picker |
Signed-off-by: sre-bot <sre-bot@pingcap.com>
cherry pick to release-4.0 in PR #17443 |
What problem does this PR solve?
Issue Number: close #17442
Problem Summary:
in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.
but for follower or learner requests, this may introduce infinitely retry.
when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.
What is changed and how it works?
What's Changed:
do retry immediately instead of check store liveness when it's a follower/learner read.
Related changes
Check List
Tests
Side effects
Release note
This change is