Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When leader is selected as target peer at first, no other replicas could be retied for stale read #906

Closed
cfzjywxk opened this issue Jul 24, 2023 · 5 comments · Fixed by #942

Comments

@cfzjywxk
Copy link
Contributor

When the leader peer is first picked as the target with labels and it's unavailable because of some region errors, the replica selector would keep trying the leader peer but not recover though replica peers may be able to process this request successfully.

For example:

  1. consider there are 3AZs with 1 tidb and 1 tikv server in each AZ, and there is a region whose leader peer is in AZ a.
  2. Then a stale read request is triggered by the tidb-server in AZ a, the tikv-server in AZ a is selected as the target peer because of the label configurations.
  3. Meanwhile, network partition happens between the tidb and tikv nodes in AZ a, and when errors are encountered processing stale read requests the leader peer is unconditionally retried.
  4. The leader is unavailable and tidb kept raising pseudo epoch not match errors and retrying those peers.
@you06
Copy link
Contributor

you06 commented Aug 4, 2023

#916 fixed this issue, which wll fallback to follower read when leadder is unavailable.

@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 4, 2023

#916 fixed this issue, which wll fallback to follower read when leadder is unavailable.

@you06 #916 introduces fallback to follower ServerIsBusy is returned, the unavailable or rpc error is not processed yet?

@you06
Copy link
Contributor

you06 commented Aug 4, 2023

@cfzjywxk also made some changes in #910, which handles unavailable and rpc error.

// In stale-read, the request will fallback to leader after the local follower failure.
// If the leader is also unavailable, we can fallback to the follower and use replica-read flag again,
// The remote follower not tried yet, and the local follower can retry without stale-read flag.
if state.isStaleRead {
selector.state = &tryFollower{
fallbackFromLeader: true,
leaderIdx: state.leaderIdx,
lastIdx: state.leaderIdx,
labels: state.option.labels,
}
if leaderEpochStale {
selector.regionCache.scheduleReloadRegion(selector.region)
}
return nil, stateChanged{}
}

@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 4, 2023

@you06 OK,you cloud close the issue in that #910.

@cfzjywxk
Copy link
Contributor Author

@you06 Please close this issue if the master PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants