-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ebs br: restore could hang if some tikv nodes are killed or restarted #45206
Comments
/assign @YuJuncen |
This is because when we are in recovery mode, all elections will be suspended until BR choose the leader. But the problem is that AFTER BR had chosen the leader, the store got down. Once it reboots, the leaders are dropped. However we are still in recovery mode, so we cannot elect new leaders. |
A solution might be extending the recovery mode. Make it have 3 stages:
Once BR detected there is a TiKV outage (maybe by creating a no-op TCP connection with the gRPC port of each TiKV), BR will:
|
cc @hicqu , do you have some good ideas? |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Kill some TiKV node during ebs br restore phase
2. What did you expect to see? (Required)
EBS BR restore continue and succeed
3. What did you see instead (Required)
EBS BR restore hangs
4. What is your TiDB version? (Required)
TiDB 6.5 and above
The text was updated successfully, but these errors were encountered: