-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up balance leader #4610
Comments
cc @CabinfeverB, Would you like to take a look? ptal @rleungx |
I will take a look |
/assign @CabinfeverB |
MotivationCurrently, the MinScheduleInterval param determines the balance-leader speed. According to MinScheduleInterval equals 10 ms, balance-leader only has max ops 100 op/s. If there are 100K regions that need to balance leader when a big cluster restart (2M regions), it will take 30 minutes. This is an unacceptable time cost Detailed DesignConsidering that the trigger frequency of the scheduler should not be too fast, we decided to add a batch field in the balance-leader scheduler to speed up balance leader by increasing the number of operators generated every scheduling. In the TiKV, since we believe that the performance overhead of transferring leader in a raft group is small, the transfer-leader operator does not consume the store limit. This means that regions can be repeatedly selected from a store, so a priority-queue-like idea can be adopted. Operators are extracted from the store which has the highest/lowest leader score and we calculate the influence to adjust this top of 'heap', unless it is really impossible to extract. Then extract the next highest/timer low, and so on. Usage DescSince we think the balance leader is an urgent scheduler, we set the Development PlanSubtasks#4652 must be involved
Test PlanUnder the same cluster size, it should be possible to obtain an approximate linear optimization by testing the time to reach the equilibrium state when the In order to test whether the original goal can be achieved, it is best to have a large cluster to do the test. |
close #4610 add lock to avoid data race in balance-leader-scheduler Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
close tikv#4610 add lock to avoid data race in balance-leader-scheduler Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Development Task
Currently, balance leader only have max ops 100 op/s. we want to increase the speed when a big cluster restart (2M regions).
one way like: #4008
The text was updated successfully, but these errors were encountered: