Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: allow balance-leader-scheduler generate multiple operators #4652

Merged
merged 19 commits into from
Mar 14, 2022

Conversation

CabinfeverB
Copy link
Member

@CabinfeverB CabinfeverB commented Feb 17, 2022

Signed-off-by: Cabinfever_B cabinfeveroier@gmail.com

What problem does this PR solve?

Issue Number: ref #4610
speed up balance leader

What is changed and how it works?

Add a Batch param like #4008 that the scheduler can generate multi operators.

speed up balance leader by batch

Check List

Tests

  • Unit test

Release note

Speed up balance leader

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 17, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • nolouch
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 17, 2022
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #4652 (785ade2) into master (6d4d8a5) will increase coverage by 0.02%.
The diff coverage is 97.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4652      +/-   ##
==========================================
+ Coverage   75.03%   75.06%   +0.02%     
==========================================
  Files         284      284              
  Lines       27942    27985      +43     
==========================================
+ Hits        20966    21006      +40     
- Misses       5110     5111       +1     
- Partials     1866     1868       +2     
Flag Coverage Δ
unittests 75.06% <97.84%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/schedulers/balance_leader.go 92.50% <97.67%> (+0.83%) ⬆️
server/cluster/coordinator.go 74.17% <100.00%> (ø)
server/schedule/operator_controller.go 84.44% <100.00%> (-0.04%) ⬇️
server/tso/local_allocator.go 64.86% <0.00%> (-6.76%) ⬇️
server/tso/tso.go 61.14% <0.00%> (-5.72%) ⬇️
pkg/etcdutil/etcdutil.go 84.70% <0.00%> (-3.53%) ⬇️
server/tso/allocator_manager.go 63.99% <0.00%> (-1.79%) ⬇️
server/election/lease.go 86.15% <0.00%> (-1.54%) ⬇️
server/schedule/operator/operator.go 93.57% <0.00%> (-1.43%) ⬇️
server/schedule/operator/step.go 74.07% <0.00%> (-0.68%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d4d8a5...785ade2. Read the comment docs.

@CabinfeverB CabinfeverB marked this pull request as ready for review February 18, 2022 05:41
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2022
@CabinfeverB CabinfeverB changed the title speed up balance leader scheduler: allow balance-leader-scheduler generate multiple operators Feb 18, 2022
@@ -821,6 +821,14 @@ func (oc *OperatorController) GetFastOpInfluence(cluster Cluster, influence oper
}
}

// AddOpInfluence add operator influence for cluster
func AddOpInfluence(op *operator.Operator, influence operator.OpInfluence, cluster Cluster) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change store leader score to sort again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this function on L841-845 in the function NewTotalOpInfluence?

@@ -51,6 +51,13 @@ func init() {
}
conf.Ranges = ranges
conf.Name = BalanceLeaderName
conf.Batch = 1
if len(args) > len(ranges)*2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to change it.

@@ -161,48 +170,70 @@ func (l *balanceLeaderScheduler) Schedule(cluster schedule.Cluster) []*operator.
targets[j].LeaderScore(leaderSchedulePolicy, jOp)
})

usedStores := make(map[uint64]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the stores can be used more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has changed

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@CabinfeverB
Copy link
Member Author

/cc @HunDunDM

@ti-chi-bot ti-chi-bot requested a review from HunDunDM March 8, 2022 08:38
return result
}

func sortStores(stores []*core.StoreInfo, pos int, less func(i, j int) bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about resortStores?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

Comment on lines 168 to 169
sourcePoint := 0
targetPoint := 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least some comments are needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rafactor it

@@ -51,6 +51,7 @@ func init() {
}
conf.Ranges = ranges
conf.Name = BalanceLeaderName
conf.Batch = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

for sourcePoint < len(sources) || targetPoint < len(targets) {
if sourcePoint < len(sources) {
used := false
plan.source, plan.target = sources[sourcePoint], nil
retryLimit := l.retryQuota.GetLimit(plan.source)
log.Debug("store leader score", zap.String("scheduler", l.GetName()), zap.Uint64("source-store", plan.SourceStoreID()))
l.counter.WithLabelValues("high-score", plan.SourceMetricLabel()).Inc()
for j := 0; j < retryLimit; j++ {
schedulerCounter.WithLabelValues(l.GetName(), "total").Inc()
if ops := l.transferLeaderOut(plan); len(ops) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing it to op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

for sourcePoint < len(sources) || targetPoint < len(targets) {
if sourcePoint < len(sources) {
used := false
plan.source, plan.target = sources[sourcePoint], nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, updated

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@@ -821,6 +821,14 @@ func (oc *OperatorController) GetFastOpInfluence(cluster Cluster, influence oper
}
}

// AddOpInfluence add operator influence for cluster
func AddOpInfluence(op *operator.Operator, influence operator.OpInfluence, cluster Cluster) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this function on L841-845 in the function NewTotalOpInfluence?

sort.Slice(bc.targets, func(i, j int) bool {
iOp := bc.plan.GetOpInfluence(bc.targets[i].GetID())
jOp := bc.plan.GetOpInfluence(bc.targets[j].GetID())
return bc.targets[i].LeaderScore(bc.plan.kind.Policy, iOp) <
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defining two functions? With the name of the function, the maintainer can easily find the slice is ordered by desc or asc.


// hasOptionalStore returns whether there is remaining and optional stores
func (bc *batchController) hasOptionalStore() bool {
return bc.sourceIndex < len(bc.sources) || bc.targetIndex < len(bc.targets)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return bc.sourceIndex < len(bc.sources) || bc.targetIndex < len(bc.targets)
return bc.hasOptionalSourceStore() || bc.hasOptionalTargetStore()

return bc.sourceIndex < len(bc.sources) || bc.targetIndex < len(bc.targets)
}

// hasOptionalSourceStore returns whether there is remaining and optional source stores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

`hasSourceStore` returns true when there are leftover source stores.

@rleungx
Copy link
Member

rleungx commented Mar 9, 2022

Some functions are only used once and it's so easy. For these functions, adding some comments is enough.

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 14, 2022
store := cs.getStore()
retryLimit := l.retryQuota.GetLimit(store)
var creator func(*balancePlan) *operator.Operator
log.Debug("store leader score", zap.String("scheduler", l.GetName()), zap.Uint64(dir, store.GetID()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no score field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement didn't have it before, but it's better to add it.

l.counter.WithLabelValues("low-score", plan.TargetMetricLabel()).Inc()
creator = l.transferLeaderIn
}
var op *operator.Operator = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var op *operator.Operator = nil
var op *operator.Operator

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
}
}

func leaderScore(store *core.StoreInfo, plan *balancePlan) float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to remove this function and move GetOpInfluence directly into xxxOption.

Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 14, 2022
@nolouch
Copy link
Contributor

nolouch commented Mar 14, 2022

/merge

@ti-chi-bot
Copy link
Member

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 9a32d17

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 14, 2022
@ti-chi-bot ti-chi-bot merged commit c78c652 into tikv:master Mar 14, 2022
ti-chi-bot added a commit that referenced this pull request Mar 16, 2022
ref #4610, ref #4652

Add `balance-leader-leader` config API.

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot added a commit that referenced this pull request Mar 16, 2022
ref #4610, ref #4652, ref #4655

pdctl supports update balance-leader-scheduler config

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
rleungx pushed a commit to rleungx/pd that referenced this pull request Mar 17, 2022
ref tikv#4610, ref tikv#4652, ref tikv#4655

pdctl supports update balance-leader-scheduler config

Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants