-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: rewrite move region related functions #1667
Conversation
@Connor1996 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add and remove peer one by one. And also make sure the leader is the last to be removed.
/rebuild |
@Connor1996 PTAL |
server/schedule/operator/operator.go
Outdated
|
||
var steps = make([]OpStep, 0, len(addPeerSteps)*2+len(rmPeerSteps)+len(tlSteps)) | ||
i, j := 0, 0 | ||
for ; i < len(addPeerSteps) && j < len(rmPeerSteps); i, j = i+1, j+1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some comments to illustrate why we want to generate steps like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not very clear about why should we add and remove peers one by one. Could you give some reasons? I will add them to the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not clear like before.
server/schedule/operator/operator.go
Outdated
} | ||
|
||
// transferLeaderToAnySteps returns the first suitable store to become region leader, | ||
func transferLeaderToAnySteps(leaderID uint64, storeIDs []uint64, cluster Cluster) (OpKind, []OpStep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just returning OpStep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we regard this function as a black box, we should not assume it will only return a TransferLeader
step if succeeded.
What's more, I'd like to make *Steps
functions have similar returns. Now there are 3 such functions that have different returns:
transferLeaderToAnySteps
(this function): no error.CreateAddPeerSteps
andCreateAddLightPeerSteps
: no OpKind and error.
I will add an error for this function, because it may fail.
The other two functions are exported, so I didn't touch them.
server/schedule/operator/operator.go
Outdated
|
||
var steps = make([]OpStep, 0, len(addPeerSteps)*2+len(rmPeerSteps)+len(tlSteps)) | ||
i, j := 0, 0 | ||
for ; i < len(addPeerSteps) && j < len(rmPeerSteps); i, j = i+1, j+1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not clear like before.
@Connor1996 @rleungx PTAL |
Co-Authored-By: Ryan Leung <rleungx@gmail.com>
Co-Authored-By: Ryan Leung <rleungx@gmail.com>
/rebuild |
Codecov Report
@@ Coverage Diff @@
## master #1667 +/- ##
==========================================
- Coverage 76.69% 76.68% -0.02%
==========================================
Files 157 157
Lines 15489 15485 -4
==========================================
- Hits 11880 11875 -5
Misses 2593 2593
- Partials 1016 1017 +1
Continue to review full report at Codecov.
|
/rebuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM.
@Connor1996 @nolouch @rleungx PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// c == [opA1, opA2, opB1, opA3, opB2, opA4, opA5, opA6, opB3, opB4, opB5, opB6] | ||
// | ||
// sizeHint is a hint for the length of returned slice. | ||
func interleaveStepGroups(a, b [][]OpStep, sizeHint int) []OpStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this hint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hint is just like the second and third arguments in make()
, for performance, not necessary.
/rebuild |
1 similar comment
/rebuild |
/rebuild |
@Connor1996 PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
* *: unify get store function everywhere (#1671) Signed-off-by: Ryan Leung <rleungx@gmail.com> * server: use leader lease to determine tso service validity (#1676) Signed-off-by: disksing <i@disksing.com> * test: fix tests (#1696) * test: fix region syncer test Signed-off-by: disksing <i@disksing.com> * add config-check flag for pd-server (#1695) Signed-off-by: cwen0 <cwenyin0@gmail.com> * operator: rewrite move region related functions (#1667) * *: support setting endKey for ScanRange (#1700) Signed-off-by: disksing <i@disksing.com> * *: reduce some unnecessary parameters (#1698) Signed-off-by: Ryan Leung <rleungx@gmail.com> * schedule: Do not send an operator of a region wth a stale epoch (#1659) * schedule: Do not send an operator of a region wth a stale epoch Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: check the version changed by the operator self Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix unit test Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix to avoid dispatching a stale opstep Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool" Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: fix typo in comment Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * fix typo Co-Authored-By: Ryan Leung <rleungx@gmail.com> * dispatch: fix unittest Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refine format Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * server: fix the dead lock in scatter region (#1706) Signed-off-by: Ryan Leung <rleungx@gmail.com>
* *: unify get store function everywhere (#1671) Signed-off-by: Ryan Leung <rleungx@gmail.com> * remove unnecessary parentheses * server: use leader lease to determine tso service validity (#1676) Signed-off-by: disksing <i@disksing.com> * change internal stat values to float64 * add pending operator influence * add metrics of pending influence * fix metrics * fix panic * adjust pending influence of balanceHotWrite * change weight of pending influence * test: fix tests (#1696) * test: fix region syncer test Signed-off-by: disksing <i@disksing.com> * decrease region rolling window; store pending influence in scheduler * add config-check flag for pd-server (#1695) Signed-off-by: cwen0 <cwenyin0@gmail.com> * decrease possiblility transfer hot write leader * change pending influence weight * add unstarted op metrics * add logs for debug * add log for debug * add logs for debug * add logs for debug * add logs for debug * add logs for debug * add logs for debug * add logs for debug * Revert "add logs for debug" This reverts commit e74c7a9. * add metrics for hotspot operators * operator: rewrite move region related functions (#1667) * add metrics for pending operators * *: support setting endKey for ScanRange (#1700) Signed-off-by: disksing <i@disksing.com> * fix bug * fix bug * fix bug * fix metrics thread-safe bug * fix logic bug * *: reduce some unnecessary parameters (#1698) Signed-off-by: Ryan Leung <rleungx@gmail.com> * schedule: Do not send an operator of a region wth a stale epoch (#1659) * schedule: Do not send an operator of a region wth a stale epoch Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: check the version changed by the operator self Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix unit test Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix to avoid dispatching a stale opstep Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool" Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: fix typo in comment Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * fix typo Co-Authored-By: Ryan Leung <rleungx@gmail.com> * dispatch: fix unittest Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refine format Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * server: fix the dead lock in scatter region (#1706) Signed-off-by: Ryan Leung <rleungx@gmail.com> * add drop time for operator * use IsDropped to recognize canceled ops * try to fix trans leader burst * try to fix trans leader burst * add zombie influence * change select src dst strategy; improve op_controller * change select src strategy * fix bug * tools: fix set namespace in pd-ctl (#1701) Signed-off-by: Ryan Leung <rleungx@gmail.com> * tools: fix parse url without http prefix (#1703) Signed-off-by: Ryan Leung <rleungx@gmail.com> * tests: support deadlock detection in make test (#1704) Signed-off-by: Ryan Leung <rleungx@gmail.com> * Makefile: fix failpoint enable (#1722) Signed-off-by: nolouch <nolouch@gmail.com> * checker: fix the issue that a region does not merge to the sibling with smaller size (#1723) Signed-off-by: disksing <i@disksing.com> * tools: balance region simulator (#1708) * scheduler: do not remove the operator when the step does not finish (#1715) Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * operator: fix the AddLearner config version judgment (#1732) Signed-off-by: nolouch <nolouch@gmail.com> * tools: fix TLS in pd control (#1729) Signed-off-by: Ryan Leung <rleungx@gmail.com> * syncer: support TLS for region syncer (#1728) Signed-off-by: Ryan Leung <rleungx@gmail.com> * schedule: fix a thread-safe bug and improve code (#1719)
* *: unify get store function everywhere (#1671) Signed-off-by: Ryan Leung <rleungx@gmail.com> * server: use leader lease to determine tso service validity (#1676) Signed-off-by: disksing <i@disksing.com> * test: fix tests (#1696) * test: fix region syncer test Signed-off-by: disksing <i@disksing.com> * add config-check flag for pd-server (#1695) Signed-off-by: cwen0 <cwenyin0@gmail.com> * operator: rewrite move region related functions (#1667) * *: support setting endKey for ScanRange (#1700) Signed-off-by: disksing <i@disksing.com> * *: reduce some unnecessary parameters (#1698) Signed-off-by: Ryan Leung <rleungx@gmail.com> * schedule: Do not send an operator of a region wth a stale epoch (#1659) * schedule: Do not send an operator of a region wth a stale epoch Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: check the version changed by the operator self Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix unit test Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * schedule: fix to avoid dispatching a stale opstep Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refactor "ConsumeConfVer() int" to "ExpectConfVerChange() bool" Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: fix typo in comment Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * fix typo Co-Authored-By: Ryan Leung <rleungx@gmail.com> * dispatch: fix unittest Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * dispatch: refine format Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * server: fix the dead lock in scatter region (#1706) Signed-off-by: Ryan Leung <rleungx@gmail.com> * tools: fix set namespace in pd-ctl (#1701) Signed-off-by: Ryan Leung <rleungx@gmail.com> * tools: fix parse url without http prefix (#1703) Signed-off-by: Ryan Leung <rleungx@gmail.com> * tests: support deadlock detection in make test (#1704) Signed-off-by: Ryan Leung <rleungx@gmail.com> * Makefile: fix failpoint enable (#1722) Signed-off-by: nolouch <nolouch@gmail.com> * checker: fix the issue that a region does not merge to the sibling with smaller size (#1723) Signed-off-by: disksing <i@disksing.com> * tools: balance region simulator (#1708) * scheduler: do not remove the operator when the step does not finish (#1715) Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> * operator: fix the AddLearner config version judgment (#1732) Signed-off-by: nolouch <nolouch@gmail.com> * tools: fix TLS in pd control (#1729) Signed-off-by: Ryan Leung <rleungx@gmail.com> * syncer: support TLS for region syncer (#1728) Signed-off-by: Ryan Leung <rleungx@gmail.com> * schedule: fix a thread-safe bug and improve code (#1719) * statistics: fix region flow calculation (#1688) Signed-off-by: jiyingtk <jiyingtk@mail.ustc.edu.cn> * makefile: improve deadlock-enable/disable (#1736) * api: fix missing keys statistic in region information (#1741) Signed-off-by: nolouch <nolouch@gmail.com> * *: update go version to 1.13 (#1742) Signed-off-by: disksing <i@disksing.com> * coordinator: add the operator cost time in log field (#1748) Signed-off-by: nolouch <nolouch@gmail.com>
What problem does this PR solve?
matchPeerSteps
much more clear.CreateMoveRegionOperator
andmatchPeerSteps
don't checkRejectLeader
label.What is changed and how it works?
moveRegionSteps
and reuse it inmatchPeerSteps
.transferLeaderToAnySteps
to select new leader.Check List
Tests