Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, core: support add learner node #896

Merged
merged 56 commits into from
Apr 13, 2018
Merged

Conversation

Connor1996
Copy link
Member

@Connor1996 Connor1996 commented Dec 29, 2017

this pr does:

  1. if enable raft learner, pd use AddLearnerPeer and PromoteLearnerPeer to replace AddNode
  2. scheduler will not select region that has learners
  3. to consider recovering from unfinished operator, pd check region that has learners but with no operator, and then pd will add PromoteLearnerPeer operator for it.

note that: peers = voters+learners

@Connor1996 Connor1996 changed the title server, core: support add learner node [WIP]server, core: support add learner node Dec 29, 2017
server/config.go Outdated
@@ -333,6 +333,8 @@ type ScheduleConfig struct {
ReplicaScheduleLimit uint64 `toml:"replica-schedule-limit,omitempty" json:"replica-schedule-limit"`
// TolerantSizeRatio is the ratio of buffer size for balance scheduler.
TolerantSizeRatio float64 `toml:"tolerant-size-ratio,omitempty" json:"tolerant-size-ratio"`
// EnableRaftLearner is the switch for using AddLearnerNode instead of AddNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch -> option.

@@ -472,6 +472,32 @@ func (c *coordinator) sendScheduleCommand(region *core.RegionInfo, step schedule
},
}
c.hbStreams.sendMsg(region, cmd)
case schedule.AddLearnerPeer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove a learner peer? I believe we need this feature in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse RemovePeer

DownPeers []*pdpb.PeerStats
PendingPeers []*metapb.Peer
PendingLearnerPeers []*metapb.Peer
CompleteLearnerPeers []*metapb.Peer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any document about how the CompleteLearner works? Specifically, when will a peer be added to the list and when will it be removed?


// Influence calculates the store difference that current step make
func (plp PromoteLearnerPeer) Influence(opInfluence OpInfluence, region *core.RegionInfo) {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just leave the method body empty.

// IsFinish checks if current step is finished.
func (alp AddLearnerPeer) IsFinish(region *core.RegionInfo) bool {
if p := region.GetStorePeer(alp.ToStore); p != nil {
return region.GetPendingLearnerPeer(p.GetId()) == nil && region.GetCompleteLearnerPeer(p.GetId()) != nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, a learner peer will be included in the peers list. This fact will affect some old utilities, such as RandomFollower. We should not select a learner as follower in some cases, for example we cannot transfer leader to a learner peer.

@disksing
Copy link
Contributor

disksing commented Jan 2, 2018

We should consider how to recover from unfinished operator. If pd's leader is changed, we need to remove a learner or promote it.

@Connor1996
Copy link
Member Author

PTAL @disksing @nolouch

@Connor1996 Connor1996 changed the title [WIP]server, core: support add learner node [DNM]server, core: support add learner node Feb 9, 2018
}

// ClassifyVoterAndLearner sorts out voter and learner from peers into different slice.
func ClassifyVoterAndLearner(region *RegionInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to export this function?

server/config.go Outdated
@@ -392,6 +395,7 @@ const (
defaultReplicaScheduleLimit = 32
defaultMergeScheduleLimit = 20
defaultTolerantSizeRatio = 2.5
defaultEnableRaftLearner = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote this unused variable.

@@ -119,6 +157,26 @@ func (r *RegionInfo) GetDownPeer(peerID uint64) *metapb.Peer {
return nil
}

// GetDownVoter returns the down peer with specified peer id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns the down voter

@disksing
Copy link
Contributor

disksing commented Apr 3, 2018

https://github.com/pingcap/pd/blob/e221ffb59f7b3433ee2e0a617f616dc92b02d007/server/schedule/replica_checker.go#L188
We need to check if the region has any learners here. Consider maxReplica=1, a region has a offline leader and a learner, the leader will be removed.

@disksing
Copy link
Contributor

disksing commented Apr 3, 2018

@disksing
Copy link
Contributor

disksing commented Apr 4, 2018

LGTM.

followers map[uint64]*regionMap // storeID -> regionID -> regionInfo
learners map[uint64]*regionMap // storeID -> regionID -> regionInfo
pendingPeers map[uint64]*regionMap // storeID -> regionID -> regionInfo
pendingLearners map[uint64]*regionMap // storeID -> regionID -> regionInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove pendingLearners

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -79,7 +79,7 @@ func (ap AddPeer) IsFinish(region *core.RegionInfo) bool {
log.Warnf("expect %v, but obtain voter %v", ap.String(), p.GetId())
return false
}
return region.GetPendingVoter(p.GetId()) == nil && p.GetId() != ap.PeerID
return region.GetPendingVoter(p.GetId()) == nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because AddNode won't have any subsequence operators?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think above if block already checked if their IDs are matched, so we don't need it here.

@disksing disksing added the priority/P1 The issue has P1 priority. label Apr 12, 2018
@Connor1996
Copy link
Member Author

/run-all-tests

@disksing
Copy link
Contributor

LGTM.

@disksing disksing added the status/tests-passed The PR has passed all tests. label Apr 13, 2018
@disksing disksing merged commit 2c8e7d7 into tikv:master Apr 13, 2018
@Connor1996 Connor1996 deleted the learner-node branch July 19, 2018 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P1 The issue has P1 priority. status/tests-passed The PR has passed all tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants