Make reparents more robust #5391

enisoc · 2019-11-01T07:04:10Z

This is the implementation of the plan discussed in #5172. The main features of the new implementation include:

Tablets that think they're masters now watch the shard record and automatically demote themselves if they see another tablet has won election as master. This should make the unintended multiple-master case self-healing as long as global topology is available.
PlannedReparentShard (PRS) can be run to attempt graceful fix-up of replication across all the tablets in a shard, even if the requested -new_master tablet is already the master. This means, for example, if PRS reports partial failure (e.g. some replicas couldn't be reached to reparent them), you can run it again to retry any failed operations.
PRS will first measure whether the requested -new_master is able to make progress replicating from the current master before setting the current master read-only. This avoids causing any disruption to the current master in the case when the candidate master is too far behind on replication to catch up within the timeout of the reparent operation.
Also fixes abort PlannedReparent if master_elect replication lag is more than specified amount #4700

RELEASE NOTE: ACTION REQUIRED

When updating from a version before this PR to a version after it, it is critical that you follow the recommended upgrade order. In particular, you must upgrade all the vttablets in the cluster before upgrading any of the vtctlds.

Similarly, if you need to downgrade from a version after this PR to a version before it, you must downgrade in the reverse order: downgrade all vtctlds before downgrading any vttablets.

Signed-off-by: deepthi <deepthi@planetscale.com>

Reparent: Move TER vtctl command from vttablet to wrangler

Reparent: add ability to watch shard data

* PlannedReparentShard: Allow retrying PRS to the existing master. This is an incremental first step toward making PRS more useful for repairing situations when replication across a shard is not fully consistent. The main thing this enables is retrying the step of reconfiguring all replicas (including the old master) to point to the new master. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix PRS test: Old master should have no slave status. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix comment. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

In particular, if we know we're master but the shard record is wrong, update it. And if another tablet takes over the shard record by having a more recent master term start time, we know we need to stop claiming to be master. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

tabletmanager: Keep tablet and shard in sync.

The new TER in wrangler skipped setting the master term start time. Now we start a master term if ChangeType() is called with type MASTER. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* Fix PlannedReparentShard unit tests We should not explicitly call SetMaster on the old master because PromoteSlaveWhenCaughtUp sets newMaster's tablet type to MASTER, which leads ShardSync to update the Shard record, which notifies the oldMaster's ShardSync, which calls SetMaster Signed-off-by: deepthi <deepthi@planetscale.com> * PromoteSlave should use a separate context and not reuse remoteCtx Signed-off-by: deepthi <deepthi@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Fix vtgate_buffer test

Duplicated relevant RPC tests for wrangler. Moved unrelated tests to a different file, fixed RPC tests to not error out during SetMaster Signed-off-by: deepthi <deepthi@planetscale.com>

…otected by mutex Signed-off-by: deepthi <deepthi@planetscale.com>

* Remove obsolete comments. These are talking about the serving graph, which no longer exists. Instead of storing serving state of each tablet in topo, we now have vtgate directly query serving state of every tablet. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Make DemoteMaster idempotent. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: deepthi <deepthi@planetscale.com>

…Cancel Signed-off-by: deepthi <deepthi@planetscale.com>

Signed-off-by: deepthi <deepthi@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Unit tests for wrangler version of TabletExternallyReparented

Signed-off-by: deepthi <deepthi@planetscale.com>

unit tests for shard watch

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

…update shard master Signed-off-by: deepthi <deepthi@planetscale.com>

…arting with InitTablet Signed-off-by: deepthi <deepthi@planetscale.com>

applicable conditions vttablet InitTablet should check MasterTermStartTime and take over if necessary fix unit test setup to work with changes to InitTablet functions Signed-off-by: deepthi <deepthi@planetscale.com>

…n-zero Signed-off-by: deepthi <deepthi@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: deepthi <deepthi@planetscale.com>

…that new tablet is returned only if there is no error Signed-off-by: deepthi <deepthi@planetscale.com>

InitTablet should not update master alias on shard record

…er will do it (#5363) Signed-off-by: deepthi <deepthi@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* PlannedReparentShard: Fix more known-recoverable problems. PlannedReparentShard should be able to fix replication as long as all tablets are reachable and all replication positions are in a mutually-consistent state. PRS also no longer trusts that the shard record contains up-to-date information on the master, because we update that record asynchronously now. Instead, it looks at MasterTermStartTime values stored in each master tablet's record, so it makes the same choice of master as vtgates. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PlannedReparentShard: Add -lag_threshold flag. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * Fix expected error in reparent test. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PRS: Add test case for graceful recovery. Signed-off-by: Anthony Yeh <enisoc@planetscale.com> * PRS: Measure replication progress instead of lag. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc · 2019-11-01T17:33:54Z

@sougou Before merging this, please make sure you change from "Squash and merge" to "Create a merge commit" so we don't lose individual authorship. We already reviewed and squashed along the way as we merged PRs into the dev branch.

sougou

Great work @enisoc and @deepthi!

sougou · 2019-11-03T00:23:21Z

go/vt/topo/shard.go

+// WatchShard will set a watch on the Shard object.
+// It has the same contract as conn.Watch, but it also unpacks the
+// contents into a Shard object
+func (ts *Server) WatchShard(ctx context.Context, keyspace, shard string) (*WatchShardData, <-chan *WatchShardData, CancelFunc) {


We'll eventually need to harden this to make sure it stays connected to the topo.

deepthi and others added 30 commits September 26, 2019 15:01

Reparent: add ability to watch shard data

13494ed

Signed-off-by: deepthi <deepthi@planetscale.com>

Reparent: Move TER vtctl command from vttablet to wrangler

ef8ce9e

Signed-off-by: deepthi <deepthi@planetscale.com>

Merge pull request #5235 from planetscale/ds-move-ter-to-wr

21aa666

Reparent: Move TER vtctl command from vttablet to wrangler

Merge pull request #5236 from planetscale/ds-add-shard-watch

346688f

Reparent: add ability to watch shard data

Merge branch 'master' into reparent-refactor

a9b6898

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

topodata.proto: Add master_term_start_time field to Shard record.

2f5a57e

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge pull request #5264 from planetscale/tablet-shard-sync

7e5e103

tabletmanager: Keep tablet and shard in sync.

Fix reparent test. (#5266)

aa81ae3

The new TER in wrangler skipped setting the master term start time. Now we start a master term if ChangeType() is called with type MASTER. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge branch 'master' into reparent-refactor

4b12b1c

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

shard_sync: Add logging and use independent timeouts.

66a00b4

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Fix vtgate_buffer test.

abdc821

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge pull request #5291 from planetscale/fix-reparent-refactor

d5eee3d

Fix vtgate_buffer test

Unit tests for wrangler version of TabletExternallyReparented.

f383512

Duplicated relevant RPC tests for wrangler. Moved unrelated tests to a different file, fixed RPC tests to not error out during SetMaster Signed-off-by: deepthi <deepthi@planetscale.com>

reads and writes to _shardSyncChan and _shardSyncCancel need to be pr…

2b6599e

…otected by mutex Signed-off-by: deepthi <deepthi@planetscale.com>

unit tests for shard watch

639f1e3

Signed-off-by: deepthi <deepthi@planetscale.com>

use mutex properly to control access to _shardSyncChan and _shardSync…

214d7ea

…Cancel Signed-off-by: deepthi <deepthi@planetscale.com>

fix doc comment

352d31d

Signed-off-by: deepthi <deepthi@planetscale.com>

Merge branch 'master' into reparent-refactor

75f5471

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge pull request #5292 from planetscale/ds-ter-tests

b43e6f5

Unit tests for wrangler version of TabletExternallyReparented

Merge branch 'master' into reparent-refactor

0e78d81

Signed-off-by: deepthi <deepthi@planetscale.com>

Merge pull request #5296 from planetscale/ds-shard-watch-tests

6abaa03

unit tests for shard watch

Make SetMaster idempotent. (#5300)

1bab089

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge branch 'master' into reparent-refactor

191d5e5

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Merge branch 'master' into reparent-refactor

d200c23

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

store master_term_start_time in tablet record. InitTablet should not …

57a6f48

…update shard master Signed-off-by: deepthi <deepthi@planetscale.com>

shard_sync should check for nil MasterAlias, which can happen when st…

e847d5e

…arting with InitTablet Signed-off-by: deepthi <deepthi@planetscale.com>

deepthi and others added 12 commits October 22, 2019 12:27

vtctl InitTablet should set tablet.MasterTermStartTime under all

67d2921

applicable conditions vttablet InitTablet should check MasterTermStartTime and take over if necessary fix unit test setup to work with changes to InitTablet functions Signed-off-by: deepthi <deepthi@planetscale.com>

InitTablet: only set MasterTermStartTime on tablet record if it is no…

69f2523

…n-zero Signed-off-by: deepthi <deepthi@planetscale.com>

Merge branch 'master' into reparent-refactor

fc168b6

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

set masterTermStartTime on tablet correctly whenever tablet type changes

8b6cca0

Signed-off-by: deepthi <deepthi@planetscale.com>

fix tests

7006212

Signed-off-by: deepthi <deepthi@planetscale.com>

ChangeType should not update tablet if nothing changed

5e57501

Signed-off-by: deepthi <deepthi@planetscale.com>

changes from review

f2b8269

Signed-off-by: deepthi <deepthi@planetscale.com>

clean up ChangeType to avoid 2 topo calls, and to adhere to contract …

25e574d

…that new tablet is returned only if there is no error Signed-off-by: deepthi <deepthi@planetscale.com>

Merge pull request #5316 from planetscale/ds-init-tablet-master-ts

d9fa8cd

InitTablet should not update master alias on shard record

EmergencyReparentShard does not need to update shard master, new mast…

789a0e3

…er will do it (#5363) Signed-off-by: deepthi <deepthi@planetscale.com>

Merge branch 'master' into reparent-refactor

e8dd5a4

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc requested a review from deepthi November 1, 2019 07:04

enisoc requested a review from sougou as a code owner November 1, 2019 07:04

sougou approved these changes Nov 3, 2019

View reviewed changes

sougou merged commit 7afc5ee into master Nov 3, 2019

morgo mentioned this pull request Nov 4, 2019

Make reparents more robust #5172

Closed

spark4 mentioned this pull request Nov 12, 2019

Serry deploy tinyspeck/vitess#140

Closed

spark4 mentioned this pull request Nov 22, 2019

Slack sync upstream 2019 11 09.r0 tinyspeck/vitess#142

Merged

rafael mentioned this pull request Dec 11, 2019

Slack sync upstream 2019 12 11.r0 tinyspeck/vitess#143

Merged

morgo deleted the reparent-refactor branch December 19, 2019 20:24

enisoc mentioned this pull request May 20, 2020

Make emergency reparents more robust. #6206

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make reparents more robust #5391

Make reparents more robust #5391

enisoc commented Nov 1, 2019 •

edited by deepthi

Loading

enisoc commented Nov 1, 2019

sougou left a comment

sougou Nov 3, 2019

Make reparents more robust #5391

Make reparents more robust #5391

Conversation

enisoc commented Nov 1, 2019 • edited by deepthi Loading

RELEASE NOTE: ACTION REQUIRED

enisoc commented Nov 1, 2019

sougou left a comment

Choose a reason for hiding this comment

sougou Nov 3, 2019

Choose a reason for hiding this comment

enisoc commented Nov 1, 2019 •

edited by deepthi

Loading