Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make reparents more robust #5391

Merged
merged 42 commits into from
Nov 3, 2019
Merged

Make reparents more robust #5391

merged 42 commits into from
Nov 3, 2019

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Nov 1, 2019

This is the implementation of the plan discussed in #5172. The main features of the new implementation include:

  • Tablets that think they're masters now watch the shard record and automatically demote themselves if they see another tablet has won election as master. This should make the unintended multiple-master case self-healing as long as global topology is available.
  • PlannedReparentShard (PRS) can be run to attempt graceful fix-up of replication across all the tablets in a shard, even if the requested -new_master tablet is already the master. This means, for example, if PRS reports partial failure (e.g. some replicas couldn't be reached to reparent them), you can run it again to retry any failed operations.
  • PRS will first measure whether the requested -new_master is able to make progress replicating from the current master before setting the current master read-only. This avoids causing any disruption to the current master in the case when the candidate master is too far behind on replication to catch up within the timeout of the reparent operation.
  • Also fixes abort PlannedReparent if master_elect replication lag is more than specified amount #4700

RELEASE NOTE: ACTION REQUIRED

When updating from a version before this PR to a version after it, it is critical that you follow the recommended upgrade order. In particular, you must upgrade all the vttablets in the cluster before upgrading any of the vtctlds.

Similarly, if you need to downgrade from a version after this PR to a version before it, you must downgrade in the reverse order: downgrade all vtctlds before downgrading any vttablets.

deepthi and others added 30 commits September 26, 2019 15:01
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Reparent: Move TER vtctl command from vttablet to wrangler
Reparent: add ability to watch shard data
* PlannedReparentShard: Allow retrying PRS to the existing master.

This is an incremental first step toward making PRS more useful for
repairing situations when replication across a shard is not fully
consistent.

The main thing this enables is retrying the step of reconfiguring all
replicas (including the old master) to point to the new master.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* Fix PRS test: Old master should have no slave status.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* Fix comment.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
In particular, if we know we're master but the shard record is wrong,
update it. And if another tablet takes over the shard record by having a
more recent master term start time, we know we need to stop claiming to
be master.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
tabletmanager: Keep tablet and shard in sync.
The new TER in wrangler skipped setting the master term start time.
Now we start a master term if ChangeType() is called with type MASTER.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* Fix PlannedReparentShard unit tests
We should not explicitly call SetMaster on the old master because
PromoteSlaveWhenCaughtUp sets newMaster's tablet type to MASTER,
which leads ShardSync to update the Shard record, which notifies
the oldMaster's ShardSync, which calls SetMaster

Signed-off-by: deepthi <deepthi@planetscale.com>

* PromoteSlave should use a separate context and not reuse remoteCtx

Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Duplicated relevant RPC tests for wrangler.
Moved unrelated tests to a different file, fixed RPC tests to not error
out during SetMaster

Signed-off-by: deepthi <deepthi@planetscale.com>
…otected by mutex

Signed-off-by: deepthi <deepthi@planetscale.com>
* Remove obsolete comments.

These are talking about the serving graph, which no longer exists.
Instead of storing serving state of each tablet in topo, we now have
vtgate directly query serving state of every tablet.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* Make DemoteMaster idempotent.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
…Cancel

Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Unit tests for wrangler version of TabletExternallyReparented
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
…update shard master

Signed-off-by: deepthi <deepthi@planetscale.com>
…arting with InitTablet

Signed-off-by: deepthi <deepthi@planetscale.com>
deepthi and others added 12 commits October 22, 2019 12:27
applicable conditions
vttablet InitTablet should check MasterTermStartTime and take over if
necessary
fix unit test setup to work with changes to InitTablet functions

Signed-off-by: deepthi <deepthi@planetscale.com>
…n-zero

Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
…that new tablet is returned only if there is no error

Signed-off-by: deepthi <deepthi@planetscale.com>
InitTablet should not update master alias on shard record
…er will do it (#5363)

Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* PlannedReparentShard: Fix more known-recoverable problems.

PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.

PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* PlannedReparentShard: Add -lag_threshold flag.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* Fix expected error in reparent test.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* PRS: Add test case for graceful recovery.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

* PRS: Measure replication progress instead of lag.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc requested a review from deepthi November 1, 2019 07:04
@enisoc enisoc requested a review from sougou as a code owner November 1, 2019 07:04
@enisoc
Copy link
Member Author

enisoc commented Nov 1, 2019

@sougou Before merging this, please make sure you change from "Squash and merge" to "Create a merge commit" so we don't lose individual authorship. We already reviewed and squashed along the way as we merged PRs into the dev branch.

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @enisoc and @deepthi!

// WatchShard will set a watch on the Shard object.
// It has the same contract as conn.Watch, but it also unpacks the
// contents into a Shard object
func (ts *Server) WatchShard(ctx context.Context, keyspace, shard string) (*WatchShardData, <-chan *WatchShardData, CancelFunc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll eventually need to harden this to make sure it stays connected to the topo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

abort PlannedReparent if master_elect replication lag is more than specified amount
3 participants