Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PlannedReparentShard: Fix more known-recoverable problems. #5376

Merged
merged 5 commits into from
Oct 31, 2019

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Oct 29, 2019

PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.

PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.

Signed-off-by: Anthony Yeh enisoc@planetscale.com

@enisoc enisoc requested a review from deepthi October 29, 2019 23:39
@enisoc enisoc requested a review from sougou as a code owner October 29, 2019 23:39
// catch up before we time out. This assumes a replica can work through
// its backlog at approximately the same rate that the transactions
// happened on the master.
if float64(status.SecondsBehindMaster) >= waitReplicasTimeout.Seconds() {
Copy link
Member

@deepthi deepthi Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern with piggy-backing the replication lag timeout on waitReplicasTimeout is that it doesn't allow people to opt out of this check. We are arguing that this is OK because a replica with lag > waitReplicasTimeout is unlikely to catch up during that time.

SecondsBehindMaster is in fact not reliable and there are cases when it doesn't get updated often enough by mysql and is actually reported as much higher than the real lag. This seems to happen in precisely the situation we are creating here - where there are no more writes to master.
See last comment in #5000 (comment)

We should consider adding another flag (-skip_replication_lag_check) to the PRS command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. What about making a flag to customize the SecondsBehindMaster threshold, instead of an on/off thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I had originally planned to implement for #4700, so I'll vote yes on that.

PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.

PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I just have a nit in one of the error messages.

}
// Check if it's behind by a small enough amount.
if float64(status.SecondsBehindMaster) > masterElectLagThreshold.Seconds() {
return vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "replication lag on master-elect %v (%v seconds) is greater than the specified lag threshold (%v); let replication catch up first or try again with a higher threshold", masterElectTabletAliasStr, status.SecondsBehindMaster, masterElectLagThreshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lag threshold -> lag_threshold
to be consistent with user-visible flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this flag entirely since we don't check lag anymore, as discussed offline.

if topoproto.TabletAliasEqual(shardInfo.MasterAlias, masterElectTabletAlias) {
// If the master is already the one we want, we just try to fix replicas (below).
rp, err := wr.tmc.MasterPosition(remoteCtx, masterElectTabletInfo.Tablet)
if currentMaster == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PRS is now handling the cases of no master / multi-master, that means situations where ERS is required should become rare to nonexistent. Let us make sure to document that in the eventual PR for merging upstream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. To summarize, my goal for PRS is that eventually it should be able to fix almost any problem as long as:

  1. All tablets are reachable, so we can check global state.
    AND
  2. The global replication state (relative positions) is consistent and compatible with making the chosen tablet the master.

You should then only need ERS in the following cases:

  1. The current master is unreachable.
    OR
  2. The relative replication positions have become inconsistent (e.g. alternative futures).
    OR
  3. It's unclear who the current master is, and some tablets are unreachable, which means we can't be sure if the global state is consistent.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@@ -184,7 +184,7 @@ type TabletManagerClient interface {
// SetMaster tells a tablet to make itself a slave to the
// passed in master tablet alias, and wait for the row in the
// reparent_journal table (if timeCreatedNS is non-zero).
SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, forceStartSlave bool) error
SetMaster(ctx context.Context, tablet *topodatapb.Tablet, parent *topodatapb.TabletAlias, timeCreatedNS int64, waitPosition string, forceStartSlave bool) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't changing the interface here cause problems during upgrade?
Old vtctld's wrangler will call the old version of SetMaster, which won't work on an already upgraded vttablet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the call crosses process boundaries, it gets encoded as protobuf on the wire. The protobuf level is thus where we need to ensure compatibility when changing existing RPCs.

Adding a new, optional field in the Request struct like this should be safe. The old vtctld will not try to use the new field because it doesn't know about it. The new vttablet will simply receive a Request protobuf with the new field unset, so it will be left on the zero value.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants