Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vtgate_buffer test #5291

Merged
merged 2 commits into from
Oct 11, 2019

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Oct 10, 2019

The main problem was that we now do a full DemoteMaster on the old master, even in the case of TER, as long as disable_active_reparent is false. This takes a lot longer (because shutting down the queryservice takes 30s before in-progress transactions are cancelled), so the test was timing out at a couple different levels (contexts and waiting for vtgate to find enough healthy tablets).

This fixes the test by fixing the timeouts, but I think we should also avoid doing a full DemoteMaster in the case of TER. I'll optimize that in a future PR though. For now, let's at least get the tests green.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc requested a review from deepthi October 10, 2019 22:31
@enisoc enisoc requested a review from sougou as a code owner October 10, 2019 22:31
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -187,25 +189,34 @@ func syncShardMaster(ctx context.Context, ts *topo.Server, tablet *topodatapb.Ta
// If active reparents are disabled, we don't touch our MySQL.
// We just directly update our tablet type to REPLICA.
func (agent *ActionAgent) abortMasterTerm(ctx context.Context, masterAlias *topodatapb.TabletAlias) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point we should rename this to endMasterTerm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name was meant to convey that this is not the "graceful" way for a master term to come to an end. If we hit this code path, it means we should sort of "force demote" ourselves because another master has already taken over behind our backs. Do you think abort is a bad way to describe this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right, but it means that we are ending a master term in a not "graceful" way with the current implementation of PRS. Are we ok with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRS should have already called DemoteMaster to gracefully step down before we get to this code path. We will still technically execute this code path, but once we make these idempotent, there will be no net effect from this code path in the case of a successful PRS.

@sougou sougou merged commit d5eee3d into vitessio:reparent-refactor Oct 11, 2019
@enisoc enisoc deleted the fix-reparent-refactor branch October 11, 2019 03:43
@sougou
Copy link
Contributor

sougou commented Oct 11, 2019

Oops! Next time I'll make sure to check that it's a master PR before merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants