-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect -disable_active_reparents in backup/restore #7576
Conversation
…t. Also, Begin and End Maintenance on orchestrator Signed-off-by: deepthi <deepthi@planetscale.com>
…. Also, Begin and End Maintenance on orchestrator Signed-off-by: deepthi <deepthi@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
return vterrors.Wrap(err, "MysqlDaemon.SetMaster failed") | ||
} | ||
|
||
// If active reparents are disabled, we don't restart replication. So it makes no sense to wait for an update on the replica. | ||
// Return immediately. | ||
if !*mysqlctl.DisableActiveReparents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be the opposite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. It would have been a regression. I have opened #7703 to fix this.
if tm.orc == nil { | ||
return | ||
} | ||
if err := tm.orc.BeginMaintenance(tm.Tablet(), "vttablet has been told to Restore"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from our experiment, tm.Tablet()
seems to give an out dated state and we are getting cannot find mysql port error
, we ended up using tm.tmState.Tablet()
. Let me know if you also experience it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tm.Tablet()
simply returns tm.tmState.Tablet()
so they should be equivalent. I do see that it is possible for a race condition to occur between checkMysql
and handleRestore
.
err := tm.restoreDataLocked(ctx, logger, waitForBackupInterval, deleteBeforeRestore) | ||
// Tell Orchestrator we're no longer stopped on purpose. | ||
// Do this in the background, as it's best-effort. | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only EndMaintenance
if restoreDataLocked
returns no error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable.
@@ -124,6 +134,16 @@ func (tm *TabletManager) Backup(ctx context.Context, concurrency int, logger log | |||
} | |||
returnErr = err | |||
} | |||
// Tell Orchestrator we're no longer stopped on purpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zhannan what about here? Should this also be called only if err == nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
Description
When the vttablet flag
-disable_active_reparents
is set, then vitess should not be managing the mysql level replication.However, it turns out that this was not being respected by backup/restore.
This PR fixes that by adding checks of the flag before restarting replication after a backup/restore. In terms of backup, this is only applicable to offline backups.
In addition, we put the vttablet into orchestrator maintenance mode during backup/restore. We require replication to be off and we do not want orchestrator noticing that it is off and enabling it again while a backup or restore is in progress.
Related Issue(s)
Fixes #7657
Checklist
Impacted Areas in Vitess
Components that this PR will affect: