Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(restore): multiple restore requests should be rejected and proposals should not be submitted #7118

Merged
merged 5 commits into from
Dec 15, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 29 additions & 5 deletions worker/online_restore_ee.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,38 @@ func ProcessRestoreRequest(ctx context.Context, req *pb.RestoreRequest) (int, er
if err := VerifyBackup(req, &creds, currentGroups); err != nil {
return 0, errors.Wrapf(err, "failed to verify backup")
}

if err := FillRestoreCredentials(req.Location, req); err != nil {
return 0, errors.Wrapf(err, "cannot fill restore proposal with the right credentials")
}
// Restore Tracker keeps tracks of ongoing restore operation initiated on the node.
// Restore Tracker are node specific. They manage restore operation initiated on the node.
// If already initiated on this node, rt.Add() will return already running operation error.
restoreId, err := rt.Add()
if err != nil {
return 0, errors.Wrapf(err, "cannot assign ID to restore operation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot assign ID to restore operation, because of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapf will take care of that. rt.Add() returns multiple kind of errors. wrapf will combine message to make it more meaningfull. This is the interface it implements. func (w *withMessage) Error() string { return w.msg + ": " + w.cause.Error() } Adding something in errors will mess up with the api response. But yes we can add some logging if needed. But seems redundant here.

}

// This check if any restore operation running on the node.
// Operation initiated on other nodes doesn't have record in the record tracker.
// This keeps track if there is an already running restore operation return the error.
// IMP: This introduces few corner cases.
// Like two concurrent restore operation on different nodes.
// Considering Restore as admin operation, solving all those complexities has low gains
// than to sacrifice the simplicity.
isRestoreRunning := func() bool {
tasks := GetOngoingTasks()
for _, t := range tasks {
if t == opRestore.String() {
return true
}
}
return false
}
if isRestoreRunning() {
return 0, errors.Errorf("another restore operation is already running. " +
"Please retry later.")
}

req.RestoreTs = State.GetTimestamp(false)

// TODO: prevent partial restores when proposeRestoreOrSend only sends the restore
Expand All @@ -80,10 +108,6 @@ func ProcessRestoreRequest(ctx context.Context, req *pb.RestoreRequest) (int, er
}()
}

restoreId, err := rt.Add()
if err != nil {
return 0, errors.Wrapf(err, "cannot assign ID to restore operation")
}
go func(restoreId int) {
errs := make([]error, 0)
for range currentGroups {
Expand Down