Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(restore): multiple restore requests should be rejected and proposals should not be submitted #7118

Merged
merged 5 commits into from
Dec 15, 2020

Conversation

aman-bansal
Copy link
Contributor

@aman-bansal aman-bansal commented Dec 11, 2020

Issue: multiple restore requests resulted in multiple restore operation even though requests erred out.

Reason: We have a check to manage number of restore operations running. But the check is happening after submitted the proposals hence failure message but multiple restore operations trigger.


This change is Reviewable

@netlify
Copy link

netlify bot commented Dec 11, 2020

✔️ Deploy preview for dgraph-docs ready!

🔨 Explore the source changes: 240a23e

🔍 Inspect the deploy logs: https://app.netlify.com/sites/dgraph-docs/deploys/5fd5ea78b537f60008753af9

😎 Browse the preview: https://deploy-preview-7118--dgraph-docs.netlify.app

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aman-bansal, @manishrjain, @MichaelJCompton, and @vvbalaji-dgraph)


wiki/content/enterprise-features/binary-backups.md, line 431 at r1 (raw file):

Only restore once. If you run same restore on a different node at the same time,
then that request will also be accepted.

When running a live restore, you only need to send the request to a single Alpha. If you run the same restore on a different Alpha at the same time, then another restore operation will be started.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the doc and then :lgtm:

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aman-bansal, @manishrjain, @MichaelJCompton, and @vvbalaji-dgraph)

if err := FillRestoreCredentials(req.Location, req); err != nil {
return 0, errors.Wrapf(err, "cannot fill restore proposal with the right credentials")
}
restoreId, err := rt.Add()
if err != nil {
return 0, errors.Wrapf(err, "cannot assign ID to restore operation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot assign ID to restore operation, because of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapf will take care of that. rt.Add() returns multiple kind of errors. wrapf will combine message to make it more meaningfull. This is the interface it implements. func (w *withMessage) Error() string { return w.msg + ": " + w.cause.Error() } Adding something in errors will mess up with the api response. But yes we can add some logging if needed. But seems redundant here.

@manishrjain
Copy link
Contributor

Approved. Just one comment.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aman-bansal , were you going to remove the restroreTracker too? Or, is that a different PR?

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aman-bansal, @danielmai, @MichaelJCompton, and @vvbalaji-dgraph)


worker/draft.go, line 226 at r3 (raw file):

}

func isTaskRunning(operation op) bool {

GetOngoingTasks gives you the same information. Use that.

Copy link
Contributor Author

@aman-bansal aman-bansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a separate PR. It will make one endpoint operation restoreStatus redundant. I will cherrypick this to 20.11 and restore tracker removal being breaking change, will be released later.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @danielmai, @manishrjain, @MichaelJCompton, and @vvbalaji-dgraph)


wiki/content/enterprise-features/binary-backups.md, line 431 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…
Only restore once. If you run same restore on a different node at the same time,
then that request will also be accepted.

When running a live restore, you only need to send the request to a single Alpha. If you run the same restore on a different Alpha at the same time, then another restore operation will be started.

Done.


worker/draft.go, line 226 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

GetOngoingTasks gives you the same information. Use that.

I had to write this logic over loop to check particular op is running or not. I thought being here could be used later. One more thing because of which I was reluctant is GetOngoingTasks return []string. Let me push another thing that uses GetOngoingTasks.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai, @manishrjain, @MichaelJCompton, and @vvbalaji-dgraph)

@aman-bansal aman-bansal merged commit 6688364 into master Dec 15, 2020
@aman-bansal aman-bansal deleted the aman/restore_multi_fix branch December 16, 2020 07:31
aman-bansal added a commit that referenced this pull request Jan 12, 2021
…als should not be submitted (#7118)

* fix(restore): multiple restore requests should be rejected and proposals should not be submitted

* adding doc defining restore behaviour

* changing message to be more concrete

* making restore return error if operation already running

* making use of GetOngoingTasks
aman-bansal added a commit that referenced this pull request Jan 12, 2021
…als should not be submitted (#7118) (#7276)

* fix(restore): multiple restore requests should be rejected and proposals should not be submitted

* adding doc defining restore behaviour

* changing message to be more concrete

* making restore return error if operation already running

* making use of GetOngoingTasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants