Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field to backup requests to force a full backup. #3387

Merged
merged 5 commits into from
May 20, 2019

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented May 7, 2019

This change adds a new field to the backup requests to force a full
backup. Only the code that forces the full backup is included in this
PR to keep it small.

Manually tested using a minio instance locally. No existing tests will
be broken by this change. Automated tests will be included later since
the existing backup test needs some work before it can verify this
feature.


This change is Reviewable

This change adds a new field to the backup requests to force a full
backup. Only the code that forces the full backup is included in this
PR to keep it small.

Manually tested using a minio instance locally. No existing tests will
be broken by this change. Automated tests will be included later since
the existing backup test needs some work before it can verify this
feature.
@martinmr martinmr requested review from manishrjain and a team as code owners May 7, 2019 23:33
Copy link
Contributor

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r1.
Reviewable status: 4 of 5 files reviewed, all discussions resolved (waiting on @manishrjain)

gitlw
gitlw previously requested changes May 8, 2019
Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a doc to the version field. Please check if it makes sense and do a git pull, :)

Reviewable status: 4 of 6 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)


ee/backup/file_handler.go, line 76 at r1 (raw file):

			// No new changes since last backup.
			if !req.Backup.ForceFull && m.Version >= req.Backup.ReadTs {

I think Gus used SnapshotTs instead of ReadTs because even if there are no data changes, the ReadTs can still be incremented, e.g. by a transaction which has the ReadOnly set to false, and does not have any mutations.

In such cases, using the SnapshotTs can avoid generating snapshots while there is no data change. Do you want to change away from that behavior?

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 1 unresolved discussion (waiting on @gitlw and @manishrjain)


ee/backup/file_handler.go, line 76 at r1 (raw file):

Previously, gitlw (Lucas Wang) wrote…

I think Gus used SnapshotTs instead of ReadTs because even if there are no data changes, the ReadTs can still be incremented, e.g. by a transaction which has the ReadOnly set to false, and does not have any mutations.

In such cases, using the SnapshotTs can avoid generating snapshots while there is no data change. Do you want to change away from that behavior?

I am going to try to come up with some other mechanism to detect if there are any changes in some other PR. The problem with the current approach is that the backup requests might return success without actually doing anything. There are some changes in master that will make snapshots happen less frequently, so it could lead to situations where data is lost because users thought it was backed up when it wasn't.

@martinmr martinmr dismissed gitlw’s stale review May 9, 2019 19:14

addressed review

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw and @martinmr)


ee/backup/file_handler.go, line 76 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I am going to try to come up with some other mechanism to detect if there are any changes in some other PR. The problem with the current approach is that the backup requests might return success without actually doing anything. There are some changes in master that will make snapshots happen less frequently, so it could lead to situations where data is lost because users thought it was backed up when it wasn't.

We don't need any mechanism to detect changes or not in Dgraph. All that'd avoid is some CPU cycles looping over keys in the tree. If we need such a mechanism, it should go into Badger.

Please remove any such mechanism if present. We should be using the ReadTs, not SnapshotTs.


ee/backup/file_handler.go, line 91 at r2 (raw file):

	// the contents from the database.
	if req.Backup.ForceFull {
		req.Version = 0

s/Version/Since? Considering this is the timestamp that we want to backup since.


protos/pb.proto, line 494 at r2 (raw file):

  // created. If false, the backup will be full or incremental depending on
  // the existing backups.
  bool force_full = 11;

Do we need this var? Can't since be just set to zero?


worker/backup_ee.go, line 121 at r2 (raw file):

		SessionToken: sessionToken,
		Anonymous:    anonymous,
		ForceFull:    forceFull,

Instead, set Since to zero?

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 6 files reviewed, 4 unresolved discussions (waiting on @codexnull, @gitlw, and @manishrjain)


ee/backup/file_handler.go, line 76 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We don't need any mechanism to detect changes or not in Dgraph. All that'd avoid is some CPU cycles looping over keys in the tree. If we need such a mechanism, it should go into Badger.

Please remove any such mechanism if present. We should be using the ReadTs, not SnapshotTs.

I have removed this check and the error. I have already removed the logic that used SnapshotTs instead of ReadTs.


ee/backup/file_handler.go, line 91 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

s/Version/Since? Considering this is the timestamp that we want to backup since.

Since this affects the current behavior, I'll do this after the PR to fix the tests is merged. For now I've added a todo.


protos/pb.proto, line 494 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do we need this var? Can't since be just set to zero?

Since is being used in a weird unintuitive way (it's not being set at the time a backup request is sent but after the request is received based on the response !!??) so I don't think it's safe to use it to force a full backup.

I plan to go through the code and try to simplify it as much as possible once the tests are submitted. For now I have added a TODO to check if this field can be removed.


worker/backup_ee.go, line 121 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead, set Since to zero?

See the other comment

worker/backup_ee.go Outdated Show resolved Hide resolved
@martinmr martinmr requested a review from manishrjain May 16, 2019 20:55
@martinmr martinmr dismissed manishrjain’s stale review May 17, 2019 01:00

addressed comments

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw, @manishrjain, and @martinmr)

@martinmr martinmr merged commit eaf5a90 into master May 20, 2019
@martinmr martinmr deleted the martinmr/force-full-backup branch May 20, 2019 22:29
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
This change adds a new field to the backup requests to force a full
backup. Only the code that forces the full backup is included in this
PR to keep it small.

Manually tested using a minio instance locally. No existing tests will
be broken by this change. Automated tests will be included later since
the existing backup test needs some work before it can verify this
feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants