-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add field to backup requests to force a full backup. #3387
Conversation
This change adds a new field to the backup requests to force a full backup. Only the code that forces the full backup is included in this PR to keep it small. Manually tested using a minio instance locally. No existing tests will be broken by this change. Automated tests will be included later since the existing backup test needs some work before it can verify this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1.
Reviewable status: 4 of 5 files reviewed, all discussions resolved (waiting on @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a doc to the version field. Please check if it makes sense and do a git pull, :)
Reviewable status: 4 of 6 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)
ee/backup/file_handler.go, line 76 at r1 (raw file):
// No new changes since last backup. if !req.Backup.ForceFull && m.Version >= req.Backup.ReadTs {
I think Gus used SnapshotTs instead of ReadTs because even if there are no data changes, the ReadTs can still be incremented, e.g. by a transaction which has the ReadOnly set to false, and does not have any mutations.
In such cases, using the SnapshotTs can avoid generating snapshots while there is no data change. Do you want to change away from that behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 6 files reviewed, 1 unresolved discussion (waiting on @gitlw and @manishrjain)
ee/backup/file_handler.go, line 76 at r1 (raw file):
Previously, gitlw (Lucas Wang) wrote…
I think Gus used SnapshotTs instead of ReadTs because even if there are no data changes, the ReadTs can still be incremented, e.g. by a transaction which has the ReadOnly set to false, and does not have any mutations.
In such cases, using the SnapshotTs can avoid generating snapshots while there is no data change. Do you want to change away from that behavior?
I am going to try to come up with some other mechanism to detect if there are any changes in some other PR. The problem with the current approach is that the backup requests might return success without actually doing anything. There are some changes in master that will make snapshots happen less frequently, so it could lead to situations where data is lost because users thought it was backed up when it wasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw and @martinmr)
ee/backup/file_handler.go, line 76 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I am going to try to come up with some other mechanism to detect if there are any changes in some other PR. The problem with the current approach is that the backup requests might return success without actually doing anything. There are some changes in master that will make snapshots happen less frequently, so it could lead to situations where data is lost because users thought it was backed up when it wasn't.
We don't need any mechanism to detect changes or not in Dgraph. All that'd avoid is some CPU cycles looping over keys in the tree. If we need such a mechanism, it should go into Badger.
Please remove any such mechanism if present. We should be using the ReadTs, not SnapshotTs.
ee/backup/file_handler.go, line 91 at r2 (raw file):
// the contents from the database. if req.Backup.ForceFull { req.Version = 0
s/Version/Since? Considering this is the timestamp that we want to backup since.
protos/pb.proto, line 494 at r2 (raw file):
// created. If false, the backup will be full or incremental depending on // the existing backups. bool force_full = 11;
Do we need this var? Can't since be just set to zero?
worker/backup_ee.go, line 121 at r2 (raw file):
SessionToken: sessionToken, Anonymous: anonymous, ForceFull: forceFull,
Instead, set Since to zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 6 files reviewed, 4 unresolved discussions (waiting on @codexnull, @gitlw, and @manishrjain)
ee/backup/file_handler.go, line 76 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
We don't need any mechanism to detect changes or not in Dgraph. All that'd avoid is some CPU cycles looping over keys in the tree. If we need such a mechanism, it should go into Badger.
Please remove any such mechanism if present. We should be using the ReadTs, not SnapshotTs.
I have removed this check and the error. I have already removed the logic that used SnapshotTs instead of ReadTs.
ee/backup/file_handler.go, line 91 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
s/Version/Since? Considering this is the timestamp that we want to backup since.
Since this affects the current behavior, I'll do this after the PR to fix the tests is merged. For now I've added a todo.
protos/pb.proto, line 494 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Do we need this var? Can't since be just set to zero?
Since is being used in a weird unintuitive way (it's not being set at the time a backup request is sent but after the request is received based on the response !!??) so I don't think it's safe to use it to force a full backup.
I plan to go through the code and try to simplify it as much as possible once the tests are submitted. For now I have added a TODO to check if this field can be removed.
worker/backup_ee.go, line 121 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Instead, set Since to zero?
See the other comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @gitlw, @manishrjain, and @martinmr)
This change adds a new field to the backup requests to force a full backup. Only the code that forces the full backup is included in this PR to keep it small. Manually tested using a minio instance locally. No existing tests will be broken by this change. Automated tests will be included later since the existing backup test needs some work before it can verify this feature.
This change adds a new field to the backup requests to force a full
backup. Only the code that forces the full backup is included in this
PR to keep it small.
Manually tested using a minio instance locally. No existing tests will
be broken by this change. Automated tests will be included later since
the existing backup test needs some work before it can verify this
feature.
This change is