Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental backups / partial restore #2963

Merged
merged 111 commits into from
Mar 22, 2019

Conversation

srfrog
Copy link
Contributor

@srfrog srfrog commented Feb 1, 2019

This PR adds incremental backups by reading manifest files that store the version of previous backups. Only an URI location is needed for the system to deduce what needs to happen.

  • First backup is a full backup
  • Any existing backup without a manifest.json file is ignored
  • The version of the most recent backup is used for determining the version to start the current process.
  • The process is path-driven, so it's simple to start with fresh backups.

This change is Reviewable

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @srfrog)


dgraph/cmd/alpha/admin_backup.go, line 54 at r1 (raw file):

		since, err = strconv.ParseUint(at, 10, 64)
		if err != nil {
			x.SetStatus(w,

Does this actually check that since is greater than zero? 0 is still a valid uint so it will be converted just fine


ee/backup/file_handler.go, line 98 at r1 (raw file):

		if err != nil {
			if glog.V(2) {
				fmt.Printf("--- Skip: invalid backup name format: %q\n", file)

Why the "---"? Do any existing log lines have them? Otherwise it might just be better to remove them.


ee/backup/handler.go, line 33 at r1 (raw file):

	Create(*url.URL, *Request) error
	// Load will scan location URI for backup files, then load them with loadFunc.
	Load(*url.URL, uint64, loadFn) error

is it possible to give names to the arguments in these functions? I think it would be a bit more readable.


worker/backup_ee.go, line 95 at r1 (raw file):

// BackupOverNetwork handles a request coming from an HTTP client.
func BackupOverNetwork(ctx context.Context, destination string, since uint64) error {
	ctx, cancel := context.WithCancel(ctx)

is it fine to overwrite ctx here?

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)


dgraph/cmd/alpha/admin_backup.go, line 54 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Does this actually check that since is greater than zero? 0 is still a valid uint so it will be converted just fine

If we get an error here it means the value was not a proper uint (e.g., at=horse or at=-10) . If at= is 0 then none of the logic is trigged so it wont matter. And I don't check values attached to errors, ever.


ee/backup/file_handler.go, line 98 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Why the "---"? Do any existing log lines have them? Otherwise it might just be better to remove them.

Logs can get noisy, the dashes mean "while doing what I said above just now, this happened...". Since it's a printf to stdout, there are no visual artifacts to follow the trail. Imagine 100 files, 1000, etc.


ee/backup/handler.go, line 33 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

is it possible to give names to the arguments in these functions? I think it would be a bit more readable.

This is an interface not a function definition. The Go standard/idiomatic-way is not name the abstract types. The actual function definition (concrete type) will have the arg names and how it uses them. But I will expand the comments which is really what explains how these might work.


worker/backup_ee.go, line 95 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

is it fine to overwrite ctx here?

Yes, a new context is created using the incoming as parent. And the parent is a background context which is a static reference.

danielmai
danielmai previously approved these changes Feb 1, 2019
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)


ee/backup/file_handler.go, line 98 at r1 (raw file):

Previously, srfrog (Gus) wrote…

Logs can get noisy, the dashes mean "while doing what I said above just now, this happened...". Since it's a printf to stdout, there are no visual artifacts to follow the trail. Imagine 100 files, 1000, etc.

Why not use glog here? It's only printed when glog.V(2), but it doesn't log with glog?

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)


ee/backup/file_handler.go, line 98 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…

Why not use glog here? It's only printed when glog.V(2), but it doesn't log with glog?

restore only works in CLI. glog looks ugly because it has timestamp when the rest doesn't

Copy link
Contributor

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r7, 2 of 7 files at r8, 5 of 5 files at r11.
Reviewable status: 14 of 15 files reviewed, 17 unresolved discussions (waiting on @gitlw, @manishrjain, @martinmr, and @srfrog)


ee/backup/backup_test.go, line 52 at r11 (raw file):

			conn, err := grpc.Dial("localhost:9180", grpc.WithInsecure())
			x.Check(err)
			dg := dgo.NewDgraphClient(api.NewDgraphClient(conn))

FYI, the z.DgraphClientNoDropAll() does exactly this.


ee/backup/backup_test.go, line 86 at r11 (raw file):

	}
	for i := range dirs {
		x.Check(os.RemoveAll(dirs[i]))

You can't just remove the whole data directory instead of subdirs individually? Are there additional files there?

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 14 of 15 files reviewed, 17 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)


ee/backup/backup_test.go, line 86 at r11 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

You can't just remove the whole data directory instead of subdirs individually? Are there additional files there?

I thought so too, but docker compose had issues starting without the dir being there. Maybe I'm missing something. So I left data/ with a gitkeep.

Copy link
Contributor

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r7, 10 of 10 files at r12.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @gitlw, @manishrjain, @martinmr, and @srfrog)


ee/backup/run.go, line 119 at r12 (raw file):

			grpc.WithBlock(),
			grpc.WithInsecure(),
			grpc.WithTimeout(10*time.Second))

grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a few lines below?


ee/backup/s3_handler.go, line 222 at r12 (raw file):

	}

	const N = 100

How about a better name for this variable? BATCH_SIZE, MAX_PENDING, ...


ee/backup/s3_handler.go, line 235 at r12 (raw file):

			}(i, o)
			if i%N == 0 {
				wg.Wait()

Won't this wait after the first object (i == 0, and 0 % anything == 0)? I think you want (i+1)%N instead.


ee/backup/s3_handler.go, line 237 at r12 (raw file):

				wg.Wait()
			}
		}

Does this loop have a potential deadlock or do I not understand how channels work?

Suppose there are 101 objects. The loop will start all goroutines (i = 0 through 100) and then wait for them to finish. But the rc channel has a buffer length of 1, so 100 of them will block until the rc channel is read, which won't happen because the parent routine is in wg.Wait().

Should the make(chan) have a buffer length of N?


ee/backup/s3_handler.go, line 258 at r12 (raw file):

	defer close(doneCh)

	suffix := "/" + backupManifest

Can you use filepath.Join() here as well?

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)


ee/backup/run.go, line 119 at r12 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a few lines below?

Done.


ee/backup/s3_handler.go, line 123 at r8 (raw file):

Previously, srfrog (Gus) wrote…

actually, I dont know why we are using strings.Compare() anyway. it's gone.

Done.


ee/backup/s3_handler.go, line 222 at r12 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

How about a better name for this variable? BATCH_SIZE, MAX_PENDING, ...

Done.


ee/backup/s3_handler.go, line 235 at r12 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

Won't this wait after the first object (i == 0, and 0 % anything == 0)? I think you want (i+1)%N instead.

nice catch, fixed


ee/backup/s3_handler.go, line 237 at r12 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

Does this loop have a potential deadlock or do I not understand how channels work?

Suppose there are 101 objects. The loop will start all goroutines (i = 0 through 100) and then wait for them to finish. But the rc channel has a buffer length of 1, so 100 of them will block until the rc channel is read, which won't happen because the parent routine is in wg.Wait().

Should the make(chan) have a buffer length of N?

Yes you are correct, I had moved wg.Wait() outside, that needed to be a time.Sleep() and a larger buffer like you suggested.


ee/backup/s3_handler.go, line 258 at r12 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

Can you use filepath.Join() here as well?

I could but this is an URI path which only uses /. I use filepath.Join() in the file handler in case it is an NTFS mount or Dgraph is used on Windows.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: All good for now. I'll work a bit on backups once you merge.

Reviewable status: 15 of 16 files reviewed, 15 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)


ee/backup/run.go, line 119 at r12 (raw file):

grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a f

This comment hasn't been addressed, but marked as done.

@manishrjain manishrjain merged commit f95b962 into master Mar 22, 2019
@manishrjain manishrjain deleted the srfrog/issue-2949_incremental_backups branch March 22, 2019 00:24
@srfrog srfrog mentioned this pull request Apr 1, 2019
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
This PR adds incremental backups by reading manifest files that store the version of previous backups. Only an URI location is needed for the system to deduce what needs to happen.

- First backup is a full backup
- Any existing backup without a `manifest.json` file is ignored
- The version of the most recent backup is used for determining the version to start the current process.
- The process is path-driven, so it's simple to start with fresh backups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants