Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Bulk Loader + Live Loader): Supporting Loading files via s3/minio #7359

Merged
merged 11 commits into from
Jan 25, 2021

Conversation

gja
Copy link
Contributor

@gja gja commented Jan 23, 2021


This change is Reviewable

@github-actions github-actions bot added area/bulk-loader Issues related to bulk loading. area/graphql Issues related to GraphQL support on Dgraph. labels Jan 23, 2021
@gja gja changed the title feat(Bulk Loader): Supporting Bulk Loading files via s3/minio feat(Bulk Loader + Live Loader): Supporting Loading files via s3/minio Jan 23, 2021
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but otherwise :lgtm:

Reviewed 24 of 24 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @gja, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)


chunker/chunk.go, line 370 at r1 (raw file):

file string

Maybe call this filename so it matches the name mentioned in the comment.

This signature is also getting kinda long (past 100 characters).


filestore/filestore.go, line 23 at r1 (raw file):

	url, err := url.Parse(path)
	x.Check(err)

Can you test if this doesn't return an err (and then panic from x.Check) with local filesystem paths? I'm wondering if there's an edge case here if, say, the local path has spaces or something that's different from URLs.

Copy link
Contributor Author

@gja gja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)


chunker/chunk.go, line 370 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…
file string

Maybe call this filename so it matches the name mentioned in the comment.

This signature is also getting kinda long (past 100 characters).

DONE


filestore/filestore.go, line 23 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…
	url, err := url.Parse(path)
	x.Check(err)

Can you test if this doesn't return an err (and then panic from x.Check) with local filesystem paths? I'm wondering if there's an edge case here if, say, the local path has spaces or something that's different from URLs.

Tested. Even url.Parse(C:\foobar\file with spaces and a % works.rdf.gz)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Some comments.

Reviewed 23 of 24 files at r1, 2 of 2 files at r2.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @danielmai, @gja, @pawanrawal, and @vvbalaji-dgraph)


chunker/chunk.go, line 370 at r2 (raw file):

// StreamReader returns a bufio given a ReadCloser. The file is passed just to check for .gz files
func StreamReader(file string, key x.SensitiveByteSlice, f io.ReadCloser) (rd *bufio.Reader, cleanup func()) {

100 chars.


dgraph/cmd/live/run.go, line 407 at r2 (raw file):

// processFile forwards a file to the RDF or JSON processor as appropriate
func (l *loader) processFile(ctx context.Context, fs filestore.FileStore, filename string, key x.SensitiveByteSlice) error {

100 chars.


filestore/filestore.go, line 1 at r2 (raw file):

package filestore

license.


filestore/local_files.go, line 1 at r2 (raw file):

package filestore

license.


filestore/remote_files.go, line 1 at r2 (raw file):

package filestore

license


filestore/remote_files.go, line 21 at r2 (raw file):

func (rf *remoteFiles) Open(path string) (io.ReadCloser, error) {
	url, err := url.Parse(path)
	x.Check(err)

return error, don't check fail.


filestore/remote_files.go, line 51 at r2 (raw file):

		bucket, prefix := rf.mc.ParseBucketAndPrefix(url.Path)
		for obj := range rf.mc.ListObjectsV2(bucket, prefix, true, context.TODO().Done()) {

What happens if you pass a nil channel?


systest/bulk_live/common/bulk_live_fixture.go, line 69 at r2 (raw file):

	if opts.remote {
		schemaPath = "minio://" + testutil.ContainerAddr("minio1", 9001) + "/data/" + schemaPath + "?secure=false"

100 chars.


testutil/exec.go, line 130 at r2 (raw file):

func DgraphBinaryPath() string {
	// Useful for OSX, as GOPATH/bin/dgraph is likely set to the linux compiled version of dgraph for docker

100 chars.

@gja gja merged commit 764803c into master Jan 25, 2021
@gja gja deleted the bulk-load-s3 branch January 25, 2021 19:26
gja added a commit that referenced this pull request Jan 27, 2021
#7359)

It is now possible to pass an s3 / minio path to bulk and live loader. -f s3:///<bucket>/path-to-rdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading. area/graphql Issues related to GraphQL support on Dgraph.
Development

Successfully merging this pull request may close these issues.

3 participants