Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for xidmap in bulkloader #5090

Merged
merged 6 commits into from
Apr 8, 2020

Conversation

all-seeing-code
Copy link
Contributor

@all-seeing-code all-seeing-code commented Apr 3, 2020

Dear contributor, what does this PR do?

This PR adds --xidmap option to bulk-loader which stores explicit map between UIDs (Assigned by dgraph) and External IDs (passed by user). Documentation for --xidmap flag can be found here

Motivation

This is a community-asked feature. Related discussion can be found in Issue:4917

Components affected by this PR

Bulk loader, documentation for this is being updated by PR:5093

Does this PR break backwards compatibility?

No

Testing

Verified that xid-uid map stored using the flag is identical to the xid-uid mapping which happens in dgraph and can be checked via querying on Ratel. For this a small rdf file was bulk uploaded and queried to verify.

Fixes

Issue:4917

This change is Reviewable

@all-seeing-code all-seeing-code requested review from manishrjain and a team as code owners April 3, 2020 09:48
@all-seeing-code all-seeing-code linked an issue Apr 3, 2020 that may be closed by this pull request
Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write down the testing you did for this part in the PR. Also make sure documentation is updated after this task is done.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @anurags92 and @manishrjain)


dgraph/cmd/bulk/loader.go, line 163 at r1 (raw file):

}

func (ld *loader) mapStage(opt *options) {

This can be avoided as options can accessed from loader.opt.


dgraph/cmd/bulk/loader.go, line 174 at r1 (raw file):

		ld.xids = xidmap.New(ld.zero, db)
	} else {
		ld.xids = xidmap.New(ld.zero, nil)

else part can be avoided by declaring db at top of if block. It will have nil value by default.

@MichelDiz
Copy link
Contributor

@ashish-goswami I gonna do the documentation #5049

MichelDiz added a commit that referenced this pull request Apr 3, 2020
@all-seeing-code
Copy link
Contributor Author

all-seeing-code commented Apr 6, 2020

Please write down the testing you did for this part in the PR. Also make sure documentation is updated after this task is done.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @anurags92 and @manishrjain)

dgraph/cmd/bulk/loader.go, line 163 at r1 (raw file):

}

func (ld *loader) mapStage(opt *options) {

This can be avoided as options can accessed from loader.opt.

dgraph/cmd/bulk/loader.go, line 174 at r1 (raw file):

Addressed the above comment.

		ld.xids = xidmap.New(ld.zero, db)
	} else {
		ld.xids = xidmap.New(ld.zero, nil)

else part can be avoided by declaring db at top of if block. It will have nil value by default.

Wouldn't we be allocating the DB twice in-case the xidmap argument is used?

@all-seeing-code
Copy link
Contributor Author

Please write down the testing you did for this part in the PR. Also make sure documentation is updated after this task is done.

Testing:
Manually tested that the feature works locally and in docker containers.
@ashish-goswami, @manishrjain @MichelDiz Do you guys suggest any specific test which can be written for this feature?

Documentation is being updated by a different PR (PR:5093)

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking about writing manual testing steps. You can write systest if you want. You can insert some records and later read xidmap badger directory to check if uids exists for those xids or not. :lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @anurags92 and @manishrjain)


dgraph/cmd/bulk/loader.go, line 174 at r1 (raw file):

Quoted 9 lines of code…
	var db *badger.DB
	if len(ld.opt.ClientDir) > 0 {
		x.Check(os.MkdirAll(ld.opt.ClientDir, 0700))

		var err error
		db, err = badger.Open(badger.DefaultOptions(ld.opt.ClientDir))
		x.Checkf(err, "Error while creating badger KV for xidmap")
	} 
	ld.xids = xidmap.New(ld.zero, db)

I was suggesting as above. We are only populating db if `len(ld.opt.ClientDir)>0, otherwise db is nil.
Also modify the error message to be more specific like above.

Copy link
Contributor Author

@all-seeing-code all-seeing-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @manishrjain)


dgraph/cmd/bulk/loader.go, line 163 at r1 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

This can be avoided as options can accessed from loader.opt.

Ok, done.


dgraph/cmd/bulk/loader.go, line 174 at r1 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…
	var db *badger.DB
	if len(ld.opt.ClientDir) > 0 {
		x.Check(os.MkdirAll(ld.opt.ClientDir, 0700))

		var err error
		db, err = badger.Open(badger.DefaultOptions(ld.opt.ClientDir))
		x.Checkf(err, "Error while creating badger KV for xidmap")
	} 
	ld.xids = xidmap.New(ld.zero, db)

I was suggesting as above. We are only populating db if `len(ld.opt.ClientDir)>0, otherwise db is nil.
Also modify the error message to be more specific like above.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Do remember to close the DB.

Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @anurags92 and @ashish-goswami)


dgraph/cmd/bulk/loader.go, line 173 at r3 (raw file):

		x.Checkf(err, "Error while creating badger KV posting store")
	}
	ld.xids = xidmap.New(ld.zero, db)

You need to close the db.

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @anurags92 and @manishrjain)

Copy link
Contributor

@MichelDiz MichelDiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Tested locally.

Reviewed 1 of 2 files at r2, 1 of 1 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @anurags92)

@all-seeing-code all-seeing-code merged commit 73825c4 into master Apr 8, 2020
@all-seeing-code all-seeing-code deleted the anurags92/xidmap_Bulkloader branch April 8, 2020 05:58
@all-seeing-code all-seeing-code restored the anurags92/xidmap_Bulkloader branch April 9, 2020 05:04
MichelDiz added a commit that referenced this pull request Apr 16, 2020
* Add xidmap and store_xids to documentation

Reference: #5090

* Remove -x flag in bulk

Also have added -x to Live and make xidmap default in comments.
danielmai pushed a commit that referenced this pull request May 1, 2020
* Add xidmap and store_xids to documentation

Reference: #5090

* Remove -x flag in bulk

Also have added -x to Live and make xidmap default in comments.
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 18, 2020
Add support for xidmap in bulkloader
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 18, 2020
* Add xidmap and store_xids to documentation

Reference: dgraph-io#5090

* Remove -x flag in bulk

Also have added -x to Live and make xidmap default in comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support the --xidmap option in Bulkload*
4 participants