Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(multi-tenancy): add support for multi-tenancy in bulk loader #7399

Merged
merged 7 commits into from
Feb 8, 2021

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented Feb 4, 2021

This PR adds bulk loader support in multi-tenancy.

Adds a flag --force-namespace in bulk loader which makes it load the data into a specified namespace. If this flag is not set, then it preserves the namespace from the schema file.


This change is Reviewable

if err != nil {
return rnq, err
}
rnq.Namespace = ns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate field in NQuad just for the namespace? We can use the label field?
The label is string and Namespace is uint64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, the user cannot pass non-namespace strings in the label.
We can allow it in the following way:
The user tells that his RDFs contain a label and also tell us to load the data into a specific namespace. Then while parsing the RDFs we will not treat the label as namespace.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Whole bunch of TODOs, so let's resolve them before merging. Leave it to @jarifibrahim for final approval.

Reviewed 16 of 17 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


chunker/json_parser.go, line 464 at r2 (raw file):

		// this int64 case is needed for FastParseJSON, which doesn't use json.Number
		case int64:
			namespace = uint64(ns)

Don't make it a number in JSON. Use hex strings. Add a string part. Do strip spaces and use strconv.ParseUint.


chunker/rdf_parser.go, line 231 at r2 (raw file):

	// TODO(Naman): Ensure the label contains valid namespace.
	// Append the namespace to the predicate before returning NQuad.
	if rnq.Label != "" {

why not rename label to namespace?


chunker/rdf_parser.go, line 232 at r2 (raw file):

	// Append the namespace to the predicate before returning NQuad.
	if rnq.Label != "" {
		ns, err := strconv.ParseUint(rnq.Label, 0, 64)

do strip spaces.


dgraph/cmd/bulk/merge_shards.go, line 63 at r2 (raw file):

	// Put the first map shard in the first reduce shard since it contains all the reserved
	// predicates.
	// TODO(Naman): Why do we handle first shard differently?

This is because we want all reserved stuff to be in group 1. You can add this as a comment.


dgraph/cmd/bulk/schema.go, line 92 at r2 (raw file):

// checkAndSetInitialSchema initializes the schema for namespace if it does not already exist.
func (s *schemaStore) checkAndSetInitialSchema(namespace uint64) {
	if _, ok := s.namespaces.Load(namespace); !ok {

if ok; return

lock

if ok return

do stuff.


dgraph/cmd/debug/run.go, line 574 at r2 (raw file):

		}
		ns, attr := x.ParseNamespaceAttr(pk.Attr)
		x.Check2(buf.WriteString(fmt.Sprintf(" ns: 0x%x ", ns)))

%#x would give you the 0x.

@jarifibrahim jarifibrahim self-requested a review February 8, 2021 06:48
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 7 of 15 files reviewed, 11 unresolved discussions (waiting on @manishrjain, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


go.mod, line 5 at r3 (raw file):

go 1.12

replace github.com/dgraph-io/dgo/v200 => /home/algod/go/src/github.com/dgraph-io/dgo

Remove this, please.


dgraph/cmd/bulk/loader.go, line 173 at r3 (raw file):

func (ld *loader) leaseNamespaces() {
	var maxNs uint64
	ld.namespaces.Range(func(key, value interface{}) bool {

Add a comment.


dgraph/cmd/bulk/loader.go, line 290 at r3 (raw file):

	// Send the graphql triples
	// TODO(Naman): Handle this.

Put more details here.


worker/export_test.go, line 76 at r3 (raw file):

func populateGraphExport(t *testing.T) {
	rdfEdges := []string{
		`<1> <friend> <5> <author0> .`,

Add namespace over here and then validate it in a test.

Copy link
Contributor Author

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 26 files reviewed, 11 unresolved discussions (waiting on @jarifibrahim, @manishrjain, @martinmr, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


go.mod, line 5 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Remove this, please.

Done.


chunker/json_parser.go, line 464 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't make it a number in JSON. Use hex strings. Add a string part. Do strip spaces and use strconv.ParseUint.

Done.


chunker/rdf_parser.go, line 231 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

why not rename label to namespace?

We have added a new field Namespace of type uint64 in the Nquad. If we had used string type, we will have to do strconv.ParseUint(ns...) at a bunch of places where we need namespace for the same Nquad.
Moreover, the Label is not used by dgraph and just adds up an extra field in pb.Posting.


chunker/rdf_parser.go, line 232 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

do strip spaces.

Done.


dgraph/cmd/bulk/loader.go, line 173 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Add a comment.

Done.


dgraph/cmd/bulk/loader.go, line 290 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Put more details here.

Done.


dgraph/cmd/bulk/merge_shards.go, line 63 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This is because we want all reserved stuff to be in group 1. You can add this as a comment.

Done.


dgraph/cmd/bulk/schema.go, line 92 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if ok; return

lock

if ok return

do stuff.

Done.


dgraph/cmd/debug/run.go, line 574 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

%#x would give you the 0x.

Done.

@NamanJain8 NamanJain8 merged commit b9a00c4 into ibrahim/multitenancy Feb 8, 2021
@NamanJain8 NamanJain8 deleted the naman/multitenancy/bulk-loader branch February 8, 2021 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants