-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(live): added upsert in live loader (high mem usage) #6051
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @harshil-goel, @manishrjain, and @vvbalaji-dgraph)
dgraph/cmd/live/run.go, line 293 at r1 (raw file):
query := "" for i, _ := range ids { query += fmt.Sprintf(`%s as var(func: eq(%s, "%s"))`, i, opt.upsert, i)
opt.upsert will be xid from the CLI -U xid.
Should we just hard-code it? Do we allow upsert on any another predicate?
dgraph/cmd/live/run.go, line 300 at r1 (raw file):
ObjectValue: &api.Value{Val: &api.Value_StrVal{StrVal: i}}, }) }
example of the formatted query will be helpful.
dgraph/cmd/live/run.go, line 357 at r1 (raw file):
} if opt.upsert == "" {
When upsert is turned on, our UIDs will be hashes and hence they will really be random across the UID space. Would this not result in huge UID space wastage since Zero will assign UID higher than the highest UID seen? Correct me if I am wrong.
dgraph/cmd/live/run.go, line 492 at r1 (raw file):
l.conflictGenerator = parseUid } else { l.conflictGenerator = fingerPrintUid
I dont get why we need fingerprinted conflictGenerator for Upserts? Is to so that subsequent runs of the Live-loader for the same xid gives same UID?
dgraph/cmd/live/load-uids/load_test.go, line 82 at r1 (raw file):
"q": [ { "xid": "m.1234",
To confirm, without this PR, the query would result in 2 UIDs because we dont do Upserts in live, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @manishrjain, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/live/run.go, line 293 at r1 (raw file):
Previously, parasssh wrote…
opt.upsert will be xid from the CLI -U xid.
Should we just hard-code it? Do we allow upsert on any another predicate?
No, we can allow on any predicate. It doesn't really affect our performance and gives user flexibility.
dgraph/cmd/live/run.go, line 300 at r1 (raw file):
Previously, parasssh wrote…
example of the formatted query will be helpful.
Done, added comments.
dgraph/cmd/live/run.go, line 357 at r1 (raw file):
Previously, parasssh wrote…
When upsert is turned on, our UIDs will be hashes and hence they will really be random across the UID space. Would this not result in huge UID space wastage since Zero will assign UID higher than the highest UID seen? Correct me if I am wrong.
Without upserts turned on, we ask zero to assign the UIDs here itself, to save time. But now the user's requirement is that they want to reuse old UIDs. Hence we query them. If the old UID doesn't exist, then zero would lease one normally. We calculate the hash for internal live loader purposes only.
dgraph/cmd/live/run.go, line 492 at r1 (raw file):
Previously, parasssh wrote…
I dont get why we need fingerprinted conflictGenerator for Upserts? Is to so that subsequent runs of the Live-loader for the same xid gives same UID?
This is the conflict generator we have in the live loader. This helps us figure out if two mutations would abort or not so that we can plan the mutations. Earlier we had integer uids(like 0x1234), but now we have string xids (for example m.1234). That's why instead of parsing ints, I hashed the xids.
dgraph/cmd/live/load-uids/load_test.go, line 82 at r1 (raw file):
Previously, parasssh wrote…
To confirm, without this PR, the query would result in 2 UIDs because we dont do Upserts in live, right?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r1, 1 of 1 files at r3.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @harshil-goel, @manishrjain, @parasssh, and @vvbalaji-dgraph)
dgraph/cmd/live/batch.go, line 93 at r2 (raw file):
schema *schema conflictGenerator func(string) (uint64, error)
Good to have comment about this.
dgraph/cmd/live/run.go, line 71 at r2 (raw file):
bufferSize int ludicrousMode bool upsert string
we should call this upsertPredicate
may be.
dgraph/cmd/live/run.go, line 156 at r2 (raw file):
flag.Bool("ludicrous_mode", false, "Run live loader in ludicrous mode (Should "+ "only be done when alpha is under ludicrous mode)") flag.StringP("upsert", "U", "", "run in upsert mode")
Explain it in more detail.
dgraph/cmd/live/run.go, line 320 at r3 (raw file):
query := "" for i, idx := range ids {
can we call it as xid?
Fixes DGRAPH-4829
This adds an option to do upsert in the live loader. Now, by giving -U "xid" would do upserts. Blank nodes would now act as a xid in the predicate provided.
This change is