-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sort): Make sort consistent for indexed and without indexed predicates #7241
Conversation
@danielmai : this change is expected to produce consistent results while retaining performance of using sorted index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, @vmrajas, and @vvbalaji-dgraph)
worker/sort.go, line 185 at r5 (raw file):
span.Annotate(nil, "sortWithIndex") maxCount := 0
Can you add a comment on what maxCount is going to store.
worker/sort.go, line 287 at r5 (raw file):
token := k.Term if !order.Desc { maxCount = int(ts.Count)
Can you add a comment on why this is done. I know that this would become a rather large comment. But, it would help in understanding the code later.
worker/sort.go, line 606 at r5 (raw file):
// intersectBucket intersects every UID list in the UID matrix with the // indexed bucket.
Can you also add a comment on the significance of count parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, and @vvbalaji-dgraph)
query/query1_test.go, line 1963 at r6 (raw file):
} func TestSortNull2(t *testing.T) {
All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset
query/query1_test.go, line 2095 at r6 (raw file):
query := `{ me(func: uid(61, 62, 63, 64, 65, 66, 67, 68, 69, 70), orderdesc: pred, first: 2) {
How about a test with offset:5 and first:5
and another one with offset:9 and first:5
query/query2_test.go, line 977 at r6 (raw file):
"data": { "q": [{ "name_lang_index@de": "öffnen",
Are these the null value ones?
worker/sort.go, line 195 at r6 (raw file):
var emptySkippedList pb.List out[i].ulist = &emptyList out[i].skippedUids = &emptySkippedList
Just define it inline like
out[i].skippedUids = &pb.List{}
worker/sort.go, line 325 at r6 (raw file):
for _, uid := range ul.Uids { if _, ok := present[uid]; !ok { nullPreds = append(nullPreds, uid)
So this nullPreds
is common across all the lists? Shouldn't it be reset for each uid list?
worker/sort.go, line 329 at r6 (raw file):
} requiredCount := int(ts.Count) - len(r.UidMatrix[i].Uids)
remainingCount
would be a better name.
worker/sort.go, line 769 at r6 (raw file):
val.Value = nil nullsList = append(nullsList, uid) nullVals = append(nullVals, []types.Val{val})
What does val
hold? Is it just a null? Would be the null
be returned? I don't see null
being returned in the query tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 1963 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset
Done.
query/query1_test.go, line 2095 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
How about a test with
offset:5 and first:5
and another one withoffset:9 and first:5
Added these cases.
query/query2_test.go, line 977 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Are these the null value ones?
Yes.
worker/sort.go, line 195 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
Just define it inline like
out[i].skippedUids = &pb.List{}
Done.
worker/sort.go, line 325 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
So this
nullPreds
is common across all the lists? Shouldn't it be reset for each uid list?
Thanks. Fixed.
worker/sort.go, line 329 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
remainingCount
would be a better name.
Done.
worker/sort.go, line 769 at r6 (raw file):
Previously, pawanrawal (Pawan Rawal) wrote…
What does
val
hold? Is it just a null? Would be thenull
be returned? I don't seenull
being returned in the query tests.
Yes. It is just the null values. It is being used for multisort, I wanted to keep the behavior of multisort unchanged, so appended the null values corresponding to the uid list, for which the predicates contains null or unsortable values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do a bit more testing around different offsets and first combinations for different kind of datasets as discussed.
Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 2008 at r7 (raw file):
{"predname":"nameJ"}]}}`, }, {11, 9, 5, false, `{"data": {"me":[
How about 12 and 5, does that return an empty result?
worker/sort.go, line 307 at r7 (raw file):
for i, ul := range ts.UidMatrix { var nullPreds []uint64
Add a comment
These are uids for which there is value for the predicate.
worker/sort.go, line 328 at r7 (raw file):
// We didn't get anything in the intersected result, it is possible that complete offset // is yet not applied. We need to apply the remainng offset on the null values.
remaining
worker/sort.go, line 331 at r7 (raw file):
if len(r.UidMatrix[i].Uids) == 0 { start := int(ts.Offset) - len(out[i].skippedUids.Uids) x.AssertTrue(start >= 0)
return an error here instead of crashing the server and think of a case where this doesn't hold
worker/sort.go, line 335 at r7 (raw file):
nullPreds = nullPreds[start:] } else { nullPreds = []uint64{}
nullsPreds = nullPreds[:0]
worker/sort.go, line 338 at r7 (raw file):
} } remainingCount := int(ts.Count) - len(r.UidMatrix[i].Uids)
Try removing the index on pred
and your tests should still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r3, 1 of 3 files at r6, 3 of 3 files at r8.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)
query/query1_test.go, line 2020 at r8 (raw file):
makeQuery := func(offset, first int32, desc, index bool) string { pred := "pred"
You can have two queries with the same alias
…cates (#7241) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) (#7301) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
…cates (#7241) (#7323) Make the result of sort consistent for predicate with and without index. After this change, the predicates with null values will appear at the last of the sort result for both ascending and descending sort, for both index/no-index case. Before this change the result for predicate with index didn't contain the null predicates and the one without index treated nulls as infinite value - they appeared at the beginning for descending while at the end for the ascending sort case. NOTE: This is a breaking change for the response of sort. Co-authored-by: Rajas Vanjape <rajas@dgraph.io> (cherry picked from commit 85278f8)
This PR makes the result of sort consistent for predicate with and without index. There was an issue that for predicate with index, we used to drop the nodes which doesn't contain the predicate used for sorting, while we retained them for that of without index.
The nulls were dropped in thee case of
sortWithIndex
because in this case to calculate the result, we used to take intersection of the given list for sorting with the index of the predicate used for sorting, hence other predicates were dropped. In this change, we keep track of nodes which have null sort predicates and append them at the end of the result as required by pagination.Some benchmarks:
The dataset contain 1million UIDs, with half of them containing the predicate used for sorting. The time taken are in nano seconds.
The results of the same queries on Master:
This change is