Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sort): Make sort consistent for indexed and without indexed predicates #7241

Merged
merged 20 commits into from
Jan 13, 2021

Conversation

ahsanbarkati
Copy link
Contributor

@ahsanbarkati ahsanbarkati commented Jan 4, 2021

This PR makes the result of sort consistent for predicate with and without index. There was an issue that for predicate with index, we used to drop the nodes which doesn't contain the predicate used for sorting, while we retained them for that of without index.

The nulls were dropped in thee case of sortWithIndex because in this case to calculate the result, we used to take intersection of the given list for sorting with the index of the predicate used for sorting, hence other predicates were dropped. In this change, we keep track of nodes which have null sort predicates and append them at the end of the result as required by pagination.

Some benchmarks:
The dataset contain 1million UIDs, with half of them containing the predicate used for sorting. The time taken are in nano seconds.

No Index:
Query                                   Time
orderasc, first:100 ->                  7092526995
orderasc, offset:100000, first:100 ->   6809554104
orderdesc, first:100 ->                 6981851622
orderdesc, offset:100000, first:100 ->  6862242806

Index:
orderasc, first:100 ->                  1421648793
orderasc, offset:100000, first:100 ->   2044206032
orderdesc, first:100 ->                 1577750157
orderdesc, offset:100000, first:100 ->  1956177772

The results of the same queries on Master:

No Index:
orderasc, first:100 ->                  6881212989
orderasc, offset:100000, first:100 ->   7543954929
orderdesc, first:100 ->                 7327315514
orderdesc, offset:100000, first:100 ->  7262242806

Index:
orderasc, first:100 ->                  1410905825
orderasc, offset:100000, first:100 ->   1890659562
orderdesc, first:100 ->                 1404256406
orderdesc, offset:100000, first:100 ->  2106243131

This change is Reviewable

@ahsanbarkati ahsanbarkati changed the title fix(sort): Make sort consistent for indexed and without indexed fix(sort): Make sort consistent for indexed and without indexed predicates Jan 4, 2021
@vvbalaji-dgraph
Copy link

@danielmai : this change is expected to produce consistent results while retaining performance of using sorted index.

@ahsanbarkati ahsanbarkati marked this pull request as ready for review January 7, 2021 13:03
Copy link
Contributor

@vmrajas vmrajas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, @vmrajas, and @vvbalaji-dgraph)


worker/sort.go, line 185 at r5 (raw file):

	span.Annotate(nil, "sortWithIndex")

	maxCount := 0

Can you add a comment on what maxCount is going to store.


worker/sort.go, line 287 at r5 (raw file):

			token := k.Term
			if !order.Desc {
				maxCount = int(ts.Count)

Can you add a comment on why this is done. I know that this would become a rather large comment. But, it would help in understanding the code later.


worker/sort.go, line 606 at r5 (raw file):

// intersectBucket intersects every UID list in the UID matrix with the
// indexed bucket.

Can you also add a comment on the significance of count parameter.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, and @vvbalaji-dgraph)


query/query1_test.go, line 1963 at r6 (raw file):

}

func TestSortNull2(t *testing.T) {

All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset


query/query1_test.go, line 2095 at r6 (raw file):

	query := `{
me(func: uid(61, 62, 63, 64, 65, 66, 67, 68, 69, 70), orderdesc: pred, first: 2) {

How about a test with offset:5 and first:5 and another one with offset:9 and first:5


query/query2_test.go, line 977 at r6 (raw file):

			"data": {
				"q": [{
					"name_lang_index@de": "öffnen",

Are these the null value ones?


worker/sort.go, line 195 at r6 (raw file):

		var emptySkippedList pb.List
		out[i].ulist = &emptyList
		out[i].skippedUids = &emptySkippedList

Just define it inline like

out[i].skippedUids = &pb.List{}

worker/sort.go, line 325 at r6 (raw file):

		for _, uid := range ul.Uids {
			if _, ok := present[uid]; !ok {
				nullPreds = append(nullPreds, uid)

So this nullPreds is common across all the lists? Shouldn't it be reset for each uid list?


worker/sort.go, line 329 at r6 (raw file):

		}

		requiredCount := int(ts.Count) - len(r.UidMatrix[i].Uids)

remainingCount would be a better name.


worker/sort.go, line 769 at r6 (raw file):

				val.Value = nil
				nullsList = append(nullsList, uid)
				nullVals = append(nullVals, []types.Val{val})

What does val hold? Is it just a null? Would be the null be returned? I don't see null being returned in the query tests.

Copy link
Contributor Author

@ahsanbarkati ahsanbarkati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)


query/query1_test.go, line 1963 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

All these could just have been one test with Table driven tests because you are testing the same thing that is null behavior with various values of first and offset

Done.


query/query1_test.go, line 2095 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

How about a test with offset:5 and first:5 and another one with offset:9 and first:5

Added these cases.


query/query2_test.go, line 977 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Are these the null value ones?

Yes.


worker/sort.go, line 195 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Just define it inline like

out[i].skippedUids = &pb.List{}

Done.


worker/sort.go, line 325 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

So this nullPreds is common across all the lists? Shouldn't it be reset for each uid list?

Thanks. Fixed.


worker/sort.go, line 329 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

remainingCount would be a better name.

Done.


worker/sort.go, line 769 at r6 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

What does val hold? Is it just a null? Would be the null be returned? I don't see null being returned in the query tests.

Yes. It is just the null values. It is being used for multisort, I wanted to keep the behavior of multisort unchanged, so appended the null values corresponding to the uid list, for which the predicates contains null or unsortable values.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do a bit more testing around different offsets and first combinations for different kind of datasets as discussed.

Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)


query/query1_test.go, line 2008 at r7 (raw file):

			{"predname":"nameJ"}]}}`,
		},
		{11, 9, 5, false, `{"data": {"me":[

How about 12 and 5, does that return an empty result?


worker/sort.go, line 307 at r7 (raw file):

	for i, ul := range ts.UidMatrix {
		var nullPreds []uint64

Add a comment
These are uids for which there is value for the predicate.


worker/sort.go, line 328 at r7 (raw file):

		// We didn't get anything in the intersected result, it is possible that complete offset
		// is yet not applied. We need to apply the remainng offset on the null values.

remaining


worker/sort.go, line 331 at r7 (raw file):

		if len(r.UidMatrix[i].Uids) == 0 {
			start := int(ts.Offset) - len(out[i].skippedUids.Uids)
			x.AssertTrue(start >= 0)

return an error here instead of crashing the server and think of a case where this doesn't hold


worker/sort.go, line 335 at r7 (raw file):

				nullPreds = nullPreds[start:]
			} else {
				nullPreds = []uint64{}

nullsPreds = nullPreds[:0]


worker/sort.go, line 338 at r7 (raw file):

			}
		}
		remainingCount := int(ts.Count) - len(r.UidMatrix[i].Uids)

Try removing the index on pred and your tests should still pass.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r3, 1 of 3 files at r6, 3 of 3 files at r8.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @pawanrawal, and @vvbalaji-dgraph)


query/query1_test.go, line 2020 at r8 (raw file):

	makeQuery := func(offset, first int32, desc, index bool) string {
		pred := "pred"

You can have two queries with the same alias

@ahsanbarkati ahsanbarkati merged commit 85278f8 into master Jan 13, 2021
@ahsanbarkati ahsanbarkati deleted the ahsan/sort-bug branch January 13, 2021 19:33
ahsanbarkati added a commit that referenced this pull request Jan 15, 2021
…cates (#7241)

Make the result of sort consistent for predicate with and without index.
After this change, the predicates with null values will appear at the
last of the sort result for both ascending and descending sort, for both
index/no-index case. Before this change the result for predicate with index
didn't contain the null predicates and the one without index treated nulls as
infinite value - they appeared at the beginning for descending while at the
end for the ascending sort case.

NOTE: This is a breaking change for the response of sort.

Co-authored-by: Rajas Vanjape <rajas@dgraph.io>
(cherry picked from commit 85278f8)
ahsanbarkati added a commit that referenced this pull request Jan 15, 2021
…cates (#7241) (#7301)

Make the result of sort consistent for predicate with and without index.
After this change, the predicates with null values will appear at the
last of the sort result for both ascending and descending sort, for both
index/no-index case. Before this change the result for predicate with index
didn't contain the null predicates and the one without index treated nulls as
infinite value - they appeared at the beginning for descending while at the
end for the ascending sort case.

NOTE: This is a breaking change for the response of sort.

Co-authored-by: Rajas Vanjape <rajas@dgraph.io>
(cherry picked from commit 85278f8)
ahsanbarkati added a commit that referenced this pull request Jan 15, 2021
ahsanbarkati added a commit that referenced this pull request Jan 15, 2021
…ed predicates (#7241) (#7301)" (#7304)

This reverts commit ea435fc.

We are reverting this change because it is a breaking change.
ahsanbarkati added a commit that referenced this pull request Jan 16, 2021
…cates (#7241)

Make the result of sort consistent for predicate with and without index.
After this change, the predicates with null values will appear at the
last of the sort result for both ascending and descending sort, for both
index/no-index case. Before this change the result for predicate with index
didn't contain the null predicates and the one without index treated nulls as
infinite value - they appeared at the beginning for descending while at the
end for the ascending sort case.

NOTE: This is a breaking change for the response of sort.

Co-authored-by: Rajas Vanjape <rajas@dgraph.io>
(cherry picked from commit 85278f8)
ahsanbarkati added a commit that referenced this pull request Jan 16, 2021
…cates (#7241) (#7323)

Make the result of sort consistent for predicate with and without index.
After this change, the predicates with null values will appear at the
last of the sort result for both ascending and descending sort, for both
index/no-index case. Before this change the result for predicate with index
didn't contain the null predicates and the one without index treated nulls as
infinite value - they appeared at the beginning for descending while at the
end for the ascending sort case.

NOTE: This is a breaking change for the response of sort.

Co-authored-by: Rajas Vanjape <rajas@dgraph.io>
(cherry picked from commit 85278f8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants