Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithms to handle UidPack #4321

Merged
merged 23 commits into from
Jan 10, 2020
Merged

Algorithms to handle UidPack #4321

merged 23 commits into from
Jan 10, 2020

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Nov 26, 2019

This PR includes versions of most of the algorithms in uidlist.go that work with UidPack
objects. It also includes a few util methods to make the handling of UidPack objects easier.

There are still some algorithms that need to be done.

  • Intersection algorithm with jump.
  • Support an afterUid parameter in the intersection algorithm to provide the
    equivalent of the IntersectCompressedWith algorithm.
  • Possibly, benchmarks for the UidListIterator type.

This change is Reviewable

@martinmr martinmr requested review from manishrjain and a team as code owners November 26, 2019 00:41
@martinmr martinmr changed the title Algorithms to handle UidPac Algorithms to handle UidPack Nov 26, 2019
algo/packed.go Outdated Show resolved Hide resolved
Copy link
Contributor

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @manishrjain and @martinmr)


algo/packed.go, line 159 at r2 (raw file):

	for i < n && k < m {
		uid := uIt.Get()

Doing uid by uid iteration is going to be slow, as we had discussed. It would be better to get a list of Uids from the iterator and iterate upon them locally, without calling more functions.


codec/util.go, line 25 at r2 (raw file):

// UidPackIterator is a Wrapper around Decoder to allow simplified iteration over
// a UidPack.
type UidPackIterator struct {

This should not exist. Calling a function to access a UID would be too slow. Unless you think I'm wrong -- and in that case, you should create a benchmark to prove that.


codec/util.go, line 72 at r2 (raw file):

// CopyUidPack creates a copy of the given UidPack.
func CopyUidPack(pack *pb.UidPack) *pb.UidPack {

You don't need to copy it like this. You can just copy the structs directly -- instead of decoding, then encoding.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @golangcibot and @manishrjain)


algo/packed.go, line 295 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

unreachable: unreachable code (from govet)

Done.


algo/packed.go, line 159 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Doing uid by uid iteration is going to be slow, as we had discussed. It would be better to get a list of Uids from the iterator and iterate upon them locally, without calling more functions.

See other comment.


codec/util.go, line 25 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This should not exist. Calling a function to access a UID would be too slow. Unless you think I'm wrong -- and in that case, you should create a benchmark to prove that.

I checked the compiler optimizations and both Get and Valid are being inlined. Next is not but most of the time it will only increase a counter.

I ran a benchmark and while the iteration without this struct is indeed faster I don't think the difference is that big. I think it's justifiable given that it simplifies the code.

[Decoder]: Using assembly version of decoder
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/codec
BenchmarkIterationNormal-8      1000000000               0.117 ns/op
BenchmarkIterationWithUtil-8    1000000000               0.124 ns/op
PASS

codec/util.go, line 72 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You don't need to copy it like this. You can just copy the structs directly -- instead of decoding, then encoding.

Done.

codec/util.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


codec/util.go, line 25 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I checked the compiler optimizations and both Get and Valid are being inlined. Next is not but most of the time it will only increase a counter.

I ran a benchmark and while the iteration without this struct is indeed faster I don't think the difference is that big. I think it's justifiable given that it simplifies the code.

[Decoder]: Using assembly version of decoder
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/codec
BenchmarkIterationNormal-8      1000000000               0.117 ns/op
BenchmarkIterationWithUtil-8    1000000000               0.124 ns/op
PASS

In the latest version this has been removed.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @golangcibot and @manishrjain)


codec/util.go, line 83 at r3 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not gofmt-ed with -s (from gofmt)

		packCopy.Blocks[i].Base = block.Base

Done.

@martinmr martinmr dismissed manishrjain’s stale review January 3, 2020 21:24

Addressed comments

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 10 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


algo/packed.go, line 56 at r5 (raw file):

	vUids := vDec.Uids()
	uIdx, vIdx := 0, 0
	encoder := codec.Encoder{BlockSize: int(u.BlockSize)}

result


algo/packed.go, line 60 at r5 (raw file):

	for {
		// Load the next block of the encoded lists if necessary.
		if len(uUids) == 0 || uIdx == len(uUids) {

if len(uuids) == 0; break
if len(vuids) == 0; break

if uidx == len(uuids) { advance }
if vids == len(vuids) { advance }


codec/util_test.go, line 64 at r3 (raw file):

func BenchmarkIterationNormal(b *testing.B) {
	b.StopTimer()

remove this.


codec/util_test.go, line 73 at r3 (raw file):

	decoder := Decoder{Pack: packedUids}
	decoder.Seek(0, SeekStart)
	unpackedUids := make([]uint64, 0)

make([]uint64, 0, benchmarkSize)


codec/util_test.go, line 75 at r3 (raw file):

	unpackedUids := make([]uint64, 0)

	b.StartTimer()

b.ResetTimer

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 10 unresolved discussions (waiting on @golangcibot and @manishrjain)


algo/packed.go, line 56 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

result

Done.


algo/packed.go, line 60 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if len(uuids) == 0; break
if len(vuids) == 0; break

if uidx == len(uuids) { advance }
if vids == len(vuids) { advance }

Done.


codec/util_test.go, line 64 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

remove this.

Done.


codec/util_test.go, line 73 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

make([]uint64, 0, benchmarkSize)

Done.


codec/util_test.go, line 75 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

b.ResetTimer

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


algo/packed.go, line 106 at r6 (raw file):

}

type listInfoPacked struct {

Add a comment: What does this struct do?


algo/packed.go, line 113 at r6 (raw file):

// IntersectSortedPacked calculates the intersection of multiple lists and performs
// the intersections from the smallest to the largest list.
func IntersectSortedPacked(lists []*pb.UidPack) *pb.UidPack {

Do we ever intersect more than 2 lists? Even if we do need more than 2, we should do 2 first, then for the result, we should decide which algo to use to intersect with the 3rd list.


algo/packed.go, line 135 at r6 (raw file):

	}

	out := IntersectWithLinPacked(ls[0].l, ls[1].l)

Add a TODO here.


algo/packed.go, line 138 at r6 (raw file):

	// Intersect from smallest to largest.
	for i := 2; i < len(ls); i++ {
		out := IntersectWithLinPacked(out, ls[i].l)

I doubt this is the right algo to use by default for all intersections.


algo/packed.go, line 170 at r6 (raw file):

	for {
		// Load the next block of the encoded lists if necessary.
		if len(uUids) == 0 || uIdx == len(uUids) {

Follow the commends/code from above.


algo/packed.go, line 244 at r6 (raw file):

			continue
		}

Remove vert spaces.


algo/packed.go, line 275 at r6 (raw file):

	idx := make([]int, len(lists))
	// uidIdx is the element we are looking for within the smaller decoded array.
	uidIdx := make([]int, len(lists))

Don't understand the diff between idx and uidIdx.


algo/packed.go, line 311 at r6 (raw file):

	}
	decoder := codec.Decoder{Pack: u}
	decoder.Seek(0, codec.SeekStart)

Seek should be to the UID you want. And then, based on that, you know the block. And you know the number of uids from blocks before. In fact, decoder can just give you the index -- based on previous blocks' num_uids and the pos in current block's uids. But, in this one you need to return -1, if uid is not found.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 6 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


algo/packed.go, line 106 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment: What does this struct do?

Done.


algo/packed.go, line 113 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do we ever intersect more than 2 lists? Even if we do need more than 2, we should do 2 first, then for the result, we should decide which algo to use to intersect with the 3rd list.

Yes. It's being used in the query package to list more than two lists.


algo/packed.go, line 135 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a TODO here.

Done.


algo/packed.go, line 138 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I doubt this is the right algo to use by default for all intersections.

Yeah, I added a TODO to implement the rest of them.


algo/packed.go, line 170 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Follow the commends/code from above.

Done.


algo/packed.go, line 244 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove vert spaces.

Done.


algo/packed.go, line 311 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Seek should be to the UID you want. And then, based on that, you know the block. And you know the number of uids from blocks before. In fact, decoder can just give you the index -- based on previous blocks' num_uids and the pos in current block's uids. But, in this one you need to return -1, if uid is not found.

Done.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 15 unresolved discussions (waiting on @golangcibot and @manishrjain)


algo/packed.go, line 275 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't understand the diff between idx and uidIdx.

Done. I rewrote this algo to be easier to be understood.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Do address the comments, remove extra vars. And then merge.

Reviewed 3 of 3 files at r7.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


algo/heap.go, line 29 at r7 (raw file):

	// The following fields are only used when merging pb.UidPack objects.
	decoder  *codec.Decoder // pointer to the decoder for this pb.UidPack object.
	packLen  int            // The exact length of the pb.UidPack object.

Doubt you need it.


algo/heap.go, line 30 at r7 (raw file):

	decoder  *codec.Decoder // pointer to the decoder for this pb.UidPack object.
	packLen  int            // The exact length of the pb.UidPack object.
	packIdx  int            // The current position in the entire pb.UidPack object.

I doubt you need it.


algo/heap.go, line 32 at r7 (raw file):

	packIdx  int            // The current position in the entire pb.UidPack object.
	blockIdx int            // The current position in the current pb.UidBlock object.
	block    []uint64       // The current block.

blockUids


algo/packed.go, line 289 at r7 (raw file):

		// Reached the end of this list. Remove it from the heap.
		if me.packIdx >= me.packLen-1 {

if me.blockIdx > me.blockLen && !decoder.Valid()

Remove the packIdx and packLen.


algo/packed.go, line 327 at r7 (raw file):

	for i := 0; i < decoder.BlockIdx(); i++ {
		index += int(u.Blocks[i].GetNumUids())

Move this closer to usage. Do the uids() and search first. Then figure out the index.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot and @manishrjain)


algo/heap.go, line 29 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Doubt you need it.

Done.


algo/heap.go, line 30 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I doubt you need it.

Done.


algo/heap.go, line 32 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

blockUids

Done.


algo/packed.go, line 289 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if me.blockIdx > me.blockLen && !decoder.Valid()

Remove the packIdx and packLen.

Done.


algo/packed.go, line 327 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move this closer to usage. Do the uids() and search first. Then figure out the index.

Done.

@martinmr martinmr merged commit f98fe35 into master Jan 10, 2020
@martinmr martinmr deleted the martinmr/uid-pack-algos branch January 10, 2020 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants