-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algorithms to handle UidPack #4321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @manishrjain and @martinmr)
algo/packed.go, line 159 at r2 (raw file):
for i < n && k < m { uid := uIt.Get()
Doing uid by uid iteration is going to be slow, as we had discussed. It would be better to get a list of Uids from the iterator and iterate upon them locally, without calling more functions.
codec/util.go, line 25 at r2 (raw file):
// UidPackIterator is a Wrapper around Decoder to allow simplified iteration over // a UidPack. type UidPackIterator struct {
This should not exist. Calling a function to access a UID would be too slow. Unless you think I'm wrong -- and in that case, you should create a benchmark to prove that.
codec/util.go, line 72 at r2 (raw file):
// CopyUidPack creates a copy of the given UidPack. func CopyUidPack(pack *pb.UidPack) *pb.UidPack {
You don't need to copy it like this. You can just copy the structs directly -- instead of decoding, then encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @golangcibot and @manishrjain)
algo/packed.go, line 295 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
unreachable: unreachable code (from
govet
)
Done.
algo/packed.go, line 159 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Doing uid by uid iteration is going to be slow, as we had discussed. It would be better to get a list of Uids from the iterator and iterate upon them locally, without calling more functions.
See other comment.
codec/util.go, line 25 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This should not exist. Calling a function to access a UID would be too slow. Unless you think I'm wrong -- and in that case, you should create a benchmark to prove that.
I checked the compiler optimizations and both Get
and Valid
are being inlined. Next
is not but most of the time it will only increase a counter.
I ran a benchmark and while the iteration without this struct is indeed faster I don't think the difference is that big. I think it's justifiable given that it simplifies the code.
[Decoder]: Using assembly version of decoder
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/codec
BenchmarkIterationNormal-8 1000000000 0.117 ns/op
BenchmarkIterationWithUtil-8 1000000000 0.124 ns/op
PASS
codec/util.go, line 72 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
You don't need to copy it like this. You can just copy the structs directly -- instead of decoding, then encoding.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
codec/util.go, line 25 at r2 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I checked the compiler optimizations and both
Get
andValid
are being inlined.Next
is not but most of the time it will only increase a counter.I ran a benchmark and while the iteration without this struct is indeed faster I don't think the difference is that big. I think it's justifiable given that it simplifies the code.
[Decoder]: Using assembly version of decoder goos: linux goarch: amd64 pkg: github.com/dgraph-io/dgraph/codec BenchmarkIterationNormal-8 1000000000 0.117 ns/op BenchmarkIterationWithUtil-8 1000000000 0.124 ns/op PASS
In the latest version this has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @golangcibot and @manishrjain)
codec/util.go, line 83 at r3 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
File is not
gofmt
-ed with-s
(fromgofmt
)packCopy.Blocks[i].Base = block.Base
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 10 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
algo/packed.go, line 56 at r5 (raw file):
vUids := vDec.Uids() uIdx, vIdx := 0, 0 encoder := codec.Encoder{BlockSize: int(u.BlockSize)}
result
algo/packed.go, line 60 at r5 (raw file):
for { // Load the next block of the encoded lists if necessary. if len(uUids) == 0 || uIdx == len(uUids) {
if len(uuids) == 0; break
if len(vuids) == 0; break
if uidx == len(uuids) { advance }
if vids == len(vuids) { advance }
codec/util_test.go, line 64 at r3 (raw file):
func BenchmarkIterationNormal(b *testing.B) { b.StopTimer()
remove this.
codec/util_test.go, line 73 at r3 (raw file):
decoder := Decoder{Pack: packedUids} decoder.Seek(0, SeekStart) unpackedUids := make([]uint64, 0)
make([]uint64, 0, benchmarkSize)
codec/util_test.go, line 75 at r3 (raw file):
unpackedUids := make([]uint64, 0) b.StartTimer()
b.ResetTimer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 10 unresolved discussions (waiting on @golangcibot and @manishrjain)
algo/packed.go, line 56 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
result
Done.
algo/packed.go, line 60 at r5 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
if len(uuids) == 0; break
if len(vuids) == 0; breakif uidx == len(uuids) { advance }
if vids == len(vuids) { advance }
Done.
codec/util_test.go, line 64 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
remove this.
Done.
codec/util_test.go, line 73 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
make([]uint64, 0, benchmarkSize)
Done.
codec/util_test.go, line 75 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
b.ResetTimer
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
algo/packed.go, line 106 at r6 (raw file):
} type listInfoPacked struct {
Add a comment: What does this struct do?
algo/packed.go, line 113 at r6 (raw file):
// IntersectSortedPacked calculates the intersection of multiple lists and performs // the intersections from the smallest to the largest list. func IntersectSortedPacked(lists []*pb.UidPack) *pb.UidPack {
Do we ever intersect more than 2 lists? Even if we do need more than 2, we should do 2 first, then for the result, we should decide which algo to use to intersect with the 3rd list.
algo/packed.go, line 135 at r6 (raw file):
} out := IntersectWithLinPacked(ls[0].l, ls[1].l)
Add a TODO here.
algo/packed.go, line 138 at r6 (raw file):
// Intersect from smallest to largest. for i := 2; i < len(ls); i++ { out := IntersectWithLinPacked(out, ls[i].l)
I doubt this is the right algo to use by default for all intersections.
algo/packed.go, line 170 at r6 (raw file):
for { // Load the next block of the encoded lists if necessary. if len(uUids) == 0 || uIdx == len(uUids) {
Follow the commends/code from above.
algo/packed.go, line 244 at r6 (raw file):
continue }
Remove vert spaces.
algo/packed.go, line 275 at r6 (raw file):
idx := make([]int, len(lists)) // uidIdx is the element we are looking for within the smaller decoded array. uidIdx := make([]int, len(lists))
Don't understand the diff between idx and uidIdx.
algo/packed.go, line 311 at r6 (raw file):
} decoder := codec.Decoder{Pack: u} decoder.Seek(0, codec.SeekStart)
Seek should be to the UID you want. And then, based on that, you know the block. And you know the number of uids from blocks before. In fact, decoder can just give you the index -- based on previous blocks' num_uids and the pos in current block's uids. But, in this one you need to return -1, if uid is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
algo/packed.go, line 106 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment: What does this struct do?
Done.
algo/packed.go, line 113 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Do we ever intersect more than 2 lists? Even if we do need more than 2, we should do 2 first, then for the result, we should decide which algo to use to intersect with the 3rd list.
Yes. It's being used in the query package to list more than two lists.
algo/packed.go, line 135 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a TODO here.
Done.
algo/packed.go, line 138 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I doubt this is the right algo to use by default for all intersections.
Yeah, I added a TODO to implement the rest of them.
algo/packed.go, line 170 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Follow the commends/code from above.
Done.
algo/packed.go, line 244 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Remove vert spaces.
Done.
algo/packed.go, line 311 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Seek should be to the UID you want. And then, based on that, you know the block. And you know the number of uids from blocks before. In fact, decoder can just give you the index -- based on previous blocks' num_uids and the pos in current block's uids. But, in this one you need to return -1, if uid is not found.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 15 unresolved discussions (waiting on @golangcibot and @manishrjain)
algo/packed.go, line 275 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Don't understand the diff between idx and uidIdx.
Done. I rewrote this algo to be easier to be understood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do address the comments, remove extra vars. And then merge.
Reviewed 3 of 3 files at r7.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
algo/heap.go, line 29 at r7 (raw file):
// The following fields are only used when merging pb.UidPack objects. decoder *codec.Decoder // pointer to the decoder for this pb.UidPack object. packLen int // The exact length of the pb.UidPack object.
Doubt you need it.
algo/heap.go, line 30 at r7 (raw file):
decoder *codec.Decoder // pointer to the decoder for this pb.UidPack object. packLen int // The exact length of the pb.UidPack object. packIdx int // The current position in the entire pb.UidPack object.
I doubt you need it.
algo/heap.go, line 32 at r7 (raw file):
packIdx int // The current position in the entire pb.UidPack object. blockIdx int // The current position in the current pb.UidBlock object. block []uint64 // The current block.
blockUids
algo/packed.go, line 289 at r7 (raw file):
// Reached the end of this list. Remove it from the heap. if me.packIdx >= me.packLen-1 {
if me.blockIdx > me.blockLen && !decoder.Valid()
Remove the packIdx and packLen.
algo/packed.go, line 327 at r7 (raw file):
for i := 0; i < decoder.BlockIdx(); i++ { index += int(u.Blocks[i].GetNumUids())
Move this closer to usage. Do the uids() and search first. Then figure out the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @golangcibot and @manishrjain)
algo/heap.go, line 29 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Doubt you need it.
Done.
algo/heap.go, line 30 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
I doubt you need it.
Done.
algo/heap.go, line 32 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
blockUids
Done.
algo/packed.go, line 289 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
if me.blockIdx > me.blockLen && !decoder.Valid()
Remove the packIdx and packLen.
Done.
algo/packed.go, line 327 at r7 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Move this closer to usage. Do the uids() and search first. Then figure out the index.
Done.
This PR includes versions of most of the algorithms in uidlist.go that work with UidPack
objects. It also includes a few util methods to make the handling of UidPack objects easier.
There are still some algorithms that need to be done.
afterUid
parameter in the intersection algorithm to provide theequivalent of the IntersectCompressedWith algorithm.
This change is