Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: reuse the slice to optimize #5889

Closed
wants to merge 1 commit into from

Conversation

YaoZengzeng
Copy link
Contributor

@YaoZengzeng YaoZengzeng commented Aug 14, 2019

@YaoZengzeng
Copy link
Contributor Author

YaoZengzeng commented Aug 15, 2019

@krasi-georgiev Also optimize seriesHashmap.del() and baseChunkSeries.Next() with same way.

@YaoZengzeng YaoZengzeng changed the title refactor: reuse the slice in MemPostings.Delete() refactor: reuse the slice to optimize Aug 15, 2019
@YaoZengzeng
Copy link
Contributor Author

@bwplotka @krasi-georgiev

After reading the code more comprehensively, I think the slice of MemPostings.Delete() should not be reused.

Because MemPostings.Get(name, value) will return the Postings for querying whose underlying slice share same memory with MemPostings.m[name][value].

If we reuse the slice in deleting, the Postings may return duplicate series ref.

seriesHashmap.del() and baseChunkSeries.Next() could reuse the slice safely.

break
}
}
if !found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the short circuit? The short circuit doesn't allocate anymore, but still avoids extra append calls and assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csmarchbanks Actually the short circuit still need to iterate the p.m[n][l] one loop and if found==true, it need to allocate and iterate another loop to fullfill repl. But if we reuse the p.m[n][l], only one loop is needed and no allocation at all under any conditions.

However as I commented above, reuse the slice will result a sneaky bug. So I'll rollback this code change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is inside of a Lock, so reusing the slice should not cause a problem.

I will rephrase my concern, the previous code did a loop with only gets in order to quickly see if anything changed. The new code only does one loop, but that loop is more expensive since it also does an append for every element that was not deleted. I am wondering if you have any benchmarks to show that doing the extra operations is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock only protect MemPostings.m, but not the memory underlying it.

For example, if MemPosting.m["foo"]["bar"] is {1, 2, 3, 4, 5} and through MemPosting.Get("foo", "bar") we will get a ListPostings whose list field share the same memory with MemPosting.m["foo"]["bar"].

Then MemPosting.Delete() delete some ref, 2, 4 for example. If we reuse the slice, the underlying memory will be {1, 3, 5, 4, 5}. Finally, the iterating of ListPostings will not be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That make sense. Also, that is an insidious potential bug that does not fail any tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'll delete this part of slice reuse.

And I'd like to add a test for this insidious case if necessary :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'd like to add a test for this insidious case if necessary :)

Please do, or else I will in a separate PR. :) It doesn't look like MemPostings.Delete has any package level testing right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR will be fine.

Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>
@YaoZengzeng
Copy link
Contributor Author

@krasi-georgiev @bwplotka @csmarchbanks

It's proved that reuse the slice for memPostings.Delete() is not correct and the code change has been rollbacked.

But seriesHashmap.del() and baseChunkSeries.Next() could reuse the slice safely.

PTAL.

@krasi-georgiev
Copy link
Contributor

@YaoZengzeng thanks,
I just run the BenchmarkQueries and it didn't show any difference in the allocs and total bytes.
Could you check if we have another func that shows the improvment or write one to see what are the real gains by this change.

@YaoZengzeng
Copy link
Contributor Author

@krasi-georgiev Maybe the referenced functions are not called frequently, so the optimization is not obvious. But reusing the slice that way is appropriate and more efficient in theory, right? :)

@krasi-georgiev
Copy link
Contributor

yep 100% correct, but sometime the theory is not what we see in practice.

It should be simple enough to write some benchmark that shows the difference and it should be useful for future tests.
Would you have the time to write it?
If yes please use only public API interfaces as these are less likely to change.

@YaoZengzeng
Copy link
Contributor Author

Yes, I'd like to research it in more depth :)

@krasi-georgiev
Copy link
Contributor

Thanks, Appreciated. waiting for the results.

@krasi-georgiev
Copy link
Contributor

@YaoZengzeng any progress on this?

@krasi-georgiev
Copy link
Contributor

ping @YaoZengzeng

@YaoZengzeng
Copy link
Contributor Author

@krasi-georgiev Sorry for the delay, but really busy these days. 😂

I'll move on once I have time :)

@krasi-georgiev
Copy link
Contributor

@YaoZengzeng are you still busy? any chance to look into this anytime soon?

@stale stale bot added the stale label Feb 19, 2020
@krasi-georgiev krasi-georgiev removed their assignment Mar 6, 2020
@stale stale bot removed the stale label Mar 6, 2020
@stale stale bot added the stale label May 5, 2020
@beorn7
Copy link
Member

beorn7 commented Aug 17, 2020

In the bug scrub, we decided to close this for lack of activity. If somebody wants to pick this up, we'll appreciate it. Next step, as discussed above, would be to verify the improvement with a benchmark.

@beorn7 beorn7 closed this Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants