-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Make queries utilise secondary indexes #1925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: The purpose of indexes is performance gains on read, at the cost of performance on write (and storage space). I really think performance tests should be included in this PR.
I would like to see performance tests for non-indexed vs indexed queries, on both read and write. Without these we are not really testing that this code change is doing anything useful, and could be seen as an explain-feature or a simple refactor.
praise: What I have seen so far of the production code looks good, but I have left a few comments dotted about. Will finish my review a bit later once some of them have been addressed.
Do you mean adding tests to |
Adding to At the moment there is nothing that really tests the behaviour change that the users care about (query speed). |
8b682c0
to
189f586
Compare
Codecov ReportAttention:
@@ Coverage Diff @@
## develop #1925 +/- ##
===========================================
+ Coverage 74.67% 74.98% +0.30%
===========================================
Files 234 241 +7
Lines 23044 23616 +572
===========================================
+ Hits 17208 17707 +499
- Misses 4661 4709 +48
- Partials 1175 1200 +25
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
4ad4751
to
4323b69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting a few comments now, as they are pilling up a bit and I need a breather.
Overall it looks good, but the tests in particular need so work I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work Islam. Just one minor thought on the extra param ion the bench tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work Islam. Just one minor thought on the extra param ion the bench tests.
Note: Make sure all conversations with Andy are resolved before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please resolve the remaining conversations before merge though.
I have a request for future PRs too - when resolving a conversation, can you please comment in the conversation/thread what you have done to resolve it. It makes it easier to review, as I don't have to then go and re-find the location (which can sometimes be tricky due to github/other changes), and then figure out what has been done to resolve it (also sometimes tricky and occasionally requiring guesswork).
commit c8bde64 Author: Islam Aliev <aliev.islam@gmail.com> Date: Sat Oct 14 00:26:22 2023 +0200 feat: Make queries utilise secondary indexes (sourcenetwork#1925) ## Relevant issue(s) Resolves sourcenetwork#1555 ## Description With this change the the secondary indexes are utilised during querying data. A dedicated `Indexer` fetcher is implemented to perform fetching of values of indexed fields. Now there is a separate `filter` package that houses a lot of methods for working with filters. A new metric `indexesFetched` is introduced into `@explain` to provide information about how many indexes has been fetched. It also includes an update to the testing framework to allow adding custom asserters. The new ExplainResultsAsserter is used with this new feature.
## Relevant issue(s) Resolves sourcenetwork#1555 ## Description With this change the the secondary indexes are utilised during querying data. A dedicated `Indexer` fetcher is implemented to perform fetching of values of indexed fields. Now there is a separate `filter` package that houses a lot of methods for working with filters. A new metric `indexesFetched` is introduced into `@explain` to provide information about how many indexes has been fetched. It also includes an update to the testing framework to allow adding custom asserters. The new ExplainResultsAsserter is used with this new feature.
Relevant issue(s)
Resolves #1555
Description
With this change the the secondary indexes are utilised during querying data.
A dedicated
Indexer
fetcher is implemented to perform fetching of values of indexed fields.Now there is a separate
filter
package that houses a lot of methods for working with filters.A new metric
indexesFetched
is introduced into@explain
to provide information about how many indexes has been fetched.It also includes an update to the testing framework to allow adding custom asserters.
The new ExplainResultsAsserter is used with this new feature.
Benchmark
I ran some query benchmarks with regular fetching and indexed fetching on 10k docs and here is the summary:
Time per Operation (ns/op)
Indexed: Took approximately 2,196,481 nanoseconds (~2.2 ms) per operation
Original: Took approximately 74,037,628 nanoseconds (~74 ms) per operation
Boost Factor: Indexed is approximately 33.7 times faster than the original.
Memory Allocation (B/op)
Indexed: Allocated approximately 1,053,403 bytes (~1 MB) per operation
Original: Allocated approximately 45,218,327 bytes (~45 MB) per operation
Boost Factor: Indexed uses approximately 42.9 times less memory than the original.
Number of Allocations (allocs/op)
Indexed: Made 13,331 memory allocations per operation
Original: Made 494,494 memory allocations per operation
Boost Factor: Indexed has approximately 37.1 times fewer memory allocations than the original.
Summary
The indexed approach is 33.7 times faster, uses 42.9 times less memory, and has 37.1 times fewer memory allocations compared to the original approach.
Tasks
How has this been tested?
Integration tests
Specify the platform(s) on which this was tested: