-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Filtering on Large List encoded by Bitmap #14774
Support Filtering on Large List encoded by Bitmap #14774
Conversation
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for 63f1cd4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
LGTM! |
terms query delegate to bitmap query Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for deeb3ee: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
7/19: I have a working draft version of this bitmap filtering feature using terms lookup. Will continue after 7/28 |
❌ Gradle check result for 7b2ddb8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
7b2ddb8
to
577e6d0
Compare
❌ Gradle check result for 577e6d0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for 2e67647: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for 3cd735e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for b9bf2d4: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14774 +/- ##
============================================
+ Coverage 71.82% 71.86% +0.04%
- Complexity 63046 63104 +58
============================================
Files 5207 5208 +1
Lines 295581 295712 +131
Branches 42690 42723 +33
============================================
+ Hits 212295 212525 +230
+ Misses 65875 65682 -193
- Partials 17411 17505 +94 ☔ View full report in Codecov by Sentry. |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❕ Gradle check result for 389f469: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for c51dfc8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the code pretty carefully and left three comments.
In my opinion, none of them are blockers for this PR.
server/src/main/java/org/opensearch/search/query/BitmapDocValuesQuery.java
Show resolved
Hide resolved
Signed-off-by: Michael Froh <froh@amazon.com>
❌ Gradle check result for 9c1c039: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for c51dfc8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 9c1c039: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
I don't seem to be able to retrigger the Mend Security Check. It's not configured to block merges, though, so I believe it's safe to merge anyway. |
--------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Michael Froh <froh@amazon.com> Co-authored-by: Michael Froh <froh@amazon.com> (cherry picked from commit 52ecbe9) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
) * Support Filtering on Large List encoded by Bitmap (#14774) --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Michael Froh <froh@amazon.com> Co-authored-by: Michael Froh <froh@amazon.com> (cherry picked from commit 52ecbe9) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Update version checks to look for 2.17.0 Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael Froh <froh@amazon.com>
…#14774) --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Michael Froh <froh@amazon.com> Co-authored-by: Michael Froh <froh@amazon.com>
…#14774) --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Michael Froh <froh@amazon.com> Co-authored-by: Michael Froh <froh@amazon.com>
Problem
To retrieve the documents that match at least one item from a given list, we can use the terms query. We can even save the filter list in an document and use terms lookup to fetch that and feed into the terms query.
However, as the filter becomes larger, the memory and network transmit overhead increase. And this overhead disproportionately affects the latency and TPS when the filter becomes huge (10k+ items), making it un-useable.
Proposal
We can use RoaringBitmap to encode the filter which provides less memory and bandwidth usage and fast and deterministic in-memory random access or lookup.
User Story
Users want to filter/join a main index with a bitmap filter on a numeric field.
e.g. The index contains product ids and other data related to products. Each filter represents the owned products of a customer. The filter is a list of numeric product ids.
An example experience using terms lookup
Uses create a RoaringBitmap for a filter on the client side, serialize the bitmap to byte array and encode the byte array using base64.
Users index/store the bitmap filter in a binary field (
customer_filter
) of an OpenSearch index (customers
). The id of the document is the identifier of the customer associated with this filter.Users do a terms lookup query on products index (
products
) with a lookup oncustomers
of certain customer id.User do a normal terms query and pass in bitmap
User do a boolean query with boolean operation between multiple filters
Implementation
Functional Requirements
_source
of a document using get by id request. Adding support to fetch from the stored field. This is needed by binary field because its source would be the base64 encoded string.terms
queryNon-functional Requirements
Related Issues
Resolves #12341
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.