-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve search performance for numeric sort queries #10867
Comments
I was brainstorming with @harshavamsi on this one briefly last week. I think there might be some trickery that we can do especially for the special case where a segment has no deletes. Specifically, I'm wondering if we can inspect the BKD tree to find the leftmost/rightmost (depending on sort order) smallest range with at least N hits, where N is the I don't know if it would ultimately help, or if it's essentially what happens in the the PointValues estimate/intersect methods anyway. |
@msfroh thanks for the inputs. Tagging @rishabhmaurya here as well. @rishabhmaurya had the idea of essentially trying to help match_all queries that use a descending sort on a numeric field. Rather than going through the entire BKD tree like you mentioned, we could essentially look through the min/max value that makes the most sense for us and then attach a range filter on that node assuming other attributes like the number of hits and the number of docs to be returned are all taken care of first. I don't think I did a great job of explaining, but I will put up an RFC with my thought process and how we could prune the tree. |
Thanks @harshavamsi for working on it. I have working version of it in lucene and details of optimization are mentioned here - apache/lucene#12534 and PR rishabhmaurya/lucene#2. I had a discussion around it with @msfroh and we agreed upon its utility. I started making changes in opensearch as well because lucene community may not accept it as it works for cases with MatchAllQuery with desc sort and no deletions on numeric field. You can find the opensearch changes here, its still work in progress - rishabhmaurya@f261cb3 |
Should we pull this in rishabhmaurya@f261cb3 and run a benchmark along with profile to identify the early improvements |
rishabhmaurya@f261cb3 is still work in progress so can't be used directly. |
Current steps on this:
|
Preliminary benchmarking results: Without optimization
With optimization
|
thanks @harshavamsi for running the benchmark. Could you provide more details on the workload and queries you ran? |
I ran this workload and this task: Workload: geonames I used an This is the query:
|
Re-running the benchmark on the optimized cluster:
|
can you also post the segment stats here and overall index size. Given the latency is already pretty low, this maybe not be the right workload to test against. |
@rishabhmaurya here's the segment stats:
Index size:
|
The segment sizes are too small to see any noticeable difference, I can work with you on it next week. |
@rishabhmaurya The POC you tried would only work for |
+1 on @gashutos point. @rishabhmaurya - Do you have a specific use case where this will be useful? |
@gashutos thanks for looking. Yes, I have mentioned in the poc that it is supposed to work only for MatchAllQuery with no doc deletions.
Can you point me to your poc/issue and also why do you think its a rare case. Thank you |
@rishabhmaurya
The reason we think it is rare scenario because generally in production, we dont see just sort on single field without any filtering clause wrapping it. Again this is observation based on my seen user usecases. |
Posting some more number here, same workload and instance but this time with force merging into 1 large segment to see if it could have any impact as well as running on a single primary shard: Non optimized cluster:
Optimized cluster:
Will dive into lucene code path to understand where we're spending time when running this workload. |
Hi @harshavamsi - will documentation be required for this feature in 2.12? |
This is purely an internal optimization task. It should not require any documentation. |
Hi, are we on track for this to be released in 2.12 ? |
Pushing this out to v2.13, since this optimization is still in the investigation stage. Although the benchmarks numbers looks promising, it requires further deep dive into the lucene code path to understand where we're spending time and coming up with the improvement opportunities. |
Moved it to 2.14.0 as per the discussion with @harshavamsi |
We should try benchmarking numeric sort queries with apache/lucene#13149. Based on the explanation at https://blunders.io/posts/es-benchmark-4-inlining, we may see significant improvement to numeric sorting.. |
Tagging @opensearch-project/benchmark-core team |
Numeric sorting is one of the key query mechanisms used across multiple OpenSearch clusters. It is critical to understand performance characteristics of numeric sorting queries and identify mechanisms to reduce query latency and reduce performance overhead.
To understand query characteristics of numeric sorting, a simple performance testing was performed with below settings:
Benchmark tool: opensearch-benchmark
Workload: geonames
Task: desc_sort_population
Nodes: 1 node
JVM size: 4 GB
Benchmark result:
CPU profile:
As expected most CPU cycles for numeric sorting is spent in Long comparator to do perform sorting operation. CPU cycles are equally distributed between PointValues operations
estimatePointCount
andintersect
Opening this issue to brainstorm and identify potential improvements to numeric sorting.
The text was updated successfully, but these errors were encountered: