-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Support sub agg in filter rewrite optimization #17447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support sub agg in filter rewrite optimization #17447
Conversation
|
{"run-benchmark-test": "id_5"} |
|
{"run-benchmark-test": "id_11"} |
360b293 to
a031b5a
Compare
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2442/ . Final results will be published once the job is completed. |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2443/ . Final results will be published once the job is completed. |
|
❌ Gradle check result for a031b5a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
dbwiddis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass.
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
...rg/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteOptimizationContext.java
Outdated
Show resolved
Hide resolved
|
{"run-benchmark-test": "id_4"} |
|
{"run-benchmark-test": "id_4"} |
|
The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/2469/ . Final results will be published once the job is completed. |
Benchmark ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/2469/
|
Benchmark Baseline Comparison ResultsBenchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/39/
|
some really good improvements in range histogram queries and a few keyword terms queries as well. @bowenlan-amzn |
a031b5a to
5a9082d
Compare
|
❌ Gradle check result for 5a9082d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
f95a687 to
86e1821
Compare
jainankitk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bowenlan-amzn - Thanks for updating description with the latest memory numbers. Glad to see we are pretty close to the unoptimized version in terms of memory utilization. Can you also update the latency benchmark for big5 and pmc workload with the latest changes?
server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
Show resolved
Hide resolved
...r/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java
Outdated
Show resolved
Hide resolved
...opensearch/search/aggregations/bucket/filterrewrite/rangecollector/SubAggRangeCollector.java
Outdated
Show resolved
Hide resolved
...in/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java
Outdated
Show resolved
Hide resolved
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
bucket sub agg big5
pmc
pmc w/ segment threshold as 0, so all segment will use the optimization
Seems a little better than before, but not removing the need of the segment level threshold |
|
❌ Gradle check result for e751779: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❕ Gradle check result for 90b481b: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17447-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5fb4e6951b6c80db5c9c45398ed3704ac4092ba3
# Push it to GitHub
git push --set-upstream origin backport/backport-17447-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.xThen, create a pull request where the |
) * Support sub agg in filter rewrite optimization Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Clean unused code Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove singleton DV related change Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * let aggregator decide whether to support sub agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor range collector Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prevent NPE Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * handle tryPrecomputeAggregationForLeaf interface Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * clean up for review Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add changelog Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add segment level check this is for the regression we see in pmc, docs per segment doesn't work well. we should also consider number of ranges. From experiments I choose 1000. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improvements - throw exception for unreachable path - several place to only run when hasSub - range agg only works for match all Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * experiment annotation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Update server/src/main/java/org/opensearch/search/SearchService.java Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Collect sub agg after each bucket Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * try fixed bit set Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>
) * Support sub agg in filter rewrite optimization Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Clean unused code Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove singleton DV related change Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * let aggregator decide whether to support sub agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor range collector Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prevent NPE Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * handle tryPrecomputeAggregationForLeaf interface Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * clean up for review Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add changelog Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add segment level check this is for the regression we see in pmc, docs per segment doesn't work well. we should also consider number of ranges. From experiments I choose 1000. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improvements - throw exception for unreachable path - several place to only run when hasSub - range agg only works for match all Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * experiment annotation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Update server/src/main/java/org/opensearch/search/SearchService.java Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Collect sub agg after each bucket Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * try fixed bit set Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Harsh Kothari <techarsh@amazon.com>
) * Support sub agg in filter rewrite optimization Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Clean unused code Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * remove singleton DV related change Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * let aggregator decide whether to support sub agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor range collector Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * prevent NPE Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * handle tryPrecomputeAggregationForLeaf interface Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * clean up for review Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add changelog Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add segment level check this is for the regression we see in pmc, docs per segment doesn't work well. we should also consider number of ranges. From experiments I choose 1000. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * improvements - throw exception for unreachable path - several place to only run when hasSub - range agg only works for match all Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * experiment annotation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Update server/src/main/java/org/opensearch/search/SearchService.java Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Collect sub agg after each bucket Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * try fixed bit set Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: Harsh Kothari <techarsh@amazon.com>
Description
This change add support of sub aggregation when performing filter rewrite optimization on range, date and auto date aggregation.
Filter rewrite optimization previously only support one top aggregation (range, date and auto date) combined with match all or range query on the same aggregation field. Now we can also put sub aggregation under that and still benefit from the optimization.
The execution of sub aggregation still goes the default way of aggregation, which is to iterate through the matching docs and collect using DocValues. The benefit comes from the different way of aggregation in the top level. Instead of using DocValues, we iterate over the index of aggregated field.
We see improvements of the 2 applicable operations in big5 workload.
This change doesn't support composite aggregation as it uses a different and more complicated way to support sub aggregation.
Since our change only support match all or top level query on same numeric field that's aggregated, we don't need to worry about sub aggregation that requires scoring.
Theoretical Analysis
Previous #13171
Default Aggregation Method
Time Cost: T_default = C_default × N_Doc
Where:
N_Doc: total number of matching documents
C_default: average per document cost when iterate DocValues
Primary operations that contribute to cost: DocValues loading, value rounding, bucket selection
Optimized Method
We traverse the BKD tree one time to collect continuous ranges without gap or overlap
Time Cost: T_optim = C_optim × ((N_Leaf_Visit/N_Leaf) × N_Doc)
Where:
N_Leaf_Visit: Number of leaf nodes requiring visitDocValues
N_Leaf: Total number of leaf nodes with matching documents
C_optim: average per document cost when traverse BKD
Primary operations that contribute to cost: visitDocValues
Compare Default and Optimized Method
Previously when we only support one top aggregation, we only need to do do counting. Since inner node contains the count of its children leaves, N_Leaf_Visit is bounded by number of ranges +1 because only across range requires we go down to visitDocValues of one leaf.
Now we support sub aggregation, so we also need to build iterator for each range. The means we need to visit every leaf to collect docIds to build the matching docs iterator for the range, and later on sub aggregation collect these iterators.
So at the end, we need to compare C_default and C_optim. Below is the experiment to get data for it.
Experiment
We run experiments on 2 types of sub aggregation, metric and bucket. And on 2 types of workload, big5 and pmc.
Big5 workload is much denser compared to pmc, meaing it contains much more documents in every minute interval.
pmc has 574199 documents across 123 months
big5 has 116000000 documents across 290 hours
metric sub agg
{ "size": 0, "aggs": { "agg1": { "date_histogram": { "field": "@timestamp", "fixed_interval": "$INTERVAL" }, "aggs": { "tmin": { "min": { "field": "metrics.tmin" } }, "tavg": { "avg": { "field": "metrics.size" } }, "tmax": { "max": { "field": "metrics.size" } } } } } }Big5
pmc
bucket sub agg
{ "size": 0, "aggs": { "agg1": { "date_histogram": { "field": "@timestamp", "fixed_interval": "$INTERVAL" }, "aggs": { "agg2": { "auto_date_histogram": { "field": "@timestamp", "buckets": 3 }, "aggs": { "tmin": { "min": { "field": "metrics.tmin" } }, "tavg": { "avg": { "field": "metrics.size" } }, "tmax": { "max": { "field": "metrics.size" } } } } } } } }big5
pmc
Regression in PMC
Again we are seeing some regression in PMC workload. We try to add some check at segment level. We first check if documents per segment can help with, but after checking several different values, it's not working well.
Then we notice the regression seems become smaller and smaller, and when docs per range is 20k~80k the regression is within 5ms which can happen normally even if you run same query twice. So we implement a rough estimate of segment level docs per range. After checking some value, we choose 1000 and below are the results.
I think this threshold works by disabling the optimization for most of the segments when ranges are high.
metric sub agg
bucket sub agg
Memory Implication
This way of doing sub aggregation do allocation more memory mainly for keeping a DocIdSetBuilder in memory for later on reply for sub aggregation.
First Iteration
Here are the memory allocation profiling for several scenarios
Do sub aggregation after one range finalized

Do sub aggregation after all ranges finalized (more details #17447 (comment))

Default aggregation method

Second Iteration
We change to use one FixedBitSet per segment, and reusing the same one for all ranges traversed.
Theoretically there could be at most 2 billion docs, so the max of the memory for this bitset would be ~250MB
Using FixedBitSet

Default aggregation

Script and Query
Related Issues
Resolves #12602
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.