-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of rounding dates in date_histogram aggregation #9727
Improve performance of rounding dates in date_histogram aggregation #9727
Conversation
Compatibility status:Checks if related components are compatible with change 1e50ab2 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #9727 +/- ##
============================================
- Coverage 71.16% 71.09% -0.08%
Complexity 58115 58115
============================================
Files 4831 4831
Lines 273999 274082 +83
Branches 39920 39930 +10
============================================
- Hits 195005 194847 -158
- Misses 62604 62920 +316
+ Partials 16390 16315 -75
... and 460 files with indirect coverage changes 📢 Have feedback on the report? Share it here. |
Compatibility status:Checks if related components are compatible with change abfb6cd Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
Bad Gateway errors from Jenkins again. :( GitHub action failed but the check is still running in the background. Update: At least the check completed successfully now. |
Gradle Check (Jenkins) Run Completed with:
|
Jenkins down again. This is disappointing. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9727-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7abfc17a6d1334a885e16d07aa3f6d1c875279c8
# Push it to GitHub
git push --set-upstream origin backport/backport-9727-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com>
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
…pensearch-project#9727) * Improve performance of rounding dates in date_histogram aggregation Signed-off-by: Ketan Verma <ketan9495@gmail.com> * Minor refactoring changes Signed-off-by: Ketan Verma <ketan9495@gmail.com> --------- Signed-off-by: Ketan Verma <ketan9495@gmail.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
In the
date_histogram
aggregation, the timestamp value from each document must be rounded down to the nearest interval (year, quarter, month, week, day, etc.) as defined in the search request. This rounded down timestamp serves as the bucket key to aggregate results. Instead of rounding down the timestamp for each hit, the ArrayRounding class pre-computes the possible values and performs a binary search to find the round down point.This is ideal for sufficiently large arrays. But a simple linear search is far superior for small arrays as it avoids the penalty of branch misprediction and pipeline stalls, and accesses memory sequentially.
Since this takes up majority of the CPU time, making these changes can lead to significant improvements. Hot methods:
Macro benchmarks
I reused the noaa OSB workload and executed the following search request with varying intervals.
Improvement in quarter/month intervals is substantial. Improvement in year/week/day interval is small as the number of pre-computed values is small, so no big jumps are made to access memory. This varies from workload to workload.
Micro benchmarks
Results are from c6i.2xlarge EC2 instance (Intel Xeon 8375C CPU @ 2.90GHz). Full results.
Uniform distribution
Each value equally likely to be picked.
Skewed distribution (towards the edge)
Follows a normal distribution centered at p90 ± 5% stddev.
Skewed distribution (centered)
Follows a normal distribution centered at p50 ± 5% stddev. This is the worst-case for "meet in the middle" linear search.
Alternatives
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.