Skip to content

Conversation

@asimmahmood1
Copy link
Contributor

@asimmahmood1 asimmahmood1 commented Dec 5, 2025

Description

This is a follow up to #20057

Compared to the date and auto date histogram, this version handles multiple owning bucket ord correctly but keep track of upToInclusive per bucket. We can go back and apply the same logic to date and auto date.

Same logic can be applied to max agg, currently max is written as complete copy of min, will try to see if it can be refactored.

Currently skiplist only keeps track of long values, so for e.g. double type, we need to convert it to sortable long. Given currently logic, it'll only handle basic numeric types but not half-float.

This also relies on skiplist mapping to be explicitly enabled, unlike @timestamp field. So that is something can be part of 3.5 to enable skiplist by default.

TODO:

  1. add test for date
  2. skip logic for unsupported numeric types like half float
  3. validate with bigger datasets

Related Issues

Part of #18882
Part of #19384

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added skiplist optimization for auto_date_histogram aggregation to improve date histogram query performance and reduce processing time significantly.
    • Added skiplist optimization for min aggregation to enhance computation efficiency for minimum value aggregations across large datasets.
  • Tests

    • Expanded test coverage for skiplist collector behavior during aggregation rounding changes, sub-aggregation scenarios, and bucket merging operations.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
…by filter rewrite

* TODO:
  * add more test cases, currently does not assert that skiplist was used
  * add auto date benchmark to http_logs
  * verify existing benchmarks have no issue

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

# Conflicts:
#	server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
@asimmahmood1 asimmahmood1 self-assigned this Dec 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

This PR adds skiplist optimization to date histogram and min aggregations. Changes include dynamic rounding support in HistogramSkiplistLeafCollector via supplier-based accessors, integration of DocValuesSkipper into AutoDateHistogramAggregator and MinAggregator leaf collectors, refactored skiplist eligibility checks, profiling updates, and comprehensive test coverage for rounding change scenarios.

Changes

Cohort / File(s) Summary
Core Skiplist Leaf Collector
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java
Introduces dynamic rounding via supplier-based accessors (preparedRoundingSupplier, bucketOrdsSupplier); adds runtime rounding change detection with cache invalidation; new public interface IncreaseRoundingIfNeeded for rounding callbacks; new public static canUseSkiplist() method and constructor overload for supplier-based behavior.
Date Histogram Aggregators
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java, server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java
AutoDateHistogramAggregator: wires DocValuesSkipper into leaf collection paths, updates getLeafCollector signature to accept skipper, conditionally uses HistogramSkiplistLeafCollector, adds per-collector counters, and implements rounding advancement logic. DateHistogramAggregator: delegates skiplist eligibility check to HistogramSkiplistLeafCollector.canUseSkiplist().
Min Aggregation Skiplist
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
Adds DocValuesSkipper-based optimization with new fields (fieldName, fieldIsFloat, skipUpTo, counters), internal MinSkiplistLeafCollector class for skip range advancement, helper methods (growMins, growSkipUpTo), debug info collection, and resource cleanup.
Profiling
server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java
Adds timing instrumentation for collect(DocIdStream stream, long owningBucketOrd) and collectRange(int min, int max) methods.
Tests
server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java, server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java, server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java
AutoDateHistogramAggregatorTests: updates DATE_FIELD constant, adds skiplist-enabled test helpers and three new tests for rounding changes. MinAggregatorTests: adds two new property-style tests for skiplist result equivalence and terms aggregation. FilterRewriteSubAggTests: adds descriptive documentation and explicit assertion messages.
Documentation
CHANGELOG.md
Updates Unreleased 3.x "Added" section to document skiplist optimization for auto_date_histogram aggregation.

Sequence Diagram(s)

sequenceDiagram
    participant Agg as AutoDateHistogramAggregator
    participant HC as getLeafCollector
    participant SKL as HistogramSkiplistLeafCollector
    participant Skipper as DocValuesSkipper
    participant Rounding as Rounding.Prepared
    
    Agg->>HC: getLeafCollector(values, skipper, sub)
    HC->>HC: Check if skiplist can be used
    alt Skiplist Available
        HC->>SKL: new HistogramSkiplistLeafCollector<br/>(with suppliers)
        SKL->>Skipper: advance(doc)
        Skipper-->>SKL: skip range
        SKL->>Rounding: get current rounding<br/>(via supplier)
        alt Rounding Changed
            Rounding-->>SKL: new rounding level
            SKL->>SKL: invalidate cache
            SKL->>HC: IncreaseRoundingIfNeeded<br/>.increaseRounding()
            HC->>HC: handle rounding update
        end
        SKL->>SKL: bucket document
        SKL-->>HC: return
    else Fallback
        HC->>HC: use standard collector
    end
    HC-->>Agg: LeafBucketCollector
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

  • HistogramSkiplistLeafCollector.java: Dense supplier-based logic with dynamic rounding detection and cache invalidation; requires careful review of state transitions
  • AutoDateHistogramAggregator.java: Significant method signature changes (getLeafCollector now includes skipper parameter) affecting both FromSingle and FromMany; complex rounding advancement logic and conditional skiplist collector instantiation
  • MinAggregator.java: New inner class MinSkiplistLeafCollector with skip range advancement; multiple new fields and resource management paths (growMins, growSkipUpTo, cleanup in doClose/doReset)
  • Test coverage: Three new AutoDateHistogramAggregatorTests and two MinAggregatorTests with complex rounding change scenarios and sub-aggregation interactions
  • Potential edge cases: Rounding changes during collection, bucket merging after rounding increase, interaction with sub-aggregations

Possibly related PRs

  • PR #19573: Modifies skiplist-based histogram collection paths with HistogramSkiplistLeafCollector and canUseSkiplist logic, directly related to this PR's core skiplist infrastructure.

Suggested labels

enhancement, Search

Suggested reviewers

  • jainankitk
  • msfroh
  • mch2
  • andrross
  • reta
  • saratvemulapalli
  • kotwanikunal

Poem

🐰 Through histograms we skip with glee,
Auto-dating by rounding spree,
With suppliers and caches bright,
Skiplist paths take flight!
Min values merge without a fight,
OpenSearch optimized just right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding skiplist optimization to the min aggregation with support for multiple owning bucket ordinals.
Description check ✅ Passed The description comprehensively covers the changes including context from the related PR, implementation details, limitations, and known TODOs. It follows the template structure with Description, Related Issues, and Check List sections.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimmahmood1 asimmahmood1 added the v3.5.0 Issues and PRs related to version 3.4.0 label Dec 5, 2025
@asimmahmood1 asimmahmood1 added Search:Aggregations Performance This is for any performance related enhancements or bugs Search:Performance labels Dec 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

@asimmahmood1
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)

235-246: Consider splitting conditions for readability.

The early return on line 236 combines two null checks on one line. For consistency with the style on lines 238 and 240, consider splitting:

-    if (skipper == null || singleton == null) return false;
+    if (skipper == null || singleton == null) {
+        return false;
+    }

222-229: Consider adding @FunctionalInterface annotation.

The IncreaseRoundingIfNeeded interface is designed as a functional interface. Adding the annotation would provide compile-time verification and document intent:

+    @FunctionalInterface
     public interface IncreaseRoundingIfNeeded {
         void accept(long owningBucket, long rounded);
     }
server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (3)

35-35: Unused import: Lucene90DocValuesFormat

This import is not used anywhere in the file and should be removed.

-import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat;

1017-1029: Inconsistent indentation in category "c" loop.

The loop for category "c" has extra indentation compared to categories "a" and "b".

             // Category "c": values 100, 200 -> min should be 100
-                for (int i = 0; i < 4096 * 2; i++) {
-                    Document doc7 = new Document();
-                    doc7.add(new StringField(categoryField, "c", Field.Store.NO));
-                    doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
-                    doc7.add(SortedNumericDocValuesField.indexedField("value", 100));
-                    iw.addDocument(doc7);
-
-                    Document doc8 = new Document();
-                    doc8.add(new StringField(categoryField, "c", Field.Store.NO));
-                    doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
-                    doc8.add(SortedNumericDocValuesField.indexedField("value", 200));
-                    iw.addDocument(doc8);
-                }
+            for (int i = 0; i < 4096 * 2; i++) {
+                Document doc7 = new Document();
+                doc7.add(new StringField(categoryField, "c", Field.Store.NO));
+                doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
+                doc7.add(SortedNumericDocValuesField.indexedField("value", 100));
+                iw.addDocument(doc7);
+
+                Document doc8 = new Document();
+                doc8.add(new StringField(categoryField, "c", Field.Store.NO));
+                doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
+                doc8.add(SortedNumericDocValuesField.indexedField("value", 200));
+                iw.addDocument(doc8);
+            }

976-1029: Consider reducing dataset size or extracting helper for document creation.

The test creates 8192 documents per category (24,576 total). While this ensures skiplist structures are populated, consider:

  1. Adding a comment explaining why this size is needed (e.g., to trigger skiplist block creation)
  2. Extracting document creation into a helper to reduce code duplication
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (2)

336-337: Float conversion may be incorrect for integer types.

When isFloat is false, the code treats the value as a raw long. However, for integer types (INT, SHORT, BYTE), the skiplist stores sortable longs, which for these types should just be the value itself. The logic appears correct, but:

  1. For DOUBLE/FLOAT: NumericUtils.sortableLongToDouble is correct
  2. For LONG: Raw long value is correct
  3. For INT/SHORT/BYTE: Should also work since they're stored as longs

However, the variable name sortableLongToDouble suggests float conversion, but you're assigning to a double even for integer path. This works due to implicit conversion but could be clearer.

Consider renaming for clarity:

-                double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : sortableLong;
+                double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : (double) sortableLong;

256-294: MinSkiplistLeafCollector stores references that can become stale.

The collector stores references to mins and skipUpTo arrays, but these are reassigned via minAgg.growMins() and minAgg.growSkipUpTo() during collection. The code correctly re-fetches these references in advanceSkipper and collect, but the instance fields may hold stale references between calls.

Consider either:

  1. Not storing these as instance fields and always fetching from minAgg
  2. Adding a comment explaining the refresh pattern

The current implementation works correctly due to the refresh logic, but the pattern is subtle.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0449ce8 and e4d5427.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (7 hunks)
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (16 hunks)
  • server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (1 hunks)
  • server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (8 hunks)
  • server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java (2 hunks)
  • server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java (3 hunks)
  • server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java (4 hunks)
  • server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)
server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (1)
  • NumberFieldMapper (102-2185)
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)
server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java (1)
  • AggregatorBase (60-365)
server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
  • LeafBucketCollector (51-179)
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)
server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
  • LeafBucketCollector (51-179)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (24)
server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java (2)

284-298: Excellent test documentation!

The detailed documentation block clearly explains the test's purpose, the specific code paths being exercised, and the test setup requirements. This significantly improves maintainability and helps future developers understand the skiplist verification logic.


341-341: Great improvement to assertion clarity!

Adding descriptive failure messages to assertNotNull assertions is a testing best practice. These messages clearly indicate what should be present and link the assertion to the verification of SubAggRangeCollector.collect() execution, making test failures much easier to diagnose.

Also applies to: 349-349, 358-358

server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)

41-48: Supplier-based rounding design enables dynamic rounding changes.

The introduction of LongFunction<Rounding.Prepared> and Supplier<LongKeyedBucketOrds> allows AutoDateHistogramAggregator to dynamically change rounding during collection. This design correctly decouples the skiplist collector from the specific rounding instance.


150-181: Rounding change detection relies on reference equality.

The detection at line 156 using currentRounding != lastPreparedRounding depends on AutoDateHistogramAggregator creating new Rounding.Prepared instances when rounding changes. This is documented but creates a subtle contract between the two classes.

Consider adding a unit test that specifically validates this behavior to prevent future regressions if the rounding instance reuse pattern changes.

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (6)

166-168: fieldName extraction handles null valuesSource correctly.

The ternary check on valuesSource type correctly handles the case where valuesSource is not FieldData, setting fieldName to null. This null is then checked before calling getDocValuesSkipper.


278-299: Skipper retrieval centralized in base getLeafCollector.

Good refactoring to handle DocValuesSkipper retrieval once in the base class method and pass it to subclass implementations. The delegation pattern through iteratingCollector properly routes all collection methods.


425-442: FromSingle skiplist collector correctly integrated.

The skiplist collector integration is well-structured. Note that line 440 ignores owningBucket since FromSingle always operates on bucket 0, which is consistent with the assertion at line 448.


706-737: FromMany skiplist collector handles per-bucket rounding correctly.

The implementation properly uses roundingIndexFor(owningBucketOrd) to retrieve the correct prepared rounding for each owning bucket. The explanatory comment (lines 712-720) clearly documents the current limitation and the special case where skiplist can be used.


491-545: increaseRoundingIfNeeded method well-documented.

The extensive comments (lines 507-527) clearly explain the two heuristics for determining when to increase rounding granularity. This documentation helps maintainability of the algorithm.


787-838: FromMany increaseRoundingIfNeeded handles per-bucket state.

The method correctly manages per-bucket min/max tracking by growing the arrays and initializing new slots appropriately (Long.MAX_VALUE for mins, Long.MIN_VALUE for maxes at lines 799 and 804).

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (1)

235-238: Skiplist eligibility check centralized to HistogramSkiplistLeafCollector.

Good refactoring to delegate the skiplist eligibility decision to HistogramSkiplistLeafCollector.canUseSkiplist. This centralizes the logic and ensures consistent behavior across DateHistogramAggregator and AutoDateHistogramAggregator.

server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java (2)

68-86: Profiling support added for new collection methods.

The new overrides correctly follow the existing timing pattern with try-finally to ensure the timer stops even if an exception occurs. This ensures skiplist-enabled collection paths are properly profiled.


35-37: Imports support the new method overrides.

The DocIdStream import is required for the new collect(DocIdStream, long) method signature. The ExperimentalApi annotation is inherited from the base class method.

server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)

849-961: Property test looks comprehensive for skiplist equivalence validation.

The test properly validates that skiplist and standard collectors produce identical results across randomized scenarios including negative values and missing values. The 100 iterations provide good coverage.

A few observations:

  • The actualMin calculation on line 879 may not match either result when all filter values are "b" (no documents match filter "a"), but the assertion compares standardResult vs skiplistResult which is correct.
  • Consider reducing iterations in CI if test runtime becomes a concern.
server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java (6)

98-98: Renaming DATE_FIELD to @timestamp is intentional for skiplist enablement.

The change aligns the test field name with the skiplist-enabled path in the histogram implementation. This is a reasonable choice for testing the optimization.


984-1029: Clean overload pattern for skiplist-enabled testing.

The new overloads properly propagate the enableSkiplist flag while maintaining backward compatibility with existing tests. The default behavior (skiplist disabled) is preserved.


1035-1052: LGTM - Skiplist-enabled indexing logic is correct.

The conditional use of SortedNumericDocValuesField.indexedField() vs standard SortedNumericDocValuesField correctly enables/disables skiplist-based doc values.


1083-1118: Test correctly validates skiplist collector behavior during rounding changes.

The test creates a dataset that spans hourly to monthly data, which triggers rounding increases. The assertions properly verify:

  • Aggregation completes successfully
  • Bucket count respects numBuckets limit
  • Total doc count is preserved (24 + 30 = 54)
  • Buckets are in ascending order

1127-1167: Test correctly validates sub-aggregation behavior during rounding changes.

The test verifies that stats sub-aggregations remain valid after rounding changes. The doc count assertion (12 * 3 = 36) is correct.


1176-1215: Test correctly validates bucket merging after rounding changes.

The expected doc counts are correct:

  • 5 days × 24 hours = 120 hourly docs
  • 5 years × 4 quarters = 20 quarterly docs
  • Total = 140 docs

The expected bucket distribution (124, 4, 4, 4, 4) correctly accounts for all 120 hourly docs plus 4 quarterly docs falling into the 2020 bucket.

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (4)

307-323: Corner case handling when doc is between skip intervals may be incorrect.

The logic at lines 315-322 handles the case where skipper.minDocID(0) > doc, but the encoding -(skipper.minDocID(0) - 1) - 1 seems unnecessarily complex. When minDocID(0) > doc, we're before the skipper's range, so we should simply not skip.

Also, line 308 checks doc > skipper.maxDocID(0) but advanceSkipper is called when doc >= upToInclusive (line 365). This asymmetry might cause issues at boundaries.

Please verify the boundary conditions are correct by testing edge cases where:

  1. doc equals maxDocID(0) exactly
  2. doc is just before minDocID(0)
  3. doc is between skip intervals

369-375: Sub-collector is invoked before skip check - verify this is intentional.

The sub-collector is called (line 370) before checking if we can skip (line 373). This means sub-aggregations process every document regardless of whether the min aggregator can skip it.

If this is intentional (sub-aggs need all docs even when min can skip), add a comment. If not, the sub-collector call should be moved after the skip check, or the skip should only apply to the min value update.

Is it correct that sub-aggregations see all documents even when the min aggregator can skip them? This may be the intended behavior for correctness, but please confirm.


230-247: Helper methods for array growth are well implemented.

The growMins and growSkipUpTo methods correctly handle lazy growth of the arrays with proper initialization of new elements. The pattern matches the existing codebase style.


416-419: Map.of() requires Java 9+; verify this aligns with the target OpenSearch version.

OpenSearch 2.x requires Java 11 minimum (compatible with 11, 17, and 21+), making Map.of() safe. However, if this change targets OpenSearch 1.x branches which support Java 8, this syntax is incompatible and should be replaced with Collections.singletonMap() or alternative approaches.

- Bump opensearch-protobufs dependency to 0.24.0 and update transport-grpc module compatibility ([#20059](https://github.com/opensearch-project/OpenSearch/pull/20059))

- Refactor the ShardStats, WarmerStats and IndexingPressureStats class to use the Builder pattern instead of constructors ([#19966](https://github.com/opensearch-project/OpenSearch/pull/19966))
- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Changelog entry references wrong PR and missing min aggregation entry.

The changelog entry references PR #20057, but this is PR #20175. Additionally, based on the PR title "Add skiplist optimization to min aggregation", there should be a separate changelog entry for the min aggregation skiplist optimization.

-- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
+- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
+- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))
🤖 Prompt for AI Agents
In CHANGELOG.md around line 74, the entry incorrectly references PR #20057 and
is missing the separate min aggregation entry; update the PR number to #20175
for the existing line and add a new changelog line noting "Add skiplist
optimization to min aggregation (PR #20175)" or similar, ensuring both
auto_date_histogram and min aggregation changes are represented with correct PR
reference and consistent formatting to match surrounding entries.

Comment on lines +183 to +186
this.skipUpTo = bigArrays.newLongArray(1, false);
this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Memory leak: skipUpTo array is reallocated on every leaf without releasing previous allocation.

The skipUpTo array is created fresh for each leaf collector but never released until doClose(). If processing multiple leaf segments, each call to getLeafCollector() allocates a new LongArray, leaking the previous one.

Consider either:

  1. Allocating once in the constructor and resetting per-leaf
  2. Releasing the old array before allocating a new one
         // Use skiplist collector if conditions are met
         if (skipper != null) {
             skipListCollectorsUsed++;
+            if (this.skipUpTo != null) {
+                Releasables.close(this.skipUpTo);
+            }
             this.skipUpTo = bigArrays.newLongArray(1, false);
             this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
             return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
this.skipUpTo = bigArrays.newLongArray(1, false);
this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
}
if (skipper != null) {
skipListCollectorsUsed++;
if (this.skipUpTo != null) {
Releasables.close(this.skipUpTo);
}
this.skipUpTo = bigArrays.newLongArray(1, false);
this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
}
🤖 Prompt for AI Agents
In
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
around lines 183 to 186, skipUpTo is reallocated on every call to
getLeafCollector() causing a memory leak because previous LongArray instances
are not released; fix by either allocating skipUpTo once in the MinAggregator
constructor and reusing/resetting its values before returning each leaf
collector (call fill or set sentinel values) or, if per-leaf allocation is
required, release the previous LongArray by calling its close/release method (or
bigArrays.releaseRef/appropriate deallocation API) before assigning a new one,
and ensure doClose() still closes the final array.

Comment on lines +364 to +375
// TODO: check if it should be doc >= or doc >
if (doc >= upToInclusive) {
advanceSkipper(doc, owningBucketOrd);
skipUpToValue = skipUpTo.get(owningBucketOrd);
}
if (!isSubNoOp) {
sub.collect(doc, owningBucketOrd);
}
// If skipUpTo >= 0, we can skip this document
if (skipUpToValue >= 0) {
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TODO comment and potential off-by-one error needs resolution.

The TODO at line 364 indicates uncertainty about whether the comparison should be >= or >. This needs to be resolved before merging.

Based on the logic:

  • upToInclusive represents the last doc ID that can be skipped (inclusive)
  • When doc >= upToInclusive, we need to re-evaluate the skip range

The current >= seems incorrect. If upToInclusive is the max doc that CAN be skipped, then:

  • doc == upToInclusive means doc is still in the skippable range
  • Only doc > upToInclusive should trigger re-evaluation
-            // TODO: check if it should be doc >= or doc >
-            if (doc >= upToInclusive) {
+            if (doc > upToInclusive) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO: check if it should be doc >= or doc >
if (doc >= upToInclusive) {
advanceSkipper(doc, owningBucketOrd);
skipUpToValue = skipUpTo.get(owningBucketOrd);
}
if (!isSubNoOp) {
sub.collect(doc, owningBucketOrd);
}
// If skipUpTo >= 0, we can skip this document
if (skipUpToValue >= 0) {
return;
}
if (doc > upToInclusive) {
advanceSkipper(doc, owningBucketOrd);
skipUpToValue = skipUpTo.get(owningBucketOrd);
}
if (!isSubNoOp) {
sub.collect(doc, owningBucketOrd);
}
// If skipUpTo >= 0, we can skip this document
if (skipUpToValue >= 0) {
return;
}
🤖 Prompt for AI Agents
In
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
around lines 364 to 375, the comparison using `doc >= upToInclusive` is
incorrect given `upToInclusive` is the last doc ID that may be skipped; change
the condition to `if (doc > upToInclusive) { advanceSkipper(doc,
owningBucketOrd); skipUpToValue = skipUpTo.get(owningBucketOrd); }`, remove the
TODO, and keep the rest of the logic (sub.collect and the subsequent
skipUpToValue check) unchanged so that documents equal to upToInclusive are
still considered skippable and only docs greater than it trigger re-evaluation.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for e4d5427: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@asimmahmood1
Copy link
Contributor Author

big5 validation

 curl -XGET 'http://localhost:9200/big5/_search' -H 'Content-Type: application/json' -d '
{
  "size": 0,
  "query": {
     "bool": {
      "filter": [
        {
          "term": {
            "process.name": "systemd"
          }
        }
      ]
    }
  },
  "aggs": {
    "tmax": {
      "range": {
        "field": "metrics.size",
        "ranges": [
          {
            "to": -10
          },
          {
            "from": -10,
            "to": 10
          },
          {
            "from": 10,
            "to": 100
          },
          {
            "from": 100,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 2000
          },
          {
            "from": 2000
          }
        ]
      },
      "aggs": {
        "by_hour": {
          "date_histogram": {
            "field": "@timestamp",
            "calendar_interval": "day"
          },
          "aggs": {
            "tmin": {
              "min": {
                "field": "metrics.tmin"
              }
            }
          }
        }
      }
    }
  },"profile":"false"
}'
diff 18_range_date_histo_with_min_and_filter_baseline.sh 18_range_date_histo_with_min_and_filter_candidate.sh
2c2
<   "took": 378,
---
>   "took": 333,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance This is for any performance related enhancements or bugs Search:Aggregations Search:Performance v3.5.0 Issues and PRs related to version 3.4.0

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant