Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175

asimmahmood1 · 2025-12-05T18:48:58Z

Description

This is a follow up to #20057

Compared to the date and auto date histogram, this version handles multiple owning bucket ord correctly but keep track of upToInclusive per bucket. We can go back and apply the same logic to date and auto date.

Same logic can be applied to max agg, currently max is written as complete copy of min, will try to see if it can be refactored.

Currently skiplist only keeps track of long values, so for e.g. double type, we need to convert it to sortable long. Given currently logic, it'll only handle basic numeric types but not half-float.

This also relies on skiplist mapping to be explicitly enabled, unlike @timestamp field. So that is something can be part of 3.5 to enable skiplist by default.

TODO:

add test for date
skip logic for unsupported numeric types like half float
validate with bigger datasets

Related Issues

Part of #18882
Part of #19384

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

New Features
- Added skiplist optimization for auto_date_histogram aggregation to improve date histogram query performance and reduce processing time significantly.
- Added skiplist optimization for min aggregation to enhance computation efficiency for minimum value aggregations across large datasets.
Tests
- Expanded test coverage for skiplist collector behavior during aggregation rounding changes, sub-aggregation scenarios, and bucket merging operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

…by filter rewrite * TODO: * add more test cases, currently does not assert that skiplist was used * add auto date benchmark to http_logs * verify existing benchmarks have no issue Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Conflicts: # server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

coderabbitai · 2025-12-05T18:49:05Z

Walkthrough

This PR adds skiplist optimization to date histogram and min aggregations. Changes include dynamic rounding support in HistogramSkiplistLeafCollector via supplier-based accessors, integration of DocValuesSkipper into AutoDateHistogramAggregator and MinAggregator leaf collectors, refactored skiplist eligibility checks, profiling updates, and comprehensive test coverage for rounding change scenarios.

Changes

Cohort / File(s)	Summary
Core Skiplist Leaf Collector `server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java`	Introduces dynamic rounding via supplier-based accessors (`preparedRoundingSupplier`, `bucketOrdsSupplier`); adds runtime rounding change detection with cache invalidation; new public interface `IncreaseRoundingIfNeeded` for rounding callbacks; new public static `canUseSkiplist()` method and constructor overload for supplier-based behavior.
Date Histogram Aggregators `server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java`, `server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java`	AutoDateHistogramAggregator: wires DocValuesSkipper into leaf collection paths, updates `getLeafCollector` signature to accept skipper, conditionally uses HistogramSkiplistLeafCollector, adds per-collector counters, and implements rounding advancement logic. DateHistogramAggregator: delegates skiplist eligibility check to `HistogramSkiplistLeafCollector.canUseSkiplist()`.
Min Aggregation Skiplist `server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java`	Adds DocValuesSkipper-based optimization with new fields (`fieldName`, `fieldIsFloat`, `skipUpTo`, counters), internal `MinSkiplistLeafCollector` class for skip range advancement, helper methods (`growMins`, `growSkipUpTo`), debug info collection, and resource cleanup.
Profiling `server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java`	Adds timing instrumentation for `collect(DocIdStream stream, long owningBucketOrd)` and `collectRange(int min, int max)` methods.
Tests `server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java`, `server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java`, `server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java`	AutoDateHistogramAggregatorTests: updates DATE_FIELD constant, adds skiplist-enabled test helpers and three new tests for rounding changes. MinAggregatorTests: adds two new property-style tests for skiplist result equivalence and terms aggregation. FilterRewriteSubAggTests: adds descriptive documentation and explicit assertion messages.
Documentation `CHANGELOG.md`	Updates Unreleased 3.x "Added" section to document skiplist optimization for auto_date_histogram aggregation.

Sequence Diagram(s)

sequenceDiagram
    participant Agg as AutoDateHistogramAggregator
    participant HC as getLeafCollector
    participant SKL as HistogramSkiplistLeafCollector
    participant Skipper as DocValuesSkipper
    participant Rounding as Rounding.Prepared
    
    Agg->>HC: getLeafCollector(values, skipper, sub)
    HC->>HC: Check if skiplist can be used
    alt Skiplist Available
        HC->>SKL: new HistogramSkiplistLeafCollector<br/>(with suppliers)
        SKL->>Skipper: advance(doc)
        Skipper-->>SKL: skip range
        SKL->>Rounding: get current rounding<br/>(via supplier)
        alt Rounding Changed
            Rounding-->>SKL: new rounding level
            SKL->>SKL: invalidate cache
            SKL->>HC: IncreaseRoundingIfNeeded<br/>.increaseRounding()
            HC->>HC: handle rounding update
        end
        SKL->>SKL: bucket document
        SKL-->>HC: return
    else Fallback
        HC->>HC: use standard collector
    end
    HC-->>Agg: LeafBucketCollector

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

HistogramSkiplistLeafCollector.java: Dense supplier-based logic with dynamic rounding detection and cache invalidation; requires careful review of state transitions
AutoDateHistogramAggregator.java: Significant method signature changes (getLeafCollector now includes skipper parameter) affecting both FromSingle and FromMany; complex rounding advancement logic and conditional skiplist collector instantiation
MinAggregator.java: New inner class MinSkiplistLeafCollector with skip range advancement; multiple new fields and resource management paths (growMins, growSkipUpTo, cleanup in doClose/doReset)
Test coverage: Three new AutoDateHistogramAggregatorTests and two MinAggregatorTests with complex rounding change scenarios and sub-aggregation interactions
Potential edge cases: Rounding changes during collection, bucket merging after rounding increase, interaction with sub-aggregations

Possibly related PRs

PR #19573: Modifies skiplist-based histogram collection paths with HistogramSkiplistLeafCollector and canUseSkiplist logic, directly related to this PR's core skiplist infrastructure.

Suggested labels

enhancement, Search

Suggested reviewers

jainankitk
msfroh
mch2
andrross
reta
saratvemulapalli
kotwanikunal

Poem

🐰 Through histograms we skip with glee,
Auto-dating by rounding spree,
With suppliers and caches bright,
Skiplist paths take flight!
Min values merge without a fight,
OpenSearch optimized just right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding skiplist optimization to the min aggregation with support for multiple owning bucket ordinals.
Description check	✅ Passed	The description comprehensively covers the changes including context from the related PR, implementation details, limitations, and known TODOs. It follows the template structure with Description, Related Issues, and Check List sections.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-05T18:50:14Z

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

github-actions · 2025-12-05T18:50:15Z

Hello!
We have added a performance benchmark workflow that runs by adding a comment on the PR.
Please refer https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md on how to run benchmarks on pull requests.

asimmahmood1 · 2025-12-05T18:51:49Z

@coderabbitai review

coderabbitai · 2025-12-05T18:51:55Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)
235-246: Consider splitting conditions for readability.

The early return on line 236 combines two null checks on one line. For consistency with the style on lines 238 and 240, consider splitting:
-    if (skipper == null || singleton == null) return false;
+    if (skipper == null || singleton == null) {
+        return false;
+    }
222-229: Consider adding @FunctionalInterface annotation.

The IncreaseRoundingIfNeeded interface is designed as a functional interface. Adding the annotation would provide compile-time verification and document intent:
+    @FunctionalInterface
     public interface IncreaseRoundingIfNeeded {
         void accept(long owningBucket, long rounded);
     }
server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (3)
35-35: Unused import: Lucene90DocValuesFormat

This import is not used anywhere in the file and should be removed.
-import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat;
1017-1029: Inconsistent indentation in category "c" loop.

The loop for category "c" has extra indentation compared to categories "a" and "b".
             // Category "c": values 100, 200 -> min should be 100
-                for (int i = 0; i < 4096 * 2; i++) {
-                    Document doc7 = new Document();
-                    doc7.add(new StringField(categoryField, "c", Field.Store.NO));
-                    doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
-                    doc7.add(SortedNumericDocValuesField.indexedField("value", 100));
-                    iw.addDocument(doc7);
-
-                    Document doc8 = new Document();
-                    doc8.add(new StringField(categoryField, "c", Field.Store.NO));
-                    doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
-                    doc8.add(SortedNumericDocValuesField.indexedField("value", 200));
-                    iw.addDocument(doc8);
-                }
+            for (int i = 0; i < 4096 * 2; i++) {
+                Document doc7 = new Document();
+                doc7.add(new StringField(categoryField, "c", Field.Store.NO));
+                doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
+                doc7.add(SortedNumericDocValuesField.indexedField("value", 100));
+                iw.addDocument(doc7);
+
+                Document doc8 = new Document();
+                doc8.add(new StringField(categoryField, "c", Field.Store.NO));
+                doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c")));
+                doc8.add(SortedNumericDocValuesField.indexedField("value", 200));
+                iw.addDocument(doc8);
+            }
976-1029: Consider reducing dataset size or extracting helper for document creation.

The test creates 8192 documents per category (24,576 total). While this ensures skiplist structures are populated, consider:

Adding a comment explaining why this size is needed (e.g., to trigger skiplist block creation)

Extracting document creation into a helper to reduce code duplication
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (2)
336-337: Float conversion may be incorrect for integer types.

When isFloat is false, the code treats the value as a raw long. However, for integer types (INT, SHORT, BYTE), the skiplist stores sortable longs, which for these types should just be the value itself. The logic appears correct, but:

For DOUBLE/FLOAT: NumericUtils.sortableLongToDouble is correct

For LONG: Raw long value is correct

For INT/SHORT/BYTE: Should also work since they're stored as longs

However, the variable name sortableLongToDouble suggests float conversion, but you're assigning to a double even for integer path. This works due to implicit conversion but could be clearer.

Consider renaming for clarity:
-                double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : sortableLong;
+                double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : (double) sortableLong;
256-294: MinSkiplistLeafCollector stores references that can become stale.

The collector stores references to mins and skipUpTo arrays, but these are reassigned via minAgg.growMins() and minAgg.growSkipUpTo() during collection. The code correctly re-fetches these references in advanceSkipper and collect, but the instance fields may hold stale references between calls.

Consider either:

Not storing these as instance fields and always fetching from minAgg

Adding a comment explaining the refresh pattern

The current implementation works correctly due to the refresh logic, but the pattern is subtle.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0449ce8 and e4d5427.

📒 Files selected for processing (9)

CHANGELOG.md (1 hunks)
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (7 hunks)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (16 hunks)
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (1 hunks)
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (8 hunks)
server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java (2 hunks)
server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java (3 hunks)
server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java (4 hunks)
server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)

server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (1)

NumberFieldMapper (102-2185)

server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)

server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java (1)

AggregatorBase (60-365)

server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)

LeafBucketCollector (51-179)

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)

server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)

LeafBucketCollector (51-179)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: gradle-check

🔇 Additional comments (24)

server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java (2)

284-298: Excellent test documentation!

The detailed documentation block clearly explains the test's purpose, the specific code paths being exercised, and the test setup requirements. This significantly improves maintainability and helps future developers understand the skiplist verification logic.

341-341: Great improvement to assertion clarity!

Adding descriptive failure messages to assertNotNull assertions is a testing best practice. These messages clearly indicate what should be present and link the assertion to the verification of SubAggRangeCollector.collect() execution, making test failures much easier to diagnose.

Also applies to: 349-349, 358-358

server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)

41-48: Supplier-based rounding design enables dynamic rounding changes.

The introduction of LongFunction<Rounding.Prepared> and Supplier<LongKeyedBucketOrds> allows AutoDateHistogramAggregator to dynamically change rounding during collection. This design correctly decouples the skiplist collector from the specific rounding instance.

150-181: Rounding change detection relies on reference equality.

The detection at line 156 using currentRounding != lastPreparedRounding depends on AutoDateHistogramAggregator creating new Rounding.Prepared instances when rounding changes. This is documented but creates a subtle contract between the two classes.

Consider adding a unit test that specifically validates this behavior to prevent future regressions if the rounding instance reuse pattern changes.

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (6)

166-168: fieldName extraction handles null valuesSource correctly.

The ternary check on valuesSource type correctly handles the case where valuesSource is not FieldData, setting fieldName to null. This null is then checked before calling getDocValuesSkipper.

278-299: Skipper retrieval centralized in base getLeafCollector.

Good refactoring to handle DocValuesSkipper retrieval once in the base class method and pass it to subclass implementations. The delegation pattern through iteratingCollector properly routes all collection methods.

425-442: FromSingle skiplist collector correctly integrated.

The skiplist collector integration is well-structured. Note that line 440 ignores owningBucket since FromSingle always operates on bucket 0, which is consistent with the assertion at line 448.

706-737: FromMany skiplist collector handles per-bucket rounding correctly.

The implementation properly uses roundingIndexFor(owningBucketOrd) to retrieve the correct prepared rounding for each owning bucket. The explanatory comment (lines 712-720) clearly documents the current limitation and the special case where skiplist can be used.

491-545: increaseRoundingIfNeeded method well-documented.

The extensive comments (lines 507-527) clearly explain the two heuristics for determining when to increase rounding granularity. This documentation helps maintainability of the algorithm.

787-838: FromMany increaseRoundingIfNeeded handles per-bucket state.

The method correctly manages per-bucket min/max tracking by growing the arrays and initializing new slots appropriately (Long.MAX_VALUE for mins, Long.MIN_VALUE for maxes at lines 799 and 804).

server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (1)

235-238: Skiplist eligibility check centralized to HistogramSkiplistLeafCollector.

Good refactoring to delegate the skiplist eligibility decision to HistogramSkiplistLeafCollector.canUseSkiplist. This centralizes the logic and ensures consistent behavior across DateHistogramAggregator and AutoDateHistogramAggregator.

server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java (2)

68-86: Profiling support added for new collection methods.

The new overrides correctly follow the existing timing pattern with try-finally to ensure the timer stops even if an exception occurs. This ensures skiplist-enabled collection paths are properly profiled.

35-37: Imports support the new method overrides.

The DocIdStream import is required for the new collect(DocIdStream, long) method signature. The ExperimentalApi annotation is inherited from the base class method.

server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)

849-961: Property test looks comprehensive for skiplist equivalence validation.

The test properly validates that skiplist and standard collectors produce identical results across randomized scenarios including negative values and missing values. The 100 iterations provide good coverage.

A few observations:

The actualMin calculation on line 879 may not match either result when all filter values are "b" (no documents match filter "a"), but the assertion compares standardResult vs skiplistResult which is correct.

Consider reducing iterations in CI if test runtime becomes a concern.

server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java (6)

98-98: Renaming DATE_FIELD to @timestamp is intentional for skiplist enablement.

The change aligns the test field name with the skiplist-enabled path in the histogram implementation. This is a reasonable choice for testing the optimization.

984-1029: Clean overload pattern for skiplist-enabled testing.

The new overloads properly propagate the enableSkiplist flag while maintaining backward compatibility with existing tests. The default behavior (skiplist disabled) is preserved.

1035-1052: LGTM - Skiplist-enabled indexing logic is correct.

The conditional use of SortedNumericDocValuesField.indexedField() vs standard SortedNumericDocValuesField correctly enables/disables skiplist-based doc values.

1083-1118: Test correctly validates skiplist collector behavior during rounding changes.

The test creates a dataset that spans hourly to monthly data, which triggers rounding increases. The assertions properly verify:

Aggregation completes successfully

Bucket count respects numBuckets limit

Total doc count is preserved (24 + 30 = 54)

Buckets are in ascending order

1127-1167: Test correctly validates sub-aggregation behavior during rounding changes.

The test verifies that stats sub-aggregations remain valid after rounding changes. The doc count assertion (12 * 3 = 36) is correct.

1176-1215: Test correctly validates bucket merging after rounding changes.

The expected doc counts are correct:

5 days × 24 hours = 120 hourly docs

5 years × 4 quarters = 20 quarterly docs

Total = 140 docs

The expected bucket distribution (124, 4, 4, 4, 4) correctly accounts for all 120 hourly docs plus 4 quarterly docs falling into the 2020 bucket.

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (4)

307-323: Corner case handling when doc is between skip intervals may be incorrect.

The logic at lines 315-322 handles the case where skipper.minDocID(0) > doc, but the encoding -(skipper.minDocID(0) - 1) - 1 seems unnecessarily complex. When minDocID(0) > doc, we're before the skipper's range, so we should simply not skip.

Also, line 308 checks doc > skipper.maxDocID(0) but advanceSkipper is called when doc >= upToInclusive (line 365). This asymmetry might cause issues at boundaries.

Please verify the boundary conditions are correct by testing edge cases where:

doc equals maxDocID(0) exactly

doc is just before minDocID(0)

doc is between skip intervals

369-375: Sub-collector is invoked before skip check - verify this is intentional.

The sub-collector is called (line 370) before checking if we can skip (line 373). This means sub-aggregations process every document regardless of whether the min aggregator can skip it.

If this is intentional (sub-aggs need all docs even when min can skip), add a comment. If not, the sub-collector call should be moved after the skip check, or the skip should only apply to the min value update.

Is it correct that sub-aggregations see all documents even when the min aggregator can skip them? This may be the intended behavior for correctness, but please confirm.

230-247: Helper methods for array growth are well implemented.

The growMins and growSkipUpTo methods correctly handle lazy growth of the arrays with proper initialization of new elements. The pattern matches the existing codebase style.

416-419: Map.of() requires Java 9+; verify this aligns with the target OpenSearch version.

OpenSearch 2.x requires Java 11 minimum (compatible with 11, 17, and 21+), making Map.of() safe. However, if this change targets OpenSearch 1.x branches which support Java 8, this syntax is incompatible and should be replaced with Collections.singletonMap() or alternative approaches.

coderabbitai · 2025-12-05T18:56:40Z

CHANGELOG.md

 - Bump opensearch-protobufs dependency to 0.24.0 and update transport-grpc module compatibility ([#20059](https://github.com/opensearch-project/OpenSearch/pull/20059))
-
 - Refactor the ShardStats, WarmerStats and IndexingPressureStats class to use the Builder pattern instead of constructors ([#19966](https://github.com/opensearch-project/OpenSearch/pull/19966))
+- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))


⚠️ Potential issue | 🟡 Minor

Changelog entry references wrong PR and missing min aggregation entry.

The changelog entry references PR #20057, but this is PR #20175. Additionally, based on the PR title "Add skiplist optimization to min aggregation", there should be a separate changelog entry for the min aggregation skiplist optimization.

-- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057)) +- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057)) +- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))

- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))

- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))

🤖 Prompt for AI Agents

In CHANGELOG.md around line 74, the entry incorrectly references PR #20057 and is missing the separate min aggregation entry; update the PR number to #20175 for the existing line and add a new changelog line noting "Add skiplist optimization to min aggregation (PR #20175)" or similar, ensuring both auto_date_histogram and min aggregation changes are represented with correct PR reference and consistent formatting to match surrounding entries.

coderabbitai · 2025-12-05T18:56:40Z

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java

+            this.skipUpTo = bigArrays.newLongArray(1, false);
+            this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
+            return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
+        }


⚠️ Potential issue | 🟠 Major

Memory leak: skipUpTo array is reallocated on every leaf without releasing previous allocation.

The skipUpTo array is created fresh for each leaf collector but never released until doClose(). If processing multiple leaf segments, each call to getLeafCollector() allocates a new LongArray, leaking the previous one.

Consider either:

Allocating once in the constructor and resetting per-leaf

Releasing the old array before allocating a new one

// Use skiplist collector if conditions are met if (skipper != null) { skipListCollectorsUsed++; + if (this.skipUpTo != null) { + Releasables.close(this.skipUpTo); + } this.skipUpTo = bigArrays.newLongArray(1, false); this.skipUpTo.fill(0, this.skipUpTo.size(), -1); return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

this.skipUpTo = bigArrays.newLongArray(1, false);

this.skipUpTo.fill(0, this.skipUpTo.size(), -1);

return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);

}

if (skipper != null) {

skipListCollectorsUsed++;

if (this.skipUpTo != null) {

Releasables.close(this.skipUpTo);

}

this.skipUpTo = bigArrays.newLongArray(1, false);

this.skipUpTo.fill(0, this.skipUpTo.size(), -1);

return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);

}

🤖 Prompt for AI Agents

In server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java around lines 183 to 186, skipUpTo is reallocated on every call to getLeafCollector() causing a memory leak because previous LongArray instances are not released; fix by either allocating skipUpTo once in the MinAggregator constructor and reusing/resetting its values before returning each leaf collector (call fill or set sentinel values) or, if per-leaf allocation is required, release the previous LongArray by calling its close/release method (or bigArrays.releaseRef/appropriate deallocation API) before assigning a new one, and ensure doClose() still closes the final array.

coderabbitai · 2025-12-05T18:56:40Z

server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java

+            // TODO: check if it should be doc >= or doc >
+            if (doc >= upToInclusive) {
+                advanceSkipper(doc, owningBucketOrd);
+                skipUpToValue = skipUpTo.get(owningBucketOrd);
+            }
+            if (!isSubNoOp) {
+                sub.collect(doc, owningBucketOrd);
+            }
+            // If skipUpTo >= 0, we can skip this document
+            if (skipUpToValue >= 0) {
+                return;
+            }


⚠️ Potential issue | 🟠 Major

TODO comment and potential off-by-one error needs resolution.

The TODO at line 364 indicates uncertainty about whether the comparison should be >= or >. This needs to be resolved before merging.

Based on the logic:

upToInclusive represents the last doc ID that can be skipped (inclusive)

When doc >= upToInclusive, we need to re-evaluate the skip range

The current >= seems incorrect. If upToInclusive is the max doc that CAN be skipped, then:

doc == upToInclusive means doc is still in the skippable range

Only doc > upToInclusive should trigger re-evaluation

- // TODO: check if it should be doc >= or doc > - if (doc >= upToInclusive) { + if (doc > upToInclusive) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// TODO: check if it should be doc >= or doc >

if (doc >= upToInclusive) {

advanceSkipper(doc, owningBucketOrd);

skipUpToValue = skipUpTo.get(owningBucketOrd);

}

if (!isSubNoOp) {

sub.collect(doc, owningBucketOrd);

}

// If skipUpTo >= 0, we can skip this document

if (skipUpToValue >= 0) {

return;

}

if (doc > upToInclusive) {

advanceSkipper(doc, owningBucketOrd);

skipUpToValue = skipUpTo.get(owningBucketOrd);

}

if (!isSubNoOp) {

sub.collect(doc, owningBucketOrd);

}

// If skipUpTo >= 0, we can skip this document

if (skipUpToValue >= 0) {

return;

}

🤖 Prompt for AI Agents

In server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java around lines 364 to 375, the comparison using `doc >= upToInclusive` is incorrect given `upToInclusive` is the last doc ID that may be skipped; change the condition to `if (doc > upToInclusive) { advanceSkipper(doc, owningBucketOrd); skipUpToValue = skipUpTo.get(owningBucketOrd); }`, remove the TODO, and keep the rest of the logic (sub.collect and the subsequent skipUpToValue check) unchanged so that documents equal to upToInclusive are still considered skippable and only docs greater than it trigger re-evaluation.

github-actions · 2025-12-05T19:05:29Z

❌ Gradle check result for e4d5427: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

asimmahmood1 · 2025-12-05T19:13:46Z

big5 validation

 curl -XGET 'http://localhost:9200/big5/_search' -H 'Content-Type: application/json' -d '
{
  "size": 0,
  "query": {
     "bool": {
      "filter": [
        {
          "term": {
            "process.name": "systemd"
          }
        }
      ]
    }
  },
  "aggs": {
    "tmax": {
      "range": {
        "field": "metrics.size",
        "ranges": [
          {
            "to": -10
          },
          {
            "from": -10,
            "to": 10
          },
          {
            "from": 10,
            "to": 100
          },
          {
            "from": 100,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 2000
          },
          {
            "from": 2000
          }
        ]
      },
      "aggs": {
        "by_hour": {
          "date_histogram": {
            "field": "@timestamp",
            "calendar_interval": "day"
          },
          "aggs": {
            "tmin": {
              "min": {
                "field": "metrics.tmin"
              }
            }
          }
        }
      }
    }
  },"profile":"false"
}'

diff 18_range_date_histo_with_min_and_filter_baseline.sh 18_range_date_histo_with_min_and_filter_candidate.sh
2c2
<   "took": 378,
---
>   "took": 333,

asimmahmood1 added 10 commits December 1, 2025 23:13

Initial work to get auto date histogram to use skiplist

3079202

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Javadoc

ac85f8c

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Merge main

5f99c9d

Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Conflicts: # server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java

Change log

f6b756c

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Merge remote-tracking branch 'upstream/main' into auto-date-skiplist

a7acf06

Minor changes

44bbcb3

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Initial attempt to use skiplist optimization in min agg

8e0939a

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Fixed the leaking big arrays

1041f8c

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

Simpler test case for min as sub agg

e4d5427

Signed-off-by: Asim Mahmood <asim.seng@gmail.com>

asimmahmood1 self-assigned this Dec 5, 2025

asimmahmood1 added the v3.5.0 Issues and PRs related to version 3.4.0 label Dec 5, 2025

asimmahmood1 added this to Performance Roadmap Dec 5, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Dec 5, 2025

asimmahmood1 added Search:Aggregations Performance This is for any performance related enhancements or bugs Search:Performance labels Dec 5, 2025

asimmahmood1 mentioned this pull request Dec 5, 2025

Add skiplist optimization to auto_date_histogram aggregation #20057

Open

1 task

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175

Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175

asimmahmood1 commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

asimmahmood1 commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 5, 2025

Uh oh!

coderabbitai bot Dec 5, 2025

Uh oh!

coderabbitai bot Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

asimmahmood1 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
	- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
	- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))

Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175

Are you sure you want to change the base?

Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175

Conversation

asimmahmood1 commented Dec 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

asimmahmood1 commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

asimmahmood1 commented Dec 5, 2025

big5 validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asimmahmood1 commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading