-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add skiplist optimization to min aggregation, with multiple owningBucketOrd #20175
Conversation
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
…by filter rewrite * TODO: * add more test cases, currently does not assert that skiplist was used * add auto date benchmark to http_logs * verify existing benchmarks have no issue Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com> # Conflicts: # server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
Signed-off-by: Asim Mahmood <asim.seng@gmail.com>
WalkthroughThis PR adds skiplist optimization to date histogram and min aggregations. Changes include dynamic rounding support in HistogramSkiplistLeafCollector via supplier-based accessors, integration of DocValuesSkipper into AutoDateHistogramAggregator and MinAggregator leaf collectors, refactored skiplist eligibility checks, profiling updates, and comprehensive test coverage for rounding change scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Agg as AutoDateHistogramAggregator
participant HC as getLeafCollector
participant SKL as HistogramSkiplistLeafCollector
participant Skipper as DocValuesSkipper
participant Rounding as Rounding.Prepared
Agg->>HC: getLeafCollector(values, skipper, sub)
HC->>HC: Check if skiplist can be used
alt Skiplist Available
HC->>SKL: new HistogramSkiplistLeafCollector<br/>(with suppliers)
SKL->>Skipper: advance(doc)
Skipper-->>SKL: skip range
SKL->>Rounding: get current rounding<br/>(via supplier)
alt Rounding Changed
Rounding-->>SKL: new rounding level
SKL->>SKL: invalidate cache
SKL->>HC: IncreaseRoundingIfNeeded<br/>.increaseRounding()
HC->>HC: handle rounding update
end
SKL->>SKL: bucket document
SKL-->>HC: return
else Fallback
HC->>HC: use standard collector
end
HC-->>Agg: LeafBucketCollector
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
Hello! |
1 similar comment
|
Hello! |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (7)
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)
235-246: Consider splitting conditions for readability.The early return on line 236 combines two null checks on one line. For consistency with the style on lines 238 and 240, consider splitting:
- if (skipper == null || singleton == null) return false; + if (skipper == null || singleton == null) { + return false; + }
222-229: Consider adding @FunctionalInterface annotation.The
IncreaseRoundingIfNeededinterface is designed as a functional interface. Adding the annotation would provide compile-time verification and document intent:+ @FunctionalInterface public interface IncreaseRoundingIfNeeded { void accept(long owningBucket, long rounded); }server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (3)
35-35: Unused import:Lucene90DocValuesFormatThis import is not used anywhere in the file and should be removed.
-import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat;
1017-1029: Inconsistent indentation in category "c" loop.The loop for category "c" has extra indentation compared to categories "a" and "b".
// Category "c": values 100, 200 -> min should be 100 - for (int i = 0; i < 4096 * 2; i++) { - Document doc7 = new Document(); - doc7.add(new StringField(categoryField, "c", Field.Store.NO)); - doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c"))); - doc7.add(SortedNumericDocValuesField.indexedField("value", 100)); - iw.addDocument(doc7); - - Document doc8 = new Document(); - doc8.add(new StringField(categoryField, "c", Field.Store.NO)); - doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c"))); - doc8.add(SortedNumericDocValuesField.indexedField("value", 200)); - iw.addDocument(doc8); - } + for (int i = 0; i < 4096 * 2; i++) { + Document doc7 = new Document(); + doc7.add(new StringField(categoryField, "c", Field.Store.NO)); + doc7.add(new SortedSetDocValuesField(categoryField, new BytesRef("c"))); + doc7.add(SortedNumericDocValuesField.indexedField("value", 100)); + iw.addDocument(doc7); + + Document doc8 = new Document(); + doc8.add(new StringField(categoryField, "c", Field.Store.NO)); + doc8.add(new SortedSetDocValuesField(categoryField, new BytesRef("c"))); + doc8.add(SortedNumericDocValuesField.indexedField("value", 200)); + iw.addDocument(doc8); + }
976-1029: Consider reducing dataset size or extracting helper for document creation.The test creates 8192 documents per category (24,576 total). While this ensures skiplist structures are populated, consider:
- Adding a comment explaining why this size is needed (e.g., to trigger skiplist block creation)
- Extracting document creation into a helper to reduce code duplication
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (2)
336-337: Float conversion may be incorrect for integer types.When
isFloatis false, the code treats the value as a raw long. However, for integer types (INT, SHORT, BYTE), the skiplist stores sortable longs, which for these types should just be the value itself. The logic appears correct, but:
- For DOUBLE/FLOAT:
NumericUtils.sortableLongToDoubleis correct- For LONG: Raw long value is correct
- For INT/SHORT/BYTE: Should also work since they're stored as longs
However, the variable name
sortableLongToDoublesuggests float conversion, but you're assigning to adoubleeven for integer path. This works due to implicit conversion but could be clearer.Consider renaming for clarity:
- double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : sortableLong; + double skipperMin = isFloat ? NumericUtils.sortableLongToDouble(sortableLong) : (double) sortableLong;
256-294: MinSkiplistLeafCollector stores references that can become stale.The collector stores references to
minsandskipUpToarrays, but these are reassigned viaminAgg.growMins()andminAgg.growSkipUpTo()during collection. The code correctly re-fetches these references inadvanceSkipperandcollect, but the instance fields may hold stale references between calls.Consider either:
- Not storing these as instance fields and always fetching from
minAgg- Adding a comment explaining the refresh pattern
The current implementation works correctly due to the refresh logic, but the pattern is subtle.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java(7 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java(16 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java(8 hunks)server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java(2 hunks)server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java(3 hunks)server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java(4 hunks)server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)
server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java (1)
NumberFieldMapper(102-2185)
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)
server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java (1)
AggregatorBase(60-365)server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
LeafBucketCollector(51-179)
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)
server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (1)
LeafBucketCollector(51-179)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (24)
server/src/test/java/org/opensearch/search/aggregations/bucket/filterrewrite/FilterRewriteSubAggTests.java (2)
284-298: Excellent test documentation!The detailed documentation block clearly explains the test's purpose, the specific code paths being exercised, and the test setup requirements. This significantly improves maintainability and helps future developers understand the skiplist verification logic.
341-341: Great improvement to assertion clarity!Adding descriptive failure messages to
assertNotNullassertions is a testing best practice. These messages clearly indicate what should be present and link the assertion to the verification ofSubAggRangeCollector.collect()execution, making test failures much easier to diagnose.Also applies to: 349-349, 358-358
server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (2)
41-48: Supplier-based rounding design enables dynamic rounding changes.The introduction of
LongFunction<Rounding.Prepared>andSupplier<LongKeyedBucketOrds>allows AutoDateHistogramAggregator to dynamically change rounding during collection. This design correctly decouples the skiplist collector from the specific rounding instance.
150-181: Rounding change detection relies on reference equality.The detection at line 156 using
currentRounding != lastPreparedRoundingdepends on AutoDateHistogramAggregator creating newRounding.Preparedinstances when rounding changes. This is documented but creates a subtle contract between the two classes.Consider adding a unit test that specifically validates this behavior to prevent future regressions if the rounding instance reuse pattern changes.
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (6)
166-168: fieldName extraction handles null valuesSource correctly.The ternary check on valuesSource type correctly handles the case where valuesSource is not FieldData, setting fieldName to null. This null is then checked before calling
getDocValuesSkipper.
278-299: Skipper retrieval centralized in base getLeafCollector.Good refactoring to handle DocValuesSkipper retrieval once in the base class method and pass it to subclass implementations. The delegation pattern through
iteratingCollectorproperly routes all collection methods.
425-442: FromSingle skiplist collector correctly integrated.The skiplist collector integration is well-structured. Note that line 440 ignores
owningBucketsince FromSingle always operates on bucket 0, which is consistent with the assertion at line 448.
706-737: FromMany skiplist collector handles per-bucket rounding correctly.The implementation properly uses
roundingIndexFor(owningBucketOrd)to retrieve the correct prepared rounding for each owning bucket. The explanatory comment (lines 712-720) clearly documents the current limitation and the special case where skiplist can be used.
491-545: increaseRoundingIfNeeded method well-documented.The extensive comments (lines 507-527) clearly explain the two heuristics for determining when to increase rounding granularity. This documentation helps maintainability of the algorithm.
787-838: FromMany increaseRoundingIfNeeded handles per-bucket state.The method correctly manages per-bucket min/max tracking by growing the arrays and initializing new slots appropriately (Long.MAX_VALUE for mins, Long.MIN_VALUE for maxes at lines 799 and 804).
server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (1)
235-238: Skiplist eligibility check centralized to HistogramSkiplistLeafCollector.Good refactoring to delegate the skiplist eligibility decision to
HistogramSkiplistLeafCollector.canUseSkiplist. This centralizes the logic and ensures consistent behavior across DateHistogramAggregator and AutoDateHistogramAggregator.server/src/main/java/org/opensearch/search/profile/aggregation/ProfilingLeafBucketCollector.java (2)
68-86: Profiling support added for new collection methods.The new overrides correctly follow the existing timing pattern with
try-finallyto ensure the timer stops even if an exception occurs. This ensures skiplist-enabled collection paths are properly profiled.
35-37: Imports support the new method overrides.The
DocIdStreamimport is required for the newcollect(DocIdStream, long)method signature. TheExperimentalApiannotation is inherited from the base class method.server/src/test/java/org/opensearch/search/aggregations/metrics/MinAggregatorTests.java (1)
849-961: Property test looks comprehensive for skiplist equivalence validation.The test properly validates that skiplist and standard collectors produce identical results across randomized scenarios including negative values and missing values. The 100 iterations provide good coverage.
A few observations:
- The
actualMincalculation on line 879 may not match either result when all filter values are "b" (no documents match filter "a"), but the assertion comparesstandardResultvsskiplistResultwhich is correct.- Consider reducing iterations in CI if test runtime becomes a concern.
server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java (6)
98-98: RenamingDATE_FIELDto@timestampis intentional for skiplist enablement.The change aligns the test field name with the skiplist-enabled path in the histogram implementation. This is a reasonable choice for testing the optimization.
984-1029: Clean overload pattern for skiplist-enabled testing.The new overloads properly propagate the
enableSkiplistflag while maintaining backward compatibility with existing tests. The default behavior (skiplist disabled) is preserved.
1035-1052: LGTM - Skiplist-enabled indexing logic is correct.The conditional use of
SortedNumericDocValuesField.indexedField()vs standardSortedNumericDocValuesFieldcorrectly enables/disables skiplist-based doc values.
1083-1118: Test correctly validates skiplist collector behavior during rounding changes.The test creates a dataset that spans hourly to monthly data, which triggers rounding increases. The assertions properly verify:
- Aggregation completes successfully
- Bucket count respects numBuckets limit
- Total doc count is preserved (24 + 30 = 54)
- Buckets are in ascending order
1127-1167: Test correctly validates sub-aggregation behavior during rounding changes.The test verifies that stats sub-aggregations remain valid after rounding changes. The doc count assertion (12 * 3 = 36) is correct.
1176-1215: Test correctly validates bucket merging after rounding changes.The expected doc counts are correct:
- 5 days × 24 hours = 120 hourly docs
- 5 years × 4 quarters = 20 quarterly docs
- Total = 140 docs
The expected bucket distribution (124, 4, 4, 4, 4) correctly accounts for all 120 hourly docs plus 4 quarterly docs falling into the 2020 bucket.
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (4)
307-323: Corner case handling when doc is between skip intervals may be incorrect.The logic at lines 315-322 handles the case where
skipper.minDocID(0) > doc, but the encoding-(skipper.minDocID(0) - 1) - 1seems unnecessarily complex. WhenminDocID(0) > doc, we're before the skipper's range, so we should simply not skip.Also, line 308 checks
doc > skipper.maxDocID(0)butadvanceSkipperis called whendoc >= upToInclusive(line 365). This asymmetry might cause issues at boundaries.Please verify the boundary conditions are correct by testing edge cases where:
- doc equals maxDocID(0) exactly
- doc is just before minDocID(0)
- doc is between skip intervals
369-375: Sub-collector is invoked before skip check - verify this is intentional.The sub-collector is called (line 370) before checking if we can skip (line 373). This means sub-aggregations process every document regardless of whether the min aggregator can skip it.
If this is intentional (sub-aggs need all docs even when min can skip), add a comment. If not, the sub-collector call should be moved after the skip check, or the skip should only apply to the min value update.
Is it correct that sub-aggregations see all documents even when the min aggregator can skip them? This may be the intended behavior for correctness, but please confirm.
230-247: Helper methods for array growth are well implemented.The
growMinsandgrowSkipUpTomethods correctly handle lazy growth of the arrays with proper initialization of new elements. The pattern matches the existing codebase style.
416-419: Map.of() requires Java 9+; verify this aligns with the target OpenSearch version.OpenSearch 2.x requires Java 11 minimum (compatible with 11, 17, and 21+), making
Map.of()safe. However, if this change targets OpenSearch 1.x branches which support Java 8, this syntax is incompatible and should be replaced withCollections.singletonMap()or alternative approaches.
| - Bump opensearch-protobufs dependency to 0.24.0 and update transport-grpc module compatibility ([#20059](https://github.com/opensearch-project/OpenSearch/pull/20059)) | ||
|
|
||
| - Refactor the ShardStats, WarmerStats and IndexingPressureStats class to use the Builder pattern instead of constructors ([#19966](https://github.com/opensearch-project/OpenSearch/pull/19966)) | ||
| - Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog entry references wrong PR and missing min aggregation entry.
The changelog entry references PR #20057, but this is PR #20175. Additionally, based on the PR title "Add skiplist optimization to min aggregation", there should be a separate changelog entry for the min aggregation skiplist optimization.
-- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
+- Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057))
+- Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057)) | |
| - Add skiplist optimization to auto_date_histogram aggregation ([#20057](https://github.com/opensearch-project/OpenSearch/pull/20057)) | |
| - Add skiplist optimization to min aggregation ([#20175](https://github.com/opensearch-project/OpenSearch/pull/20175)) |
🤖 Prompt for AI Agents
In CHANGELOG.md around line 74, the entry incorrectly references PR #20057 and
is missing the separate min aggregation entry; update the PR number to #20175
for the existing line and add a new changelog line noting "Add skiplist
optimization to min aggregation (PR #20175)" or similar, ensuring both
auto_date_histogram and min aggregation changes are represented with correct PR
reference and consistent formatting to match surrounding entries.
| this.skipUpTo = bigArrays.newLongArray(1, false); | ||
| this.skipUpTo.fill(0, this.skipUpTo.size(), -1); | ||
| return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak: skipUpTo array is reallocated on every leaf without releasing previous allocation.
The skipUpTo array is created fresh for each leaf collector but never released until doClose(). If processing multiple leaf segments, each call to getLeafCollector() allocates a new LongArray, leaking the previous one.
Consider either:
- Allocating once in the constructor and resetting per-leaf
- Releasing the old array before allocating a new one
// Use skiplist collector if conditions are met
if (skipper != null) {
skipListCollectorsUsed++;
+ if (this.skipUpTo != null) {
+ Releasables.close(this.skipUpTo);
+ }
this.skipUpTo = bigArrays.newLongArray(1, false);
this.skipUpTo.fill(0, this.skipUpTo.size(), -1);
return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| this.skipUpTo = bigArrays.newLongArray(1, false); | |
| this.skipUpTo.fill(0, this.skipUpTo.size(), -1); | |
| return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub); | |
| } | |
| if (skipper != null) { | |
| skipListCollectorsUsed++; | |
| if (this.skipUpTo != null) { | |
| Releasables.close(this.skipUpTo); | |
| } | |
| this.skipUpTo = bigArrays.newLongArray(1, false); | |
| this.skipUpTo.fill(0, this.skipUpTo.size(), -1); | |
| return new MinSkiplistLeafCollector(values, skipper, mins, fieldIsFloat, MinAggregator.this, sub); | |
| } |
🤖 Prompt for AI Agents
In
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
around lines 183 to 186, skipUpTo is reallocated on every call to
getLeafCollector() causing a memory leak because previous LongArray instances
are not released; fix by either allocating skipUpTo once in the MinAggregator
constructor and reusing/resetting its values before returning each leaf
collector (call fill or set sentinel values) or, if per-leaf allocation is
required, release the previous LongArray by calling its close/release method (or
bigArrays.releaseRef/appropriate deallocation API) before assigning a new one,
and ensure doClose() still closes the final array.
| // TODO: check if it should be doc >= or doc > | ||
| if (doc >= upToInclusive) { | ||
| advanceSkipper(doc, owningBucketOrd); | ||
| skipUpToValue = skipUpTo.get(owningBucketOrd); | ||
| } | ||
| if (!isSubNoOp) { | ||
| sub.collect(doc, owningBucketOrd); | ||
| } | ||
| // If skipUpTo >= 0, we can skip this document | ||
| if (skipUpToValue >= 0) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO comment and potential off-by-one error needs resolution.
The TODO at line 364 indicates uncertainty about whether the comparison should be >= or >. This needs to be resolved before merging.
Based on the logic:
upToInclusiverepresents the last doc ID that can be skipped (inclusive)- When
doc >= upToInclusive, we need to re-evaluate the skip range
The current >= seems incorrect. If upToInclusive is the max doc that CAN be skipped, then:
doc == upToInclusivemeans doc is still in the skippable range- Only
doc > upToInclusiveshould trigger re-evaluation
- // TODO: check if it should be doc >= or doc >
- if (doc >= upToInclusive) {
+ if (doc > upToInclusive) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // TODO: check if it should be doc >= or doc > | |
| if (doc >= upToInclusive) { | |
| advanceSkipper(doc, owningBucketOrd); | |
| skipUpToValue = skipUpTo.get(owningBucketOrd); | |
| } | |
| if (!isSubNoOp) { | |
| sub.collect(doc, owningBucketOrd); | |
| } | |
| // If skipUpTo >= 0, we can skip this document | |
| if (skipUpToValue >= 0) { | |
| return; | |
| } | |
| if (doc > upToInclusive) { | |
| advanceSkipper(doc, owningBucketOrd); | |
| skipUpToValue = skipUpTo.get(owningBucketOrd); | |
| } | |
| if (!isSubNoOp) { | |
| sub.collect(doc, owningBucketOrd); | |
| } | |
| // If skipUpTo >= 0, we can skip this document | |
| if (skipUpToValue >= 0) { | |
| return; | |
| } |
🤖 Prompt for AI Agents
In
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
around lines 364 to 375, the comparison using `doc >= upToInclusive` is
incorrect given `upToInclusive` is the last doc ID that may be skipped; change
the condition to `if (doc > upToInclusive) { advanceSkipper(doc,
owningBucketOrd); skipUpToValue = skipUpTo.get(owningBucketOrd); }`, remove the
TODO, and keep the rest of the logic (sub.collect and the subsequent
skipUpToValue check) unchanged so that documents equal to upToInclusive are
still considered skippable and only docs greater than it trigger re-evaluation.
|
❌ Gradle check result for e4d5427: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
big5 validation |
Description
This is a follow up to #20057
Compared to the date and auto date histogram, this version handles multiple owning bucket ord correctly but keep track of upToInclusive per bucket. We can go back and apply the same logic to date and auto date.
Same logic can be applied to max agg, currently max is written as complete copy of min, will try to see if it can be refactored.
Currently skiplist only keeps track of long values, so for e.g. double type, we need to convert it to sortable long. Given currently logic, it'll only handle basic numeric types but not half-float.
This also relies on skiplist mapping to be explicitly enabled, unlike
@timestampfield. So that is something can be part of 3.5 to enable skiplist by default.TODO:
Related Issues
Part of #18882
Part of #19384
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.