-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram field type support for ValueCount and Avg aggregations #55933
Merged
Merged
Changes from 6 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
bb75533
Moved histogram agg tests to correct test package
csoulios dbdb7f8
Cleaned up histo sum agg test
csoulios 2671d5c
Added histogram backed average aggregation
csoulios 89e6d94
Merge branch 'master' into histo-value-count-avg
csoulios efeab0f
Enabled testNoDocs() tests
csoulios d18029e
Added documentation
csoulios 4868f53
Small doc changes
csoulios File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
124 changes: 124 additions & 0 deletions
124
...java/org/elasticsearch/xpack/analytics/aggregations/metrics/HistoBackedAvgAggregator.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License; | ||
* you may not use this file except in compliance with the Elastic License. | ||
*/ | ||
package org.elasticsearch.xpack.analytics.aggregations.metrics; | ||
|
||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.search.ScoreMode; | ||
import org.elasticsearch.common.lease.Releasables; | ||
import org.elasticsearch.common.util.BigArrays; | ||
import org.elasticsearch.common.util.DoubleArray; | ||
import org.elasticsearch.common.util.LongArray; | ||
import org.elasticsearch.index.fielddata.HistogramValue; | ||
import org.elasticsearch.index.fielddata.HistogramValues; | ||
import org.elasticsearch.search.DocValueFormat; | ||
import org.elasticsearch.search.aggregations.Aggregator; | ||
import org.elasticsearch.search.aggregations.InternalAggregation; | ||
import org.elasticsearch.search.aggregations.LeafBucketCollector; | ||
import org.elasticsearch.search.aggregations.LeafBucketCollectorBase; | ||
import org.elasticsearch.search.aggregations.metrics.CompensatedSum; | ||
import org.elasticsearch.search.aggregations.metrics.InternalAvg; | ||
import org.elasticsearch.search.aggregations.metrics.NumericMetricsAggregator; | ||
import org.elasticsearch.search.internal.SearchContext; | ||
import org.elasticsearch.xpack.analytics.aggregations.support.HistogramValuesSource; | ||
|
||
import java.io.IOException; | ||
import java.util.Map; | ||
|
||
/** | ||
* Average aggregator operating over histogram datatypes {@link HistogramValuesSource} | ||
* The aggregation computes weighted average by taking counts into consideration for each value | ||
*/ | ||
class HistoBackedAvgAggregator extends NumericMetricsAggregator.SingleValue { | ||
|
||
private final HistogramValuesSource.Histogram valuesSource; | ||
|
||
LongArray counts; | ||
DoubleArray sums; | ||
DoubleArray compensations; | ||
DocValueFormat format; | ||
|
||
HistoBackedAvgAggregator(String name, HistogramValuesSource.Histogram valuesSource, DocValueFormat formatter, SearchContext context, | ||
Aggregator parent, Map<String, Object> metadata) throws IOException { | ||
super(name, context, parent, metadata); | ||
this.valuesSource = valuesSource; | ||
this.format = formatter; | ||
if (valuesSource != null) { | ||
final BigArrays bigArrays = context.bigArrays(); | ||
counts = bigArrays.newLongArray(1, true); | ||
sums = bigArrays.newDoubleArray(1, true); | ||
compensations = bigArrays.newDoubleArray(1, true); | ||
} | ||
} | ||
|
||
@Override | ||
public ScoreMode scoreMode() { | ||
return valuesSource != null && valuesSource.needsScores() ? ScoreMode.COMPLETE : ScoreMode.COMPLETE_NO_SCORES; | ||
} | ||
|
||
@Override | ||
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, | ||
final LeafBucketCollector sub) throws IOException { | ||
if (valuesSource == null) { | ||
return LeafBucketCollector.NO_OP_COLLECTOR; | ||
} | ||
final BigArrays bigArrays = context.bigArrays(); | ||
final HistogramValues values = valuesSource.getHistogramValues(ctx); | ||
final CompensatedSum kahanSummation = new CompensatedSum(0, 0); | ||
|
||
return new LeafBucketCollectorBase(sub, values) { | ||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
counts = bigArrays.grow(counts, bucket + 1); | ||
sums = bigArrays.grow(sums, bucket + 1); | ||
compensations = bigArrays.grow(compensations, bucket + 1); | ||
|
||
if (values.advanceExact(doc)) { | ||
final HistogramValue sketch = values.histogram(); | ||
|
||
// Compute the sum of double values with Kahan summation algorithm which is more accurate than naive summation | ||
final double sum = sums.get(bucket); | ||
final double compensation = compensations.get(bucket); | ||
kahanSummation.reset(sum, compensation); | ||
while (sketch.next()) { | ||
double d = sketch.value() * sketch.count(); | ||
kahanSummation.add(d); | ||
counts.increment(bucket, sketch.count()); | ||
} | ||
|
||
sums.set(bucket, kahanSummation.value()); | ||
compensations.set(bucket, kahanSummation.delta()); | ||
} | ||
} | ||
}; | ||
} | ||
|
||
@Override | ||
public double metric(long owningBucketOrd) { | ||
if (valuesSource == null || owningBucketOrd >= sums.size()) { | ||
return Double.NaN; | ||
} | ||
return sums.get(owningBucketOrd) / counts.get(owningBucketOrd); | ||
} | ||
|
||
@Override | ||
public InternalAggregation buildAggregation(long bucket) { | ||
if (valuesSource == null || bucket >= sums.size()) { | ||
return buildEmptyAggregation(); | ||
} | ||
return new InternalAvg(name, sums.get(bucket), counts.get(bucket), format, metadata()); | ||
} | ||
|
||
@Override | ||
public InternalAggregation buildEmptyAggregation() { | ||
return new InternalAvg(name, 0.0, 0L, format, metadata()); | ||
} | ||
|
||
@Override | ||
public void doClose() { | ||
Releasables.close(counts, sums, compensations); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@not-napoleon and I've been favoring changing the ctor so you can just to
(MetricAggregatorSupplier) HistoBackedSumAggregator::new
. It is less "big".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MetricAggregatorSupplier) HistoBackedSumAggregator::new
is definitely better looking and I would prefer it as well.The problem is that
MetricAggregatorSupplier
passes aValuesSource
argumentwhile
HistoBackedSumAggregator
constructor expects aHistogramValuesSource.Histogram
So I had 3 options:
ValuesSource
parameter in theHistoBackedSumAggregator
constructor and do a type cast later when I would use theHistogramValuesSource.Histogram
object.HistoBackedMetricAggregatorSupplier
that would match the arguments of theHistoBackedSumAggregator
ValuesSource
toHistogramValuesSource.Histogram
in the registrar method changing the admittedly prettier(MetricAggregatorSupplier) HistoBackedSumAggregator::new
to the longer lambda expression.I chose the third option because it seemed cleaner to do the type cast at the registrar method where it is obvious that the VS is a histogram. On the other hand, I didn't want to create yet another
AggregatorSupplier
sub-classThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been favoring having generic
ValuesSource
in the supplier and doing the cast in the aggregator constructor instead. Really the suppliers shouldn't care about the ValuesSource subclass, since the whole idea is to be able to add new ones in, and having to subclass a supplier to do that is a lot of boiler plate. I know I haven't been perfectly consistent about that so far, but I'm working on cleaning it up as I can.