Skip to content

Commit

Permalink
Histogram field type support for Sum aggregation (#55681)
Browse files Browse the repository at this point in the history
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
  • Loading branch information
csoulios authored Apr 29, 2020
1 parent 0666be5 commit cefc6af
Show file tree
Hide file tree
Showing 13 changed files with 440 additions and 44 deletions.
62 changes: 58 additions & 4 deletions docs/reference/aggregations/metrics/sum-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[[search-aggregations-metrics-sum-aggregation]]
=== Sum Aggregation

A `single-value` metrics aggregation that sums up numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
A `single-value` metrics aggregation that sums up numeric values that are extracted from the aggregated documents.
These values can be extracted either from specific numeric or <<histogram,histogram>> fields in the documents,
or be generated by a provided script.

Assuming the data consists of documents representing sales records we can sum
the sale price of all hats with:
Expand Down Expand Up @@ -30,9 +32,9 @@ Resulting in:
--------------------------------------------------
{
...
"aggregations": {
"hat_prices": {
"value": 450.0
"aggregations" : {
"hat_prices" : {
"value" : 450.0
}
}
}
Expand Down Expand Up @@ -157,3 +159,55 @@ POST /sales/_search?size=0
}
--------------------------------------------------
// TEST[setup:sales]

[[search-aggregations-metrics-sum-aggregation-histogram-fields]]
==== Histogram fields

When the sums are computed on <<histogram,histogram fields>>, the result of the aggregation is the sum of all elements in the `values`
array multiplied by the number in the same position in the `counts` array.

For example, if we have the following index that stores pre-aggregated histograms with latency metrics for different networks:

[source,console]
--------------------------------------------------
PUT metrics_index/_doc/1
{
"network.name" : "net-1",
"latency_histo" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
"counts" : [3, 7, 23, 12, 6] <2>
}
}
PUT metrics_index/_doc/2
{
"network.name" : "net-2",
"latency_histo" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
"counts" : [8, 17, 8, 7, 6] <2>
}
}
POST /metrics_index/_search?size=0
{
"aggs" : {
"total_latency" : { "sum" : { "field" : "latency_histo" } }
}
}
--------------------------------------------------

For each histogram field the sum aggregation will multiply each number in the `values` array <1> multiplied with its associated count
in the `counts` array <2>. Eventually, it will add all values for all histograms and return the following result:

[source,console-result]
--------------------------------------------------
{
...
"aggregations" : {
"total_latency" : {
"value" : 28.8
}
}
}
--------------------------------------------------
// TESTRESPONSE[skip:test not setup]
10 changes: 4 additions & 6 deletions docs/reference/mapping/types/histogram.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ binary <<doc-values,doc values>> and not indexed. Its size in bytes is at most
Because the data is not indexed, you only can use `histogram` fields for the
following aggregations and queries:

* <<search-aggregations-metrics-sum-aggregation-histogram-fields,sum>> aggregation
* <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation
* <<search-aggregations-metrics-percentile-rank-aggregation,percentile ranks>> aggregation
* <<search-aggregations-metrics-boxplot-aggregation,boxplot>> aggregation
Expand Down Expand Up @@ -73,9 +74,9 @@ The following <<indices-create-index, create index>> API request creates a new i
--------------------------------------------------
PUT my_index
{
"mappings": {
"properties": {
"my_histogram": {
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
Expand Down Expand Up @@ -114,6 +115,3 @@ increasing order. For <<search-aggregations-metrics-percentile-aggregation-appro
histograms this value represents the mean value. In case of HDR histograms this represents the value iterated to.
<2> Count for each bucket. Values in the arrays are treated as integers and must be positive or zero.
Negative values will be rejected. The relation between a bucket and a count is given by the position in the array.



Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
public class InternalSum extends InternalNumericMetricsAggregation.SingleValue implements Sum {
private final double sum;

InternalSum(String name, double sum, DocValueFormat formatter, Map<String, Object> metadata) {
public InternalSum(String name, double sum, DocValueFormat formatter, Map<String, Object> metadata) {
super(name, metadata);
this.sum = sum;
this.format = formatter;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
import java.io.IOException;
import java.util.Map;

class SumAggregator extends NumericMetricsAggregator.SingleValue {
public class SumAggregator extends NumericMetricsAggregator.SingleValue {

private final ValuesSource.Numeric valuesSource;
private final DocValueFormat format;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
import org.elasticsearch.xpack.analytics.action.AnalyticsInfoTransportAction;
import org.elasticsearch.xpack.analytics.action.AnalyticsUsageTransportAction;
import org.elasticsearch.xpack.analytics.action.TransportAnalyticsStatsAction;
import org.elasticsearch.xpack.analytics.aggregations.metrics.AnalyticsPercentilesAggregatorFactory;
import org.elasticsearch.xpack.analytics.aggregations.metrics.AnalyticsAggregatorFactory;
import org.elasticsearch.xpack.analytics.boxplot.BoxplotAggregationBuilder;
import org.elasticsearch.xpack.analytics.boxplot.InternalBoxplot;
import org.elasticsearch.xpack.analytics.cumulativecardinality.CumulativeCardinalityPipelineAggregationBuilder;
Expand Down Expand Up @@ -128,8 +128,11 @@ public Map<String, Mapper.TypeParser> getMappers() {

@Override
public List<Consumer<ValuesSourceRegistry.Builder>> getAggregationExtentions() {
return List.of(AnalyticsPercentilesAggregatorFactory::registerPercentilesAggregator,
AnalyticsPercentilesAggregatorFactory::registerPercentileRanksAggregator);
return List.of(
AnalyticsAggregatorFactory::registerPercentilesAggregator,
AnalyticsAggregatorFactory::registerPercentileRanksAggregator,
AnalyticsAggregatorFactory::registerHistoBackedSumAggregator
);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,18 @@

package org.elasticsearch.xpack.analytics.aggregations.metrics;

import org.elasticsearch.search.aggregations.metrics.MetricAggregatorSupplier;
import org.elasticsearch.search.aggregations.metrics.PercentileRanksAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.PercentilesAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.PercentilesAggregatorSupplier;
import org.elasticsearch.search.aggregations.metrics.PercentilesConfig;
import org.elasticsearch.search.aggregations.metrics.PercentilesMethod;
import org.elasticsearch.search.aggregations.metrics.SumAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValuesSourceRegistry;
import org.elasticsearch.xpack.analytics.aggregations.support.AnalyticsValuesSourceType;

public class AnalyticsPercentilesAggregatorFactory {
public class AnalyticsAggregatorFactory {

public static void registerPercentilesAggregator(ValuesSourceRegistry.Builder builder) {
builder.register(PercentilesAggregationBuilder.NAME,
AnalyticsValuesSourceType.HISTOGRAM,
Expand Down Expand Up @@ -58,4 +61,10 @@ public static void registerPercentileRanksAggregator(ValuesSourceRegistry.Builde
"is not compatible with Histogram field");
});
}

public static void registerHistoBackedSumAggregator(ValuesSourceRegistry.Builder builder) {
builder.register(SumAggregationBuilder.NAME,
AnalyticsValuesSourceType.HISTOGRAM,
(MetricAggregatorSupplier) HistoBackedSumAggregator::new);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/
package org.elasticsearch.xpack.analytics.aggregations.metrics;

import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.ScoreMode;
import org.elasticsearch.common.lease.Releasables;
import org.elasticsearch.common.util.BigArrays;
import org.elasticsearch.common.util.DoubleArray;
import org.elasticsearch.index.fielddata.HistogramValue;
import org.elasticsearch.index.fielddata.HistogramValues;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.InternalAggregation;
import org.elasticsearch.search.aggregations.LeafBucketCollector;
import org.elasticsearch.search.aggregations.LeafBucketCollectorBase;
import org.elasticsearch.search.aggregations.metrics.CompensatedSum;
import org.elasticsearch.search.aggregations.metrics.InternalSum;
import org.elasticsearch.search.aggregations.metrics.NumericMetricsAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;
import org.elasticsearch.xpack.analytics.aggregations.support.HistogramValuesSource;

import java.io.IOException;
import java.util.Map;

/**
* Sum aggregator operating over histogram datatypes {@link HistogramValuesSource}
*/
class HistoBackedSumAggregator extends NumericMetricsAggregator.SingleValue {

private final ValuesSource valuesSource;
private final DocValueFormat format;

private DoubleArray sums;
private DoubleArray compensations;

HistoBackedSumAggregator(String name, ValuesSource valuesSource, DocValueFormat formatter, SearchContext context,
Aggregator parent, Map<String, Object> metadata) throws IOException {
super(name, context, parent, metadata);
this.valuesSource = valuesSource;
this.format = formatter;
if (valuesSource != null) {
sums = context.bigArrays().newDoubleArray(1, true);
compensations = context.bigArrays().newDoubleArray(1, true);
}
}

@Override
public ScoreMode scoreMode() {
return valuesSource != null && valuesSource.needsScores() ? ScoreMode.COMPLETE : ScoreMode.COMPLETE_NO_SCORES;
}

@Override
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
final LeafBucketCollector sub) throws IOException {
if (valuesSource == null) {
return LeafBucketCollector.NO_OP_COLLECTOR;
}
final BigArrays bigArrays = context.bigArrays();
final HistogramValues values = ((HistogramValuesSource.Histogram) valuesSource).getHistogramValues(ctx);

final CompensatedSum kahanSummation = new CompensatedSum(0, 0);
return new LeafBucketCollectorBase(sub, values) {
@Override
public void collect(int doc, long bucket) throws IOException {
sums = bigArrays.grow(sums, bucket + 1);
compensations = bigArrays.grow(compensations, bucket + 1);

if (values.advanceExact(doc)) {
final HistogramValue sketch = values.histogram();
final double sum = sums.get(bucket);
final double compensation = compensations.get(bucket);
kahanSummation.reset(sum, compensation);
while (sketch.next()) {
double d = sketch.value() * sketch.count();
kahanSummation.add(d);
}

compensations.set(bucket, kahanSummation.delta());
sums.set(bucket, kahanSummation.value());
}
}
};
}

@Override
public double metric(long owningBucketOrd) {
if (valuesSource == null || owningBucketOrd >= sums.size()) {
return 0.0;
}
return sums.get(owningBucketOrd);
}

@Override
public InternalAggregation buildAggregation(long bucket) {
if (valuesSource == null || bucket >= sums.size()) {
return buildEmptyAggregation();
}
return new InternalSum(name, sums.get(bucket), format, metadata());
}

@Override
public InternalAggregation buildEmptyAggregation() {
return new InternalSum(name, 0.0, format, metadata());
}

@Override
public void doClose() {
Releasables.close(sums, compensations);
}
}
Loading

0 comments on commit cefc6af

Please sign in to comment.