Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram field type support for ValueCount and Avg aggregations #55933

Merged
merged 7 commits into from
May 4, 2020

Conversation

csoulios
Copy link
Contributor

Implements value_count and avg aggregations over Histogram fields as discussed in #53285

  • value_count returns the sum of all counts array of the histograms
  • avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 29, 2020
@nik9000 nik9000 self-requested a review April 29, 2020 15:04
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small stuff but it mostly looks good to me. Its neat that this is comparatively easy to do now!

@@ -65,6 +68,24 @@ public static void registerPercentileRanksAggregator(ValuesSourceRegistry.Builde
public static void registerHistoBackedSumAggregator(ValuesSourceRegistry.Builder builder) {
builder.register(SumAggregationBuilder.NAME,
AnalyticsValuesSourceType.HISTOGRAM,
(MetricAggregatorSupplier) HistoBackedSumAggregator::new);
(MetricAggregatorSupplier) (name, valuesSource, format, context, parent, metadata) ->
new HistoBackedSumAggregator(name, (HistogramValuesSource.Histogram) valuesSource, format, context, parent, metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@not-napoleon and I've been favoring changing the ctor so you can just to (MetricAggregatorSupplier) HistoBackedSumAggregator::new. It is less "big".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MetricAggregatorSupplier) HistoBackedSumAggregator::new is definitely better looking and I would prefer it as well.

The problem is that MetricAggregatorSupplier passes a ValuesSource argument

Aggregator build(String name,
                     ValuesSource valuesSource,
                     DocValueFormat format,
                     SearchContext context,
                     Aggregator parent,
                     Map<String, Object> metadata) throws IOException;

while HistoBackedSumAggregator constructor expects a HistogramValuesSource.Histogram

HistoBackedSumAggregator(String name, HistogramValuesSource.Histogram valuesSource, DocValueFormat formatter, SearchContext context,
            Aggregator parent, Map<String, Object> metadata) throws IOException 

So I had 3 options:

  1. Keep the ValuesSource parameter in the HistoBackedSumAggregator constructor and do a type cast later when I would use the HistogramValuesSource.Histogram object.
  2. Create a separate HistoBackedMetricAggregatorSupplier that would match the arguments of the HistoBackedSumAggregator
  3. Do the cast of ValuesSource to HistogramValuesSource.Histogram in the registrar method changing the admittedly prettier (MetricAggregatorSupplier) HistoBackedSumAggregator::new to the longer lambda expression.

I chose the third option because it seemed cleaner to do the type cast at the registrar method where it is obvious that the VS is a histogram. On the other hand, I didn't want to create yet another AggregatorSupplier sub-class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been favoring having generic ValuesSource in the supplier and doing the cast in the aggregator constructor instead. Really the suppliers shouldn't care about the ValuesSource subclass, since the whole idea is to be able to add new ones in, and having to subclass a supplier to do that is a lot of boiler plate. I know I haven't been perfectly consistent about that so far, but I'm working on cleaning it up as I can.

@csoulios csoulios merged commit caf6c5a into elastic:master May 4, 2020
@csoulios csoulios deleted the histo-value-count-avg branch May 4, 2020 07:24
csoulios added a commit that referenced this pull request May 4, 2020
#56099)

Backports #55933 to 7.x

Implements value_count and avg aggregations over Histogram fields as discussed in #53285

- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >feature release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants