Decouple pipeline reductions from final agg reduction #45796

polyfractal · 2019-08-21T14:40:15Z

Historically only two extra activities happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc

Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances.

This commit decouples pipeline aggs from the final reduction:

Introduces a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController)
Adds a materializePipeline() method to InternalAggs and InternalMultiBucket. This is essentially the final reduce for pipelines
Makes reductions on pipelines a no-op

By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible

Closes #44914, predecessor PR was #45359

Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction: 1. Introduces a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController) 2. Adds a `materializePipeline()` method to InternalAggs and InternalMultiBucket. This is essentially the final reduce for pipelines 3. Makes reductions on pipelines a no-op By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible

elasticmachine · 2019-08-21T14:40:17Z

Pinging @elastic/es-analytics-geo

polyfractal · 2019-08-21T14:41:10Z

Note: the first commit contains the main changes, the second commit ("Remove unnecessary doReduce()") just does a bulk rename since doReduce() is no longer needed. So reviewing the first commit first would be less noisy.

polyfractal · 2019-08-21T15:38:00Z

/cc @javanna since you've looked at the final reduce stuff for CCS
/cc @markharwood since I had to make some changes to SigTerms (e.g. the buckets get updated with scores, and I had to adjust some ctors to allow buckets to be recreated with the same score)

not-napoleon

Someone who knows Significant Terms aggregation better should probably comment on that part, I'm just taking it on faith. Otherwise this looks good to me. I left two nits about the use of the Streams API, but they're both a matter of opinion, so don't feel like you have to fix them.

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java

server/src/main/java/org/elasticsearch/search/aggregations/InternalMultiBucketAggregation.java

jimczi

+1 to separate the aggs reduction and the pipeline reduction. However I wonder if all the renamings are needed. Could we just change InternalAggregations to separate reduce and reducePipeline ?

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregation.java

polyfractal · 2019-09-09T19:56:48Z

Review comments addressed.

I renamed the "materialize" stuff back to "reduce", although there's a small hiccup. Because InternalAggregations's reduce() is essentially now reducePipelines(), we are left over with the abstract doReduce().

I first renamed that to reduce() which makes sense, although it modifies some 70+ files :)

In the second commit I changed it to doReduce() which has a much more minimal impact (because all the aggs implemented doReduce() abstract method before), although a bit strange conceptually as there isn't a reduce() anymore.

I don't have a strong opinion either way, although I'd lean towards renaming to reduce() and just accepting that it touches every single aggregation :)

jimczi

The change looks good to me.

I don't have a strong opinion either way, although I'd lean towards renaming to reduce() and just accepting that it touches every single aggregation :)

+1 to move back to reduce and no need for another round of review since the change should be straightforward ;).

This reverts commit ba627f6.

polyfractal · 2019-09-23T15:23:17Z

@elasticmachine update branch

polyfractal · 2019-11-25T20:06:37Z

Oh my goodness, I thought I merged this. Fixing up conflicts so that it can go in.

Oops!

polyfractal · 2019-11-27T17:14:55Z

@elasticmachine run elasticsearch-ci/packaging-sample-matrix

Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible

$polyfractal$

polyfractal added 3 commits August 20, 2019 14:28

$@polyfractal$

Remove unnecessary doReduce()

1d9def7

$@polyfractal$

Add assertions, javadoc

e36dfd4

$@polyfractal$ polyfractal added >bug :Analytics/Aggregations Aggregations >refactoring v8.0.0 v7.4.0 labels Aug 21, 2019

$@polyfractal$

Merge remote-tracking branch 'origin/master' into materialize_pipeline

fb969e4

colings86 added v7.5.0 and removed v7.4.0 labels Aug 30, 2019

not-napoleon approved these changes Sep 4, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregations.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/search/aggregations/InternalMultiBucketAggregation.java Outdated Show resolved Hide resolved

jimczi reviewed Sep 4, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/search/aggregations/InternalAggregation.java Outdated Show resolved Hide resolved

polyfractal added 3 commits September 9, 2019 12:19

$@polyfractal$

Review comments: move back to "reduce" naming

68e035b

$@polyfractal$

Review comments: Stream tweaks

8322905

$@polyfractal$

More renaming

ba627f6

jimczi approved these changes Sep 10, 2019

View reviewed changes

$@polyfractal$

Revert "More renaming"

46444ae

This reverts commit ba627f6.

jimczi added v7.6.0 and removed v7.5.0 labels Nov 12, 2019

$@polyfractal$

Merge remote-tracking branch 'origin/master' into materialize_pipeline

abbd05c

$@polyfractal$ polyfractal force-pushed the materialize_pipeline branch from 20ad569 to abbd05c Compare November 25, 2019 20:21

$@polyfractal$

Merge conflicts

db22367

$@polyfractal$

Merge remote-tracking branch 'origin/master' into materialize_pipeline

2bcfbf9

$@polyfractal$ polyfractal merged commit 9c34ff9 into elastic:master Dec 5, 2019

$@polyfractal$ polyfractal mentioned this pull request Dec 5, 2019

auto_date_histogram fails where date_histogram does not #44914

Closed

This was referenced Dec 11, 2019

[Monitoring] [UI] Multiple logstash pipelines aggregations are broken #50054

Closed

SingleBucket aggs need to reduce their bucket's pipelines first #50103

Merged

nik9000 mentioned this pull request Apr 1, 2020

Mute InternalAutoDateHistogramTests.testReduceRandom #54542

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple pipeline reductions from final agg reduction #45796

Decouple pipeline reductions from final agg reduction #45796

$@polyfractal$ polyfractal commented Aug 21, 2019

elasticmachine commented Aug 21, 2019

polyfractal commented Aug 21, 2019

polyfractal commented Aug 21, 2019

not-napoleon left a comment

jimczi left a comment

polyfractal commented Sep 9, 2019

jimczi left a comment

polyfractal commented Sep 23, 2019

polyfractal commented Nov 25, 2019

polyfractal commented Nov 27, 2019

Decouple pipeline reductions from final agg reduction #45796

Decouple pipeline reductions from final agg reduction #45796

Conversation

polyfractal commented Aug 21, 2019

elasticmachine commented Aug 21, 2019

polyfractal commented Aug 21, 2019

polyfractal commented Aug 21, 2019

not-napoleon left a comment

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

polyfractal commented Sep 9, 2019

jimczi left a comment

Choose a reason for hiding this comment

polyfractal commented Sep 23, 2019

polyfractal commented Nov 25, 2019

polyfractal commented Nov 27, 2019

$@polyfractal$ polyfractal commented Aug 21, 2019