Simplify ordering support on terms aggregations #17588

jpountz · 2016-04-07T08:47:24Z

We try to be as flexible as possible when it comes to sorting terms aggregations. However, sorting by anything but by _term or descending _count makes it very hard to return the correct top buckets and counts, which is disappointing to users. Instead, I suggest that we only allow order options that result in
reasonably accurate results:

remove the ability to sort by ascending count
remove ordering by sub aggregations entirely, or only allow when the leaf of the path is a min or max aggregation: in that case counts will not be accurate but I believe that the top buckets will be correct.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-04-07T10:22:41Z

Is this something we can possibly support with two-phase aggregations? #12316

jpountz · 2016-04-07T12:48:47Z

With multiple phases, we could get accurate results when sorting by min/max aggs (the first round could compute the top buckets and the second could ask all shards only for these values). This would also help refine counts when sorting by descending count since we can assume that shards have similar number of occurrences of each term. But we cannot make such assumptions for sub aggregations. For instance if you sort a terms aggregation by a sub avg aggregation, you could still get very different top terms for each shard eg. if the field that you compute the avg on has outliers.

rashidkpc · 2016-04-07T21:07:19Z

Please open an issue on the kibana repo if and when you start moving on this as it will be a breaking change for many of our users.

colings86 · 2016-04-08T09:45:18Z

Discussed in FixItFriday and we agreed to split out removing the ascending count option from this issue so the conversation can be separated from removing support for sorting by sub aggregations (see #17614)

CristianWeiland · 2017-02-13T12:35:01Z

I had this problem (described in #23108), and increasing size of terms aggregation gives a more accurate result. Increasing size to a value bigger than the number of documents seems to give the correct result.
Why does this happen?

clintongormley · 2017-02-13T12:40:19Z

@CristianWeiland you can read about it in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts

markharwood · 2018-03-16T15:35:01Z

As noted on #17614 (comment), one possible policy is to error if numTermsReturned < size requested and point out the use of partitioning as a way to get accurate-within-partition results.

cc @elastic/es-search-aggs

nik9000 · 2020-07-29T18:59:24Z

Folks have long since come to rely on ordering by sub-agg so we think we're probably better off adding any optimizations that we can for it and doing what we can to make us more likely to produce correct answers.

The trouble with these orderings is that they make it very hard to prune results and still be sure that we have the right results, both on the data nodes and on the coordinating nodes. We try to work around this with shard_size so the data nodes return all of the important shards. We work around this on the coordinating node by never pruning results at all. Neither of these are great. The shard_size might not contain the important buckets so we could end up being wrong. Not pruning results from the shards could use a ton of memory. Both bad! But maybe we can do something about it? We're not really sure how yet though.

nik9000 · 2021-06-23T14:29:24Z

Looking at this again, I think my comment from last time is still accurate. Folks rely on these flawed or not. I don't think we can drop this. And we don't have time in the short term to do these accurately. I think, at least for a while, we're not going to do anything with this.

jpountz added >breaking discuss :Analytics/Aggregations Aggregations labels Apr 7, 2016

colings86 mentioned this issue Apr 8, 2016

Remove support for sorting terms aggregation by ascending count #17614

Closed

rashidkpc mentioned this issue Apr 28, 2016

Plans for adding terms agg support? elastic/timelion#15

Closed

rashidkpc mentioned this issue Jun 29, 2016

split elastic/timelion#133

Closed

clintongormley added discuss stalled and removed discuss labels Jul 8, 2016

colings86 mentioned this issue Aug 19, 2016

Ordering term aggregation based on scripted metric. #15718

Closed

jpountz mentioned this issue Oct 7, 2016

Add support for "missing" parameter to the terms aggregation "order" #20237

Closed

colings86 mentioned this issue Dec 7, 2016

Order by scripted_metric sub aggregation #8486

Closed

jpountz mentioned this issue Feb 10, 2017

Computing average shows wrong value #23108

Closed

colings86 mentioned this issue Mar 13, 2018

Aggregations: Make order more flexible for terms #6917

Closed

tomcallahan added team-discuss and removed discuss labels Jun 1, 2018

$@polyfractal$ polyfractal mentioned this issue Aug 2, 2018

Possibility to add new ordering scheme for Aggregation results. #26570

Closed

colings86 removed the team-discuss label Aug 7, 2018

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

$@polyfractal$ polyfractal mentioned this issue Sep 9, 2020

Remove ordering support on histogram aggregations #17587

Closed

wchaparro assigned nik9000 Jun 16, 2021

nik9000 closed this as completed Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify ordering support on terms aggregations #17588

Simplify ordering support on terms aggregations #17588

jpountz commented Apr 7, 2016

clintongormley commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

colings86 commented Apr 8, 2016

CristianWeiland commented Feb 13, 2017

clintongormley commented Feb 13, 2017

markharwood commented Mar 16, 2018

nik9000 commented Jul 29, 2020

nik9000 commented Jun 23, 2021

Simplify ordering support on terms aggregations #17588

Simplify ordering support on terms aggregations #17588

Comments

jpountz commented Apr 7, 2016

clintongormley commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

colings86 commented Apr 8, 2016

CristianWeiland commented Feb 13, 2017

clintongormley commented Feb 13, 2017

markharwood commented Mar 16, 2018

nik9000 commented Jul 29, 2020

nik9000 commented Jun 23, 2021