-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify ordering support on terms aggregations #17588
Comments
Is this something we can possibly support with two-phase aggregations? #12316 |
With multiple phases, we could get accurate results when sorting by min/max aggs (the first round could compute the top buckets and the second could ask all shards only for these values). This would also help refine counts when sorting by descending count since we can assume that shards have similar number of occurrences of each term. But we cannot make such assumptions for sub aggregations. For instance if you sort a terms aggregation by a sub avg aggregation, you could still get very different top terms for each shard eg. if the field that you compute the avg on has outliers. |
Please open an issue on the kibana repo if and when you start moving on this as it will be a breaking change for many of our users. |
Discussed in FixItFriday and we agreed to split out removing the ascending count option from this issue so the conversation can be separated from removing support for sorting by sub aggregations (see #17614) |
I had this problem (described in #23108), and increasing size of terms aggregation gives a more accurate result. Increasing size to a value bigger than the number of documents seems to give the correct result. |
As noted on #17614 (comment), one possible policy is to error if numTermsReturned < cc @elastic/es-search-aggs |
Folks have long since come to rely on ordering by sub-agg so we think we're probably better off adding any optimizations that we can for it and doing what we can to make us more likely to produce correct answers. The trouble with these orderings is that they make it very hard to prune results and still be sure that we have the right results, both on the data nodes and on the coordinating nodes. We try to work around this with |
Looking at this again, I think my comment from last time is still accurate. Folks rely on these flawed or not. I don't think we can drop this. And we don't have time in the short term to do these accurately. I think, at least for a while, we're not going to do anything with this. |
We try to be as flexible as possible when it comes to sorting terms aggregations. However, sorting by anything but by
_term
or descending_count
makes it very hard to return the correct top buckets and counts, which is disappointing to users. Instead, I suggest that we only alloworder
options that result inreasonably accurate results:
min
ormax
aggregation: in that case counts will not be accurate but I believe that the top buckets will be correct.The text was updated successfully, but these errors were encountered: