-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering on results of an aggregation #4404
Comments
We also need a way to filter a multi valued aggregate down to a single value so we don't have to get so much data back. We retrieve values like biggest month, smallest month, busiest month, slowest month, etc, and I was hoping to at least be able to do something like below, but there is no "size" field on the histograms, so even though the results are ordered the way I want, I have to get back thousands of results just so I can print a single number on a website. {
"query": { "match_all": {} },
"aggs": {
"biggest_month": {
"date_histogram": {
"field": "PO.IssuedDate",
"interval": "month",
"order": { "sum_of_po_totals": "desc" },
"format": "yyyy-MM",
"size": 1 // no size???
},
"aggs": {
"sum_of_po_totals": {
"sum": { "field": "PO.IssuedAmount" }
}
}
},
"smallest_month": {
"date_histogram": {
"field": "PO.IssuedDate",
"interval": "month",
"order": { "sum_of_po_totals": "asc" },
"format": "yyyy-MM",
"size": 1 // no size???
},
"aggs": {
"sum_of_po_totals": {
"sum": { "field": "PO.IssuedAmount" }
}
}
},
"busiest_month": {
"date_histogram": {
"field": "PO.IssuedDate",
"interval": "month",
"order": { "_count": "desc" },
"format": "yyyy-MM",
"size": 1 // no size???
}
},
"slowest_month": {
"date_histogram": {
"field": "PO.IssuedDate",
"interval": "month",
"order": { "_count": "asc" },
"format": "yyyy-MM",
"size": 1 // no size???
}
}
},
"size": 0
} Better yet, I would love to see the single value aggregations (or something like them) work on multi value aggregations as a filter (which is currently not being used for anything), so something like this: {
"query": { "match_all": {} },
"aggs": {
"biggest_month": {
"max": { "field": "by_month.sum_of_po_totals" }
},
"smallest_month": {
"min": { "field": "by_month.sum_of_po_totals" }
},
"busiest_month": {
"max": { "field": "by_month.count_of_pos" }
},
"slowest_month": {
"min": { "field": "by_month.count_of_pos" }
},
"aggs": {
"by_month": {
"date_histogram": {
"field": "PO.IssuedDate",
"interval": "month",
"format": "yyyy-MM"
},
"aggs": {
"sum_of_po_totals": {
"sum": { "field": "PO.IssuedAmount" }
},
"count_of_pos": {
"value_count": { "field": "PO.ID" }
}
}
}
}
},
"size": 0
} Of course, the "having" modifier mentioned originally would also be fantastic for doing range queries, etc on these aggregate results as well, and would also solve our issue if we could use a "max" aggregate in a "having" clause (even though I personally find it a little less intuitive to have it at the bottom of the hierarchy, but I'm guessing it would probably be easier to implement that way since it is still a single valued aggregate operating under a multi-valued bucket aggregate). No matter what the solution, this is still a pretty massive hole for our needs in an otherwise awesome aggregations framework, which is unfortunate. |
If I well understand this issue, you would like to have the equivalent of the |
The I think what we are looking for is a way to further filter the result of the aggregation after the fact (i.e. the MAX of the SUM). |
Also, would it be possible to apply aggregation in the query filter? We need to limit search result based on the aggregation result. For Example, return all the employees in a department "dep1" whose salary is less than the average salary of a department "dep1". |
A filter similar to "having" would be useful, I have been trying to find groups of documents with more than a certain number of matches overall. (i.e. SELECT id ... GROUP BY id HAVING COUNT(id) > 4) |
Are there any plans to implement such a feature? We really need this for one of our "big data" projects. Are there any approaches to implement this via a plugin? I would be very thankful for any hints. |
We are willing to provide financial support for this feature if that helps. Just let me know how to get in contact. |
+1. This feature is very much needed. Happy to provide resources to help if necessary. |
Mark. We also want to migrate our database to ES but stopped by this issue. |
+1 |
7 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 as well. |
+1 |
+1 this functionality would be very useful |
+1 |
1 similar comment
+1 |
+1 |
2 similar comments
👍 |
+1 |
Closing in favour of #8110 |
+1 |
5 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
1+ |
+1 |
1 similar comment
+1 |
Important to note that this was closed in favor of #8110 - which was closed as too complicated. It seems like this might be solved in 2.0 with |
@yehosef have a look at the bucket selector agg |
@clintongormley - great, thanks for the reference. |
The case here can best be described by referring to the HAVING clause in SQL. Basically we need to be able do an aggregation but limit the results of the query based on the value of the aggregation. For example if I wanted to look at what age bands have an average height greater that 5 feet, my query would look something like this:
The text was updated successfully, but these errors were encountered: