-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducers - Post processing of aggregation results #8110
Comments
+1 |
Hi @colings86 Will it allow for expressions against aggregated values? Say I need to calculate a metric: would transformation merging (adding new and replacing existing) buckets from one agg into another be possible. My use cases for it are:
Thanks |
Also see #4983 for other use cases. |
+1 :) |
@roytmana currently the implementations of the reducers are still being discussed but the design does allow for creating new buckets (which could be a merge of existing buckets) but not modifying existing buckets |
+1 |
Thank you @colings86. Is replacement of an existing bucket considered to be modification? I would not want to modify an existing bucket but rather replace it all together I think combining two aggregation results is a hugely useful use case (I outline just couple of cases in my previous post which I and other who build generic interactive solutions based on ES would be interested in) |
I am trying to implement the Union Reducer and have hit a dilemma which I want some opinions on. Basically reducers need to be able to access and use multiple aggregations from anywhere in the aggregation tree. Thinking for the moment only about top-level reducers, I think it would be good if the logic to extract the aggregations was done in the Reducer class so that the implementations don't have to worry about doing it. That would mean the doReduce method would have a list of aggregations passed to it rather than a single aggregation. The problem is that this makes reducers which only care about 1 aggregation a bit clunky since they each have to check that the list only contains one aggregation and then get it. its not a lot of code but will get repeated in a lot of places. I think the options are:
Interested to hear peoples thought on this |
I am actually not sure if we need to care for reducers that only take one aggregation. Instead we might also just implement all the "single" reducers so that they can also operate on an arbitrary number of aggregations and then return the result for any of the curves. For example, a reducer running on the following aggregation could return 1 or several results depending on if there are one or more terms in the terms aggregation:
|
+1 |
+1. Would love the ability to do a simple metric aggregation on bucket fields. For example, give me the average "doc_count" etc. |
yes. yes yes yes. |
Having started to implement bucket reducers using the style of API above we have found the API to be overly complex and very verbose to perform even straight forward tasks (such as a simple derivative of a metric). A lot of this is to do with a user having to carefully build their aggregations and then have to define multiple reducers and link them to the aggregations. We found that the only way to build reducer requests with any confidence is to first build the aggregations and then inspect the response to work out what the reducers request should look like. This is not what we want. We would like an API where reducers can be defined at the same time as the aggregations with a structure which is inuitive from the aggregation request structure (i.e. you shouldn't need to get the aggregations response to be able to write your reducers request) So, we have rethought the API, going back to a more native design with the existing aggregations framework so that reducers look and feel like just another aggregation. Now there is a new type of aggregation which takes the output of an aggregation tree and computes a new aggregation tree based on these results. The original sub-aggregation tree is destroyed in the process. This can be used to create functionality such as derivatives and moving average where the original data is not needed in the final output. With this new aggregation functionality we can have even more aggregation implementations. The first implementation of this will be the derivative (#9293). |
👍 |
So there is no option to directly get the division between two aggregations? I have this search query, and I wonder if it's possible to obtain the division between the balanceIncrement after the nested aggregations are done? ideally i want to calculate the payout: sum balanceIncrement where 'type' = "prizes" / sum balanceIncrement where 'type' in ("play","ext) grouped by room and then grouped by day. What I have now is that summation is done but I don't find the way I can then divide between them and then get the payout. This is my query: /fb/balance_trasnfer/_search?pretty {
"query" : {
"filtered": {
"filter": {
"range" : {
"transferDate" : {
"from" : "${BAL_TRANSFER_DATE_INIT}",
"to" : "${BAL_TRANSFER_DATE_PLUS_7_DAYS}"
}
}
}
}
},
"size": 0,
"aggs" : {
"transfers_over_day" : {
"date_histogram" : {
"field" : "transferDate",
"interval" : "day"
},
"aggs": {
"group_by_room": {
"terms": {
"field": "roomName"
},
"aggs" : {
"wins-and-bets" : {
"filters" : {
"filters" : {
"win" : { "term" : { "type" : "prize"}},
"bet" : { "term" : { "type" : ["play","ext"]}}
}
},
"aggs" : {
"sum_balance": {
"sum" : {"field":"balanceIncrement"}
}
}
}
}
}
}
}
}
}' |
👍 |
+1
Would be great to be able to filter out top buckets having the "SUCCESS" in the sub-aggregation's success field. |
Why has this issue been closed ? |
As described in the comment above (#8110 (comment)) the approach and API proposed in this issue turned out to be too complex and not user friendly. Work is still continuing to provide this kind of functionality but with a better more intuitive API. The start of this is the implementation of derivatives in #9293 |
Thank You, managed to miss that comment. |
👍 |
Hi All, I have a requirement to aggregate on 3 fields.I need to perform range operation on the doc_count value of last aggregation. Is there a way I can accomplish this ? Let us say my query looks like the below - "aggregations" : { I want to explore on the possibilities of filtering the results based in doc_count value of last aggregation(g3). Below is the result - Thanks. |
While I see why reducers can create an overly complex API, I don't think the specialized route is any better. There are many actions I may want to perform on the resulting buckets from an aggregation besides the derivative. The derivative approach here (#9293) is simple and easy, but very specific. How do you plan to cover all the potential use cases for this? For instance, I may want the entire filter API to filter the buckets returned from an aggregation. In my particular case, I want to only find the buckets with "doc_count" equal to 1. But I can see many more complex use cases for these "meta-filters". Edit: I just found this #9876, which might answer how you're planning for other use cases. But I don't see an answer for my particular use case. Those seem to be aggregating aggregation results, not filtering them. Any idea how I can resolve my use case? |
We plan to add a
Anything that these post-processing aggs can do, can also be done on the client. So the major benefit of adding these is convenience for the user. We'll keep adding aggs that are commonly useful, but if something is really specific, then it can still be done client side. |
Actually no, not really. So, no, I don't want to transfer tens of megabytes of aggregated data from Elasticsearch into my application just to be able to return to my consumer a few figures obtainable by further aggregating the aggregations. No, I would very much like the aggregations of aggregations to take place on the Elasticsearch coordinating node, as the application delivers high-level statistics. Some |
@acarstoiu thank you for your essay. If, instead of venting your spleen, you'd followed some of the links above, you would have discovered the wonder that is pipeline aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline.html Please do remember that there are real humans here, with real feelings, writing software which you are using. |
I'm aware of #9876, I just wanted to point out that your recommendation is evolutionary wrong. I hope this time was short enough for you (and your feelings). Last, but not least, thank you for the link, you're a good writer too! |
Hi! Why elasticsearch sum aggs in API with timestamp, showing different result with kibana interface? |
Please ask questions like this on the https://discuss.elastic.co/ forums, this issues list is used for tracking bugs and feature requests. Also, so people can help you with your problem you will need to include a lot more detail about the problem you are seeing. Please read the following on how best to ask questions on the forum to get the best help: https://www.elastic.co/help |
I am using elasticsearch-1.5.1 and I want to apply a filter over derived filters. I tried using reducers but I guess it doesn't support reducers. Please suggest a solution. |
@nishasingla1224 The "bucket reducer" functionality was added to Elasticsearch 2.0 under the name "Pipeline Aggregations". You can read the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html You will need to upgrade to 2.0+ to use them, however. |
@acarstoiu Now is there a way to just return the size of the aggr bucket, without the bucket content. |
Hi everyone, + 1 for @shuangshui's question |
I am also interested in the answer to @shuangshui's question. |
@shuangshui @jrj-d @sourcec0de You can potentially use response filtering via Still, it can help to cut down on very large responses, especially when you're just interested in some kind of metric calculated over a bunch of histograms for example |
Overview
This feature would introduce a new 'reducers' module, allowing for the processing and transformation of aggregation results. The current aggregation module is very powerful and can compute varied and complex analytics but is limited when calculating analytics which depend on numerous independently calculated aggregated metrics. Currently, the only way to calculate these types of complex metrics is to write client side logic. The reducer module will add a new phase in the search request where the coordinating node can pass the reduced aggregations for further processing.
Key Concepts & Terminology
bucket aggregation
and uses it's set of buckets to perform calculations and generate a newaggregation
which represents either a single answer (a newmetric aggregation
) or a new set of buckets (a newbucket aggregation
). Reducers run after aggregations and act on the result tree returned by aggregations or the result tree of previous reducers. There are 2 types ofreducer
:selections
). Abucket reducer
may create new buckets to add to theselections
but is not able to modify existing buckets.Reducer Structure
The following snippet captures the basic structure of aggregations:
Some rules for reducers requests:
_root.path.to.value
Use Cases
Some possible implementations of reducers are detailed below:
Bucket Reducers
Metric Reducers
Example
As an example, imagine that you have sales data in an index and you would like to calculate the derivative of the price field over time. This is useful for determining trends in the data, such as whether sales are increasing linearly over time.
You might start with the following aggregation:
The results of these aggregations might look something like the following:
The sliding window reducer could then be used to request time windows of two months. A nested delta reducer inside the sliding window reducer could be used to calculate the rate of change of price for each two month selection.
Which would produce the following response (derivative value is shown in price per month):
The text was updated successfully, but these errors were encountered: