-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't use Date or String values in bucket_selector
and bucket_script
pipeline aggregations
#23874
Comments
Bleh. I think this is a thing that'd be fixed pretty well with the script contexts we keep talking about. In that case we'd compile the script against one of a couple of interfaces (returning a double, returning a date, returning a string) and then adapt them to something useful for aggs. Or something like that. Without them (because they aren't coming quickly) we could add a couple more instanceof checks.... |
I don't really want to add the instanceof checks directly to that method since, as I mentioned, the rest of the pipeline aggregations rely on the value being a number. |
@elastic/es-search-aggs |
Ran into this and had a look at the
|
Uhm... and I would replace the "high" with a "low" in that label. |
Also related, recent discussions about exposing the string key of geo tiles: #39957 (comment) |
Hi! I was just curious if this is something that is still being worked on or if this is working already in ES 7.3? |
I'm curious too |
Just ran into this issue while trying to calculate durations between events over time and was wondering if this is a workaround or if I'm misunderstanding the result: Given a set of documents with a `@timestamp` field
this query fails: {
"size": 0,
"aggs": {
"permin": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "minute",
"min_doc_count": 1,
"keyed": false
},
"aggs": {
"diff": {
"serial_diff": {
"buckets_path": "_key",
"lag": 1
}
}
}
}
}
} with: {
"type" : "aggregation_execution_exception",
"reason" : "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [ZonedDateTime] at aggregation [_key]"
} while this one, with a {
"size": 0,
"aggs": {
"permin": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "minute",
"min_doc_count": 1,
"keyed": false
},
"aggs": {
"minpermin": {
"min": {
"field": "@timestamp"
}
},
"diff": {
"serial_diff": {
"buckets_path": "minpermin",
"lag": 1
}
}
}
}
}
} |
related to #54110 |
Previously we were not able to execute queries like the one below, where we filter on the groupings of the `Aggregate`s. The reasons are the limitations of the `bucket_selector`s `bucket_path` [0] that cannot filter on non-numerical keys. The implemented optimization transforms the following query: ```sql SELECT i, a FROM ( SELECT int as i, AVG(float) as a FROM test GROUP BY int ) WHERE i = 1 ``` To one equivalent to this: ```sql SELECT i, a FROM ( SELECT int as i, AVG(float) as a FROM test WHERE int = 1 GROUP BY int ) ``` The latter query can be translated to Query DSL. The change also makes it possible to use a query with HAVING in a subselect with filters in the outer query. We can have multiple levels of queries on a single GROUP BY with filters on multiple levels. At the end the filters will be collapsed, the conditions on the aggregates will transformed into an equivalent of HAVING, while the conditions on groupings will be pushed below the GROUP BY query. Note: because of the limitation [0] mentioned above (sub-)conditions that use expressions both on groupings and aggregates of the GROUP BY and unsplittable (groupings and aggregates are not in separate child of a conjunction) cannot be handled. For example the following query will fail with a VerifierException: ```sql SELECT * FROM ( SELECT int, count(*) as cnt FROM test GROUP BY int ) WHERE int > 2 OR cnt > 1 ``` because the `int > 2 OR cnt > 1` filters both on the groupings (`int > 2`) and the aggregates (`cnt > 1` that translates to `count(*) > 1`). Note: Queries like ```sql SELECT MAX(int) FROM test WHERE MAX(int) > 10 ``` are still not allowed (HAVING should be used). Also fixes elastic#69758, by making the queries work utilizing the filter push-down, but note that there are changes in the `EXPLAIN (PLAN ANALYZED)` outputs (`SubqueryAlias` node appears). [0] elastic#23874 .
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I had a look at this trying to implement it using something very similar to the second bullet point but I see the following. The logic that resolves the value pointed by the
As a result I believe the actual issue is in having |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Reported via Discuss forum: https://discuss.elastic.co/t/bucket-selector-aggregation-on-date-histogram--key/80986
The
bucket_script
andbucket_selector
aggregations use theBucketHelpers.resolveBucketValue()
method to get thebucket_path
values from the buckets. That method requires that the return value is aDouble
so currently it is required that all thebucket_path
s are numeric values. This presents a problem when trying to use the_key
of a bucket since the key might be a DateTime (in the case of adate_histogram
ordate_range
aggs) or a String (in the case of aterms
agg).One thing to note here is that all current pipeline aggs except the
bucket_script
andbucket_selector
require the value from thebucket_path
to be a double, so whatever the solution to this bug we should maintain a route that guarantees a double to be returned.I have a few thoughts on how to solve this but I don't know if which (if any) of them are a good idea yet:
BucketHelpers.resolveBucketValue()
to have a generic return type and somehow check that type is compatible before returning (not sure if this is possible since generics are not available at runtime)public Object resolveBucketObject()
method inBucketHelpers
that just returns the value it gets from the bucket as long as its not an instance ofInternalAggregation
(So that its an actual value rather than an aggregationbucket_script
andbucket_selector
aggs get the bucket path values directly usingBucket.getProperty()
The text was updated successfully, but these errors were encountered: