-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252
Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252
Conversation
To start with we decided that we would like to support only date histogram aggregations using 'fixed_interval' and throw an error if 'calendar_interval' is used on a time series index.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
server/src/main/java/org/elasticsearch/ElasticsearchException.java
Outdated
Show resolved
Hide resolved
.../rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/20_unsupported_aggs.yml
Outdated
Show resolved
Hide resolved
.../org/elasticsearch/search/aggregations/bucket/histogram/DateHistogramAggregationBuilder.java
Outdated
Show resolved
Hide resolved
Date histogram aggregations must fail if 'calendar_interval' or a non-utc time zone is used, only if the index the aggregation is running on is a rollup index. For time series indices, date histogram aggregations should work as expected without throwing any error.
@@ -127,6 +128,15 @@ public class IndexMetadata implements Diffable<IndexMetadata>, ToXContentFragmen | |||
EnumSet.of(ClusterBlockLevel.WRITE) | |||
); | |||
|
|||
public boolean isRollupIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling this is not the best I can do here. I am working on a better way to do this at the moment but I wanted to push this change because it might be good enough. If that feeling is shared but this is considered "good enough" we could proceed merging this to unblock Kibana folks experimenting with catching exceptions. Then I can create another issue to refactor this. N.B. I am on vacation next week and Christos is on vacation too and I would like to avoid Kibana folks being stuck waiting for this.
As a result, if required, I can work on this on another PR whose purpose would be to refactor this "isRollupIndex" logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cool as a temporary thing. Maybe a TODO
in the code so we know we thought it was temporary when we added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me, left a nit about another test we could add
- match: { error.root_cause.0.reason: "Field [total_memory_used] of type [aggregate_metric_double] is not supported for aggregation [percentiles]" } | ||
|
||
--- | ||
"Top-metrics aggregation on aggregate_metric_double field": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether out of scope but what about a test for the mixed case (as this will be extremely common in practice)? Querying both indices at the same time, resulting in one of them succeeding and the other producing a shard failure so there's partial data in the response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that is a good test I think...I am not sure what the result is in this case. I guess we hava a shard failure for the rollup index and partial results coming from the time series index...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not necessary for the errors we throw if the aggregation is not supported (aggregation on field of type agrgegate_metric_double
). In that case we don't check if the index is a time series index or not...just the field type. Also when doing a rollup on a field, lat's say a histogram aggregation on a gauge field called test_field. In that case we create the rollup index and a new field that is called test_field_histogram. As a result it would not be possible to run a query on both indices on the same field because the field name in the rollup index would be different 'by construction' (original_field_name + aggregation_name).
Anyway, it is a good test for the date histogram case...in that case indeed the field is the timestamp field which is the same in both the source index and the rollup index.
Hitting both indices produces partial results coming from the source time series index and a shard failure coming from the rollup index.
calendar_interval: hour | ||
min_doc_count: 1 | ||
|
||
- match: { _shards.total: 2 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: here we rely on the fact that the rollup index is created with the same number of shards of the original index, which is set to 1 in the setup block.
@@ -0,0 +1,250 @@ | |||
setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this test adding the logic to create the rollup index from the source index because some settings and metadata are applied by the rollup operation and it is not possible to set them from the yaml test. The two settings I use are the source index name and the rollup status which I check to verify if the index is a rollup index or not.
@@ -127,6 +128,15 @@ public class IndexMetadata implements Diffable<IndexMetadata>, ToXContentFragmen | |||
EnumSet.of(ClusterBlockLevel.WRITE) | |||
); | |||
|
|||
public boolean isRollupIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cool as a temporary thing. Maybe a TODO
in the code so we know we thought it was temporary when we added it.
String valuesSourceDescription, | ||
String aggregationName | ||
) { | ||
if (indexSettings.getIndexMetadata().isRollupIndex() && DateIntervalWrapper.IntervalTypeEnum.CALENDAR.equals(intervalType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this test for all time rolled up indices. Not that there are any non-time series ones, but I think the addition of isRollupIndex
means we don't need these new methods in IndexMode
any more.
final boolean rollupSuccess = IndexMetadata.RollupTaskStatus.SUCCESS.name() | ||
.toLowerCase(Locale.ROOT) | ||
.equals(indexRollupStatus != null ? indexRollupStatus.toLowerCase(Locale.ROOT) : IndexMetadata.RollupTaskStatus.UNKNOWN); | ||
return Strings.isNullOrEmpty(sourceIndex) == false && rollupSuccess; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a duplicate.
* Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double'). | ||
* Such field types do not support some aggregations. | ||
*/ | ||
public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth leaving a note that the name of this class is part of a contract with Kibana so we shouldn't change it.
from: 401.0 | ||
|
||
- match: { status: 400 } | ||
- match: { error.root_cause.0.type: unsupported_aggregation_on_downsampled_field } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth leaving a comment here that this type
name is part of a contract with kibana. Just extra paranoia so folks don't change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very quick look at the code. I only left a small comment about naming the exception, because this is something we cannot change in the future.
* Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double'). | ||
* Such field types do not support some aggregations. | ||
*/ | ||
public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we name the action "rollup" everywhere in the code, what about naming this exception UnsupportedAggregationOnRollupField
?
Also, since this is more generic than a rollup field we can name it UnsupportedAggregationOnRollupIndex
I've merged this so @flash1293 should be able to grab it sooner rather than later. |
FYI @tsullivan @ppisljar - you should be able to wire this up in your warnings handling pr |
When an unsupported aggregation is executed on a
aggregate_double_metric
field an error with a specifictype
field is generated. This allows clients like Kibanato identify these errors and handle them properly.
Also we would like to fail on queries using a date histogram
aggregation with
calendar_interval
on rollup indices.Date histogram aggregations are executed on rollup indices
only if using
fixed_interval
and aUTC
timezone.