Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252

salvatore-campagna · 2022-08-10T16:13:39Z

When an unsupported aggregation is executed on a
aggregate_double_metric field an error with a specific
type field is generated. This allows clients like Kibana
to identify these errors and handle them properly.

Also we would like to fail on queries using a date histogram
aggregation with calendar_interval on rollup indices.
Date histogram aggregations are executed on rollup indices
only if using fixed_interval and a UTC timezone.

To start with we decided that we would like to support only date histogram aggregations using 'fixed_interval' and throw an error if 'calendar_interval' is used on a time series index.

elasticsearchmachine · 2022-08-10T16:16:28Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

server/src/main/java/org/elasticsearch/ElasticsearchException.java

.../rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/20_unsupported_aggs.yml

.../org/elasticsearch/search/aggregations/bucket/histogram/DateHistogramAggregationBuilder.java

Date histogram aggregations must fail if 'calendar_interval' or a non-utc time zone is used, only if the index the aggregation is running on is a rollup index. For time series indices, date histogram aggregations should work as expected without throwing any error.

salvatore-campagna · 2022-08-18T10:37:25Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

@@ -127,6 +128,15 @@ public class IndexMetadata implements Diffable<IndexMetadata>, ToXContentFragmen
        EnumSet.of(ClusterBlockLevel.WRITE)
    );

+    public boolean isRollupIndex() {


I have the feeling this is not the best I can do here. I am working on a better way to do this at the moment but I wanted to push this change because it might be good enough. If that feeling is shared but this is considered "good enough" we could proceed merging this to unblock Kibana folks experimenting with catching exceptions. Then I can create another issue to refactor this. N.B. I am on vacation next week and Christos is on vacation too and I would like to avoid Kibana folks being stuck waiting for this.

As a result, if required, I can work on this on another PR whose purpose would be to refactor this "isRollupIndex" logic.

I think it's cool as a temporary thing. Maybe a TODO in the code so we know we thought it was temporary when we added it.

flash1293

Looks mostly good to me, left a nit about another test we could add

flash1293 · 2022-08-18T10:37:25Z

.../rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/20_unsupported_aggs.yml

+  - match: { error.root_cause.0.reason: "Field [total_memory_used] of type [aggregate_metric_double] is not supported for aggregation [percentiles]" }
+
+---
+"Top-metrics aggregation on aggregate_metric_double field":


Not sure whether out of scope but what about a test for the mixed case (as this will be extremely common in practice)? Querying both indices at the same time, resulting in one of them succeeding and the other producing a shard failure so there's partial data in the response

Oh that is a good test I think...I am not sure what the result is in this case. I guess we hava a shard failure for the rollup index and partial results coming from the time series index...?

This test is not necessary for the errors we throw if the aggregation is not supported (aggregation on field of type agrgegate_metric_double). In that case we don't check if the index is a time series index or not...just the field type. Also when doing a rollup on a field, lat's say a histogram aggregation on a gauge field called test_field. In that case we create the rollup index and a new field that is called test_field_histogram. As a result it would not be possible to run a query on both indices on the same field because the field name in the rollup index would be different 'by construction' (original_field_name + aggregation_name).

Anyway, it is a good test for the date histogram case...in that case indeed the field is the timestamp field which is the same in both the source index and the rollup index.

Hitting both indices produces partial results coming from the source time series index and a shard failure coming from the rollup index.

salvatore-campagna · 2022-08-18T11:10:19Z

...in/rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/30_date_histogram.yml

+                calendar_interval: hour
+                min_doc_count: 1
+
+  - match: { _shards.total: 2 }


NOTE: here we rely on the fact that the rollup index is created with the same number of shards of the original index, which is set to 1 in the setup block.

salvatore-campagna · 2022-08-18T11:49:12Z

...in/rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/30_date_histogram.yml

@@ -0,0 +1,250 @@
+setup:


I had to change this test adding the logic to create the rollup index from the source index because some settings and metadata are applied by the rollup operation and it is not possible to set them from the yaml test. The two settings I use are the source index name and the rollup status which I check to verify if the index is a rollup index or not.

nik9000 · 2022-08-18T15:42:18Z

server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java

@@ -127,6 +128,15 @@ public class IndexMetadata implements Diffable<IndexMetadata>, ToXContentFragmen
        EnumSet.of(ClusterBlockLevel.WRITE)
    );

+    public boolean isRollupIndex() {


I think it's cool as a temporary thing. Maybe a TODO in the code so we know we thought it was temporary when we added it.

nik9000 · 2022-08-18T15:43:52Z

server/src/main/java/org/elasticsearch/index/IndexMode.java

+            String valuesSourceDescription,
+            String aggregationName
+        ) {
+            if (indexSettings.getIndexMetadata().isRollupIndex() && DateIntervalWrapper.IntervalTypeEnum.CALENDAR.equals(intervalType)) {


I think we want this test for all time rolled up indices. Not that there are any non-time series ones, but I think the addition of isRollupIndex means we don't need these new methods in IndexMode any more.

nik9000 · 2022-08-18T15:44:08Z

server/src/main/java/org/elasticsearch/index/IndexMode.java

+        final boolean rollupSuccess = IndexMetadata.RollupTaskStatus.SUCCESS.name()
+            .toLowerCase(Locale.ROOT)
+            .equals(indexRollupStatus != null ? indexRollupStatus.toLowerCase(Locale.ROOT) : IndexMetadata.RollupTaskStatus.UNKNOWN);
+        return Strings.isNullOrEmpty(sourceIndex) == false && rollupSuccess;


I think this is a duplicate.

nik9000 · 2022-08-18T15:44:42Z

...ain/java/org/elasticsearch/search/aggregations/UnsupportedAggregationOnDownsampledField.java

+ * Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double').
+ * Such field types do not support some aggregations.
+ */
+public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException {


It's probably worth leaving a note that the name of this class is part of a contract with Kibana so we shouldn't change it.

nik9000 · 2022-08-18T15:47:12Z

.../rollup/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/rollup/20_unsupported_aggs.yml

+                    from: 401.0
+
+  - match: { status: 400 }
+  - match: { error.root_cause.0.type: unsupported_aggregation_on_downsampled_field }


It's probably worth leaving a comment here that this type name is part of a contract with kibana. Just extra paranoia so folks don't change it.

csoulios

I had a very quick look at the code. I only left a small comment about naming the exception, because this is something we cannot change in the future.

csoulios · 2022-08-19T05:58:52Z

...ain/java/org/elasticsearch/search/aggregations/UnsupportedAggregationOnDownsampledField.java

+ * Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double').
+ * Such field types do not support some aggregations.
+ */
+public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException {


Since we name the action "rollup" everywhere in the code, what about naming this exception UnsupportedAggregationOnRollupField?

Also, since this is more generic than a rollup field we can name it UnsupportedAggregationOnRollupIndex

nik9000 · 2022-08-22T18:22:35Z

I've merged this so @flash1293 should be able to grab it sooner rather than later.

flash1293 · 2022-08-22T18:28:33Z

FYI @tsullivan @ppisljar - you should be able to wire this up in your warnings handling pr

salvatore-campagna added 13 commits August 10, 2022 18:09

Handle failures for aggregations on aggregate_double_metric fields

763a9ac

fix: register new elasticsearch exception

e3e93ae

fix: register new elasticsearch exception

c11e303

fix: add the new exception to the ids

37c0e98

fix: prevent possible null pointer exception

00e08a0

fix: correct mistake in null pointer condition check

bbf1e08

fix: fail on date histogram aggregations with calendar_interval

9e8dc17

To start with we decided that we would like to support only date histogram aggregations using 'fixed_interval' and throw an error if 'calendar_interval' is used on a time series index.

fix: use the actual mode for comparison

b0e751a

refactor: improve code readability

dfe0bda

fi: fail on date histogram aggregations with non utc timezone

5ed41b2

fix: fail on date histogram aggregations with non utc timezone

47f9780

fix: remove yaml file added by mistake

d31904d

fix: remove useless min_doc_count

967aee8

salvatore-campagna requested review from csoulios and nik9000 August 10, 2022 16:13

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.5.0 labels Aug 10, 2022

fix: remove empty line added by mistake

9e3e6c5

salvatore-campagna added >non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data and removed needs:triage Requires assignment of a team area label labels Aug 10, 2022

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 10, 2022

Merge branch 'main' into feature/unsupported-aggs-downsampling-errors

9263914

salvatore-campagna commented Aug 11, 2022

View reviewed changes

salvatore-campagna added 2 commits August 11, 2022 09:28

fix: update version to 8.5.0

2457bf9

fix: add missing skip clause up to version 8.5.0

9550d46

salvatore-campagna requested a review from romseygeek August 11, 2022 07:37

salvatore-campagna added 2 commits August 11, 2022 10:07

fix: skip error on null timezone

b140c43

fix: add missing min_doc_count

619aa14

tsullivan mentioned this pull request Aug 17, 2022

[search/public] expose showWarnings(inspector) method on search service elastic/kibana#138342

Merged

3 tasks

salvatore-campagna added 3 commits August 18, 2022 12:23

Merge branch 'main' into feature/unsupported-aggs-downsampling-errors

cdc54c9

fix: code format violations

f381b75

salvatore-campagna requested review from nik9000, flash1293 and not-napoleon August 18, 2022 10:33

salvatore-campagna commented Aug 18, 2022

View reviewed changes

flash1293 reviewed Aug 18, 2022

View reviewed changes

test: include a test hitting both indices

1715522

Hitting both indices produces partial results coming from the source time series index and a shard failure coming from the rollup index.

salvatore-campagna commented Aug 18, 2022

View reviewed changes

docs: add a note clarifying a test

bd7db95

salvatore-campagna commented Aug 18, 2022

View reviewed changes

nik9000 requested changes Aug 18, 2022

View reviewed changes

nik9000 reviewed Aug 18, 2022

View reviewed changes

salvatore-campagna added 5 commits August 18, 2022 20:32

todo: refactor method isRollupIndex after adding more rollup metadata

0f2600b

fix: remove unused method

3f16441

fix: todo

22d2347

note: include a note clarifuing the elasticsearch/kibana contract

1d31f70

refactor: remove date histogram validation methods

9226e84

nik9000 approved these changes Aug 18, 2022

View reviewed changes

csoulios reviewed Aug 19, 2022

View reviewed changes

salvatore-campagna added 3 commits August 20, 2022 16:21

refactor: rename exception

f7d737a

Merge branch 'main' into feature/unsupported-aggs-downsampling-errors

ada8087

Merge branch 'main' into feature/unsupported-aggs-downsampling-errors

1e38114

nik9000 approved these changes Aug 22, 2022

View reviewed changes

nik9000 merged commit 4b92e1d into elastic:main Aug 22, 2022

dej611 mentioned this pull request May 2, 2023

[Lens] Cannot use different date field than default index one if downsampled elastic/kibana#156377

Open

Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252

Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252

Uh oh!

Conversation

salvatore-campagna commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Aug 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

salvatore-campagna Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flash1293 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csoulios left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Aug 22, 2022

Uh oh!

flash1293 commented Aug 22, 2022

Uh oh!

Uh oh!

salvatore-campagna commented Aug 10, 2022 •

edited

Loading

salvatore-campagna Aug 18, 2022 •

edited

Loading

salvatore-campagna Aug 18, 2022 •

edited

Loading