Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark shard failures caused by unsupported aggregations or queries against rolled up data so Kibana can identify them #89252

Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
763a9ac
Handle failures for aggregations on aggregate_double_metric fields
salvatore-campagna Aug 9, 2022
e3e93ae
fix: register new elasticsearch exception
salvatore-campagna Aug 9, 2022
c11e303
fix: register new elasticsearch exception
salvatore-campagna Aug 9, 2022
37c0e98
fix: add the new exception to the ids
salvatore-campagna Aug 9, 2022
00e08a0
fix: prevent possible null pointer exception
salvatore-campagna Aug 9, 2022
bbf1e08
fix: correct mistake in null pointer condition check
salvatore-campagna Aug 9, 2022
9e8dc17
fix: fail on date histogram aggregations with calendar_interval
salvatore-campagna Aug 10, 2022
b0e751a
fix: use the actual mode for comparison
salvatore-campagna Aug 10, 2022
dfe0bda
refactor: improve code readability
salvatore-campagna Aug 10, 2022
5ed41b2
fi: fail on date histogram aggregations with non utc timezone
salvatore-campagna Aug 10, 2022
47f9780
fix: fail on date histogram aggregations with non utc timezone
salvatore-campagna Aug 10, 2022
d31904d
fix: remove yaml file added by mistake
salvatore-campagna Aug 10, 2022
967aee8
fix: remove useless min_doc_count
salvatore-campagna Aug 10, 2022
9e3e6c5
fix: remove empty line added by mistake
salvatore-campagna Aug 10, 2022
9263914
Merge branch 'main' into feature/unsupported-aggs-downsampling-errors
salvatore-campagna Aug 11, 2022
2457bf9
fix: update version to 8.5.0
salvatore-campagna Aug 11, 2022
9550d46
fix: add missing skip clause up to version 8.5.0
salvatore-campagna Aug 11, 2022
b140c43
fix: skip error on null timezone
salvatore-campagna Aug 11, 2022
619aa14
fix: add missing min_doc_count
salvatore-campagna Aug 11, 2022
21f0c07
doc: include javadoc for the new exception
salvatore-campagna Aug 16, 2022
b2d5bb9
fix: use the index mode setting directly
salvatore-campagna Aug 16, 2022
d08de71
refactor: extract a validation method for the calendar interval type
salvatore-campagna Aug 16, 2022
b8f4a5f
fix: fail validation on 'calendar_interval'
salvatore-campagna Aug 16, 2022
1434aec
fix: incorrect validation return value
salvatore-campagna Aug 16, 2022
0993e09
fix: incorrect validation return value
salvatore-campagna Aug 16, 2022
d96fbd0
fix: extract logic to throw exceptions from the registry
salvatore-campagna Aug 16, 2022
fe79b0c
doc: include javadoc for the new ValuesSourceType interface method
salvatore-campagna Aug 16, 2022
2936b6d
fix: remove file added by mistake
salvatore-campagna Aug 16, 2022
a9c484b
fix: refactor logic to throw exceptions
salvatore-campagna Aug 16, 2022
1dfba79
fix: possible null pointer exception
salvatore-campagna Aug 16, 2022
742c628
fix: handle possible null exception providers
salvatore-campagna Aug 16, 2022
9e9987e
fix: void methods do not return any value
salvatore-campagna Aug 16, 2022
3de5555
fix: assert not null exception
salvatore-campagna Aug 17, 2022
77ab1b0
test: assertion error for null values source type exception handler
salvatore-campagna Aug 17, 2022
39cbfa7
fix: typo
salvatore-campagna Aug 17, 2022
37a1eb1
fix: get rid of unused variable
salvatore-campagna Aug 17, 2022
3297479
fix: use illegal_argument_exception for date histogram errors
salvatore-campagna Aug 17, 2022
c0e54ef
test: remove test on assertion
salvatore-campagna Aug 17, 2022
c5f2878
refactor: build exception message inside validation methods
salvatore-campagna Aug 17, 2022
076d589
fix: fail date histogram aggregations for rollup indices
salvatore-campagna Aug 18, 2022
cdc54c9
Merge branch 'main' into feature/unsupported-aggs-downsampling-errors
salvatore-campagna Aug 18, 2022
f381b75
fix: code format violations
salvatore-campagna Aug 18, 2022
1715522
test: include a test hitting both indices
salvatore-campagna Aug 18, 2022
bd7db95
docs: add a note clarifying a test
salvatore-campagna Aug 18, 2022
0f2600b
todo: refactor method isRollupIndex after adding more rollup metadata
salvatore-campagna Aug 18, 2022
3f16441
fix: remove unused method
salvatore-campagna Aug 18, 2022
22d2347
fix: todo
salvatore-campagna Aug 18, 2022
1d31f70
note: include a note clarifuing the elasticsearch/kibana contract
salvatore-campagna Aug 18, 2022
9226e84
refactor: remove date histogram validation methods
salvatore-campagna Aug 18, 2022
f7d737a
refactor: rename exception
salvatore-campagna Aug 20, 2022
ada8087
Merge branch 'main' into feature/unsupported-aggs-downsampling-errors
salvatore-campagna Aug 21, 2022
1e38114
Merge branch 'main' into feature/unsupported-aggs-downsampling-errors
salvatore-campagna Aug 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1578,6 +1578,12 @@ private enum ElasticsearchExceptionHandle {
HealthNodeNotDiscoveredException::new,
166,
Version.V_8_5_0
),
UNSUPPORTED_AGGREGATION_ON_DOWNSAMPLED_FIELD_EXCEPTION(
org.elasticsearch.search.aggregations.UnsupportedAggregationOnDownsampledField.class,
org.elasticsearch.search.aggregations.UnsupportedAggregationOnDownsampledField::new,
167,
Version.V_8_5_0
);

final Class<? extends ElasticsearchException> exceptionClass;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import org.elasticsearch.cluster.routing.allocation.IndexMetadataUpdater;
import org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider;
import org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.collect.ImmutableOpenMap;
import org.elasticsearch.common.collect.MapBuilder;
import org.elasticsearch.common.compress.CompressedXContent;
Expand Down Expand Up @@ -127,6 +128,15 @@ public class IndexMetadata implements Diffable<IndexMetadata>, ToXContentFragmen
EnumSet.of(ClusterBlockLevel.WRITE)
);

public boolean isRollupIndex() {
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling this is not the best I can do here. I am working on a better way to do this at the moment but I wanted to push this change because it might be good enough. If that feeling is shared but this is considered "good enough" we could proceed merging this to unblock Kibana folks experimenting with catching exceptions. Then I can create another issue to refactor this. N.B. I am on vacation next week and Christos is on vacation too and I would like to avoid Kibana folks being stuck waiting for this.

As a result, if required, I can work on this on another PR whose purpose would be to refactor this "isRollupIndex" logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cool as a temporary thing. Maybe a TODO in the code so we know we thought it was temporary when we added it.

final String sourceIndex = settings.get(IndexMetadata.INDEX_ROLLUP_SOURCE_NAME_KEY);
final String indexRollupStatus = settings.get(IndexMetadata.INDEX_ROLLUP_STATUS_KEY);
final boolean rollupSuccess = IndexMetadata.RollupTaskStatus.SUCCESS.name()
.toLowerCase(Locale.ROOT)
.equals(indexRollupStatus != null ? indexRollupStatus.toLowerCase(Locale.ROOT) : IndexMetadata.RollupTaskStatus.UNKNOWN);
return Strings.isNullOrEmpty(sourceIndex) == false && rollupSuccess;
}

public enum State implements Writeable {
OPEN((byte) 0),
CLOSE((byte) 1);
Expand Down
81 changes: 81 additions & 0 deletions server/src/main/java/org/elasticsearch/index/IndexMode.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.cluster.metadata.MetadataCreateDataStreamService;
import org.elasticsearch.cluster.routing.IndexRouting;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.compress.CompressedXContent;
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
Expand All @@ -28,10 +29,13 @@
import org.elasticsearch.index.mapper.RoutingFieldMapper;
import org.elasticsearch.index.mapper.TimeSeriesIdFieldMapper;
import org.elasticsearch.index.mapper.TsidExtractingIdFieldMapper;
import org.elasticsearch.search.aggregations.bucket.histogram.DateIntervalWrapper;

import java.io.IOException;
import java.time.ZoneId;
import java.util.Arrays;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Objects;
import java.util.function.BooleanSupplier;
Expand Down Expand Up @@ -110,6 +114,22 @@ public DocumentDimensions buildDocumentDimensions(IndexSettings settings) {
public boolean shouldValidateTimestamp() {
return false;
}

@Override
public void validateCalendarIntervalType(
IndexSettings indexSettings,
DateIntervalWrapper.IntervalTypeEnum intervalType,
String valuesSourceDescription,
String aggregationName
) {}

@Override
public void validateCalendarTimeZone(
IndexSettings indexSettings,
ZoneId tz,
String valuesSourceDescription,
String aggregationName
) {}
},
TIME_SERIES("time_series") {
@Override
Expand Down Expand Up @@ -196,8 +216,55 @@ public DocumentDimensions buildDocumentDimensions(IndexSettings settings) {
public boolean shouldValidateTimestamp() {
return true;
}

/**
* For a time series index we just support 'fixed_interval' aggregations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have missed something. I thought we only wanted to limit to fixed interval for the rolled up indices. Time series indices are supposed to support all the things. I think. I certainly could be forgetting things, but that was my memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, this was also my understanding

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what is wrong here...do you mean this should fail only if the date histogram aggregation runs on a rolled-up index? And not on every time series index? If that is what I missed then I have no idea how do we know if an index is a result of a rollup operation or not. Maybe we have some meta-data? I will check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the UTC/non-UTC check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for UTC/non-UTC by my understanding. Not sure whether it's right or not, but in Kibana we currently check this via the meta.fixed_interval property on the field: https://github.com/elastic/kibana/blob/9799dbba27c5baf594357eae0bbfc79b4e7da77c/src/plugins/data_views/server/fetcher/lib/field_capabilities/field_caps_response.ts#L145

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what is wrong here...do you mean this should fail only if the date histogram aggregation runs on a rolled-up index? And not on every time series index? If that is what I missed then I have no idea how do we know if an index is a result of a rollup operation or not. Maybe we have some meta-data? I will check.

It should fail only if the date_histogram agg is executed on a rolled up index. "Raw" time-series indices support any date_histogram interval.
Rollup indices contain the IndexMetadata.INDEX_ROLLUP_SOURCE_NAME setting in their index settings. And the @timestamp field should contain the fixed_interval property in the field mapping meta. See

private static void addTimestampField(final RollupActionConfig config, final XContentBuilder builder) throws IOException {
final String timestampField = config.getTimestampField();
final String dateIntervalType = config.getIntervalType();
final String dateInterval = config.getInterval().toString();
final String timezone = config.getTimeZone();
builder.startObject(timestampField)
.field("type", DateFieldMapper.CONTENT_TYPE)
.startObject("meta")
.field(dateIntervalType, dateInterval)
.field(RollupActionConfig.TIME_ZONE, timezone)
.endObject()
.endObject();
}

*
* @param intervalType the type of interval used by the date histogram
*/
@Override
public void validateCalendarIntervalType(
IndexSettings indexSettings,
DateIntervalWrapper.IntervalTypeEnum intervalType,
String valuesSourceDescription,
String aggregationName
) {
if (indexSettings.getIndexMetadata().isRollupIndex() && DateIntervalWrapper.IntervalTypeEnum.CALENDAR.equals(intervalType)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this test for all time rolled up indices. Not that there are any non-time series ones, but I think the addition of isRollupIndex means we don't need these new methods in IndexMode any more.

throw new IllegalArgumentException(
valuesSourceDescription
+ " is not supported for aggregation ["
+ aggregationName
+ "] with interval type ["
+ intervalType.getPreferredName()
+ "]"
);
}
}

@Override
public void validateCalendarTimeZone(
IndexSettings indexSettings,
ZoneId tz,
String valuesSourceDescription,
String aggregationName
) {
if (indexSettings.getIndexMetadata().isRollupIndex() && tz != null && ZoneId.of("UTC").equals(tz) == false) {
throw new IllegalArgumentException(
valuesSourceDescription + " is not supported for aggregation [" + aggregationName + "] with timezone [" + tz + "]"
);
}
}
};

private static boolean isIndexRollup(final IndexSettings indexSettings) {
final String sourceIndex = indexSettings.getIndexMetadata().getSettings().get(IndexMetadata.INDEX_ROLLUP_SOURCE_NAME_KEY);
final String indexRollupStatus = indexSettings.getIndexMetadata().getSettings().get(IndexMetadata.INDEX_ROLLUP_STATUS_KEY);
final boolean rollupSuccess = IndexMetadata.RollupTaskStatus.SUCCESS.name()
.toLowerCase(Locale.ROOT)
.equals(indexRollupStatus != null ? indexRollupStatus.toLowerCase(Locale.ROOT) : IndexMetadata.RollupTaskStatus.UNKNOWN);
return Strings.isNullOrEmpty(sourceIndex) == false && rollupSuccess;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a duplicate.

}

protected static String tsdbMode() {
return "[" + IndexSettings.MODE.getKey() + "=time_series]";
}
Expand Down Expand Up @@ -331,4 +398,18 @@ public static IndexMode fromString(String value) {
public String toString() {
return getName();
}

public abstract void validateCalendarIntervalType(
IndexSettings indexSettings,
DateIntervalWrapper.IntervalTypeEnum intervalType,
String valuesSourceDescription,
String aggregationName
);

public abstract void validateCalendarTimeZone(
IndexSettings indexSettings,
ZoneId tz,
String valueSourceDescription,
String aggregationName
);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.search.aggregations;

import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.rest.RestStatus;

import java.io.IOException;

/**
* Thrown when executing an aggregation on a time series index field whose type is not supported.
* Downsampling uses specific types while aggregating some fields (like 'aggregate_metric_double').
* Such field types do not support some aggregations.
*/
public class UnsupportedAggregationOnDownsampledField extends AggregationExecutionException {
salvatore-campagna marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth leaving a note that the name of this class is part of a contract with Kibana so we shouldn't change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we name the action "rollup" everywhere in the code, what about naming this exception UnsupportedAggregationOnRollupField?

Also, since this is more generic than a rollup field we can name it UnsupportedAggregationOnRollupIndex


public UnsupportedAggregationOnDownsampledField(final String msg) {
super(msg);
}

public UnsupportedAggregationOnDownsampledField(final StreamInput in) throws IOException {
super(in);
}

@Override
public RestStatus status() {
return RestStatus.BAD_REQUEST;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -417,9 +417,16 @@ protected ValuesSourceAggregatorFactory innerBuild(
AggregatorFactory parent,
AggregatorFactories.Builder subFactoriesBuilder
) throws IOException {
DateHistogramAggregationSupplier aggregatorSupplier = context.getValuesSourceRegistry().getAggregator(REGISTRY_KEY, config);
final DateIntervalWrapper.IntervalTypeEnum dateHistogramIntervalType = dateHistogramInterval.getIntervalType();

context.getIndexSettings()
.getMode()
.validateCalendarIntervalType(context.getIndexSettings(), dateHistogramIntervalType, config.getDescription(), getName());

final ZoneId tz = timeZone();
context.getIndexSettings().getMode().validateCalendarTimeZone(context.getIndexSettings(), tz, config.getDescription(), getName());

DateHistogramAggregationSupplier aggregatorSupplier = context.getValuesSourceRegistry().getAggregator(REGISTRY_KEY, config);
final Rounding rounding = dateHistogramInterval.createRounding(tz, offset);

LongBounds roundedBounds = null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,13 +169,23 @@ public <T> T getAggregator(RegistryKey<T> registryKey, ValuesSourceConfig values
@SuppressWarnings("unchecked")
T supplier = (T) aggregatorRegistry.get(registryKey).get(valuesSourceConfig.valueSourceType());
if (supplier == null) {
throw new IllegalArgumentException(
valuesSourceConfig.getDescription() + " is not supported for aggregation [" + registryKey.getName() + "]"
);
final RuntimeException unmappedException = valuesSourceConfig.valueSourceType()
.getUnregisteredException(
valuesSourceConfig.getDescription() + " is not supported for aggregation [" + registryKey.getName() + "]"
);
assert unmappedException != null
: "Value source type ["
+ valuesSourceConfig.valueSourceType()
+ "] did not return a valid exception for aggregation ["
+ registryKey.getName()
+ "]";
throw unmappedException;
}
return supplier;
}
throw new AggregationExecutionException("Unregistered Aggregation [" + registryKey.getName() + "]");
throw new AggregationExecutionException(
"Unregistered Aggregation [" + (registryKey != null ? registryKey.getName() : "unknown aggregation") + "]"
);
}

public AggregationUsageService getUsageService() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,15 @@ default DocValueFormat getFormatter(String format, ZoneId tz) {
* @return the name of the Values Source Type
*/
String typeName();

/**
* Returns the exception to throw in case the registry (type, aggregator) entry
* is not registered.
*
* @param message the message for the exception
* @return the exception to throw
*/
default RuntimeException getUnregisteredException(String message) {
return new IllegalArgumentException(message);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
import org.elasticsearch.search.SearchParseException;
import org.elasticsearch.search.SearchShardTarget;
import org.elasticsearch.search.aggregations.MultiBucketConsumerService;
import org.elasticsearch.search.aggregations.UnsupportedAggregationOnDownsampledField;
import org.elasticsearch.search.internal.ShardSearchContextId;
import org.elasticsearch.snapshots.Snapshot;
import org.elasticsearch.snapshots.SnapshotException;
Expand Down Expand Up @@ -831,6 +832,7 @@ public void testIds() {
ids.put(164, VersionConflictException.class);
ids.put(165, SnapshotNameAlreadyInUseException.class);
ids.put(166, HealthNodeNotDiscoveredException.class);
ids.put(167, UnsupportedAggregationOnDownsampledField.class);

Map<Class<? extends ElasticsearchException>, Integer> reverse = new HashMap<>();
for (Map.Entry<Integer, Class<? extends ElasticsearchException>> entry : ids.entrySet()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import org.elasticsearch.script.AggregationScript;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.AggregationExecutionException;
import org.elasticsearch.search.aggregations.UnsupportedAggregationOnDownsampledField;
import org.elasticsearch.search.aggregations.support.AggregationContext;
import org.elasticsearch.search.aggregations.support.FieldContext;
import org.elasticsearch.search.aggregations.support.ValueType;
Expand All @@ -22,6 +23,12 @@
public enum AggregateMetricsValuesSourceType implements ValuesSourceType {

AGGREGATE_METRIC() {

@Override
public RuntimeException getUnregisteredException(String message) {
return new UnsupportedAggregationOnDownsampledField(message);
}

@Override
public ValuesSource getEmpty() {
throw new IllegalArgumentException("Can't deal with unmapped AggregateMetricsValuesSource type " + this.value());
Expand Down
Loading