Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail shards early when we can detect a type missmatch #79869

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -1368,3 +1368,67 @@ huge size:
- match: { aggregations.str_terms.buckets.1.doc_count: 2 }
- match: { aggregations.str_terms.buckets.2.key: c }
- match: { aggregations.str_terms.buckets.2.doc_count: 3 }

---
Value type missmatch fails shard:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: mismatch (not missmatch)

- skip:
version: " - 8.0.99"
reason: "Fixed in 8.1"

- do:
indices.create:
index: valuetype_test_1
body:
settings:
number_of_shards: 1
number_of_replicas: 0
mappings:
properties:
ip:
type: keyword
- do:
indices.create:
index: valuetype_test_2
body:
settings:
number_of_shards: 1
number_of_replicas: 0
mappings:
properties:
ip:
type: ip

- do:
bulk:
index: valuetype_test_2
refresh: true
body: |
{ "index": {} }
{ "ip": "127.0.0.1" }
{ "index": {} }
{ "ip": "192.168.0.1" }
{ "index": {} }
{ "ip": "192.168.0.2" }
{ "index": {} }
{ "ip": "192.168.0.3" }
- do:
search:
index: valuetype_test_1,valuetype_test_2
body:
size: 0
aggs:
str_terms:
terms:
field: ip
value_type: ip

- match: { _shards.failed: 1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a special case of the problem, because valuetype_test_1 contains no documents. Wouldn't it be ok to silently skip its shards instead of failing them?

Also, would it make sense to add a test where the index may contain documents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are two problems with silently skipping shards that don't have docs that I can think of. First, we don't know that until later in the process, and by that time we don't have an obvious way to check if we have a conflict. So we'd need to carry some "has type conflict" flag and fail if that's true and we have docs. Seems clunky. It also adds some leniency and unpredictability. If the query changes to include docs on those shards, now the aggregation starts failing. Seems confusing.

Furthermore, if there are no matching docs on the shard, failing it doesn't change the results. It just lets the user know there's a potential problem. More information is better.

And, finally, yes, it makes sense to add a test with docs in both indices. Will push one up shortly. Thanks!

- length: { aggregations.str_terms.buckets: 4 }
- match: { aggregations.str_terms.buckets.0.key: "127.0.0.1" }
- match: { aggregations.str_terms.buckets.0.doc_count: 1 }
- match: { aggregations.str_terms.buckets.1.key: "192.168.0.1" }
- match: { aggregations.str_terms.buckets.1.doc_count: 1 }
- match: { aggregations.str_terms.buckets.2.key: "192.168.0.2" }
- match: { aggregations.str_terms.buckets.2.doc_count: 1 }
- match: { aggregations.str_terms.buckets.3.key: "192.168.0.3" }
- match: { aggregations.str_terms.buckets.3.doc_count: 1 }
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,8 @@ public MappedFieldType fieldType() {
return fieldType;
}

public String getTypeName() {
return fieldType.typeName();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,26 @@ private static ValuesSourceConfig internalResolve(
if (valuesSourceType == null) {
// We have a field, and the user didn't specify a type, so get the type from the field
valuesSourceType = fieldResolver.getValuesSourceType(fieldContext, userValueTypeHint, defaultValueSourceType);
}
} else if (valuesSourceType != fieldResolver.getValuesSourceType(fieldContext, userValueTypeHint, defaultValueSourceType)
&& script == null) {
/*
* This is the case where the user has specified the type they expect, but we found a field of a different type.
* Usually this happens because of a mapping error, e.g. an older index mapped an IP address as a keyword. If
* the aggregation proceeds, it will usually break during reduction and return no results. So instead, we fail the
* shard with the conflict at this point, allowing the correctly mapped shards to return results with a partial
* failure.
*
* Note that if a script is specified, the assumption is that the script adapts the field into the specified type,
* and we allow the aggregation to continue.
*/
throw new IllegalArgumentException(
"Field type ["
+ fieldContext.getTypeName()
+ "] is incompatible with specified value_type ["
+ userValueTypeHint
+ "]"
);
}
}
}
if (valuesSourceType == null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ public void testBadUserValueTypeHint() throws IOException {
ValueType.NUMERIC // numeric type hint
)
);
assertThat(e.getMessage(), equalTo("Expected numeric type on field [binary], but got [binary]"));
assertThat(e.getMessage(), equalTo("Field type [binary] is incompatible with specified value_type [numeric]"));
}

private void testSearchCase(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,209 @@
import org.elasticsearch.index.fielddata.SortedBinaryDocValues;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.index.mapper.MapperServiceTestCase;
import org.elasticsearch.script.AggregationScript;
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptContext;
import org.mockito.Mockito;

import java.util.List;

// TODO: This whole set of tests needs to be rethought.
public class ValuesSourceConfigTests extends MapperServiceTestCase {

@Override
@SuppressWarnings("unchecked")
protected <T> T compileScript(Script script, ScriptContext<T> context) {
AggregationScript.Factory mockFactory = Mockito.mock(AggregationScript.Factory.class);
Mockito.when(mockFactory.newFactory(Mockito.any(), Mockito.any())).thenReturn(Mockito.mock(AggregationScript.LeafFactory.class));
return (T) mockFactory;
}

/**
* Attempting to resolve a config with neither a field nor a script specified throws an error
*/
public void testNoFieldNoScript() {
expectThrows(
IllegalStateException.class,
() -> ValuesSourceConfig.resolve(null, null, null, null, null, null, null, CoreValuesSourceType.KEYWORD)
);
}

/**
* When there's an unmapped field with no script, we should use the user value type hint if available, and fall back to the default
* value source type if it's not available.
*/
public void testUnmappedFieldNoScript() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "long")));
// No value type hint
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(context, null, "UnmappedField", null, null, null, null, CoreValuesSourceType.KEYWORD);
assertEquals(CoreValuesSourceType.KEYWORD, config.valueSourceType());
});

// With value type hint
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
ValueType.IP,
"UnmappedField",
null,
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.IP, config.valueSourceType());
});
}

/**
* When the field is mapped and there's no script and no hint, use the field type
*/
public void testMappedFieldNoScriptNoHint() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "long")));
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(context, null, "field", null, null, null, null, CoreValuesSourceType.KEYWORD);
assertEquals(CoreValuesSourceType.NUMERIC, config.valueSourceType());
});
}

/**
* When we have a mapped field and a hint, but no script, we should throw if the hint doesn't match the field,
* and use the type of both if they do match. Note that when there is a script, we just use the user value type
* regardless of the field type. This is to allow for scripts that adapt types, even though runtime fields are
* a better solution for that in every way.
*/
public void testMappedFieldNoScriptWithHint() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "long")));
// not matching case
expectThrows(
IllegalArgumentException.class,
() -> withAggregationContext(
mapperService,
List.of(source(b -> b.field("field", 42))),
context -> ValuesSourceConfig.resolve(context, ValueType.IP, "field", null, null, null, null, CoreValuesSourceType.KEYWORD)
)
);

// matching case
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(context, ValueType.NUMBER, "field", null, null, null, null, CoreValuesSourceType.KEYWORD);
assertEquals(CoreValuesSourceType.NUMERIC, config.valueSourceType());
});
}

/**
* When there's a script and the user tells us what type it produces, always use that type, regardless of if there's also a field
*/
public void testScriptWithHint() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "long")));
// With field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
ValueType.IP,
"field",
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.IP, config.valueSourceType());
}, () -> null);

// With unmapped field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
ValueType.IP,
"unmappedField",
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.IP, config.valueSourceType());
}, () -> null);

// Without field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
ValueType.IP,
null,
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.IP, config.valueSourceType());
}, () -> null);
}

/**
* If there's a script and the user didn't tell us what type it produces, use the field if possible, otherwise the default
*/
public void testScriptNoHint() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "long")));
// With field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
null,
"field",
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.NUMERIC, config.valueSourceType());
}, () -> null);

// With unmapped field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
null,
"unmappedField",
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.KEYWORD, config.valueSourceType());
}, () -> null);

// Without field
withAggregationContext(mapperService, List.of(source(b -> b.field("field", 42))), context -> {
ValuesSourceConfig config;
config = ValuesSourceConfig.resolve(
context,
null,
null,
mockScript("mockscript"),
null,
null,
null,
CoreValuesSourceType.KEYWORD
);
assertEquals(CoreValuesSourceType.KEYWORD, config.valueSourceType());
}, () -> null);
}

public void testKeyword() throws Exception {
MapperService mapperService = createMapperService(fieldMapping(b -> b.field("type", "keyword")));
withAggregationContext(mapperService, List.of(source(b -> b.field("field", "abc"))), context -> {
Expand Down
Loading