-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic _source: support field
in many cases
#89950
Synthetic _source: support field
in many cases
#89950
Conversation
This adds support for the `field` scripting API in many but not all cases. Before this change numbers, dates, and IPs supported the `field` API when running with _source in synthetic mode because they always have doc values. This change adds support for `match_only_text`, `store`d `keyword` fields, and `store`d `text` fields. Two remaining field configurations work with synthetic _source and do not work with `field`: * A `text` field with a sub-`keyword` field that has `doc_values` * A `text` field with a sub-`keyword` field that is `store`d
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @nik9000, I've created a changelog YAML for you. |
public boolean alwaysEmpty() { | ||
return sourceProvider.alwaysEmpty(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this via the FieldDataContext somehow? Putting it on the SourceLookup feels a bit weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it, but, like, it's the source lookup that is empty. I think maybe moving and renaming it is a good idea. I just don't know the right spot.
throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations"); | ||
} | ||
SourceLookup sourceLookup = fieldDataContext.lookupSupplier().get().source(); | ||
if (sourceLookup.alwaysEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using alwaysEmpty
to signal that you should try synthetic source things. But maybe it'd be better to have it actually have a syntheticSource
method instead. Always empty is a fairly broad concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could have source providers have a syntheticSource
method instead - the only non-test place I believe we explicitly use the "null" source provider is from MappingLookup which checks if we have synthetic source.
Pinging @elastic/es-search (Team:Search) |
@romseygeek could you have another look? It's much bigger now.... |
… synthetic_source_fields_3
server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romseygeek this is ready for you again.
run |
run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this change 👍 . I didn't do a line by line review, but how it works looks good to me.
So I know that I asked for the information about synthetic source to be moved to FieldDataContext but I have, annoyingly, changed my mind about that again. FDC holds information at the context in which field data is being asked for, and synthetic source is entirely orthogonal to that. I've opened #91400 to move this to MapperBuilderContext, which I think will fit better here as well - field types can get a constructor parameter telling them if they need to load things from a secret stored field or just use source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there, I left a couple of questions.
if (hasDocValues()) { | ||
return fieldDataFromDocValues(); | ||
} | ||
if (isSyntheticSource && isStored()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if synthetic source is enabled and the data isn't stored? As I read it we fall through to using normal source, which will fail a bit unpredictably I think? Is it worth throwing an explicit Exception here in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't possible to configure synthetic _source without doc values or stored fields. I'll add a hard test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an assertion as well so its more obvious when reading the code?
if (operation != FielddataOperation.SCRIPT) { | ||
throw new IllegalStateException("unknown field data operation [" + operation.name() + "]"); | ||
} | ||
if (isSyntheticSource && isStored()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.
|
||
public void testDocValues() throws IOException { | ||
MapperService mapper = createMapperService(fieldMapping(b -> b.field("type", "text"))); | ||
assertScriptDocValues(mapper, "foo", equalTo(List.of("foo"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth testing that an input that would otherwise be tokenized is returned whole here. Basically test that we get "foo bar" back, rather than a list of "foo" and "bar".
assertScriptDocValues(mapper, "foo", equalTo(List.of("foo"))); | ||
} | ||
|
||
@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/86603") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This links to the master synthetic source issue, is there a more specific one that this relates to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll un-awaitsfix it and have it catch the fancy new exception I'm throwing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for iterating.
This adds support for the
field
scripting API in many but not all cases. Before this change numbers, dates, and IPs supported thefield
API when running with _source in synthetic mode because they always have doc values. This change adds support formatch_only_text
,store
dkeyword
fields, andstore
dtext
fields. Two remaining field configurations work with synthetic _source and do not work withfield
:text
field with a sub-keyword
field that hasdoc_values
text
field with a sub-keyword
field that isstore
d