Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic _source: support field in many cases #89950

Merged
merged 17 commits into from
Nov 10, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 8, 2022

This adds support for the field scripting API in many but not all cases. Before this change numbers, dates, and IPs supported the field API when running with _source in synthetic mode because they always have doc values. This change adds support for match_only_text, stored keyword fields, and stored text fields. Two remaining field configurations work with synthetic _source and do not work with field:

  • A text field with a sub-keyword field that has doc_values
  • A text field with a sub-keyword field that is stored

image

@nik9000 nik9000 mentioned this pull request Sep 8, 2022
50 tasks
This adds support for the `field` scripting API in many but not all
cases. Before this change numbers, dates, and IPs supported the `field`
API when running with _source in synthetic mode because they always have
doc values. This change adds support for `match_only_text`, `store`d
`keyword` fields, and `store`d `text` fields. Two remaining field
configurations work with synthetic _source and do not work with `field`:
* A `text` field with a sub-`keyword` field that has `doc_values`
* A `text` field with a sub-`keyword` field that is `store`d
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 9, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

public boolean alwaysEmpty() {
return sourceProvider.alwaysEmpty();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this via the FieldDataContext somehow? Putting it on the SourceLookup feels a bit weird to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it, but, like, it's the source lookup that is empty. I think maybe moving and renaming it is a good idea. I just don't know the right spot.

throw new IllegalArgumentException(CONTENT_TYPE + " fields do not support sorting and aggregations");
}
SourceLookup sourceLookup = fieldDataContext.lookupSupplier().get().source();
if (sourceLookup.alwaysEmpty()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using alwaysEmpty to signal that you should try synthetic source things. But maybe it'd be better to have it actually have a syntheticSource method instead. Always empty is a fairly broad concept.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have source providers have a syntheticSource method instead - the only non-test place I believe we explicitly use the "null" source provider is from MappingLookup which checks if we have synthetic source.

@nik9000 nik9000 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Sep 13, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Sep 13, 2022
@nik9000 nik9000 requested a review from romseygeek September 14, 2022 19:16
@nik9000
Copy link
Member Author

nik9000 commented Sep 14, 2022

@romseygeek could you have another look? It's much bigger now....

@nik9000 nik9000 requested a review from martijnvg September 21, 2022 15:15
@csoulios csoulios removed the v8.5.0 label Sep 21, 2022
Copy link
Member Author

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek this is ready for you again.

@nik9000
Copy link
Member Author

nik9000 commented Nov 7, 2022

run
elasticsearch-ci/part-1

@nik9000
Copy link
Member Author

nik9000 commented Nov 7, 2022

run elasticsearch-ci/part-1

@nik9000 nik9000 requested a review from romseygeek November 7, 2022 15:35
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this change 👍 . I didn't do a line by line review, but how it works looks good to me.

@romseygeek
Copy link
Contributor

So I know that I asked for the information about synthetic source to be moved to FieldDataContext but I have, annoyingly, changed my mind about that again. FDC holds information at the context in which field data is being asked for, and synthetic source is entirely orthogonal to that. I've opened #91400 to move this to MapperBuilderContext, which I think will fit better here as well - field types can get a constructor parameter telling them if they need to load things from a secret stored field or just use source.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, I left a couple of questions.

if (hasDocValues()) {
return fieldDataFromDocValues();
}
if (isSyntheticSource && isStored()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if synthetic source is enabled and the data isn't stored? As I read it we fall through to using normal source, which will fail a bit unpredictably I think? Is it worth throwing an explicit Exception here in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible to configure synthetic _source without doc values or stored fields. I'll add a hard test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an assertion as well so its more obvious when reading the code?

if (operation != FielddataOperation.SCRIPT) {
throw new IllegalStateException("unknown field data operation [" + operation.name() + "]");
}
if (isSyntheticSource && isStored()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has a different answer! This will be possible in a follow up but isn't yet. I'll throw a helpful exception here too.


public void testDocValues() throws IOException {
MapperService mapper = createMapperService(fieldMapping(b -> b.field("type", "text")));
assertScriptDocValues(mapper, "foo", equalTo(List.of("foo")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth testing that an input that would otherwise be tokenized is returned whole here. Basically test that we get "foo bar" back, rather than a list of "foo" and "bar".

assertScriptDocValues(mapper, "foo", equalTo(List.of("foo")));
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/86603")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This links to the master synthetic source issue, is there a more specific one that this relates to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll un-awaitsfix it and have it catch the fancy new exception I'm throwing.

@nik9000 nik9000 requested a review from romseygeek November 10, 2022 14:51
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for iterating.

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 10, 2022
@elasticsearchmachine elasticsearchmachine merged commit 74d0d19 into elastic:main Nov 10, 2022
@nik9000 nik9000 deleted the synthetic_source_fields_3 branch November 10, 2022 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants