Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to _search to force synthetic source #87068

Merged
merged 16 commits into from
Jun 7, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 23, 2022

This adds ?force_synthetic_source to, well, force running the fetch
phase with synthetic source. If the mapping is incompatible with
synthetic source it'll throw a 400 error. This should be useful for
folks looking to evaluate synthetic source.

Folks can use it to check if their current mapping supports synthetic
source. Given how picky synthetic source is, their mapping probably
doesn't support it now, but this gives us a very quick way to test
it. Which should help shorten the testing cycle.

If their mapping does support synthetic source it'll fetch the _source
as thought synthetic source were enabled so folks can have a look at
the result and see if it's done a good job. Synthetic source makes a
lot of assumptions about the original layout of the _source and some
of those assumptions are destructive to, well, the layout. If you
need the layout you can try to use the results this produces.

Finally this gives you an upper bound on the performance hit of synthetic
source. This feature will still load and decompress the original
_source bytes, so you won't get any of the fetch-time performance
advantages to synthetic source - but it throws out the _source and
rebuilds it from doc values - do you get all of the fetch time
performance disadvanteages. It the fethc performance is still
acceptable then you know synthetic source is ok for you. If it is
marginal then you'd need a full scale test, reindexing everything
with synthetic source. If the fetch performance is nowhere near ok
then you know synthetic source isn't good for you, at least not yet.

@nik9000 nik9000 requested a review from romseygeek May 23, 2022 21:08
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 23, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000 nik9000 mentioned this pull request May 23, 2022
50 tasks
nik9000 added 4 commits May 23, 2022 17:18
This adds `?force_synthetic_source` to, well, force running the fetch
phase with synthetic source. If the mapping is incompatible with
synthetic source it'll throw a 400 error.
@nik9000
Copy link
Member Author

nik9000 commented May 30, 2022

This actually won't "turn off" the _source - it'll still be decompressed. So this will show a "worst case" for latency - it'll include the synthetic source. It'll just load the _source from stored fields and throw it away.

@javanna
Copy link
Member

javanna commented May 31, 2022

Heya @nik9000 can you expand on what the usecase is for this feature?

@javanna javanna added the :Search Foundations/Mapping Index mappings, including merging and defining field types label May 31, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 31, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000
Copy link
Member Author

nik9000 commented May 31, 2022

Heya @nik9000 can you expand on what the usecase is for this feature?

I've edited the description to add it - the short version is that it's a way to evaluate synthetic source without reindexing. I don't think it's a "forever" thing, but I've talked to quite a few folks that would find it useful.

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label May 31, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

@sethmlarson
Copy link
Contributor

I don't think it's a "forever" thing

Is this option going to be deprecated+removed in the future? If so might be good to document it this way. Also is this option for human debugging only or are we planning on users using it in applications?

@nik9000
Copy link
Member Author

nik9000 commented May 31, 2022

Is this option going to be deprecated+removed in the future?

I think so.

Also is this option for human debugging only or are we planning on users using it in applications?

Debugging. Though maybe not just humans.

@nik9000
Copy link
Member Author

nik9000 commented Jun 3, 2022

I think so.

How can I document this option in this way? Can I just not document it? There is a very real chance it'll live behind the feature flag for it's entire life and be removed before the feature flag.

@sethmlarson
Copy link
Contributor

@nik9000 Gotcha! In that case can we document that this parameter is experimental and will be removed in a future version? I think that's enough warning for a parameter that's behind a feature flag and undocumented elsewhere.

@nik9000 nik9000 requested a review from romseygeek June 6, 2022 18:47
@romseygeek
Copy link
Contributor

SearchExecutionContext can't see the root object mapper that's required. I don't think we want to make that public for other reasons.

It can via mappingLookup.getMapping().getRoot()

@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2022

It can via mappingLookup.getMapping().getRoot()

"'getRoot()' is not public in 'org.elasticsearch.index.mapper.Mapping'. Cannot be accessed from outside package"

@romseygeek
Copy link
Contributor

Fair point...

But! SourceFieldMapper is in the same package, and we can change the signature of SourceFieldMapper.newSourceLoader() to take the Mapping instead of the root object; and indeed, everywhere that newSourceLoader() is called, it's called with a call to getMapping().getRoot(). So if we rearrange things a bit then we can keep package-privacy and avoid adding that boolean to the mapping APIs.

@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2022

But! SourceFieldMapper is in the same package, and we can change the signature of SourceFieldMapper.newSourceLoader() to take the Mapping instead of the root object; and indeed, everywhere that newSourceLoader() is called, it's called with a call to getMapping().getRoot(). So if we rearrange things a bit then we can keep package-privacy and avoid adding that boolean to the mapping APIs.

I'll have a look

@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2022

I'll have a look

Yeah, that works. @romseygeek, have another look when you get a chance.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @nik9000!

@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2022

@sethmlarson I've added a flag to the spec to mark this parameter as behind the feature flag. Is that OK? Am I going to break a everyone?

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to the rest-api-spec schema.json LGTM.

@nik9000 nik9000 merged commit d0b50b5 into elastic:master Jun 7, 2022
@@ -308,6 +321,9 @@ public void writeTo(StreamOutput out) throws IOException {
+ "] or greater."
);
}
if (out.getVersion().onOrAfter(Version.V_8_4_0)) {
out.writeBoolean(forceSyntheticSource);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have an else block here and throw? This ends up failing silently otherwise when communication with older nodes, and breaks the assumption that the ccs compatibility flag is based on (that an exception is thrown whenever a new option is used that a node in the previous minor does not support)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Sorry, yeah. I should have had it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a fix for both of these in #87481.

out.writeBoolean(forceSyntheticSource);
} else {
if (forceSyntheticSource) {
throw new IllegalArgumentException("force_synthetic_source is not supported before 8.3.0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mention 8.3 in the error message, but 8.4 in the conditional above. I suspect you did that on purpose but I am not following: is there a way to force synthetic source on 8.3 which does not support the flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mistake!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbelievable, I could not believe my eyes hence I convinced myself you must have done it on purpose :)

elasticsearchmachine pushed a commit that referenced this pull request Jun 7, 2022
This fixes a missing check for unsupported version logic that'd cause
`force_synthetic_source` to be silently dropped when sent to nodes
before 8.4. It also fixes an incorrect version number in an error
message.

Relates to #87068
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 8, 2022
This adds the option to force synthetic source to the GET API. See
 elastic#87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in GET.
nik9000 added a commit that referenced this pull request Jun 9, 2022
This adds the option to force synthetic source to the GET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in GET.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 9, 2022
This adds the option to force synthetic source to the MGET API. See
 elastic#87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in MGET.
nik9000 added a commit that referenced this pull request Jun 22, 2022
This adds the option to force synthetic source to the MGET API. See
 #87068 for more discussion on why you'd want to do that - the short
version is to get an upper bound on the performance cost of using
synthetic source in MGET.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Clients Meta label for clients team Team:Search Meta label for search team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants