Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify how source is passed to fetch subphases. #65292

Merged
merged 4 commits into from
Nov 20, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Nov 20, 2020

This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:

  • For each document, we try to eagerly load the source and store it on HitContext. Most subphases that access source, like source filtering and highlighting, use HitContext. For nested hits, we filter the parent source and also store this source on HitContext.
  • Only for non-nested documents, we also store the loaded source on QueryShardContext#lookup. This allows subphases that access source through SearchLookup to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special SearchLookup just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how QueryShardContext should be used.

Addresses #62511.

@@ -546,3 +546,46 @@
- match: {hits.hits.0.highlight.company.0: "<em>ABC</em> company"}
- match: {hits.hits.1.highlight.company.0: "<em>ABC</em> <em>ABCD</em> company"}
- match: {hits.hits.2.highlight.company.0: "<em>ABCD</em> company"}

---
# This test guards against a subtle edge case in the fetch phase related to source-loading and
Copy link
Contributor Author

@jtibshirani jtibshirani Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This protects against reintroducing the bug in #31000. I should have added this test when fixing the bug originally, but better late than never!

@jtibshirani
Copy link
Contributor Author

A question that might come up: why not go further and provide source access in only one place? Maybe we could remove HitContext#sourceLookup and just have QueryShardContext#searchLookup? Here are the downsides I see:

  • A subphase could easily reposition the lookup on QueryShardContext and invalidate the document source. This can cause subtle bugs, as in _source not returned in "explain": true with certain queries #31000. (We don't yet guard against this for HitContext#lookup, but we could certainly add a check that the document ID never changes.)
  • It adds complexity when handling nested documents. When processing a nested document, we would have to store the filtered source on QueryShardContext, invalidating the source for the parent document. This is maybe okay if inner_hits is always run last, but it's fragile and introduces an ordering requirement on phases.

My mental model is that HitContext provides reliable access to the preloaded source for a particular document. QueryShardContext gives more flexible lookup ability, but it's just a 'best effort' as to whether the preloaded source is available.

@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories >refactoring v7.11.0 v8.0.0 labels Nov 20, 2020
@jtibshirani jtibshirani marked this pull request as ready for review November 20, 2020 02:52
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor

This clashes slightly with #65219 but I think the two can be reconciled. I particularly like the simplification to MappedFieldType#valueFetcher - I wonder if that can be done as a separate change?

@jtibshirani
Copy link
Contributor Author

This clashes slightly with #65219 but I think the two can be reconciled

My apologies, I had meant to submit this comment earlier but it didn't go through! Would you be okay reviewing this one first, as I think it will clarify the requirements (and it's actively causing difficulties with other PRs, as in #63572)?

I particularly like the simplification to MappedFieldType#valueFetcher - I wonder if that can be done as a separate change?

I think it needs to be done as part of this change. Before, the fetch phase used its own SearchLookup separate from the one on QueryShardContext.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jtibshirani

@jtibshirani jtibshirani merged commit f4a462d into elastic:master Nov 20, 2020
@jtibshirani jtibshirani deleted the share-search-lookup branch November 20, 2020 22:09
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 20, 2020
This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:
* For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used.

Addresses elastic#62511.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants