Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup highlighting with synthetic source #87667

Merged
merged 5 commits into from
Jun 15, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 14, 2022

Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it often wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's enough to stop them from spitting out wrong answers.

We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because generally folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.

@nik9000 nik9000 added >non-issue :Search Relevance/Highlighting How a query matched a document :StorageEngine/TSDB You know, for Metrics v8.4.0 labels Jun 14, 2022
@nik9000 nik9000 requested a review from romseygeek June 14, 2022 20:10
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels Jun 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 added 2 commits June 14, 2022 16:21
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it *often* wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's *enough* to stop them from spitting out wrong answers.

We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because *generally* folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Nik. I think we need one more test to ensure that we cover all paths in FVH but other than that this is good to merge.

- match: { hits.hits.0.highlight.foo\.vectors.0: <em>the quick brown fox jumped over the lazy dog</em> }

---
text multi fvh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?

👍

@nik9000 nik9000 mentioned this pull request Jun 15, 2022
50 tasks
@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jun 15, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

@nik9000 nik9000 merged commit 8ebf39b into elastic:master Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Highlighting How a query matched a document :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Clients Meta label for clients team Team:Search Meta label for search team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants