-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup highlighting with synthetic source #87667
Conversation
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Synthetic source has a habit of reordering text fields. This frustrates highlighting because it *often* wants to use index structures to find the offsets to values in the field. This disables the FVH highlighter for multi-valued text fields when synthetic source is enabled and runs the unified highlighter in "analyze" mode when synthetic source is enabled. That's *enough* to stop them from spitting out wrong answers. We might be leaving some performance on the table when the unified highlighter works on a single valued text field that is indexed with offsets or term vectors. We don't really expect that to be common at all though because *generally* folks will enable synthetic source to save space and adding offsets or term vectors is quite space inefficient. If it comes up, we might be able to improve here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Nik. I think we need one more test to ensure that we cover all paths in FVH but other than that this is good to merge.
- match: { hits.hits.0.highlight.foo\.vectors.0: <em>the quick brown fox jumped over the lazy dog</em> } | ||
|
||
--- | ||
text multi fvh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that this also occurs when you ask for fragments in order, so that we exercise both FragmentsBuilder implementations?
👍
Pinging @elastic/clients-team (Team:Clients) |
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it often wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's enough to stop them from spitting out wrong answers.
We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because generally folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.