-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve highlighting by using highlight_query with all_fields enabled #9671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting seems to work really well now! I tried out all the scenarios from the previous ticket and everything was great. I left a few thoughts about implementation inline.
Could you also update the docs to include the new advanced setting in this PR?
const boolQuery = { bool: { must: [ | ||
{ query_string: { query: 'foo' } }, | ||
{ range: { '@timestamp': { gte: 0, lte: 0 } } } | ||
] } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you format these so they're more readable?
if (_.has(clone, 'query_string')) { | ||
clone.query_string.all_fields = true; | ||
} else if (_.has(clone, 'bool.must')) { | ||
clone.bool.must = clone.bool.must.map(getHighlightQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about this being too brittle. I know we talked about only worrying about the query_string queries kibana creates, but maybe that's not such a great idea. If we change the structure of the query discover is generating (or ES changes the API) we might break highlighting without knowing about it. I also noticed another bug related to this assumption:
The query entered into the query bar is valid, but the code here assumes bool.must
is always an array.
What if we just recursively find any query_strings in the body and add all_fields
to them, instead of making assumptions about the structure of the query body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's kinda tricky, because what if the user has a field named query_string
and is using that field inside a query? Then we might be adding all_fields
in a place where it's not even allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only we had some kind of intermediate representation to avoid such issues... :)
It's a good point, I'll try to think on it over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a way to reliably do this without transforming the query into a normal form first either (which would be equivalent to an "intermediate representation").
fields: { | ||
'*': {} | ||
}, | ||
fragment_size: FRAGMENT_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If doc_table:highlight:all_fields
is false, should we set require_field_match
to false here to maintain the old behavior? Or do you think having a default_field is the only reason why someone might turn all_fields highlighting off, in which case they probably do want to require field matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, I never saw this comment for some reason. You hit the nail on the head... I wouldn't expect someone to turn this off unless they had a default field set. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I also can't think of a reason to turn it off unless someone wants to search a specific field by default and in that case they probably only want highlighting on that specific field. I was probably worrying too much about preserving the old behavior :)
@@ -389,6 +391,8 @@ export default function SourceAbstractFactory(Private, Promise, PromiseEmitter) | |||
recurse(agg.aggs || agg.aggregations); | |||
}); | |||
}(flatState.body.aggs || flatState.body.aggregations)); | |||
|
|||
highlightBody(flatState.body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that moving highlighting to _flatten ensures that it works across discover and dashboard. But it also means that even Vis requests ask for highlighting. I wonder if we should try to avoid this? Since the highlighting logic is nicely modularized now maybe we could just reuse it in both discover and dashboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also just realized this would overwrite any highlighting defined via the SearchSource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely vote for not adding more magic to the data source. This piece of code has cost me so much time when writing the log context view before giving up on it. The more we add the more we restrict its reuse.
Awesome progress! Functionally, highlighting works as I expect now, based on examples here #9091 (comment) The only weirdness I noticed was a refresh issue with whether the expanded document view for one of the documents in the preview was open during a search or not. If it was, there doesn't seem to be a refresh in terms of highlighting. |
@tbragin Does that happen in master as well, or is it limited to this PR? |
@lukasolson This issue seems to exists in master. Should I open a separate issue for it? |
Went ahead and filed a separate issue #9729 |
Okay, this is ready for review again. A couple of notes:
Could you take another look, @Bargs and @weltenwort? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely satisfied with the current implementation in that highlightRequest makes a call to _flatten. This means that any time we want to use this, _flatten will be called multiple times (once for highlighting and once to flatten for the actual request that gets sent). I'm open to suggestions here.
I tend to agree with you here. It's also a bit fragile since a change to the query without a subsequent call to searchSource.highlightRequest
will leave the highlighting in a stale state. What if instead we had a flag called highlightAll
or something, which a consumer of searchSource could toggle on, indicating that they want to opt in to all fields highlighting? Then at _flatten time we check that flag and if it's set, generate the highlighting config, similar to how you had it before?
Also, have you looked into updating Dashboard to use this functionality in addition to Discover? At the moment highlighting doesn't work on terms entered into the Dashboard's query bar:
And one last thing: we'll still need to update the documentation with the new advanced setting option.
I haven't gone through the highlight module changes with a fine-tooth comb yet, but I wanted to leave these few high level comments before digging too deep.
|
||
export default function getHighlightHtml(fieldValue, highlights) { | ||
let highlightHtml = (typeof fieldValue === 'object') | ||
? angular.toJson(fieldValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be JSON.stringify
? If we remove this one dependency on angular, the tests can be run standalone without Karma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember exactly why this was put in here, but if I remember correctly, fieldValue
could be an Angular object (with $$
-prefixed values) and that's why we used angular.toJson
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, not a big deal
@Bargs This is ready for another review. I didn't explicitly handle the dashboard case, but it looks to me like it's working there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw one minor doc typo, otherwise LGTM!
@@ -35,6 +35,8 @@ document. | |||
`discover:sampleSize`:: The number of rows to show in the Discover table. | |||
`doc_table:highlight`:: Highlight results in Discover and Saved Searches Dashboard. Highlighting makes request slow when | |||
working on big documents. Set this property to `false` to disable highlighting. | |||
`doc_table:highlight:all_fields`:: Improves highlighting by using a separate `highlight_query` that uses `all_fields` mode on | |||
`query_string` queries. Set to `false` if you are using a `default_field" in your index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the double quote after default_field needs to be a backtick?
Should this only apply to 5.x, seeing as how in 6.x the default will be to disable _all? |
I think old indices will still have _all though right? I don't think _all will be completely removed until 7.0. |
@@ -178,6 +181,11 @@ export default function SearchSourceFactory(Promise, Private, config) { | |||
}); | |||
}; | |||
|
|||
SearchSource.prototype.highlightRequest = function highlightRequest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something here, but where is this ever called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point, that should have been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the left-overs it LGTM as far as the code goes. But I don't have full confidence in my understanding of the way this changes highlighting for the user. I would defer judgement to someone else in that regard.
|
||
export default function SearchSourceFactory(Promise, Private, config) { | ||
const SourceAbstract = Private(AbstractDataSourceProvider); | ||
const SearchRequest = Private(SearchRequestProvider); | ||
const SegmentedRequest = Private(SegmentedRequestProvider); | ||
const searchStrategy = Private(SearchStrategyProvider); | ||
const normalizeSortRequest = Private(NormalizeSortRequestProvider); | ||
const getHighlightRequest = getHighlightRequestProvider(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also be a left-over that can be removed.
@@ -58,13 +58,15 @@ import AbstractDataSourceProvider from './_abstract'; | |||
import SearchRequestProvider from '../fetch/request/search'; | |||
import SegmentedRequestProvider from '../fetch/request/segmented'; | |||
import SearchStrategyProvider from '../fetch/strategy/search'; | |||
import { getHighlightRequestProvider } from '../../highlight'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also be a left-over that can be removed.
🎆 🎉 🎈 ! |
…elastic#9671) * Add all_fields to highlight query to improve highlighting * Refactor highlighting and move out of _flatten * Make changes as per @Bargs' requests * Add documentation about highlightAll setting * Fix docs typo * Remove unused function * Remove unused code
Backported to 5.x (5.3) in 18451e5. |
Fixes #2358.
Replaces #9091.
After some discussion, we decided that the way to have the most reliable and predictable highlighting experience would be to utilize the
highlight_query
option, and use the same query as the request body query, with the exception thatall_fields
is enabled for anyquery_string
queries.This also adds another advanced setting,
doc_table:highlight:all_fields
, which defaults totrue
. If set tofalse
, then the highlighting won't use a separatehighlight_query
.This should resolve all of the cases that we discussed in #9091.