-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Ignore date ranges containing 'now' when pre-processing a percolator query #35160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…query Today when a percolator query containing a date range then the query analyzer extracts that range, so that at search time the `percolate` query can exclude percolator queries efficiently that are never going to match. The problem is that if 'now' is used it is evaluated at index time. So the idea is to exclude date ranges with 'now', so that the query analyzer can't extract it and the `percolate` query is then able to evualate 'now' at query time.
|
Pinging @elastic/es-search-aggs |
| .get(); | ||
| assertHitCount(response, 3); | ||
|
|
||
| Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another way to do this instead of using a 5 second sleep? Maybe you can play with System.currentTimeMillis()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I would never push this test as is to master. I created this the quickly validate whether this fix was working.
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment regarding a different approach
| createQueryBuilderField(indexVersion, queryBuilderField, queryBuilder, context); | ||
| Query query = toQuery(queryShardContext, isMapUnmappedFieldAsText(), queryBuilder); | ||
| processQuery(query, context); | ||
| QueryBuilder queryBuilderForProcessing = rewrite(queryBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could use QueryShardContext#isCachable to determine if the query should be processed at index time or not. We can build the Lucene Query all the time and checks if the shard context is cacheable afterward. This way we don't need to visit the query in the rewrite function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would work. However the downside is that then for the entire query we will not extract terms / ranges. In the current approach only the clause with a date range with now gets ignored, so that we can still benefit from index time terms / range extraction. I think that most of the time these clauses with now date ranges, co-exists with other clauses for which we can extract terms/ranges.
This approach does make query processing more complex and the QueryShardContext#isCachable is simple. Maybe we should do this and if we encounter feedback that these percolator queries with now date ranges are seen as slow then we can always bring back the approach in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I missed the recursion ;). What about Simon's idea: #35160 (comment) ? It is similar but the other clauses would still be extracted so it should be as fast as before.
| } | ||
| } | ||
|
|
||
| static QueryBuilder rewrite(QueryBuilder queryBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this very much to be honest, I do wonder if we can introduce a special rewrite context that we can check for in the RangeQuery and then rewrite to MatchAllDocs this would make this change more contained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if we can introduce a special rewrite context that we can check for in the RangeQuery and then rewrite to MatchAllDocs
👍 That sounds like a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…trols whether a context supports `now` in date ranges. This defaults to `true`. In the case of the percolator, it passes down a special rewrite context that returns `false.`.
|
looks ok to me what do you think @jimczi ? @martijnvg can we add some docs on the builder why we do this? |
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it better too. +1 to add an explanation in the builder
| /** | ||
| * @return whether the query rewrite context supports 'now' (current time) in range queries with data ranges. | ||
| */ | ||
| public boolean supportsNowInRangeQueries() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ignoresNowInRangeQueries since we only rewrite the range to a match_all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convertNowRangeToMatchAll() ?
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…query (#35160) Today when a percolator query contains a date range then the query analyzer extracts that range, so that at search time the `percolate` query can exclude percolator queries efficiently that are never going to match. The problem is that if 'now' is used it is evaluated at index time. So the idea is to rewrite date ranges with 'now' to a match all query, so that the query analyzer can't extract it and the `percolate` query is then able to evaluate 'now' at query time.
…query (#35160) Today when a percolator query contains a date range then the query analyzer extracts that range, so that at search time the `percolate` query can exclude percolator queries efficiently that are never going to match. The problem is that if 'now' is used it is evaluated at index time. So the idea is to rewrite date ranges with 'now' to a match all query, so that the query analyzer can't extract it and the `percolate` query is then able to evaluate 'now' at query time.
* master: (24 commits) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) Add a frozen engine implementation (elastic#34357) Put a fake allocation id on allocate stale primary command (elastic#34140) [CCR] Enforce auto follow pattern name restrictions (elastic#35197) [ILM] rolling upgrade tests (elastic#35328) [ML] Add Missing data checking class (elastic#35310) Apply `ignore_throttled` also to concrete indices (elastic#35335) Make version field names more meaningful (elastic#35334) [CCR] Added HLRC support for pause follow API (elastic#35216) [Docs] Improve Convert Processor description (elastic#35280) [Painless] Removes extraneous compile method (elastic#35323) [CCR] Fail with a better error if leader index is red (elastic#35298) ...
* elastic/master: (25 commits) Fixes fast vector highlighter docs per issue 24318. (elastic#34190) [ML] Prevent notifications on deletion of a non existent job (elastic#35337) [CCR] Auto follow Coordinator fetch cluster state in system context (elastic#35120) Mute test for elastic#35361 Preserve `date_histogram` format when aggregating on unmapped fields (elastic#35254) Test: Mute failing SSL test Allow unmapped fields in composite aggregations (elastic#35331) [RCI] Add IndexShardOperationPermits.asyncBlockOperations(ActionListener<Releasable>) (elastic#34902) HLRC: reindex API with wait_for_completion false (elastic#35202) Add docs on JNA temp directory not being noexec (elastic#35355) [CCR] Adjust list of dynamic index settings that should be replicated (elastic#35195) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) ...
…query (elastic#35160) Today when a percolator query contains a date range then the query analyzer extracts that range, so that at search time the `percolate` query can exclude percolator queries efficiently that are never going to match. The problem is that if 'now' is used it is evaluated at index time. So the idea is to rewrite date ranges with 'now' to a match all query, so that the query analyzer can't extract it and the `percolate` query is then able to evaluate 'now' at query time.
Today when a percolator query containing a date range then the query
analyzer extracts that range, so that at search time the
percolatequerycan exclude percolator queries efficiently that are never going to match.
The problem is that if 'now' is used it is evaluated at index time.
So the idea is to exclude date ranges with 'now', so that the
query analyzer can't extract it and the
percolatequery is then ableto evualate 'now' at query time.
This is WIP to see whether this approach is acceptable and it is lacking tests.