Skip to content

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Nov 1, 2018

Today when a percolator query containing a date range then the query
analyzer extracts that range, so that at search time the percolate query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to exclude date ranges with 'now', so that the
query analyzer can't extract it and the percolate query is then able
to evualate 'now' at query time.

This is WIP to see whether this approach is acceptable and it is lacking tests.

…query

Today when a percolator query containing a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to exclude date ranges with 'now', so that the
query analyzer can't extract it and the `percolate` query is  then able
to evualate 'now' at query time.
@martijnvg martijnvg added WIP :Search Relevance/Percolator Reverse search: find queries that match a document labels Nov 1, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@martijnvg martijnvg added the >bug label Nov 1, 2018
.get();
assertHitCount(response, 3);

Thread.sleep(5000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there another way to do this instead of using a 5 second sleep? Maybe you can play with System.currentTimeMillis()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I would never push this test as is to master. I created this the quickly validate whether this fix was working.

@colings86 colings86 requested a review from jimczi November 5, 2018 16:14
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment regarding a different approach

createQueryBuilderField(indexVersion, queryBuilderField, queryBuilder, context);
Query query = toQuery(queryShardContext, isMapUnmappedFieldAsText(), queryBuilder);
processQuery(query, context);
QueryBuilder queryBuilderForProcessing = rewrite(queryBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use QueryShardContext#isCachable to determine if the query should be processed at index time or not. We can build the Lucene Query all the time and checks if the shard context is cacheable afterward. This way we don't need to visit the query in the rewrite function ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would work. However the downside is that then for the entire query we will not extract terms / ranges. In the current approach only the clause with a date range with now gets ignored, so that we can still benefit from index time terms / range extraction. I think that most of the time these clauses with now date ranges, co-exists with other clauses for which we can extract terms/ranges.

This approach does make query processing more complex and the QueryShardContext#isCachable is simple. Maybe we should do this and if we encounter feedback that these percolator queries with now date ranges are seen as slow then we can always bring back the approach in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I missed the recursion ;). What about Simon's idea: #35160 (comment) ? It is similar but the other clauses would still be extracted so it should be as fast as before.

}
}

static QueryBuilder rewrite(QueryBuilder queryBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this very much to be honest, I do wonder if we can introduce a special rewrite context that we can check for in the RangeQuery and then rewrite to MatchAllDocs this would make this change more contained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if we can introduce a special rewrite context that we can check for in the RangeQuery and then rewrite to MatchAllDocs

👍 That sounds like a good solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw @jimczi I've pushed this: 21b7b86

…trols

whether a context supports `now` in date ranges. This defaults to `true`.

In the case of the percolator, it passes down a special rewrite context that
returns `false.`.
@s1monw
Copy link
Contributor

s1monw commented Nov 6, 2018

looks ok to me what do you think @jimczi ? @martijnvg can we add some docs on the builder why we do this?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it better too. +1 to add an explanation in the builder

/**
* @return whether the query rewrite context supports 'now' (current time) in range queries with data ranges.
*/
public boolean supportsNowInRangeQueries() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ignoresNowInRangeQueries since we only rewrite the range to a match_all ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertNowRangeToMatchAll() ?

@martijnvg
Copy link
Member Author

@s1monw @jimczi Thanks for checking this out. I will add some jdocs, rename the method and add tests for this.

@martijnvg
Copy link
Member Author

@simonw @jimczi I've updated the PR.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg martijnvg removed the WIP label Nov 7, 2018
@martijnvg martijnvg merged commit 8de3c6e into elastic:master Nov 7, 2018
martijnvg added a commit that referenced this pull request Nov 7, 2018
…query (#35160)

Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query,
so that the query analyzer can't extract it and the `percolate` query
is  then able to evaluate 'now' at query time.
martijnvg added a commit that referenced this pull request Nov 7, 2018
…query (#35160)

Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query,
so that the query analyzer can't extract it and the `percolate` query
is  then able to evaluate 'now' at query time.
jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Nov 8, 2018
* master: (24 commits)
  Replicate index settings to followers (elastic#35089)
  Rename RealmConfig.globalSettings() to settings() (elastic#35330)
  [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329)
  Scripting: Add back lookup vars in score script (elastic#34833)
  watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271)
  Remove ALL shard check in CheckShrinkReadyStep (elastic#35346)
  Use soft-deleted docs to resolve strategy for engine operation (elastic#35230)
  [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316)
  Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160)
  Add a frozen engine implementation (elastic#34357)
  Put a fake allocation id on allocate stale primary command (elastic#34140)
  [CCR] Enforce auto follow pattern name restrictions (elastic#35197)
  [ILM] rolling upgrade tests (elastic#35328)
  [ML] Add Missing data checking class (elastic#35310)
  Apply `ignore_throttled` also to concrete indices (elastic#35335)
  Make version field names more meaningful  (elastic#35334)
  [CCR] Added HLRC support for pause follow API (elastic#35216)
  [Docs] Improve Convert Processor description (elastic#35280)
  [Painless] Removes extraneous compile method (elastic#35323)
  [CCR] Fail with a better error if leader index is red (elastic#35298)
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Nov 8, 2018
* elastic/master: (25 commits)
  Fixes fast vector highlighter docs per issue 24318. (elastic#34190)
  [ML] Prevent notifications on deletion of a non existent job (elastic#35337)
  [CCR] Auto follow Coordinator fetch cluster state in system context (elastic#35120)
  Mute test for elastic#35361
  Preserve `date_histogram` format when aggregating on unmapped fields (elastic#35254)
  Test: Mute failing SSL test
  Allow unmapped fields in composite aggregations (elastic#35331)
  [RCI] Add IndexShardOperationPermits.asyncBlockOperations(ActionListener<Releasable>) (elastic#34902)
  HLRC: reindex API with wait_for_completion false (elastic#35202)
  Add docs on JNA temp directory not being noexec (elastic#35355)
  [CCR] Adjust list of dynamic index settings that should be replicated (elastic#35195)
  Replicate index settings to followers (elastic#35089)
  Rename RealmConfig.globalSettings() to settings() (elastic#35330)
  [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329)
  Scripting: Add back lookup vars in score script (elastic#34833)
  watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271)
  Remove ALL shard check in CheckShrinkReadyStep (elastic#35346)
  Use soft-deleted docs to resolve strategy for engine operation (elastic#35230)
  [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316)
  Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160)
  ...
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018
…query (elastic#35160)

Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query, 
so that the query analyzer can't extract it and the `percolate` query 
is  then able to evaluate 'now' at query time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Percolator Reverse search: find queries that match a document v6.5.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants