-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix query analyzer logic for mixed conjunctions of terms and ranges #49803
Fix query analyzer logic for mixed conjunctions of terms and ranges #49803
Conversation
Pinging @elastic/es-search (:Search/Percolator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is tricky. :)
.build(); | ||
|
||
Result r = analyze(disj, Version.CURRENT); | ||
assertThat(r.minimumShouldMatch, equalTo(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you count the number of extractions and check verified
for all queries?
.build(); | ||
|
||
result = analyze(q2, Version.CURRENT); | ||
assertThat(result.minimumShouldMatch, equalTo(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's maybe also have a test for the case that there are multiple range queries on the same field, or multiple range queries on different fields?
Thanks for the review @jpountz . I added more tests, and discovered that the logic was still not quite correct - tricky, as you say... We now only check that range fields from a particular set of extractions have not been seen in other results. Multiple identical range fields within a single Result extraction set will have already been dealt with, so don't require any adjustment to the msm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's hard for me to reason about all possible corner cases, but the tests give me some confidence that it's at least more correct than the previous logic.
@@ -286,6 +284,9 @@ private static Result handleConjunction(List<Result> conjunctionsWithUnknowns) { | |||
verified = false; | |||
} | |||
} | |||
// add range fields from this Result to the seenRangeFields set so that minimumShouldMatch is correctly | |||
// calculated for subsequent Results | |||
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I generally have a preference for using method refs when possible and using new lines when chaining, e.g.
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName)); | |
result.extractions.stream() | |
.filter(Objects::nonNull) | |
.map(e -> e.range) | |
.forEach(seenRangeFields::add); |
…49803) When the query analyzer examines a conjunction containing both terms and ranges, it should only include ranges in the minimum_should_match calculation if there are no other range queries on that same field within the conjunction. This is because we cannot build a selection query over disjoint ranges on the same field, and it is not easy to check if two range queries have an overlap. The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent on whether or not the current range is over a field that has already been seen. However, this can be incorrect in the case that there are terms in the same match group which adjust the minimum_should_match downwards. Instead, the logic should be changed to match the terms extraction, whereby we adjust minimum_should_match downwards if we have already seen a range field. Fixes #49684
…lastic#49803) When the query analyzer examines a conjunction containing both terms and ranges, it should only include ranges in the minimum_should_match calculation if there are no other range queries on that same field within the conjunction. This is because we cannot build a selection query over disjoint ranges on the same field, and it is not easy to check if two range queries have an overlap. The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent on whether or not the current range is over a field that has already been seen. However, this can be incorrect in the case that there are terms in the same match group which adjust the minimum_should_match downwards. Instead, the logic should be changed to match the terms extraction, whereby we adjust minimum_should_match downwards if we have already seen a range field. Fixes elastic#49684
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the
minimum_should_match
calculation if there are noother range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.
The current logic to calculate this just sets
minimum_should_match
to1
or0
, dependenton whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match
downwards. Instead, the logic should be changed to match theterms extraction, whereby we adjust
minimum_should_match
downwards if we have alreadyseen a range field.
Fixes #49684