-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not pass negative scores into function_score or script_score queries #13627
Do not pass negative scores into function_score or script_score queries #13627
Conversation
❌ Gradle check result for 0f6c9b1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0f6c9b1
to
1081bb9
Compare
❕ Gradle check result for 1081bb9: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #13627 +/- ##
============================================
+ Coverage 71.42% 71.49% +0.07%
- Complexity 59978 61132 +1154
============================================
Files 4985 5059 +74
Lines 282275 287523 +5248
Branches 40946 41646 +700
============================================
+ Hits 201603 205560 +3957
- Misses 63999 64970 +971
- Partials 16673 16993 +320 ☔ View full report in Codecov by Sentry. |
1081bb9
to
31c61c7
Compare
❌ Gradle check result for 31c61c7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
}] | ||
explain: true | ||
- match: { hits.total.value: 3 } | ||
- match: { hits.hits.2._score: 0.0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, so the functional score on a document will be at least zero, instead of negative. This will fix the score per document level.
Wondering if this also fix the per field level score is negative? can we run profile api on yaml test and check perFieldSimilarity score is negative or zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic focuses on the input score passed to a function or script, regardless of where that score came from.
Both before and after this change, we have checks that the score returned by a function or script is not negative (and throw an exception if it is). Unfortunately, this meant that a function/script that just passes a score through would throw an exception if the input score is negative.
So, for any case where a function or script gets an illegal negative value as input, this change defensively treats it as zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome! thank you
Gradle check failed due to #13600. Retrying... |
❌ Gradle check result for 31c61c7: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
public void score(float subScore) { | ||
// We check to make sure the script score function never makes a score negative, but we need to make | ||
// sure the script score function does not receive negative input. | ||
this.score = Math.max(0.0f, subScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msfroh don't we try to hide the problem instead of fixing the source of negative scoring? As you mentioned, we do catch that in some cases (and thrown an exception) but it slips in other, why cannot we fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some discussion about this on #7860.
Essentially, the scoring logic in cross_fields
match is pretty complicated. Actually, the Elasticsearch documentation for cross_fields
even contains the following warning:
The cross_fields type blends field statistics in a complex way that can be hard to interpret. The score combination can even be incorrect, in particular when some documents contain some of the search fields, but not all of them.
As @ylwu-amzn called out on the issue, any change we make to the cross_fields
scoring formula could impact people's "regular" search results. We'd need to be extremely careful.
Meanwhile, this change just turns a query that was throwing an exception (which is what people have complained about) into one that doesn't.
We could possibly wrap the query returned from MatchQueryBuilder
in a custom scorer that clamps negative values to zero. Functionally that's what happens if tie_breaker
is 0. I would still include the changes in this PR as an extra layer of protection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta -- I spent some more time today trying to prevent the negative score from bubbling up. So far as I can tell, this is the query that's returning the negative scores:
return new DisjunctionMaxQuery(queries, tieBreakerMultiplier); |
Unfortunately, that just returns a Lucene DisjunctionMaxQuery
, which normally does the correct thing. The problem arises because of the adjustDF
and/or the adjustTTF
methods. Let's pretend that Lucene term stats are something other than what they really are... yay!
I believe it's specifically adjustTTF
method (called from the blend
method) that breaks us, because it lowers the sumTotalTermFreq
to the minimum of sumTotalTermFreq
across all of the fields, which may be lower than any individual field's term's max termFreq
(from another field). In theory, the shared sumTotalTermFreq
should never be lower than the maximum of termFreq
for any doc across all fields, but we don't have a good way of getting that value without iterating through all documents matched by the terms. I'd love to add a line just before here saying minSumTTF = Math.max(minSumTTF, maxTermFreq)
.
tl;dr: Preventing the negative scores from getting into function/script scoring functions is hard. This cross_field
scoring is a complicated mess. While it is the root cause, the immediate pain point is not the negative scores, but that the negative scores make perfectly valid function/script scoring throw an exception.
I'd like to commit this fix as-is to address the immediate problem. We can address the complicated cross_field
logic down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we enter the blend()
method for the integ test I wrote, we start off with color:red
with df=3, sumTtf=3
, while shape:red
has df=1,sumTtf=1
. Also, maxDoc
is 3.
- We exit the first loop with
minSumTTF=2
andmax = 3
. We setmaxDoc
to2
(since it's bigger thanminSumTTF
). - Later, we set
actualDf
to 2.
The relevant piece of the explain output is:
{
"value" : -0.22314355,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 1,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
That n, number of documents containing the term
came from actualDf
.
I think I can actually fix it by guaranteeing that actualDf
is never greater than N
, which I can get from reader.getDocCount(terms[i].field())
.
So, after saying that fixing the underlying problem is too hard, I think I can fix the underlying problem instead. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go: #13829
It may still be worth merging this PR as an extra layer of protection, but I'd need to update / remove the YAML REST test, because we no longer get a negative score that we force to zero. Alternatively, we could just close this PR, since the underlying problem is (probably) fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for looking into this @msfroh (I learn quite a few things from your clear explanation)
31c61c7
to
528bbd5
Compare
❕ Gradle check result for 528bbd5: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
In theory, Lucene scores should never go negative. To stop users from writing `function_score` and `script_score` queries that return negative values, we explicitly check their outputs and throw an exception when negative. Unfortunately, due to a subtle, more complicated bug in multi_match queries, sometimes those might (incorrectly) return negative scores. While that problem is also worth solving, we should protect function and script scoring from throwing an exception just for passing through a negative value that they had no hand in computing. Signed-off-by: Michael Froh <froh@amazon.com>
528bbd5
to
983e43c
Compare
❌ Gradle check result for 983e43c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
In theory, Lucene scores should never go negative. To stop users from writing
function_score
andscript_score
queries that return negative values, we explicitly check their outputs and throw an exception when negative.Unfortunately, due to a subtle, more complicated bug in multi_match queries, sometimes those might (incorrectly) return negative scores.
While that problem is also worth solving, we should protect function and script scoring from throwing an exception just for passing through a negative value that they had no hand in computing.
Description
[Describe what this change achieves]
Related Issues
Resolves #7860
Check List
New functionality has been documented.New functionality has javadoc addedAPI changes companion pull request created.Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.