-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added rescorer in hybrid query #917
Added rescorer in hybrid query #917
Conversation
b380d20
to
e44bd6a
Compare
@@ -161,28 +164,84 @@ private List<ReduceableSearchResult> getSearchResults(final List<HybridSearchCol | |||
List<ReduceableSearchResult> results = new ArrayList<>(); | |||
DocValueFormat[] docValueFormats = getSortValueFormats(sortAndFormats); | |||
for (HybridSearchCollector collector : hybridSearchCollectors) { | |||
TopDocsAndMaxScore topDocsAndMaxScore = getTopDocsAndAndMaxScore(collector, docValueFormats); | |||
boolean isSortEnabled = docValueFormats != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is small refactoring, not directly related to rescore change
e44bd6a
to
1fac0aa
Compare
if (isSortEnabled) { | ||
return getSortedTopDocsAndMaxScore(topDocs, hybridSearchCollector); | ||
} | ||
return getRescoredTopDocsAndMaxScore(topDocs, hybridSearchCollector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one is more a refactoring, not directly related to rescore logic
fcf7f51
to
16a70b5
Compare
Does bool query throws any error in this case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code looks good to me. I see that we have used similar code for rescore processor from Opensearch core. Would love to see if we can create a GH issue to track how we can reuse the code from OS core here.
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
yes, bool will return 400 response with same error message. Following is example of response from
|
sure, that makes sense. I'll create an issue for core and put the link in this PR |
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
d379a83
to
6125e56
Compare
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
6125e56
to
2afe82e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks Martin. Great job👏
Created new GH issue in core in regards to adding more flexibility to Rescore processor opensearch-project/OpenSearch#16183 |
9f4a49a
into
opensearch-project:main
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-917-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9f4a49a7e45211821d96181ce2a6842af18ce7ea
# Push it to GitHub
git push --set-upstream origin backport/backport-917-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a)
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com>
opensearch-project#924) * Added rescorer in hybrid query (opensearch-project#917) * Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit 9f4a49a) Signed-off-by: Martin Gaievski <gaievski@amazon.com> (cherry picked from commit fe50537)
* Initial version for rescorer Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Description
Enable rescore functionality in hybrid query. Currently
rescore
clause in ignored and plain results of hybrid query are returned. Following is example of query with rescore clause.Logic will be similar to one for traditional queries: rescore query will be applied to search hits returned by sub-queries, based on match/no match scores and final order of documents will be adjusted.
To implementing this we're using custom method to call the rescore. In QueryPhaseSearch we always return
rescore == false
, then we do actual check (rescore/not rescore) in the hybrid collector manager. We do the check for rescore clause at the shard level, before individual sub-query results are merged into a single object. That allows to minimize number of changes in existing logic for score normalization.Following diagram shows flow that we introduced with the PR:
For comparison following diagram shows flow for today's implementation
Example of rescore usage:
Query with rescore
Response with implemented rescore from this PR, where documents with
field
in range of [1..20] are boostedsame query before this PR or without rescore gives following response. Note the document with "field1" == 50, it has the highest score, which is the raw score after normalization.
Few important notes regarding new logic:
Scores will be scaled as per rules of normalization/combination techniques, typically in range [0, 1.0]. This is because normalization process will still be executed after rescore logic is applied.
Rescore doesn't work with sorting, this is in parity with same behavior for traditional query.
Related Issues
#914
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.