-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass empty QueryCollectorContext in case of hybrid query to improve latencies by 20% #731
Merged
martin-gaievski
merged 2 commits into
opensearch-project:main
from
martin-gaievski:pass_empty_query_collector_context_for_hybrid_query
May 6, 2024
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
martin-gaievski
added
backport 2.x
Label will add auto workflow to backport PR to 2.x branch
v2.15.0
hybrid search
hybrid query performance optimization
labels
May 3, 2024
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
martin-gaievski
force-pushed
the
pass_empty_query_collector_context_for_hybrid_query
branch
from
May 3, 2024 00:35
d2c96c7
to
9659df1
Compare
martin-gaievski
changed the title
Pass empty QueryCollectorContext in case of hybrid query
Pass empty QueryCollectorContext in case of hybrid query to improve latencies by 20%
May 3, 2024
BWC rolling upgrade tests will fail for 2.14 while the release is ongoing and core is already switched to 2.15 for 2.x branch, restart BWCs are fine |
martin-gaievski
requested review from
heemin32,
navneet1v,
VijayanB,
vamshin,
jmazanec15,
naveentatikonda,
junqiu-lei,
sean-zheng-amazon,
model-collapse,
zane-neo,
ylwu-amzn,
jngz-es,
vibrantvarun and
zhichao-aws
as code owners
May 3, 2024 17:05
navneet1v
reviewed
May 3, 2024
...earch/neuralsearch/search/query/DefaultQueryPhaseSearcherWithEmptyQueryCollectorContext.java
Outdated
Show resolved
Hide resolved
navneet1v
reviewed
May 3, 2024
...earch/neuralsearch/search/query/DefaultQueryPhaseSearcherWithEmptyQueryCollectorContext.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
martin-gaievski
force-pushed
the
pass_empty_query_collector_context_for_hybrid_query
branch
from
May 4, 2024 00:17
efba181
to
2257721
Compare
navneet1v
approved these changes
May 4, 2024
chishui
reviewed
May 6, 2024
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java
Show resolved
Hide resolved
VijayanB
approved these changes
May 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
martin-gaievski
merged commit May 6, 2024
2c556d2
into
opensearch-project:main
67 of 73 checks passed
Merged
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport 2.x
Label will add auto workflow to backport PR to 2.x branch
hybrid query performance optimization
hybrid search
v2.15.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In this change we're improving hybrid query latencies by avoiding redundant document score collection.
As per flamegraphs posted in #729 from 40 to 80% of CPU time of hybrid query processing is taken by TopDocsCollector in core. Those scores are not needed and should be removed from the execution flow in case this is hybrid query.
At the hight level the idea is to set empty doc collector instead of top docs collector in case incoming query is of type hybrid.
Ability to pass empty doc collector has been added to core in earlier PR. In this PR we're using this feature in neural plugin by passing new empty query collector context to query phase searcher. This is done by overriding
searchWith
method of both default and concurrent query phase searchers. The only collector and collector manager that will be executed for hybrid query are custom plugin implementations: HybridTopScoreDocCollector and HybridCollectorManager.Below are metrics for hybrid query latency that I've collected after this change. There are based on 2.x (2.14) version and noaa OSB workload, all times are in ms:
following is the baseline results, measurements are done before this change
Following flamegraph (taken after this change) shows that there is no call to TopDocsCollector. More deep dive is needed to find next focus area for optimizations:
Issues Resolved
#729
#704
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.