[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783
Labels
Enhancements
Increases software capabilities beyond original client specifications
hybrid query performance optimization
hybrid search
What/Why
What are you proposing?
With 2.15 version there has been lot of progress in area of improving search latency for Hybrid query (changes that are done under umbrella of #704). Team needs to continue looking for ways of optimizing latency.
Following is a flame graph taken from system after 2.15 version release.
At the high level here are some areas where team can apply efforts:
document iterator
in today's implementation we use DisiWrapper for iterating over one sub-query results and DisiPriorityQueue to collect scores for one doc id. Few foundational ideas are in this approach: iterate by one doc id, process iterators of every sub-query so they all point to the same doc id we we do a Scorer.score(). This brings some limitations, e.g. we cannot do bulk/block iteration on a set of documents.
optimizations in special cases
we can optimize for some special cases, like for example if 2+ sub-queries can be re-written to the same lucene level query we can execute only one and re-use scores for others
caching strategies: Implementing smarter caching mechanisms to reduce redundant computations.
algorithmic Improvements: optimizing existing algorithms or introducing new ones that can handle hybrid queries more efficiently.
The text was updated successfully, but these errors were encountered: