[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783

martin-gaievski · 2024-06-10T23:46:16Z

What/Why

What are you proposing?

With 2.15 version there has been lot of progress in area of improving search latency for Hybrid query (changes that are done under umbrella of #704). Team needs to continue looking for ways of optimizing latency.

Following is a flame graph taken from system after 2.15 version release.

At the high level here are some areas where team can apply efforts:

document iterator
in today's implementation we use DisiWrapper for iterating over one sub-query results and DisiPriorityQueue to collect scores for one doc id. Few foundational ideas are in this approach: iterate by one doc id, process iterators of every sub-query so they all point to the same doc id we we do a Scorer.score(). This brings some limitations, e.g. we cannot do bulk/block iteration on a set of documents.
optimizations in special cases
we can optimize for some special cases, like for example if 2+ sub-queries can be re-written to the same lucene level query we can execute only one and re-use scores for others
caching strategies: Implementing smarter caching mechanisms to reduce redundant computations.
algorithmic Improvements: optimizing existing algorithms or introducing new ones that can handle hybrid queries more efficiently.

dblock · 2024-07-01T16:21:37Z

[Catch All Triage - Attendees 1, 2, 3, 4, 5]

martin-gaievski · 2024-08-14T21:52:13Z

One more idea: we can have dedicated coordinator node and profile it in isolation to catch hot spots specific to processor code. Previously we have done one-man-orchestra approach with cluster configuration, meaning hot spots of processor may be overshadowed or distorted by searchers at the data node level.

github-actions bot added the untriaged label Jun 10, 2024

dblock added enhancement Enhancements Increases software capabilities beyond original client specifications and removed untriaged enhancement labels Jul 1, 2024

naveentatikonda added this to Vector Search RoadMap Sep 18, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Sep 18, 2024

minalsha assigned martin-gaievski Jan 9, 2025

minalsha added hybrid search hybrid query performance optimization labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783

[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783

martin-gaievski commented Jun 10, 2024 •

edited

Loading

dblock commented Jul 1, 2024

martin-gaievski commented Aug 14, 2024

[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783

[PROPOSAL] Advanced Optimization Techniques for Hybrid query #783

Comments

martin-gaievski commented Jun 10, 2024 • edited Loading

What/Why

What are you proposing?

dblock commented Jul 1, 2024

martin-gaievski commented Aug 14, 2024

martin-gaievski commented Jun 10, 2024 •

edited

Loading