Skip to content

[Feature Request] Enable injecting collector context from the plugin in the QueryPhase directly #18278

@vibrantvarun

Description

@vibrantvarun

Is your feature request related to a problem? Please describe

The current workflow of the search in OpenSearch is divided into two main phases: The Query Phase (QP) and The Fetch Phase (FP). At a high level, the QP first loads the Aggregation Processor and then performs its pre-processing. Following that, it loads the collectors and initializes the collector manager to execute the search on the shard. Later, the aggregation processor performs the post-processing of the results.

For traditional searches like bool, match, and term, the top docs collector context is created. The collector context internally initializes the TopDocsCollectorManager and loads the TopDocsCollector into it. The initialization and usage of TopDocsCollector is hardcoded in the search process. This creates a limitation where plugins cannot inject a custom collector context during the search in an ideal place when the TopDocsCollector gets instantiated.

A classic example of a query that has custom logic is the hybrid query, which resides in the neural search plugin. Due to the limitation mentioned earlier, the neural search plugin has to inject the HybridCollectorManager and HybridTopScoreDocCollector during the aggregation pre-process phase. It also has to provide a custom aggregation processor called HybridQueryAggregationProcessor, which is essentially a wrapper around DefaultAggregationProcessor. Moreover, in order to skip TopDocsCollectorContext initialization during HQ execution, there is a parody of the searchWithCollector method which injects an empty collector context in the search.

Recently, the team has performed the POC of moving Hybrid Search to OpenSearch core. There were multiple phases in which the POC was done and at each phase the benchmarking was performed. The baseline of these benchmarks is the current hybrid query implementation in the neural search plugin.

Dataset: noaa-semantic-search

Phase 1: Move Hybrid Query logic and all of its related classes to OpenSearch core. This also includes Normalization processor. Here the assumption is doing so will reduce the network calls.

Hybrid query with 3 subqueries: Term, Range and Date

Latency 3.0-beta (Phase 1) (Min distribution) 3.0-beta (GA )
p50 306.41 280.16
p90 348.01 299.51
p99 405.09 326.92
p100 434.87 334.57

No improvement was observed. The reason is that the plugin and core run in the same JVM, so it does not provide a performance boost. The degradation in latency is observed in the min distribution because it is not as stable as the GA version. Also, since the code in the min distribution is an MVP, we are just looking for any small improvement.

But, from the above experiment it can be said that a custom query can either lie in the OS core or a plugin it does not have an impact on network calls.

Phase 2: Create HybridQueryCollectorContext and inject it in the same way how TopDocsCollectorContext is injected. Also remove the EmptyCollectorContext initialization and HybridQueryAggregationProcessor and switch to DefaultAggregationProcessor

Latency 3.0-beta (Phase 2) (Min distribution) 3.0-beta (GA ) Improvement
p50 259.81 280.16 ----> 7.26% improvement
p90 278.41 299.51 ----> 7.04% improvement
p99 298.77 326.92 ----> 8.61% improvement
p100 313.3 334.57 ----> 6.35% improvement

We clearly see an improvement. To further bolster the improvement, we did a performance benchmarking of complete distribution tarball with the GA one.

Latency 3.0-beta (POC Complete distribution) 3.0-beta (GA ) Improvement
p50 246.99 280.16 --> 11.83% improvement
p90 250.9 299.51 --> 16.22% improvement
p99 289.33 326.92 --> 11.49% improvement
p100 324.19 334.57 --> 3.10% improvement

Therefore, if we make the process of injecting collector context extensible through plugins, it will help custom query types improve their performance.

Essentially, in the searchWithCollector method, the QueryCollectorContext can be injected by the plugin like how we injected in the POC for hybrid query.

Describe the solution you'd like

We can make the createQueryCollectorContext method extensible and the plugins can provide there implementation.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions