-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
The current workflow of the search in OpenSearch is divided into two main phases: The Query Phase (QP) and The Fetch Phase (FP). At a high level, the QP first loads the Aggregation Processor and then performs its pre-processing. Following that, it loads the collectors and initializes the collector manager to execute the search on the shard. Later, the aggregation processor performs the post-processing of the results.
For traditional searches like bool, match, and term, the top docs collector context is created. The collector context internally initializes the TopDocsCollectorManager and loads the TopDocsCollector into it. The initialization and usage of TopDocsCollector is hardcoded in the search process. This creates a limitation where plugins cannot inject a custom collector context during the search in an ideal place when the TopDocsCollector gets instantiated.
A classic example of a query that has custom logic is the hybrid query, which resides in the neural search plugin. Due to the limitation mentioned earlier, the neural search plugin has to inject the HybridCollectorManager and HybridTopScoreDocCollector during the aggregation pre-process phase. It also has to provide a custom aggregation processor called HybridQueryAggregationProcessor, which is essentially a wrapper around DefaultAggregationProcessor. Moreover, in order to skip TopDocsCollectorContext initialization during HQ execution, there is a parody of the searchWithCollector method which injects an empty collector context in the search.
Recently, the team has performed the POC of moving Hybrid Search to OpenSearch core. There were multiple phases in which the POC was done and at each phase the benchmarking was performed. The baseline of these benchmarks is the current hybrid query implementation in the neural search plugin.
Dataset: noaa-semantic-search
Phase 1: Move Hybrid Query logic and all of its related classes to OpenSearch core. This also includes Normalization processor. Here the assumption is doing so will reduce the network calls.
Hybrid query with 3 subqueries: Term, Range and Date
| Latency | 3.0-beta (Phase 1) (Min distribution) | 3.0-beta (GA ) |
|---|---|---|
| p50 | 306.41 | 280.16 |
| p90 | 348.01 | 299.51 |
| p99 | 405.09 | 326.92 |
| p100 | 434.87 | 334.57 |
No improvement was observed. The reason is that the plugin and core run in the same JVM, so it does not provide a performance boost. The degradation in latency is observed in the min distribution because it is not as stable as the GA version. Also, since the code in the min distribution is an MVP, we are just looking for any small improvement.
But, from the above experiment it can be said that a custom query can either lie in the OS core or a plugin it does not have an impact on network calls.
Phase 2: Create HybridQueryCollectorContext and inject it in the same way how TopDocsCollectorContext is injected. Also remove the EmptyCollectorContext initialization and HybridQueryAggregationProcessor and switch to DefaultAggregationProcessor
| Latency | 3.0-beta (Phase 2) (Min distribution) | 3.0-beta (GA ) | Improvement |
|---|---|---|---|
| p50 | 259.81 | 280.16 | ----> 7.26% improvement |
| p90 | 278.41 | 299.51 | ----> 7.04% improvement |
| p99 | 298.77 | 326.92 | ----> 8.61% improvement |
| p100 | 313.3 | 334.57 | ----> 6.35% improvement |
We clearly see an improvement. To further bolster the improvement, we did a performance benchmarking of complete distribution tarball with the GA one.
| Latency | 3.0-beta (POC Complete distribution) | 3.0-beta (GA ) | Improvement |
|---|---|---|---|
| p50 | 246.99 | 280.16 | --> 11.83% improvement |
| p90 | 250.9 | 299.51 | --> 16.22% improvement |
| p99 | 289.33 | 326.92 | --> 11.49% improvement |
| p100 | 324.19 | 334.57 | --> 3.10% improvement |
Therefore, if we make the process of injecting collector context extensible through plugins, it will help custom query types improve their performance.
Essentially, in the searchWithCollector method, the QueryCollectorContext can be injected by the plugin like how we injected in the POC for hybrid query.
Describe the solution you'd like
We can make the createQueryCollectorContext method extensible and the plugins can provide there implementation.
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status