[META] Advanced Query Engine Performance Optimizations in OpenSearch


### Overview
This meta issue tracks below complementary performance optimization strategies:

* Skip list based optimizations (Lucene 10.0) for efficient aggregation operations - https://github.com/opensearch-project/OpenSearch/issues/19384
* Lucene bulk collection API integration (Lucene 10.3/10.4) for vectorized execution and specific aggregation optimizations
* Reducing JVM overhead through strategic call site optimizations and inlining
* Using more recent algorithms/libraries, especially for stats aggregation computations

### Key Ideas

Some of the key ideas powering these optimizations based on observations from the recent benchmarks on OpenSearch and interactions with Lucene community are listed below:

#### Skip List Optimizations
While most of the optimizations based on skip list are [covered in the specific issue](https://github.com/opensearch-project/OpenSearch/issues/19384), adding summarized points here for more holistic picture:

* Optimize multi-filter numeric bucket aggregations using skip lists
* Recent `collect(DocIdStream,long)` API to improve sub-aggregation bucket aggregation by combining filter rewrite and skip list approach above
* Enhance deleted document handling for filter rewrite based sub aggregation (done as part of https://github.com/opensearch-project/OpenSearch/pull/19643) and top level aggregations (using SparseFixedBitSet based deleted documents in Lucene 10.4)
* Recent `collectRange` API to implement per aggregator logic and efficiently skip over large range of documents by leveraging preaggregated data in skiplist

While some optimizations are covered in [bulk collection issue](https://github.com/opensearch-project/OpenSearch/issues/19324), expanding more below:


#### Bulk Collection API Integration
While skiplist can be leveraged for `collectRange` API at top level, there is opportunity to adapt specific aggregation implementations for bulk APIs which should provide considerable speedup. Some of the key ideas are below:
* Add couple more bulk collection APIs in `LeafBucketCollector`
  * `collect(int[] doc, long bucket)` - This API allows collecting and processing multiple documents into same bucket. Should be able to leverage this API for top level collector or sub aggregation scenarios based on filter rewrite
  * `collect(int[] doc, long[] bucket)` - This API allows collecting and processing multiple documents across different buckets. While we lose some of the benefits for same bucket, processing this way should still be much faster as the docValues can be loaded and processed in batch, before collecting into different buckets. For example - `DateHistogram` spends significant time rounding the date values, which can have virtual call overhead due to different implementations of `DateTimeRounding`. Using the bulk API, all the datetime docValues can be loaded into buffer using bulk API in `NumericDocValues` and processing the rounded values in a batch
* Aggregation processing using above APIs should be significantly faster as primitive arrays can be vectorized and take better advantage of Panama enhancements going forward
* We have custom DocValues implementation in OpenSearch for dealing with double/float which should also supported bulk API for loading value for multiple documents at once to benefit from bulk collection patterns
* Enabling batch processing capabilities should reduce per-document processing overhead which is one of the biggest bottleneck in aggregation processing framework currently
* We should consider moving to model where multi valued fields are treated as separate types, for leveraging bulk collection optimizations by default

#### JVM-level Optimizations
Making the switch to bulk collection APIs should help:
* Reduce virtual call site overhead in critical paths
* Implement strategic inlining for vectorized execution
* Optimize call site polymorphism for better JIT compilation
While these are good optimizations, we currently don't have good benchmarking strategies to quantify these improvements clearly. Enhancing benchmarking methodology (https://github.com/opensearch-project/opensearch-benchmark-workloads/issues/709) not only helps measure these improvements more effectively, but also better reflects customer production workloads

### Supporting References

Sub META issues:
  * Bulk collection API optimization: #19324
  * Skip list optimization: #19384

### Issues

Skip list optimization code changes
  * Multi filter support #19130
  * Combine filter rewrite and skip list: #19573
  * Handling deleted documents: #19643 

Percentile aggregation improvement
  * https://github.com/opensearch-project/OpenSearch/issues/19622

Virtual call site improvement changes
  * #19933
  * #20009

### Next Steps
* Enhance OSB benchmark with call site pollution
* Create proof-of-concept implementations for each optimization
* Measure impact on real-world query patterns
* Best practices for maintaining optimizations

### Related component

Search:Performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[META] Advanced Query Engine Performance Optimizations in OpenSearch #20031

Overview

Key Ideas

Skip List Optimizations

Bulk Collection API Integration

JVM-level Optimizations

Supporting References

Issues

Next Steps

Related component

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[META] Advanced Query Engine Performance Optimizations in OpenSearch #20031

Description

Overview

Key Ideas

Skip List Optimizations

Bulk Collection API Integration

JVM-level Optimizations

Supporting References

Issues

Next Steps

Related component

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions