[Profiling deep dive] Default aggregation vs. optimization code path #14438

bowenlan-amzn · 2024-06-18T19:26:13Z

We do have good performance gain from the optimization on range type aggregation.
In short, this optimization is to get aggregation results from index structure, instead of the default way that iterates every document values and collect into results.

Currently, this optimization is only applied to a single aggregation, our next move is to also apply to aggregation with sub-aggregation. To support sub-agg, we not only need to get agg results from index, but also the docID sets, so sub agg knows which doc to collect in the second pass #12602.

However, even after supporting sub-agg, the supported use cases may still be limited in some real world scenarios, because we don't support user adding a top level query along side the aggregation (currently the only supported query is range query on the same field as aggregation... otherwise it has to be match all — but we do check this on segment level).

Haven't experiemented yet. But to support a more flexiable query execution within the optimization, the query itself would become a conjunction of 2 groups of queries — top level and the ones built from range aggregation. Theoretically this conjunction query could still be faster than default aggregation but as the complexity grows, we should also understand deeper towards the low level query operations, like, which part of the code logic is taking most CPU cycles, allocating most memories, and how are these compared to default way of doing aggregation, etc.

Previously, we created a follow up task #13549 to decide a threshold to apply the optimization, because sometime we see the optimized performs worse than default method, for example, on pmc workload when dataset is small and date histogram interval is also small like minute or second interval.
We can merge that to this task as the research directions are same.

Some previous work: #13171

bowenlan-amzn added this to Performance Roadmap May 28, 2024

bowenlan-amzn self-assigned this Jun 18, 2024

bowenlan-amzn converted this from a draft issue Jun 18, 2024

github-actions bot added the untriaged label Jun 18, 2024

bowenlan-amzn mentioned this issue Jun 18, 2024

[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

Closed

bowenlan-amzn added Search:Aggregations Performance This is for any performance related enhancements or bugs labels Jun 18, 2024

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking and Search Project Board Jun 18, 2024

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jun 18, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Jun 18, 2024

bowenlan-amzn removed the untriaged label Jun 18, 2024

bowenlan-amzn changed the title ~~[Profiling deep dive] Default aggregation vs. Rewrite optimization code path~~ [Profiling deep dive] Default aggregation vs. optimization code path Jun 18, 2024

bowenlan-amzn moved this from Now (This Quarter) to In Progress in Performance Roadmap Jun 21, 2024

bowenlan-amzn moved this from In Progress to Now (This Quarter) in Performance Roadmap Jul 4, 2024

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Profiling deep dive] Default aggregation vs. optimization code path #14438

[Profiling deep dive] Default aggregation vs. optimization code path #14438

bowenlan-amzn commented Jun 18, 2024

[Profiling deep dive] Default aggregation vs. optimization code path #14438

[Profiling deep dive] Default aggregation vs. optimization code path #14438

Comments

bowenlan-amzn commented Jun 18, 2024