You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We do have good performance gain from the optimization on range type aggregation.
In short, this optimization is to get aggregation results from index structure, instead of the default way that iterates every document values and collect into results.
Currently, this optimization is only applied to a single aggregation, our next move is to also apply to aggregation with sub-aggregation. To support sub-agg, we not only need to get agg results from index, but also the docID sets, so sub agg knows which doc to collect in the second pass #12602.
However, even after supporting sub-agg, the supported use cases may still be limited in some real world scenarios, because we don't support user adding a top level query along side the aggregation (currently the only supported query is range query on the same field as aggregation... otherwise it has to be match all — but we do check this on segment level).
Haven't experiemented yet. But to support a more flexiable query execution within the optimization, the query itself would become a conjunction of 2 groups of queries — top level and the ones built from range aggregation. Theoretically this conjunction query could still be faster than default aggregation but as the complexity grows, we should also understand deeper towards the low level query operations, like, which part of the code logic is taking most CPU cycles, allocating most memories, and how are these compared to default way of doing aggregation, etc.
Previously, we created a follow up task #13549 to decide a threshold to apply the optimization, because sometime we see the optimized performs worse than default method, for example, on pmc workload when dataset is small and date histogram interval is also small like minute or second interval.
We can merge that to this task as the research directions are same.
bowenlan-amzn
changed the title
[Profiling deep dive] Default aggregation vs. Rewrite optimization code path
[Profiling deep dive] Default aggregation vs. optimization code path
Jun 18, 2024
We do have good performance gain from the optimization on range type aggregation.
In short, this optimization is to get aggregation results from index structure, instead of the default way that iterates every document values and collect into results.
Currently, this optimization is only applied to a single aggregation, our next move is to also apply to aggregation with sub-aggregation. To support sub-agg, we not only need to get agg results from index, but also the docID sets, so sub agg knows which doc to collect in the second pass #12602.
However, even after supporting sub-agg, the supported use cases may still be limited in some real world scenarios, because we don't support user adding a top level query along side the aggregation (currently the only supported query is range query on the same field as aggregation... otherwise it has to be match all — but we do check this on segment level).
Haven't experiemented yet. But to support a more flexiable query execution within the optimization, the query itself would become a conjunction of 2 groups of queries — top level and the ones built from range aggregation. Theoretically this conjunction query could still be faster than default aggregation but as the complexity grows, we should also understand deeper towards the low level query operations, like, which part of the code logic is taking most CPU cycles, allocating most memories, and how are these compared to default way of doing aggregation, etc.
Previously, we created a follow up task #13549 to decide a threshold to apply the optimization, because sometime we see the optimized performs worse than default method, for example, on pmc workload when dataset is small and date histogram interval is also small like minute or second interval.
We can merge that to this task as the research directions are same.
Some previous work: #13171
The text was updated successfully, but these errors were encountered: