-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Concurrent Segment Search] Performance regression for multi-term aggs on high cardinality data #11584
Comments
Solutions1. Enforce a heuristic "slice_size" at the slice level (RECOMMENDED)We can use the same
2. Merge aggregators before
|
@Override | |
public ReduceableSearchResult reduce(Collection<Collector> collectors) throws IOException { | |
final List<Aggregator> aggregators = context.bucketCollectorProcessor().toAggregators(collectors); | |
final List<InternalAggregation> internals = new ArrayList<>(aggregators.size()); | |
context.aggregations().resetBucketMultiConsumer(); | |
for (Aggregator aggregator : aggregators) { | |
try { | |
// post collection is called in ContextIndexSearcher after search on leaves are completed | |
internals.add(aggregator.buildTopLevel()); | |
} catch (IOException e) { | |
throw new AggregationExecutionException("Failed to build aggregation [" + aggregator.name() + "]", e); | |
} | |
} | |
final InternalAggregations internalAggregations = InternalAggregations.from(internals); | |
return buildAggregationResult(internalAggregations); | |
} |
This is pretty challenging for 2 reasons.
bucketOrds
is implemented by eachAggregator
object rather than in any parent class, so eachAggregator
itself must implement it's own merging logic. Moreover, depending on how the subAggs are support for each aggregation type it would not be sufficient to have this merge operation be a no-op for any unaffectedAggregator
s. This means more or less we are rewriting the entire slice levelreduce
process (I think this is actually why we are re-using reduce in the first place). At a high level, today we dobuildAggregation
for the entire collector tree and thenreduce
for the entire collector tree, and we would need to change this to doreduce
thenbuildAggregation
for each level of the collector tree.- There is both DFS and BFS collection modes for aggregators. For DFS mode there shouldn't be any problem with the merge since all collection will be done before we reach
buildAggregation
. However, for BFS modes depending on how the subAggregators implementcollect
, we could be in a situation where documents have not been collected yet by the time we are doingbuildAggregation
.
For these reasons I think solution 1 is a much better approach and I will follow-up with a CR with that implementation.
Thanks @jed326 , I think is reasonable fix to try |
When using concurrent segment search we can see a significant latency increase on some aggregation types. This is related to the changes made as a part of #9085.
This can be reproduced with the OpenSearch Benchmark
http-logs
workload when comparing latency with concurrent segment search enabled and disabled on this query.By default the OpenSearch Benchmark workload will force merge the index down to 1 segment before running the search benchmarks. In this setup we would expect no differences between concurrent and non-concurrent search cases, however that is not the case.
Some basic numbers gathered from the query Profiler:
We see almost a 200% increase in query time taken between non-concurrent and concurrent cases and a 10x increase in
build_aggregation
time even though we expect these to be basically the same operation.Looking at the
async-profiler
output we see that our query is actually spending 33% of the time on aPriorityQueue.pop
call:Which is happening here:
OpenSearch/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java
Lines 161 to 164 in 2798114
Using Arthas, we can see that the problem is the priority queue
ordered
is being created with size503910
so then we are callingpop
on the PriorityQueue that many times when creating theInternalAggregation
object.arthas-watch-concurrent.txt
This ultimately relates back to #9085 where we made a change to not enforce the
shard_min_doc_count
andshard_size
parameters on the slice level, so now duringbuid_aggregation
we are creating aPriorityQueue
of basically unbounded size.I have 2 possible solutions for this, which I will add more details on below.
The text was updated successfully, but these errors were encountered: