[Concurrent Segment Search] Perform buildAggregation in parallel #11673

jed326 · 2023-12-26T16:59:08Z

Is your feature request related to a problem? Please describe

In the current concurrent search paradigm we will create slice_count times the number of collectors compared to non-concurrent search and for these collectors collect will be called in concurrently. Afterwards, reduce is called on the search threadpool sequentially for all of these collectors.

There are 2 main scenarios where this sequential behavior can be problematic -- [1] whenever the aggregators are nested and buildAggregation needs to BFS/DFS through the collector tree and [2] when buildAggregation itself is an expensive operation for the given collector.

One such example of this is the keyword-terms-numeric-terms, which is a nested terms aggregation. Even in the non-concurrent search case nested terms aggregations will suffer from combinatorial explosion of buckets for each additional nested layer and for concurrent search this problem is essentially multiplied by slice_count as the bucket creation during buildAggregation is done sequentially. The combinatorial explosion was partially addressed by #11585 however the sequential work is still a bottleneck.

Here is some basic query profiler breakdown to further illustrate this point:

Metric	weather-data-2016 - disabled - size=500	(in ms)	weather-data-2016 - slice=2 - slice_size=1150 (1.5 * shard_size + 10 — size=500)	(in ms)
Took	10918		17567
ConstantScoreQuery — time_in_nanos	3021298801	3021.2988	8023
MatchAllDocsQuery — time_in_nanos	1087697018	1087.69702	2338
rewrite_time	27406		10733
collector — time_in_nanos	4758597960	4758.59796	3919422108	3919.42211
GlobalOrdinalsStringTermsAggregator — time_in_nanos	7655394436	7655.39444	9238307858	9238.30786
collect	2670211530	2670.21153	2349533379	2349.53338
avg_collect			2223026375	2223.02638
collect_count	33659481		33659481
build_aggregation	4976192164	4976.19216	12500064156	12500.06416
avg_build_aggregation			6250026755	6250.02676
build_aggregation_count	1		2
NumericTermsAggregator — time_in_nanos	4167152226	4167.15223	5953806596	5953.8066
collect	281349935	281.34994	7061608537	7061.60854
avg_collect			271087497	271.0875
collect_count	832960		1329446
build_aggregation	3885328997	3885.329	11216763404	11216.7634
avg_build_aggregation			5220390427	5220.39043
build_aggregation_count	1		2
MaxAggregator — time_in_nanos	540335	0.54034	2229363	2.22936
collect	458491	0.45849	5622515353	5622.51535
avg_collect			1576283	1.57628
collect_count	8360		59800
build_aggregation	69834	0.06983	5617813957	5617.81396
avg_build_aggregation			251549	0.25155
build_aggregation_count	1		2

We can see that for the NumericTermsAggregator build_aggregation is taking 3-4x as long and happening sequentially as well as the combinatorial explosion in the collect_count.

Describe the solution you'd like

The additional combinatorial explosion is due to slice_size > shard_size, which is something we can revisit (ie does slice_size really need to be 1.5*shard_size + 10 or can slice_size == shard_size?). However the long pole is actually the build_aggregation as a whole taking a long time as happening sequentially. I propose that we move the buildAggregation steps to processPostCollection so that it can happen in parallel on the index_searcher thread.

I will follow-up with a PR with this change as well as some benchmarking data to further discuss.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

jed326 · 2023-12-27T20:58:37Z

I added a small POC for this here: jed326@42ba232

However, there is an AssertingLeafReader problem as the global ordinal doc values are created on the search thread but we are now trying to access them on the index_searcher thread so Lucene is throwing an exception for all aggs that use global ordinals.

I don't see a way forward to parallelize build_aggregation as is for any agg that uses global ordinals as of now. Will also need to evaluate if the lookupOrd() method is thread safe.

reta · 2023-12-27T22:55:11Z

However the long pole is actually the build_aggregation as a whole taking a long time as happening sequentially. I propose that we move the buildAggregation steps to processPostCollection so that it can happen in parallel on the index_searcher thread.

Besides the AssertingLeafReader problem, we would be keeping the indexer threads busy for longer time, potentially impacting any other searches and aggregations (since index searcher pool is shared between all indices on the node).

jed326 added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 26, 2023

jed326 self-assigned this Dec 26, 2023

github-actions bot added the Search:Performance label Dec 26, 2023

jed326 added this to Concurrent Search Dec 26, 2023

github-project-automation bot moved this to Todo in Concurrent Search Dec 26, 2023

jed326 moved this from Todo to In Progress in Concurrent Search Dec 26, 2023

jed326 removed the untriaged label Dec 27, 2023

jed326 moved this from In Progress to Todo in Concurrent Search Dec 27, 2023

jed326 removed the status in Concurrent Search Jan 9, 2024

jed326 mentioned this issue Mar 18, 2024

Perform buildAggregation concurrently and partially support composite aggs #12697

Merged

7 tasks

reta closed this as completed in #12697 Mar 27, 2024

github-project-automation bot moved this to Done in Concurrent Search Mar 27, 2024

reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.14.0 labels Mar 27, 2024

jed326 mentioned this issue Apr 18, 2024

[Concurrent Segment Search] Optimize Significant Terms Agg to not perform count query for each segment slice #8789

Open

sohami mentioned this issue May 21, 2024

Add Jay Deng as a maintainer #13773

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

jed326 commented Dec 26, 2023

jed326 commented Dec 27, 2023

reta commented Dec 27, 2023

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

Comments

jed326 commented Dec 26, 2023

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

jed326 commented Dec 27, 2023

reta commented Dec 27, 2023