Aggregation precomputation (rebased) #18927

ajleong623 · 2025-08-05T17:14:31Z

Description

This change is the same as #18106 but rebased to a more descriptive branch

This change expands on using the techniques from @sandeshkr419 pull request #11643 to precompute aggregations for match all or match none queries. We can leverage reading from termsEnum to precompute the aggregation when the field is indexed and when there are no deletions. We can check that no terms are deleted by using the weight and checking if it matches maxDocs of the reader.

Unfortunately, I was not able to use the same technique for numeric aggregators like LongRareTermsAggregator. This is because the numeric points are not indexed by frequency of terms but instead through KD-trees to optimize for different types of operations https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/PointValues.java.

Please let me know if there are any comments, concerns or suggestions.

Related Issues

Resolves #13123
#13122
#10954

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

… completed action items Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

github-actions · 2025-08-05T18:43:56Z

✅ Gradle check result for 86a23cb: SUCCESS

ajleong623 · 2025-08-06T17:37:53Z

{"run-benchmark-test": "id_13"}

github-actions · 2025-08-06T22:46:07Z

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/4009/ . Final results will be published once the job is completed.

ajleong623 · 2025-08-06T23:53:41Z

The command above was used to test the benchmarking bot. I just wanted to see how that worked because I saw it on other prs. Sorry for the confusion.

opensearch-ci-bot · 2025-08-07T00:06:03Z

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4009/

Metric	Task	Value	Unit
Cumulative indexing time of primary shards		0	min
Min cumulative indexing time across primary shards		0	min
Median cumulative indexing time across primary shards		0	min
Max cumulative indexing time across primary shards		0	min
Cumulative indexing throttle time of primary shards		0	min
Min cumulative indexing throttle time across primary shards		0	min
Median cumulative indexing throttle time across primary shards		0	min
Max cumulative indexing throttle time across primary shards		0	min
Cumulative merge time of primary shards		0	min
Cumulative merge count of primary shards		0
Min cumulative merge time across primary shards		0	min
Median cumulative merge time across primary shards		0	min
Max cumulative merge time across primary shards		0	min
Cumulative merge throttle time of primary shards		0	min
Min cumulative merge throttle time across primary shards		0	min
Median cumulative merge throttle time across primary shards		0	min
Max cumulative merge throttle time across primary shards		0	min
Cumulative refresh time of primary shards		0	min
Cumulative refresh count of primary shards		32
Min cumulative refresh time across primary shards		0	min
Median cumulative refresh time across primary shards		0	min
Max cumulative refresh time across primary shards		0	min
Cumulative flush time of primary shards		0	min
Cumulative flush count of primary shards		8
Min cumulative flush time across primary shards		0	min
Median cumulative flush time across primary shards		0	min
Max cumulative flush time across primary shards		0	min
Total Young Gen GC time		3.357	s
Total Young Gen GC count		76
Total Old Gen GC time		0	s
Total Old Gen GC count		0
Store size		15.3241	GB
Translog size		4.09782e-07	GB
Heap used for segments		0	MB
Heap used for doc values		0	MB
Heap used for terms		0	MB
Heap used for norms		0	MB
Heap used for points		0	MB
Heap used for stored fields		0	MB
Segment count		73
Min Throughput	wait-for-snapshot-recovery	4.17782e+07	byte/s
Mean Throughput	wait-for-snapshot-recovery	4.17782e+07	byte/s
Median Throughput	wait-for-snapshot-recovery	4.17782e+07	byte/s
Max Throughput	wait-for-snapshot-recovery	4.17782e+07	byte/s
100th percentile latency	wait-for-snapshot-recovery	388813	ms
100th percentile service time	wait-for-snapshot-recovery	388813	ms
error rate	wait-for-snapshot-recovery	0	%
Min Throughput	default	8	ops/s
Mean Throughput	default	8	ops/s
Median Throughput	default	8	ops/s
Max Throughput	default	8	ops/s
50th percentile latency	default	4.65251	ms
90th percentile latency	default	5.09939	ms
99th percentile latency	default	6.54413	ms
100th percentile latency	default	7.40902	ms
50th percentile service time	default	3.69542	ms
90th percentile service time	default	3.96586	ms
99th percentile service time	default	5.7257	ms
100th percentile service time	default	6.72767	ms
error rate	default	0	%
Min Throughput	term	49.88	ops/s
Mean Throughput	term	49.88	ops/s
Median Throughput	term	49.88	ops/s
Max Throughput	term	49.89	ops/s
50th percentile latency	term	3.65427	ms
90th percentile latency	term	4.07313	ms
99th percentile latency	term	4.29415	ms
100th percentile latency	term	4.30016	ms
50th percentile service time	term	2.95709	ms
90th percentile service time	term	3.16175	ms
99th percentile service time	term	3.42547	ms
100th percentile service time	term	3.52769	ms
error rate	term	0	%
Min Throughput	range	1	ops/s
Mean Throughput	range	1.01	ops/s
Median Throughput	range	1.01	ops/s
Max Throughput	range	1.01	ops/s
50th percentile latency	range	5.22718	ms
90th percentile latency	range	5.79722	ms
99th percentile latency	range	6.19016	ms
100th percentile latency	range	6.21025	ms
50th percentile service time	range	3.5307	ms
90th percentile service time	range	3.82036	ms
99th percentile service time	range	4.0155	ms
100th percentile service time	range	4.01873	ms
error rate	range	0	%
Min Throughput	200s-in-range	32.94	ops/s
Mean Throughput	200s-in-range	32.94	ops/s
Median Throughput	200s-in-range	32.94	ops/s
Max Throughput	200s-in-range	32.95	ops/s
50th percentile latency	200s-in-range	4.91016	ms
90th percentile latency	200s-in-range	5.74067	ms
99th percentile latency	200s-in-range	6.15512	ms
100th percentile latency	200s-in-range	6.2467	ms
50th percentile service time	200s-in-range	3.67083	ms
90th percentile service time	200s-in-range	3.86305	ms
99th percentile service time	200s-in-range	4.2846	ms
100th percentile service time	200s-in-range	4.30811	ms
error rate	200s-in-range	0	%
Min Throughput	400s-in-range	49.99	ops/s
Mean Throughput	400s-in-range	49.99	ops/s
Median Throughput	400s-in-range	49.99	ops/s
Max Throughput	400s-in-range	50	ops/s
50th percentile latency	400s-in-range	3.17861	ms
90th percentile latency	400s-in-range	4.53666	ms
99th percentile latency	400s-in-range	4.76311	ms
100th percentile latency	400s-in-range	4.7909	ms
50th percentile service time	400s-in-range	2.44349	ms
90th percentile service time	400s-in-range	2.57398	ms
99th percentile service time	400s-in-range	2.84571	ms
100th percentile service time	400s-in-range	2.85115	ms
error rate	400s-in-range	0	%
Min Throughput	hourly_agg	1.01	ops/s
Mean Throughput	hourly_agg	1.01	ops/s
Median Throughput	hourly_agg	1.01	ops/s
Max Throughput	hourly_agg	1.02	ops/s
50th percentile latency	hourly_agg	14.1345	ms
90th percentile latency	hourly_agg	15.3058	ms
99th percentile latency	hourly_agg	16.6416	ms
100th percentile latency	hourly_agg	16.7986	ms
50th percentile service time	hourly_agg	12.2596	ms
90th percentile service time	hourly_agg	13.5647	ms
99th percentile service time	hourly_agg	14.991	ms
100th percentile service time	hourly_agg	15.0558	ms
error rate	hourly_agg	0	%
Min Throughput	multi_term_agg	0.16	ops/s
Mean Throughput	multi_term_agg	0.16	ops/s
Median Throughput	multi_term_agg	0.16	ops/s
Max Throughput	multi_term_agg	0.16	ops/s
50th percentile latency	multi_term_agg	531470	ms
90th percentile latency	multi_term_agg	741772	ms
99th percentile latency	multi_term_agg	789802	ms
100th percentile latency	multi_term_agg	792390	ms
50th percentile service time	multi_term_agg	6237.77	ms
90th percentile service time	multi_term_agg	6358.79	ms
99th percentile service time	multi_term_agg	6860.93	ms
100th percentile service time	multi_term_agg	6886.68	ms
error rate	multi_term_agg	0	%
Min Throughput	scroll	25.05	pages/s
Mean Throughput	scroll	25.08	pages/s
Median Throughput	scroll	25.07	pages/s
Max Throughput	scroll	25.14	pages/s
50th percentile latency	scroll	222.125	ms
90th percentile latency	scroll	228.121	ms
99th percentile latency	scroll	295.836	ms
100th percentile latency	scroll	325.955	ms
50th percentile service time	scroll	220.325	ms
90th percentile service time	scroll	226.246	ms
99th percentile service time	scroll	294.054	ms
100th percentile service time	scroll	324.559	ms
error rate	scroll	0	%
Min Throughput	desc_sort_size	1	ops/s
Mean Throughput	desc_sort_size	1	ops/s
Median Throughput	desc_sort_size	1	ops/s
Max Throughput	desc_sort_size	1	ops/s
50th percentile latency	desc_sort_size	7.19984	ms
90th percentile latency	desc_sort_size	7.61741	ms
99th percentile latency	desc_sort_size	8.36554	ms
100th percentile latency	desc_sort_size	8.37801	ms
50th percentile service time	desc_sort_size	5.43027	ms
90th percentile service time	desc_sort_size	5.85338	ms
99th percentile service time	desc_sort_size	6.65381	ms
100th percentile service time	desc_sort_size	6.94602	ms
error rate	desc_sort_size	0	%
Min Throughput	asc_sort_size	1	ops/s
Mean Throughput	asc_sort_size	1	ops/s
Median Throughput	asc_sort_size	1	ops/s
Max Throughput	asc_sort_size	1	ops/s
50th percentile latency	asc_sort_size	5.34599	ms
90th percentile latency	asc_sort_size	5.7248	ms
99th percentile latency	asc_sort_size	6.04441	ms
100th percentile latency	asc_sort_size	6.17469	ms
50th percentile service time	asc_sort_size	3.54009	ms
90th percentile service time	asc_sort_size	3.61729	ms
99th percentile service time	asc_sort_size	4.0622	ms
100th percentile service time	asc_sort_size	4.29801	ms
error rate	asc_sort_size	0	%
Min Throughput	desc_sort_timestamp	1	ops/s
Mean Throughput	desc_sort_timestamp	1	ops/s
Median Throughput	desc_sort_timestamp	1	ops/s
Max Throughput	desc_sort_timestamp	1	ops/s
50th percentile latency	desc_sort_timestamp	13.2371	ms
90th percentile latency	desc_sort_timestamp	13.8565	ms
99th percentile latency	desc_sort_timestamp	16.1814	ms
100th percentile latency	desc_sort_timestamp	16.4257	ms
50th percentile service time	desc_sort_timestamp	11.5609	ms
90th percentile service time	desc_sort_timestamp	11.8494	ms
99th percentile service time	desc_sort_timestamp	14.5042	ms
100th percentile service time	desc_sort_timestamp	15.1234	ms
error rate	desc_sort_timestamp	0	%
Min Throughput	asc_sort_timestamp	1	ops/s
Mean Throughput	asc_sort_timestamp	1	ops/s
Median Throughput	asc_sort_timestamp	1	ops/s
Max Throughput	asc_sort_timestamp	1	ops/s
50th percentile latency	asc_sort_timestamp	7.562	ms
90th percentile latency	asc_sort_timestamp	8.23865	ms
99th percentile latency	asc_sort_timestamp	8.71115	ms
100th percentile latency	asc_sort_timestamp	8.83942	ms
50th percentile service time	asc_sort_timestamp	5.57361	ms
90th percentile service time	asc_sort_timestamp	6.25307	ms
99th percentile service time	asc_sort_timestamp	6.60499	ms
100th percentile service time	asc_sort_timestamp	6.80438	ms
error rate	asc_sort_timestamp	0	%
Min Throughput	desc_sort_with_after_timestamp	1	ops/s
Mean Throughput	desc_sort_with_after_timestamp	1.01	ops/s
Median Throughput	desc_sort_with_after_timestamp	1.01	ops/s
Max Throughput	desc_sort_with_after_timestamp	1.05	ops/s
50th percentile latency	desc_sort_with_after_timestamp	383.291	ms
90th percentile latency	desc_sort_with_after_timestamp	402.621	ms
99th percentile latency	desc_sort_with_after_timestamp	454.033	ms
100th percentile latency	desc_sort_with_after_timestamp	461.3	ms
50th percentile service time	desc_sort_with_after_timestamp	382.139	ms
90th percentile service time	desc_sort_with_after_timestamp	401.02	ms
99th percentile service time	desc_sort_with_after_timestamp	452.339	ms
100th percentile service time	desc_sort_with_after_timestamp	459.741	ms
error rate	desc_sort_with_after_timestamp	0	%
Min Throughput	asc_sort_with_after_timestamp	1.01	ops/s
Mean Throughput	asc_sort_with_after_timestamp	1.02	ops/s
Median Throughput	asc_sort_with_after_timestamp	1.02	ops/s
Max Throughput	asc_sort_with_after_timestamp	1.1	ops/s
50th percentile latency	asc_sort_with_after_timestamp	5.63412	ms
90th percentile latency	asc_sort_with_after_timestamp	6.07583	ms
99th percentile latency	asc_sort_with_after_timestamp	6.21885	ms
100th percentile latency	asc_sort_with_after_timestamp	6.21945	ms
50th percentile service time	asc_sort_with_after_timestamp	3.81671	ms
90th percentile service time	asc_sort_with_after_timestamp	3.9811	ms
99th percentile service time	asc_sort_with_after_timestamp	4.29301	ms
100th percentile service time	asc_sort_with_after_timestamp	4.31695	ms
error rate	asc_sort_with_after_timestamp	0	%
Min Throughput	range_size	2.01	ops/s
Mean Throughput	range_size	2.01	ops/s
Median Throughput	range_size	2.01	ops/s
Max Throughput	range_size	2.02	ops/s
50th percentile latency	range_size	47.8843	ms
90th percentile latency	range_size	49.408	ms
99th percentile latency	range_size	51.1211	ms
100th percentile latency	range_size	51.1227	ms
50th percentile service time	range_size	46.6997	ms
90th percentile service time	range_size	48.2103	ms
99th percentile service time	range_size	49.5278	ms
100th percentile service time	range_size	49.5725	ms
error rate	range_size	0	%
Min Throughput	range_with_asc_sort	1.98	ops/s
Mean Throughput	range_with_asc_sort	1.99	ops/s
Median Throughput	range_with_asc_sort	1.99	ops/s
Max Throughput	range_with_asc_sort	1.99	ops/s
50th percentile latency	range_with_asc_sort	284.364	ms
90th percentile latency	range_with_asc_sort	291.668	ms
99th percentile latency	range_with_asc_sort	299.241	ms
100th percentile latency	range_with_asc_sort	301.437	ms
50th percentile service time	range_with_asc_sort	283.167	ms
90th percentile service time	range_with_asc_sort	290.695	ms
99th percentile service time	range_with_asc_sort	298.078	ms
100th percentile service time	range_with_asc_sort	300.303	ms
error rate	range_with_asc_sort	0	%
Min Throughput	range_with_desc_sort	2	ops/s
Mean Throughput	range_with_desc_sort	2	ops/s
Median Throughput	range_with_desc_sort	2	ops/s
Max Throughput	range_with_desc_sort	2	ops/s
50th percentile latency	range_with_desc_sort	381.446	ms
90th percentile latency	range_with_desc_sort	387.192	ms
99th percentile latency	range_with_desc_sort	392.591	ms
100th percentile latency	range_with_desc_sort	394.479	ms
50th percentile service time	range_with_desc_sort	380.359	ms
90th percentile service time	range_with_desc_sort	386.271	ms
99th percentile service time	range_with_desc_sort	391.099	ms
100th percentile service time	range_with_desc_sort	393.477	ms
error rate	range_with_desc_sort	0	%

opensearch-ci-bot · 2025-08-07T00:07:25Z

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/145/

Metric	Task	Baseline	Contender	Diff	Unit
Cumulative indexing time of primary shards		0	0	0	min
Min cumulative indexing time across primary shard		0	0	0	min
Median cumulative indexing time across primary shard		0	0	0	min
Max cumulative indexing time across primary shard		0	0	0	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shard		0	0	0	min
Median cumulative indexing throttle time across primary shard		0	0	0	min
Max cumulative indexing throttle time across primary shard		0	0	0	min
Cumulative merge time of primary shards		0	0	0	min
Cumulative merge count of primary shards		0	0	0
Min cumulative merge time across primary shard		0	0	0	min
Median cumulative merge time across primary shard		0	0	0	min
Max cumulative merge time across primary shard		0	0	0	min
Cumulative merge throttle time of primary shards		0	0	0	min
Min cumulative merge throttle time across primary shard		0	0	0	min
Median cumulative merge throttle time across primary shard		0	0	0	min
Max cumulative merge throttle time across primary shard		0	0	0	min
Cumulative refresh time of primary shards		0	0	0	min
Cumulative refresh count of primary shards		32	32	0
Min cumulative refresh time across primary shard		0	0	0	min
Median cumulative refresh time across primary shard		0	0	0	min
Max cumulative refresh time across primary shard		0	0	0	min
Cumulative flush time of primary shards		0	0	0	min
Cumulative flush count of primary shards		8	8	0
Min cumulative flush time across primary shard		0	0	0	min
Median cumulative flush time across primary shard		0	0	0	min
Max cumulative flush time across primary shard		0	0	0	min
Total Young Gen GC time		1.747	3.357	1.61	s
Total Young Gen GC count		71	76	5
Total Old Gen GC time		0	0	0	s
Total Old Gen GC count		0	0	0
Store size		15.3241	15.3241	0	GB
Translog size		4.09782e-07	4.09782e-07	0	GB
Heap used for segments		0	0	0	MB
Heap used for doc values		0	0	0	MB
Heap used for terms		0	0	0	MB
Heap used for norms		0	0	0	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0	0	0	MB
Segment count		73	73	0
Min Throughput	wait-for-snapshot-recovery	4.17625e+07	4.17782e+07	15692	byte/s
Mean Throughput	wait-for-snapshot-recovery	4.17625e+07	4.17782e+07	15692	byte/s
Median Throughput	wait-for-snapshot-recovery	4.17625e+07	4.17782e+07	15692	byte/s
Max Throughput	wait-for-snapshot-recovery	4.17625e+07	4.17782e+07	15692	byte/s
100th percentile latency	wait-for-snapshot-recovery	388889	388813	-76.0625	ms
100th percentile service time	wait-for-snapshot-recovery	388889	388813	-76.0625	ms
error rate	wait-for-snapshot-recovery	0	0	0	%
Min Throughput	default	7.99933	8.00126	0.00192	ops/s
Mean Throughput	default	7.99942	8.00136	0.00195	ops/s
Median Throughput	default	7.99941	8.00137	0.00196	ops/s
Max Throughput	default	7.9995	8.0015	0.002	ops/s
50th percentile latency	default	4.68816	4.65251	-0.03565	ms
90th percentile latency	default	5.15459	5.09939	-0.0552	ms
99th percentile latency	default	5.78512	6.54413	0.759	ms
100th percentile latency	default	6.06009	7.40902	1.34893	ms
50th percentile service time	default	3.84731	3.69542	-0.15189	ms
90th percentile service time	default	4.13533	3.96586	-0.16947	ms
99th percentile service time	default	5.08928	5.7257	0.63642	ms
100th percentile service time	default	5.10171	6.72767	1.62596	ms
error rate	default	0	0	0	%
Min Throughput	term	49.8551	49.8768	0.02177	ops/s
Mean Throughput	term	49.8604	49.8836	0.02319	ops/s
Median Throughput	term	49.8604	49.8836	0.02319	ops/s
Max Throughput	term	49.8658	49.8904	0.02461	ops/s
50th percentile latency	term	3.28416	3.65427	0.37011	ms
90th percentile latency	term	3.81008	4.07313	0.26305	ms
99th percentile latency	term	5.0204	4.29415	-0.72625	ms
100th percentile latency	term	5.11558	4.30016	-0.81543	ms
50th percentile service time	term	2.56419	2.95709	0.3929	ms
90th percentile service time	term	2.75235	3.16175	0.4094	ms
99th percentile service time	term	2.97421	3.42547	0.45126	ms
100th percentile service time	term	3.0297	3.52769	0.49799	ms
error rate	term	0	0	0	%
Min Throughput	range	1.00476	1.00477	1e-05	ops/s
Mean Throughput	range	1.00659	1.0066	1e-05	ops/s
Median Throughput	range	1.00633	1.00635	1e-05	ops/s
Max Throughput	range	1.00947	1.00949	2e-05	ops/s
50th percentile latency	range	6.12711	5.22718	-0.89993	ms
90th percentile latency	range	6.54722	5.79722	-0.75	ms
99th percentile latency	range	6.84516	6.19016	-0.65501	ms
100th percentile latency	range	6.95889	6.21025	-0.74863	ms
50th percentile service time	range	4.34116	3.5307	-0.81047	ms
90th percentile service time	range	4.58379	3.82036	-0.76343	ms
99th percentile service time	range	4.94365	4.0155	-0.92815	ms
100th percentile service time	range	4.96498	4.01873	-0.94625	ms
error rate	range	0	0	0	%
Min Throughput	200s-in-range	32.6498	32.9363	0.28654	ops/s
Mean Throughput	200s-in-range	32.6691	32.9406	0.27152	ops/s
Median Throughput	200s-in-range	32.6699	32.9403	0.27038	ops/s
Max Throughput	200s-in-range	32.6876	32.9453	0.25764	ops/s
50th percentile latency	200s-in-range	21.7245	4.91016	-16.8144	ms
90th percentile latency	200s-in-range	22.6576	5.74067	-16.9169	ms
99th percentile latency	200s-in-range	23.6271	6.15512	-17.472	ms
100th percentile latency	200s-in-range	23.7671	6.2467	-17.5204	ms
50th percentile service time	200s-in-range	20.5409	3.67083	-16.8701	ms
90th percentile service time	200s-in-range	20.982	3.86305	-17.1189	ms
99th percentile service time	200s-in-range	22.1484	4.2846	-17.8638	ms
100th percentile service time	200s-in-range	22.27	4.30811	-17.9619	ms
error rate	200s-in-range	0	0	0	%
Min Throughput	400s-in-range	49.9987	49.9922	-0.00647	ops/s
Mean Throughput	400s-in-range	49.9996	49.9947	-0.00494	ops/s
Median Throughput	400s-in-range	49.9995	49.9947	-0.00482	ops/s
Max Throughput	400s-in-range	50.0006	49.9971	-0.00355	ops/s
50th percentile latency	400s-in-range	3.09343	3.17861	0.08517	ms
90th percentile latency	400s-in-range	4.4055	4.53666	0.13116	ms
99th percentile latency	400s-in-range	4.55427	4.76311	0.20883	ms
100th percentile latency	400s-in-range	4.5677	4.7909	0.2232	ms
50th percentile service time	400s-in-range	2.2531	2.44349	0.19039	ms
90th percentile service time	400s-in-range	2.47463	2.57398	0.09935	ms
99th percentile service time	400s-in-range	2.88321	2.84571	-0.0375	ms
100th percentile service time	400s-in-range	3.10932	2.85115	-0.25817	ms
error rate	400s-in-range	0	0	0	%
Min Throughput	hourly_agg	1.00587	1.00558	-0.00029	ops/s
Mean Throughput	hourly_agg	1.00965	1.00917	-0.00048	ops/s
Median Throughput	hourly_agg	1.00878	1.00834	-0.00045	ops/s
Max Throughput	hourly_agg	1.01744	1.01658	-0.00086	ops/s
50th percentile latency	hourly_agg	13.8794	14.1345	0.25503	ms
90th percentile latency	hourly_agg	15.1649	15.3058	0.14088	ms
99th percentile latency	hourly_agg	47.4276	16.6416	-30.786	ms
100th percentile latency	hourly_agg	77.8631	16.7986	-61.0645	ms
50th percentile service time	hourly_agg	12.1968	12.2596	0.06271	ms
90th percentile service time	hourly_agg	13.4278	13.5647	0.13686	ms
99th percentile service time	hourly_agg	45.4969	14.991	-30.5059	ms
100th percentile service time	hourly_agg	76.2076	15.0558	-61.1518	ms
error rate	hourly_agg	0	0	0	%
Min Throughput	multi_term_agg	0.223996	0.158562	-0.06543	ops/s
Mean Throughput	multi_term_agg	0.224524	0.159027	-0.0655	ops/s
Median Throughput	multi_term_agg	0.224525	0.159065	-0.06546	ops/s
Max Throughput	multi_term_agg	0.224991	0.159373	-0.06562	ops/s
50th percentile latency	multi_term_agg	346799	531470	184672	ms
90th percentile latency	multi_term_agg	485369	741772	256404	ms
99th percentile latency	multi_term_agg	516303	789802	273499	ms
100th percentile latency	multi_term_agg	518067	792390	274323	ms
50th percentile service time	multi_term_agg	4510.7	6237.77	1727.07	ms
90th percentile service time	multi_term_agg	4635.5	6358.79	1723.29	ms
99th percentile service time	multi_term_agg	4672.7	6860.93	2188.23	ms
100th percentile service time	multi_term_agg	4674.99	6886.68	2211.69	ms
error rate	multi_term_agg	0	0	0	%
Min Throughput	scroll	25.0498	25.0486	-0.00122	pages/s
Mean Throughput	scroll	25.082	25.08	-0.00199	pages/s
Median Throughput	scroll	25.0746	25.0727	-0.00184	pages/s
Max Throughput	scroll	25.1485	25.1448	-0.00371	pages/s
50th percentile latency	scroll	215.797	222.125	6.32722	ms
90th percentile latency	scroll	220.968	228.121	7.15327	ms
99th percentile latency	scroll	277.971	295.836	17.8649	ms
100th percentile latency	scroll	290.683	325.955	35.2718	ms
50th percentile service time	scroll	213.83	220.325	6.49471	ms
90th percentile service time	scroll	219.203	226.246	7.04276	ms
99th percentile service time	scroll	276.214	294.054	17.8392	ms
100th percentile service time	scroll	288.463	324.559	36.0956	ms
error rate	scroll	0	0	0	%
Min Throughput	desc_sort_size	1.00315	1.00308	-7e-05	ops/s
Mean Throughput	desc_sort_size	1.00383	1.00374	-9e-05	ops/s
Median Throughput	desc_sort_size	1.00377	1.00369	-9e-05	ops/s
Max Throughput	desc_sort_size	1.00471	1.0046	-0.00011	ops/s
50th percentile latency	desc_sort_size	7.37066	7.19984	-0.17082	ms
90th percentile latency	desc_sort_size	8.00538	7.61741	-0.38796	ms
99th percentile latency	desc_sort_size	8.64336	8.36554	-0.27783	ms
100th percentile latency	desc_sort_size	8.73796	8.37801	-0.35995	ms
50th percentile service time	desc_sort_size	5.6701	5.43027	-0.23984	ms
90th percentile service time	desc_sort_size	6.01817	5.85338	-0.16479	ms
99th percentile service time	desc_sort_size	6.6386	6.65381	0.0152	ms
100th percentile service time	desc_sort_size	6.73909	6.94602	0.20693	ms
error rate	desc_sort_size	0	0	0	%
Min Throughput	asc_sort_size	1.00321	1.0033	9e-05	ops/s
Mean Throughput	asc_sort_size	1.0039	1.00401	0.00011	ops/s
Median Throughput	asc_sort_size	1.00385	1.00396	0.00011	ops/s
Max Throughput	asc_sort_size	1.0048	1.00493	0.00014	ops/s
50th percentile latency	asc_sort_size	8.27556	5.34599	-2.92957	ms
90th percentile latency	asc_sort_size	8.99986	5.7248	-3.27506	ms
99th percentile latency	asc_sort_size	9.41275	6.04441	-3.36834	ms
100th percentile latency	asc_sort_size	9.46272	6.17469	-3.28803	ms
50th percentile service time	asc_sort_size	6.50851	3.54009	-2.96842	ms
90th percentile service time	asc_sort_size	7.38443	3.61729	-3.76714	ms
99th percentile service time	asc_sort_size	7.6682	4.0622	-3.606	ms
100th percentile service time	asc_sort_size	7.71648	4.29801	-3.41847	ms
error rate	asc_sort_size	0	0	0	%
Min Throughput	desc_sort_timestamp	1.00308	1.00304	-5e-05	ops/s
Mean Throughput	desc_sort_timestamp	1.00375	1.00369	-6e-05	ops/s
Median Throughput	desc_sort_timestamp	1.00369	1.00364	-5e-05	ops/s
Max Throughput	desc_sort_timestamp	1.00461	1.00454	-7e-05	ops/s
50th percentile latency	desc_sort_timestamp	13.4889	13.2371	-0.2518	ms
90th percentile latency	desc_sort_timestamp	14.1442	13.8565	-0.28761	ms
99th percentile latency	desc_sort_timestamp	16.5827	16.1814	-0.40129	ms
100th percentile latency	desc_sort_timestamp	17.4342	16.4257	-1.00856	ms
50th percentile service time	desc_sort_timestamp	11.7741	11.5609	-0.21319	ms
90th percentile service time	desc_sort_timestamp	12.4102	11.8494	-0.56075	ms
99th percentile service time	desc_sort_timestamp	14.6062	14.5042	-0.10201	ms
100th percentile service time	desc_sort_timestamp	15.2545	15.1234	-0.13111	ms
error rate	desc_sort_timestamp	0	0	0	%
Min Throughput	asc_sort_timestamp	1.00328	1.00327	-0	ops/s
Mean Throughput	asc_sort_timestamp	1.00398	1.00398	-1e-05	ops/s
Median Throughput	asc_sort_timestamp	1.00393	1.00392	-1e-05	ops/s
Max Throughput	asc_sort_timestamp	1.0049	1.00489	-1e-05	ops/s
50th percentile latency	asc_sort_timestamp	7.34149	7.562	0.22051	ms
90th percentile latency	asc_sort_timestamp	8.11677	8.23865	0.12188	ms
99th percentile latency	asc_sort_timestamp	15.4642	8.71115	-6.75305	ms
100th percentile latency	asc_sort_timestamp	22.045	8.83942	-13.2056	ms
50th percentile service time	asc_sort_timestamp	5.66768	5.57361	-0.09407	ms
90th percentile service time	asc_sort_timestamp	6.13552	6.25307	0.11756	ms
99th percentile service time	asc_sort_timestamp	13.4713	6.60499	-6.86635	ms
100th percentile service time	asc_sort_timestamp	20.1068	6.80438	-13.3024	ms
error rate	asc_sort_timestamp	0	0	0	%
Min Throughput	desc_sort_with_after_timestamp	1.00521	1.00497	-0.00025	ops/s
Mean Throughput	desc_sort_with_after_timestamp	1.01372	1.01305	-0.00067	ops/s
Median Throughput	desc_sort_with_after_timestamp	1.00952	1.00907	-0.00046	ops/s
Max Throughput	desc_sort_with_after_timestamp	1.0547	1.05198	-0.00273	ops/s
50th percentile latency	desc_sort_with_after_timestamp	357.569	383.291	25.7229	ms
90th percentile latency	desc_sort_with_after_timestamp	383.748	402.621	18.8729	ms
99th percentile latency	desc_sort_with_after_timestamp	401.948	454.033	52.0844	ms
100th percentile latency	desc_sort_with_after_timestamp	406.508	461.3	54.7921	ms
50th percentile service time	desc_sort_with_after_timestamp	355.949	382.139	26.1905	ms
90th percentile service time	desc_sort_with_after_timestamp	382.361	401.02	18.6588	ms
99th percentile service time	desc_sort_with_after_timestamp	400.675	452.339	51.6636	ms
100th percentile service time	desc_sort_with_after_timestamp	405.007	459.741	54.7332	ms
error rate	desc_sort_with_after_timestamp	0	0	0	%
Min Throughput	asc_sort_with_after_timestamp	1.00907	1.00906	-1e-05	ops/s
Mean Throughput	asc_sort_with_after_timestamp	1.02415	1.02414	-1e-05	ops/s
Median Throughput	asc_sort_with_after_timestamp	1.01661	1.0166	-1e-05	ops/s
Max Throughput	asc_sort_with_after_timestamp	1.0987	1.09871	1e-05	ops/s
50th percentile latency	asc_sort_with_after_timestamp	5.59569	5.63412	0.03843	ms
90th percentile latency	asc_sort_with_after_timestamp	5.99826	6.07583	0.07757	ms
99th percentile latency	asc_sort_with_after_timestamp	6.20105	6.21885	0.0178	ms
100th percentile latency	asc_sort_with_after_timestamp	6.20518	6.21945	0.01427	ms
50th percentile service time	asc_sort_with_after_timestamp	3.86922	3.81671	-0.05251	ms
90th percentile service time	asc_sort_with_after_timestamp	4.00563	3.9811	-0.02453	ms
99th percentile service time	asc_sort_with_after_timestamp	4.14889	4.29301	0.14412	ms
100th percentile service time	asc_sort_with_after_timestamp	4.16631	4.31695	0.15064	ms
error rate	asc_sort_with_after_timestamp	0	0	0	%
Min Throughput	range_size	2.00958	2.00772	-0.00186	ops/s
Mean Throughput	range_size	2.01324	2.01068	-0.00256	ops/s
Median Throughput	range_size	2.01273	2.01027	-0.00247	ops/s
Max Throughput	range_size	2.01894	2.01531	-0.00363	ops/s
50th percentile latency	range_size	8.25356	47.8843	39.6308	ms
90th percentile latency	range_size	8.69098	49.408	40.717	ms
99th percentile latency	range_size	9.5448	51.1211	41.5763	ms
100th percentile latency	range_size	9.7614	51.1227	41.3613	ms
50th percentile service time	range_size	7.0621	46.6997	39.6376	ms
90th percentile service time	range_size	7.31833	48.2103	40.892	ms
99th percentile service time	range_size	8.49447	49.5278	41.0333	ms
100th percentile service time	range_size	8.8889	49.5725	40.6836	ms
error rate	range_size	0	0	0	%
Min Throughput	range_with_asc_sort	2.00576	1.98438	-0.02138	ops/s
Mean Throughput	range_with_asc_sort	2.00798	1.98902	-0.01896	ops/s
Median Throughput	range_with_asc_sort	2.00768	1.98942	-0.01826	ops/s
Max Throughput	range_with_asc_sort	2.01142	1.992	-0.01942	ops/s
50th percentile latency	range_with_asc_sort	20.4224	284.364	263.941	ms
90th percentile latency	range_with_asc_sort	23.3159	291.668	268.352	ms
99th percentile latency	range_with_asc_sort	24.6598	299.241	274.581	ms
100th percentile latency	range_with_asc_sort	24.8602	301.437	276.577	ms
50th percentile service time	range_with_asc_sort	18.7127	283.167	264.454	ms
90th percentile service time	range_with_asc_sort	21.6576	290.695	269.037	ms
99th percentile service time	range_with_asc_sort	23.3539	298.078	274.724	ms
100th percentile service time	range_with_asc_sort	23.3713	300.303	276.932	ms
error rate	range_with_asc_sort	0	0	0	%
Min Throughput	range_with_desc_sort	2.00874	2.00158	-0.00716	ops/s
Mean Throughput	range_with_desc_sort	2.01207	2.00219	-0.00988	ops/s
Median Throughput	range_with_desc_sort	2.01161	2.00209	-0.00952	ops/s
Max Throughput	range_with_desc_sort	2.01729	2.00314	-0.01414	ops/s
50th percentile latency	range_with_desc_sort	23.3313	381.446	358.115	ms
90th percentile latency	range_with_desc_sort	27.4215	387.192	359.77	ms
99th percentile latency	range_with_desc_sort	29.026	392.591	363.565	ms
100th percentile latency	range_with_desc_sort	29.6699	394.479	364.81	ms
50th percentile service time	range_with_desc_sort	21.3613	380.359	358.998	ms
90th percentile service time	range_with_desc_sort	25.3829	386.271	360.888	ms
99th percentile service time	range_with_desc_sort	26.9768	391.099	364.122	ms
100th percentile service time	range_with_desc_sort	27.3377	393.477	366.139	ms
error rate	range_with_desc_sort	0	0	0	%

sandeshkr419

Thanks @ajleong623 for rebasing changes. I am trying to review but I think reviewing 3 different aggregations in a single PR is making this difficult to review.

Let's break this into multiple independent PRs: missing values, rare terms, significant/string terms.

I do have some comments on missing terms section which you can address when you break it.

(You can put this in draft mode and later close this once you finish merging up all changes independently).

sandeshkr419 · 2025-08-06T21:16:52Z

server/src/main/java/org/opensearch/search/aggregations/bucket/missing/MissingAggregator.java

+        // The optimization does not work when there are subaggregations.
+        if (subAggregators.length > 0) {
+            return false;
+        }
+
+        // When fieldname does not exist, we cannot collect through the precomputation.
+        if (fieldName == null || weight == null) {
+            return false;
+        }
+
+        // we do not collect any documents through the missing aggregation when the missing parameter
+        // is up.
+        if (valuesSourceConfig != null && valuesSourceConfig.missing() != null) {
+            return true;
+        }
+
+        // The optimization could only be used if there are no deleted documents and the top-level
+        // query matches all documents in the segment.
+        if (weight.count(ctx) == 0) {
+            return true;
+        } else if (weight.count(ctx) != ctx.reader().maxDoc()) {
+            return false;


Looking into the ordering of different short-circuits, I'm thinking the true cases needs to be prioritized first because that will save a lot of time in cutting up flow.

For example, let's say you haven sub-aggs > 0, but weight.count == 0 : in this case, you will still say that pre-compute has returned false and go on to create a leaf collector even though your matching documents (weight.count) is 0.

So, I guess prioritizing weight.count == 0 in ordering makes sense.

Something like:

weight == null : return false

weight.count == 0 : return true

valuesSourceConfig != null && valuesSourceConfig.missing() != null or subAggregators.length > 0

Remaining can exist in the same order after the above checks.

Between subAggregators.length > 0 and valuesSourceConfig != null && valuesSourceConfig.missing() != null - I'm thinking that do subAggegators matter if valuesSourceConfig != null && valuesSourceConfig.missing() != null is satisfied. If the valueSourceConfig check works with subAggregators as well, then the subAggregator checks can be moved after valueSourceConfig check.

You can probably start with:

if (weight == null) { // Weight not assigned - cannot use this optimization return false; } else { if (weight.count(ctx) == 0) { // No documents matches top level query on this segment, we can skip the segment entirely return true; } else if (weight.count(ctx) != ctx.reader().maxDoc()) { // weight.count(ctx) == ctx.reader().maxDoc() implies there are no deleted documents and // top-level query matches all docs in the segment return false; } }

sandeshkr419 · 2025-08-06T21:18:15Z

server/src/main/java/org/opensearch/search/aggregations/bucket/missing/MissingAggregator.java

+        if (this.valuesSource != null) {
+            this.fieldName = valuesSource.getIndexFieldName();
+        } else {
+            this.fieldName = null;
+        }


nit: just making this more concise and readable

this.fieldName = this.valuesSource != null ? valuesSource.getIndexFieldName() : null;

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

github-actions · 2025-08-26T15:57:54Z

❌ Gradle check result for 5f69668: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

opensearch-trigger-bot · 2025-10-19T15:22:00Z

This PR is stalled because it has been open for 30 days with no activity.

ajleong623 added 12 commits May 31, 2025 20:12

made changes to current state

11ecaf3

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

small cleanup

65e20b8

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

merged with main

d51c2a0

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

spotless

ab13378

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

added edits to missing terms agg tests

b5e08d8

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

spotless

ebca7e1

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

test spotless check

9d73b57

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

added new expected counts tests for string rare aggregation tests and…

66171ca

… completed action items Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

spotless

b4a4128

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

mode tests more deterministic and improved coverage

b60c221

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

checked comments, removed more nondeterminism, and reformatted

0375104

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

tests pass, I think

86a23cb

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

ajleong623 requested review from Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19 and reta as code owners August 5, 2025 17:14

opensearch-infra bot added the lucene label Aug 5, 2025

ajleong623 requested review from sachinpkale and saratvemulapalli as code owners August 5, 2025 17:14

ajleong623 requested review from a team, VachaShah, shwetathareja and sohami as code owners August 5, 2025 17:14

github-actions bot added Search:Aggregations Search:Performance labels Aug 5, 2025

github-actions bot mentioned this pull request Aug 6, 2025

Request to approve/deny benchmark run for PR #18927 #18945

Closed

sandeshkr419 added the v3.3.0 label Aug 6, 2025

sandeshkr419 requested changes Aug 7, 2025

View reviewed changes

ajleong623 marked this pull request as draft August 7, 2025 01:56

ajleong623 mentioned this pull request Aug 7, 2025

Rare terms aggregation precomputation #18978

Merged

3 tasks

missing terms accounter for global ordinals

5f69668

Signed-off-by: Anthony Leong <aj.leong623@gmail.com>

sandeshkr419 removed v3.3.0 lucene labels Sep 17, 2025

ajleong623 mentioned this pull request Oct 14, 2025

Aggregation precomputation missing terms #19627

Open

3 tasks

opensearch-trigger-bot bot added the stalled Issues that have stalled label Oct 19, 2025

Aggregation precomputation (rebased) #18927

Are you sure you want to change the base?

Aggregation precomputation (rebased) #18927

Uh oh!

Conversation

ajleong623 commented Aug 5, 2025

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

ajleong623 commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

ajleong623 commented Aug 6, 2025

Uh oh!

opensearch-ci-bot commented Aug 7, 2025

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/4009/

Uh oh!

opensearch-ci-bot commented Aug 7, 2025

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/145/

Uh oh!

sandeshkr419 left a comment

Choose a reason for hiding this comment

Uh oh!

sandeshkr419 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

sandeshkr419 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

sandeshkr419 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

opensearch-trigger-bot bot commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants