Add resizable search queue to OpenSearch (picking up #826) #3207

reta · 2022-05-05T20:26:50Z

Description

Create a new type of threadpool of "RESIZALE" type to dynamically adjust search queue size in runtime. The current threadpools only be updated via opensearch.yml. Picking up the work from #826.

id                     name                active rejected completed size type
Zwz7ZFYIToOULW9DgFBdfw analyze                  0        0         0    1 fixed
Zwz7ZFYIToOULW9DgFBdfw fetch_shard_started      0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw fetch_shard_store        0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw flush                    0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw force_merge              0        0         0    1 fixed
Zwz7ZFYIToOULW9DgFBdfw generic                  0        0       173      scaling
Zwz7ZFYIToOULW9DgFBdfw get                      0        0         0   12 fixed
Zwz7ZFYIToOULW9DgFBdfw listener                 0        0         0    6 fixed
Zwz7ZFYIToOULW9DgFBdfw management               1        0        21      scaling
Zwz7ZFYIToOULW9DgFBdfw refresh                  0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw search                   0        0         0   19 resizable
Zwz7ZFYIToOULW9DgFBdfw search_throttled         0        0         0    1 resizable
Zwz7ZFYIToOULW9DgFBdfw snapshot                 0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw system_read              0        0         0    5 fixed
Zwz7ZFYIToOULW9DgFBdfw system_write             0        0         0    5 fixed
Zwz7ZFYIToOULW9DgFBdfw warmer                   0        0         0      scaling
Zwz7ZFYIToOULW9DgFBdfw write                    0        0         0   12 fixed

This PR goes side by side with #2595, we replacing the SEARCH_XXX pools with the ones where the size could be adjusted at runtime. Right now this is not exposed to the outside world through API but plugins could do such adjustments.

Issues Resolved

Closes #476

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ruizhen <ruizhen@amazon.com>

opensearch-ci-bot · 2022-05-05T20:54:16Z

❌ Gradle Check failure 5a2545794f668c9c2fe95b24f8586cac2c81f620
Log 5050

Reports 5050

opensearch-ci-bot · 2022-05-05T21:04:31Z

❌ Gradle Check failure 15ff5f07270ce4fa52e2d295e983758233f05b4f
Log 5052

Reports 5052

peterzhuamazon · 2022-05-06T00:40:17Z

start gradle check

opensearch-ci-bot · 2022-05-06T01:03:44Z

❌ Gradle Check failure 15ff5f07270ce4fa52e2d295e983758233f05b4f
Log 5066

Reports 5066

dreamer-89 · 2022-05-06T02:03:24Z

start gradle check

opensearch-ci-bot · 2022-05-06T02:21:17Z

❌ Gradle Check failure 15ff5f07270ce4fa52e2d295e983758233f05b4f
Log 5071

Reports 5071

server/src/main/java/org/opensearch/threadpool/ThreadPool.java

opensearch-ci-bot · 2022-05-06T13:29:17Z

✅ Gradle Check success 5f64ad05fb82d3cfd9deeedc2cc37a8195b88f69
Log 5081

Reports 5081

opensearch-ci-bot · 2022-05-06T15:34:40Z

❌ Gradle Check failure edff7a9965c997939a9860b59fd0b465200bab9a
Log 5086

Reports 5086

opensearch-ci-bot · 2022-05-06T16:20:37Z

❌ Gradle Check failure 67df4c756ecc17b6144a8ee07b4156b46e4254f6
Log 5087

Reports 5087

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

opensearch-ci-bot · 2022-05-06T17:02:27Z

❌ Gradle Check failure 2a15da8b7bda031a6f47f7d6fb156b7b3bdb650b
Log 5092

Reports 5092

opensearch-ci-bot · 2022-05-06T17:33:41Z

✅ Gradle Check success 44c01c0
Log 5094

Reports 5094

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java

Accident :)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

reta · 2022-05-06T23:37:27Z

I have some minor comments.

Let's spell out the side effect of this change for the user in the PR description? How does one use this?

Thanks @dblock !

Let's spell out the side effect of this change for the user in the PR description?

This PR goes side by side with #2595, we replacing the SEARCH_XXX pools with the ones where the size could be adjusted at runtime. Right now this is not exposed to the outside world through API but plugins could do such adjustments.

opensearch-ci-bot · 2022-05-07T00:07:45Z

❌ Gradle Check failure cd0f9e4
Log 5114

Reports 5114

reta · 2022-05-07T02:17:46Z

start gradle check

opensearch-ci-bot · 2022-05-07T03:04:28Z

❌ Gradle Check failure cd0f9e4
Log 5122

Reports 5122

reta · 2022-05-07T12:53:20Z

start gradle check

opensearch-ci-bot · 2022-05-07T13:24:36Z

✅ Gradle Check success cd0f9e4
Log 5126

Reports 5126

Bukhtawar · 2022-05-10T16:37:10Z

server/src/main/java/org/opensearch/common/util/concurrent/EWMATrackingThreadPoolExecutor.java

+    // This is a random starting point alpha. TODO: revisit this with actual testing and/or make it configurable
+    double EWMA_ALPHA = 0.3;
+


Should this be configurable, maybe more of an expert settings

This is the default, but I will add the constructors to allow configuration

Bukhtawar · 2022-05-10T16:38:07Z

server/src/main/java/org/opensearch/common/util/concurrent/OpenSearchExecutors.java

+    ) {
+
+        if (queueCapacity <= 0) {
+            throw new IllegalArgumentException("queue capacity for [" + name + "] executor must be positive, got: " + queueCapacity);


Maybe consider 0 in the exception

Sorry, didn't get this one: positive means > 0 and 0 is not acceptable value, makes sense?

Bukhtawar · 2022-05-10T16:39:30Z

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java

+public final class QueueResizableOpenSearchThreadPoolExecutor extends OpenSearchThreadPoolExecutor
+    implements
+        EWMATrackingThreadPoolExecutor {
+


nit: formatting

Not me - spotless

Bukhtawar · 2022-05-12T17:36:11Z

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java

+     * Resizes the work queue capacity of the pool
+     * @param capacity the new capacity
+     */
+    public synchronized int resize(int capacity) {


For my understanding who calls resize?

AFAIK that's could be done from the plugin(s), as per attached issue

Should the plugin have the capability able to override the resize logic? Do you think we could expose a contract?

I believe this is the whole purpose of the issue and the change (I am finalizing the #826 since the pull request was abandoned). From own perspective - it could be useful in certain cases since thread pools are not adjustable at runtime.

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

opensearch-ci-bot · 2022-05-12T18:38:40Z

❌ Gradle Check failure 3dfe946
Log 5259

Reports 5259

reta · 2022-05-12T18:57:24Z

x Gradle Check failure 3dfe946 Log 5259

Reports 5259

#1703

reta · 2022-05-12T18:57:33Z

start gradle check

opensearch-ci-bot · 2022-05-12T19:26:24Z

✅ Gradle Check success 3dfe946
Log 5260

Reports 5260

reta · 2022-05-13T16:37:13Z

@saratvemulapalli anything left on your side? thank you!

saratvemulapalli · 2022-05-13T16:51:24Z

@saratvemulapalli anything left on your side? thank you!

Thanks @reta No I dont have anything else.
I was just trying to read the PR and understand the context :)

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java

dblock · 2022-05-16T12:38:23Z

@reta Any reason not to backport to 2.x? No breaking changes here AFAIK.

reta · 2022-05-16T13:09:14Z

@reta Any reason not to backport to 2.x? No breaking changes here AFAIK.

It changes the pool behind SEARCH and SEARCH_THROTTLED, we thought with @andrross that it is breaking change in #2595. I am about to run the benchmarks to evaluate the impact (if any), we could target 2.1.0, for 2.0.0 would be great to backport after the benchmarking, wdyt?

dblock · 2022-05-16T15:43:12Z

I it's a net performance improvement without any visible backwards incompatible changes, then I don't see why not. Breaking changes are only about user experience, APIs, interfaces.

reta · 2022-05-16T15:45:58Z

I it's a net performance improvement without any visible backwards incompatible changes, then I don't see why not. Breaking changes are only about user experience, APIs, interfaces.

Got it, thanks @dblock , I will run benchmarks shortly and update the issue on backport plans, thanks!

reta · 2022-05-25T19:30:17Z

I it's a net performance improvement without any visible backwards incompatible changes, then I don't see why not. Breaking changes are only about user experience, APIs, interfaces.

Got it, thanks @dblock , I will run benchmarks shortly and update the issue on backport plans, thanks!

@dblock sorry for delay, finally run the tests I wanted, regarding

... any visible backwards incompatible changes,

There is only one - removal of the deprecated properties for SEARCH / SEARCH_THROTTLED pool [1].

thread_pool:
    search:
        size: 30
        queue_size: 500
        min_queue_size: 10  --> removed
        max_queue_size: 1000  --> removed
        auto_queue_frame_size: 2000  --> removed
        target_response_time: 1s  --> removed

If you think this is a backward compatibility concern, we could still bring the change to 2.1.0 but without changing SEARCH / SEARCH_THROTTLED pool types. Would it make sense?

Regrading performance, no regressions shown by the benchmarks:

nyc_taxis (search and aggregation only)

Metric	Task	2.0.0	3.0.0	Diff	Unit
...	...	...	...	...	...
Min Throughput	default	3.01603	3.01586	-0.00017	ops/s
Mean Throughput	default	3.02609	3.02584	-0.00025	ops/s
Median Throughput	default	3.02374	3.02358	-0.00015	ops/s
Max Throughput	default	3.04603	3.04554	-0.00048	ops/s
50th percentile latency	default	10.512	10.9957	0.48375	ms
90th percentile latency	default	12.1891	12.6665	0.47739	ms
99th percentile latency	default	12.8956	13.5526	0.657	ms
100th percentile latency	default	12.9211	13.565	0.64393	ms
50th percentile service time	default	8.76475	9.15046	0.38571	ms
90th percentile service time	default	10.3444	10.6486	0.30422	ms
99th percentile service time	default	10.8272	11.174	0.34685	ms
100th percentile service time	default	10.8565	11.3419	0.48544	ms
error rate	default	0	0	0	%
Min Throughput	range	0.703045	0.703475	0.00043	ops/s
Mean Throughput	range	0.705008	0.70571	0.0007	ops/s
Median Throughput	range	0.70456	0.705193	0.00063	ops/s
Max Throughput	range	0.709045	0.710306	0.00126	ops/s
50th percentile latency	range	418.294	190.774	-227.52	ms
90th percentile latency	range	454.41	213.684	-240.726	ms
99th percentile latency	range	563.515	225.804	-337.711	ms
100th percentile latency	range	565.653	230.017	-335.636	ms
50th percentile service time	range	416.174	188.381	-227.793	ms
90th percentile service time	range	451.55	210.951	-240.598	ms
99th percentile service time	range	561.716	223.472	-338.245	ms
100th percentile service time	range	564.861	227.001	-337.86	ms
error rate	range	0	0	0	%
Min Throughput	distance_amount_agg	2.0118	2.01191	0.00012	ops/s
Mean Throughput	distance_amount_agg	2.01942	2.0196	0.00018	ops/s
Median Throughput	distance_amount_agg	2.01764	2.01781	0.00016	ops/s
Max Throughput	distance_amount_agg	2.03494	2.03518	0.00025	ops/s
50th percentile latency	distance_amount_agg	8.43376	7.63267	-0.80109	ms
90th percentile latency	distance_amount_agg	9.05376	8.67633	-0.37743	ms
99th percentile latency	distance_amount_agg	9.87221	9.10477	-0.76745	ms
100th percentile latency	distance_amount_agg	9.96799	9.44583	-0.52216	ms
50th percentile service time	distance_amount_agg	6.33497	6.05554	-0.27943	ms
90th percentile service time	distance_amount_agg	6.68171	6.60702	-0.07468	ms
99th percentile service time	distance_amount_agg	7.09549	7.39716	0.30166	ms
100th percentile service time	distance_amount_agg	7.21325	7.46483	0.25157	ms
error rate	distance_amount_agg	0	0	0	%
Min Throughput	autohisto_agg	1.49243	1.49562	0.00319	ops/s
Mean Throughput	autohisto_agg	1.49578	1.49757	0.0018	ops/s
Median Throughput	autohisto_agg	1.49616	1.4978	0.00164	ops/s
Max Throughput	autohisto_agg	1.49742	1.49851	0.00109	ops/s
50th percentile latency	autohisto_agg	466.632	485.789	19.1568	ms
90th percentile latency	autohisto_agg	486.674	539.443	52.7692	ms
99th percentile latency	autohisto_agg	523.54	553.441	29.901	ms
100th percentile latency	autohisto_agg	539.967	555.939	15.9721	ms
50th percentile service time	autohisto_agg	465.333	484.183	18.8497	ms
90th percentile service time	autohisto_agg	485.29	537.177	51.8877	ms
99th percentile service time	autohisto_agg	521.557	552.085	30.5282	ms
100th percentile service time	autohisto_agg	537.714	553.973	16.2587	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50285	1.50154	-0.00131	ops/s
Mean Throughput	date_histogram_agg	1.50462	1.5025	-0.00212	ops/s
Median Throughput	date_histogram_agg	1.50422	1.50228	-0.00194	ops/s
Max Throughput	date_histogram_agg	1.50816	1.50442	-0.00375	ops/s
50th percentile latency	date_histogram_agg	479.104	490.881	11.7763	ms
90th percentile latency	date_histogram_agg	512.327	551.553	39.2264	ms
99th percentile latency	date_histogram_agg	555.329	574.225	18.8962	ms
100th percentile latency	date_histogram_agg	558.072	581.807	23.7346	ms
50th percentile service time	date_histogram_agg	477.227	489.498	12.2716	ms
90th percentile service time	date_histogram_agg	511.139	550.112	38.9724	ms
99th percentile service time	date_histogram_agg	553.898	572.41	18.5116	ms
100th percentile service time	date_histogram_agg	556.489	580.048	23.5587	ms
error rate	date_histogram_agg	0	0	0	%

http_logs (search and aggregation only)

Metric	Task	2.0.0	3.0.0	Diff	Unit
...	...	...	...	...	...
Min Throughput	default	8.00221	8.00439	0.00218	ops/s
Mean Throughput	default	8.00248	8.00482	0.00234	ops/s
Median Throughput	default	8.00249	8.00481	0.00232	ops/s
Max Throughput	default	8.00274	8.00525	0.0025	ops/s
50th percentile latency	default	7.68398	8.37626	0.69229	ms
90th percentile latency	default	9.17801	8.9582	-0.21981	ms
99th percentile latency	default	9.99378	9.47531	-0.51848	ms
100th percentile latency	default	10.0425	9.88431	-0.15818	ms
50th percentile service time	default	6.21694	6.78643	0.56949	ms
90th percentile service time	default	6.96319	7.13032	0.16714	ms
99th percentile service time	default	7.52822	7.43142	-0.09681	ms
100th percentile service time	default	8.12879	7.62242	-0.50637	ms
error rate	default	0	0	0	%
Min Throughput	term	49.7882	49.8224	0.0342	ops/s
Mean Throughput	term	49.7966	49.8298	0.03319	ops/s
Median Throughput	term	49.7966	49.8298	0.03319	ops/s
Max Throughput	term	49.8051	49.8373	0.03217	ops/s
50th percentile latency	term	6.54982	12.4655	5.9157	ms
90th percentile latency	term	12.9127	13.3714	0.45871	ms
99th percentile latency	term	14.8837	15.3168	0.43316	ms
100th percentile latency	term	27.8102	16.4798	-11.3304	ms
50th percentile service time	term	5.35667	11.0563	5.69963	ms
90th percentile service time	term	11.377	11.8741	0.49705	ms
99th percentile service time	term	12.3291	13.7963	1.46724	ms
100th percentile service time	term	26.5797	14.8792	-11.7005	ms
error rate	term	0	0	0	%
Min Throughput	range	1.00463	1.00476	0.00014	ops/s
Mean Throughput	range	1.00641	1.00659	0.00018	ops/s
Median Throughput	range	1.00616	1.00634	0.00018	ops/s
Max Throughput	range	1.00921	1.00947	0.00027	ops/s
50th percentile latency	range	13.1732	13.2035	0.03027	ms
90th percentile latency	range	16.8576	17.1525	0.29494	ms
99th percentile latency	range	18.4478	18.036	-0.41177	ms
100th percentile latency	range	20.6701	21.8114	1.14129	ms
50th percentile service time	range	10.7207	10.8275	0.10689	ms
90th percentile service time	range	14.7411	14.6425	-0.09859	ms
99th percentile service time	range	15.5539	15.6802	0.12634	ms
100th percentile service time	range	18.6388	19.019	0.38021	ms
error rate	range	0	0	0	%
Min Throughput	200s-in-range	32.8873	32.9331	0.04576	ops/s
Mean Throughput	200s-in-range	32.8941	32.9369	0.04281	ops/s
Median Throughput	200s-in-range	32.894	32.9371	0.0431	ops/s
Max Throughput	200s-in-range	32.9008	32.9404	0.03958	ops/s
50th percentile latency	200s-in-range	12.4071	14.643	2.23589	ms
90th percentile latency	200s-in-range	16.3007	15.7014	-0.59928	ms
99th percentile latency	200s-in-range	18.1677	16.3358	-1.83186	ms
100th percentile latency	200s-in-range	19.0266	16.4531	-2.5735	ms
50th percentile service time	200s-in-range	10.602	12.9783	2.37633	ms
90th percentile service time	200s-in-range	14.5145	14.0507	-0.46381	ms
99th percentile service time	200s-in-range	16.1185	14.4248	-1.69374	ms
100th percentile service time	200s-in-range	17.0706	14.901	-2.1696	ms
error rate	200s-in-range	0	0	0	%
Min Throughput	400s-in-range	49.9987	49.903	-0.09565	ops/s
Mean Throughput	400s-in-range	49.999	49.9065	-0.0925	ops/s
Median Throughput	400s-in-range	49.9989	49.9065	-0.09239	ops/s
Max Throughput	400s-in-range	49.9995	49.91	-0.08946	ops/s
50th percentile latency	400s-in-range	8.18514	8.94803	0.76289	ms
90th percentile latency	400s-in-range	9.54274	9.5797	0.03696	ms
99th percentile latency	400s-in-range	10.306	9.96981	-0.33623	ms
100th percentile latency	400s-in-range	10.4425	15.2209	4.77837	ms
50th percentile service time	400s-in-range	6.77338	7.42973	0.65635	ms
90th percentile service time	400s-in-range	7.83353	7.81598	-0.01754	ms
99th percentile service time	400s-in-range	8.34765	8.37274	0.02509	ms
100th percentile service time	400s-in-range	8.42711	13.0866	4.65947	ms
error rate	400s-in-range	0	0	0	%
Min Throughput	hourly_agg	0.200447	0.200423	-2e-05	ops/s
Mean Throughput	hourly_agg	0.200618	0.200585	-3e-05	ops/s
Median Throughput	hourly_agg	0.200594	0.200562	-3e-05	ops/s
Max Throughput	hourly_agg	0.200886	0.200839	-5e-05	ops/s
50th percentile latency	hourly_agg	2546.73	2657.02	110.296	ms
90th percentile latency	hourly_agg	2592.32	2703.39	111.076	ms
99th percentile latency	hourly_agg	2681.31	2740.59	59.2801	ms
100th percentile latency	hourly_agg	2689.59	2749.19	59.6026	ms
50th percentile service time	hourly_agg	2543.37	2655.01	111.637	ms
90th percentile service time	hourly_agg	2588.75	2701.26	112.506	ms
99th percentile service time	hourly_agg	2678.43	2736.52	58.0876	ms
100th percentile service time	hourly_agg	2687	2746.34	59.3386	ms
error rate	hourly_agg	0	0	0	%
Min Throughput	scroll	25.0507	25.0527	0.00201	pages/s
Mean Throughput	scroll	25.0834	25.0868	0.00339	pages/s
Median Throughput	scroll	25.076	25.079	0.00309	pages/s
Max Throughput	scroll	25.1513	25.1574	0.00617	pages/s
50th percentile latency	scroll	234.563	245.951	11.388	ms
90th percentile latency	scroll	265.414	287.358	21.9441	ms
99th percentile latency	scroll	311.755	326.597	14.8421	ms
100th percentile latency	scroll	330.247	351.546	21.2992	ms
50th percentile service time	scroll	231.239	242.784	11.5456	ms
90th percentile service time	scroll	262.547	284.44	21.8924	ms
99th percentile service time	scroll	308.578	324.156	15.5784	ms
100th percentile service time	scroll	327.591	348.721	21.1297	ms
error rate	scroll	0	0	0	%
Min Throughput	desc_sort_timestamp	0.501015	0.501025	1e-05	ops/s
Mean Throughput	desc_sort_timestamp	0.501232	0.501245	1e-05	ops/s
Median Throughput	desc_sort_timestamp	0.501215	0.501228	1e-05	ops/s
Max Throughput	desc_sort_timestamp	0.501515	0.501531	2e-05	ops/s
50th percentile latency	desc_sort_timestamp	652.535	683.276	30.7416	ms
90th percentile latency	desc_sort_timestamp	675.574	703.809	28.2351	ms
99th percentile latency	desc_sort_timestamp	714.458	764.219	49.7612	ms
100th percentile latency	desc_sort_timestamp	716.489	772.286	55.7966	ms
50th percentile service time	desc_sort_timestamp	649.752	680.292	30.5407	ms
90th percentile service time	desc_sort_timestamp	673.166	701.23	28.0636	ms
99th percentile service time	desc_sort_timestamp	711.741	761.242	49.5004	ms
100th percentile service time	desc_sort_timestamp	714.197	769.173	54.9769	ms
error rate	desc_sort_timestamp	0	0	0	%
Min Throughput	asc_sort_timestamp	0.50164	0.501641	0	ops/s
Mean Throughput	asc_sort_timestamp	0.501992	0.501993	0	ops/s
Median Throughput	asc_sort_timestamp	0.501965	0.501965	0	ops/s
Max Throughput	asc_sort_timestamp	0.502453	0.502455	0	ops/s
50th percentile latency	asc_sort_timestamp	30.0054	35.0532	5.04773	ms
90th percentile latency	asc_sort_timestamp	33.7782	52.3356	18.5575	ms
99th percentile latency	asc_sort_timestamp	57.08	58.5949	1.51486	ms
100th percentile latency	asc_sort_timestamp	57.1303	60.0232	2.8929	ms
50th percentile service time	asc_sort_timestamp	27.0778	31.8819	4.80416	ms
90th percentile service time	asc_sort_timestamp	30.375	48.8441	18.4692	ms
99th percentile service time	asc_sort_timestamp	53.881	54.9683	1.08728	ms
100th percentile service time	asc_sort_timestamp	54.0996	56.4284	2.32881	ms
error rate	asc_sort_timestamp	0	0	0	%
Min Throughput	desc_sort_with_after_timestamp	0.502476	0.502214	-0.00026	ops/s
Mean Throughput	desc_sort_with_after_timestamp	0.506505	0.505812	-0.00069	ops/s
Median Throughput	desc_sort_with_after_timestamp	0.504518	0.504044	-0.00047	ops/s
Max Throughput	desc_sort_with_after_timestamp	0.525909	0.523097	-0.00281	ops/s
50th percentile latency	desc_sort_with_after_timestamp	866.508	952.86	86.3517	ms
90th percentile latency	desc_sort_with_after_timestamp	907.291	987.509	80.2176	ms
99th percentile latency	desc_sort_with_after_timestamp	1018.87	1046.59	27.7263	ms
100th percentile latency	desc_sort_with_after_timestamp	1028.18	1093.72	65.5354	ms
50th percentile service time	desc_sort_with_after_timestamp	863.539	951.523	87.9846	ms
90th percentile service time	desc_sort_with_after_timestamp	905.088	986.212	81.1242	ms
99th percentile service time	desc_sort_with_after_timestamp	1016.47	1044.26	27.794	ms
100th percentile service time	desc_sort_with_after_timestamp	1025.64	1090.68	65.0482	ms
error rate	desc_sort_with_after_timestamp	0	0	0	%
Min Throughput	asc_sort_with_after_timestamp	0.502376	0.502259	-0.00012	ops/s
Mean Throughput	asc_sort_with_after_timestamp	0.506246	0.505929	-0.00032	ops/s
Median Throughput	asc_sort_with_after_timestamp	0.504337	0.504122	-0.00022	ops/s
Max Throughput	asc_sort_with_after_timestamp	0.524844	0.523545	-0.0013	ops/s
50th percentile latency	asc_sort_with_after_timestamp	928.733	1009.39	80.6548	ms
90th percentile latency	asc_sort_with_after_timestamp	958.256	1035.28	77.0265	ms
99th percentile latency	asc_sort_with_after_timestamp	993.756	1071.2	77.4397	ms
100th percentile latency	asc_sort_with_after_timestamp	1091.93	1080.84	-11.0903	ms
50th percentile service time	asc_sort_with_after_timestamp	926.65	1007.06	80.407	ms
90th percentile service time	asc_sort_with_after_timestamp	955.901	1033.54	77.6397	ms
99th percentile service time	asc_sort_with_after_timestamp	992.325	1069.3	76.9757	ms
100th percentile service time	asc_sort_with_after_timestamp	1090.52	1079.36	-11.1592	ms
error rate	asc_sort_with_after_timestamp	0	0	0	%
Min Throughput	wait-until-merges-1-seg-finish	71.1768	45.1326	-26.0442	ops/s
Mean Throughput	wait-until-merges-1-seg-finish	71.1768	45.1326	-26.0442	ops/s
Median Throughput	wait-until-merges-1-seg-finish	71.1768	45.1326	-26.0442	ops/s
Max Throughput	wait-until-merges-1-seg-finish	71.1768	45.1326	-26.0442	ops/s
100th percentile latency	wait-until-merges-1-seg-finish	13.4659	21.3753	7.9094	ms
100th percentile service time	wait-until-merges-1-seg-finish	13.4659	21.3753	7.9094	ms
error rate	wait-until-merges-1-seg-finish	0	0	0	%
Min Throughput	desc-sort-timestamp-after-force-merge-1-seg	1.52894	1.52204	-0.0069	ops/s
Mean Throughput	desc-sort-timestamp-after-force-merge-1-seg	1.53478	1.52494	-0.00984	ops/s
Median Throughput	desc-sort-timestamp-after-force-merge-1-seg	1.53532	1.52425	-0.01107	ops/s
Max Throughput	desc-sort-timestamp-after-force-merge-1-seg	1.53998	1.52979	-0.01019	ops/s
50th percentile latency	desc-sort-timestamp-after-force-merge-1-seg	38346.4	39658.1	1311.75	ms
90th percentile latency	desc-sort-timestamp-after-force-merge-1-seg	44843.8	45993.4	1149.59	ms
99th percentile latency	desc-sort-timestamp-after-force-merge-1-seg	46533.7	47367.6	833.887	ms
100th percentile latency	desc-sort-timestamp-after-force-merge-1-seg	46686.4	47541.8	855.399	ms
50th percentile service time	desc-sort-timestamp-after-force-merge-1-seg	656.122	655.751	-0.37095	ms
90th percentile service time	desc-sort-timestamp-after-force-merge-1-seg	701.464	697.192	-4.27179	ms
99th percentile service time	desc-sort-timestamp-after-force-merge-1-seg	784.255	744.138	-40.1172	ms
100th percentile service time	desc-sort-timestamp-after-force-merge-1-seg	787.343	758.515	-28.828	ms
error rate	desc-sort-timestamp-after-force-merge-1-seg	0	0	0	%
Min Throughput	asc-sort-timestamp-after-force-merge-1-seg	2.00635	2.00642	7e-05	ops/s
Mean Throughput	asc-sort-timestamp-after-force-merge-1-seg	2.00772	2.0078	7e-05	ops/s
Median Throughput	asc-sort-timestamp-after-force-merge-1-seg	2.00762	2.00769	7e-05	ops/s
Max Throughput	asc-sort-timestamp-after-force-merge-1-seg	2.00948	2.00958	0.0001	ops/s
50th percentile latency	asc-sort-timestamp-after-force-merge-1-seg	29.1459	33.7639	4.618	ms
90th percentile latency	asc-sort-timestamp-after-force-merge-1-seg	49.2804	56.4928	7.2124	ms
99th percentile latency	asc-sort-timestamp-after-force-merge-1-seg	54.3399	57.8095	3.46968	ms
100th percentile latency	asc-sort-timestamp-after-force-merge-1-seg	56.3336	57.8883	1.55473	ms
50th percentile service time	asc-sort-timestamp-after-force-merge-1-seg	26.8871	31.8396	4.95252	ms
90th percentile service time	asc-sort-timestamp-after-force-merge-1-seg	46.8394	54.319	7.47961	ms
99th percentile service time	asc-sort-timestamp-after-force-merge-1-seg	52.5184	56.0072	3.48884	ms
100th percentile service time	asc-sort-timestamp-after-force-merge-1-seg	53.0127	56.2684	3.25575	ms
error rate	asc-sort-timestamp-after-force-merge-1-seg	0	0	0	%
Min Throughput	desc-sort-with-after-timestamp-after-force-merge-1-seg	0.502488	0.502397	-9e-05	ops/s
Mean Throughput	desc-sort-with-after-timestamp-after-force-merge-1-seg	0.50654	0.506296	-0.00024	ops/s
Median Throughput	desc-sort-with-after-timestamp-after-force-merge-1-seg	0.504543	0.504376	-0.00017	ops/s
Max Throughput	desc-sort-with-after-timestamp-after-force-merge-1-seg	0.526029	0.525035	-0.00099	ops/s
50th percentile latency	desc-sort-with-after-timestamp-after-force-merge-1-seg	928.58	977.832	49.2514	ms
90th percentile latency	desc-sort-with-after-timestamp-after-force-merge-1-seg	985.915	1055.96	70.0458	ms
99th percentile latency	desc-sort-with-after-timestamp-after-force-merge-1-seg	1059.71	1102.05	42.342	ms
100th percentile latency	desc-sort-with-after-timestamp-after-force-merge-1-seg	1073.87	1107.76	33.8872	ms
50th percentile service time	desc-sort-with-after-timestamp-after-force-merge-1-seg	926.498	975.437	48.9392	ms
90th percentile service time	desc-sort-with-after-timestamp-after-force-merge-1-seg	983.137	1053.57	70.4342	ms
99th percentile service time	desc-sort-with-after-timestamp-after-force-merge-1-seg	1057.37	1099.72	42.3518	ms
100th percentile service time	desc-sort-with-after-timestamp-after-force-merge-1-seg	1072.36	1104.98	32.6262	ms
error rate	desc-sort-with-after-timestamp-after-force-merge-1-seg	0	0	0	%
Min Throughput	asc-sort-with-after-timestamp-after-force-merge-1-seg	0.502249	0.502145	-0.0001	ops/s
Mean Throughput	asc-sort-with-after-timestamp-after-force-merge-1-seg	0.505902	0.505626	-0.00028	ops/s
Median Throughput	asc-sort-with-after-timestamp-after-force-merge-1-seg	0.504106	0.503913	-0.00019	ops/s
Max Throughput	asc-sort-with-after-timestamp-after-force-merge-1-seg	0.523439	0.522306	-0.00113	ops/s
50th percentile latency	asc-sort-with-after-timestamp-after-force-merge-1-seg	998.301	1058.08	59.7813	ms
90th percentile latency	asc-sort-with-after-timestamp-after-force-merge-1-seg	1054.35	1098.29	43.9415	ms
99th percentile latency	asc-sort-with-after-timestamp-after-force-merge-1-seg	1141.02	1120.78	-20.236	ms
100th percentile latency	asc-sort-with-after-timestamp-after-force-merge-1-seg	1155.65	1123.34	-32.3125	ms
50th percentile service time	asc-sort-with-after-timestamp-after-force-merge-1-seg	996.547	1055.03	58.4851	ms
90th percentile service time	asc-sort-with-after-timestamp-after-force-merge-1-seg	1052.17	1096	43.8278	ms
99th percentile service time	asc-sort-with-after-timestamp-after-force-merge-1-seg	1137.57	1118.51	-19.0669	ms
100th percentile service time	asc-sort-with-after-timestamp-after-force-merge-1-seg	1153.22	1120.02	-33.1953	ms
error rate	asc-sort-with-after-timestamp-after-force-merge-1-seg	0	0	0	%

[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-threadpool.html#fixed-auto-queue-size

dblock · 2022-05-26T13:00:14Z

... any visible backwards incompatible changes,

There is only one - removal of the deprecated properties for SEARCH / SEARCH_THROTTLED pool [1].
thread_pool:
    search:
        size: 30
        queue_size: 500
        min_queue_size: 10  --> removed
        max_queue_size: 1000  --> removed
        auto_queue_frame_size: 2000  --> removed
        target_response_time: 1s  --> removed
If you think this is a backward compatibility concern, we could still bring the change to 2.1.0 but without changing SEARCH / SEARCH_THROTTLED pool types. Would it make sense?

So, if a user has this in a config, will it break? Or just not use these? It's okay to deprecate settings with warnings (e.g. this setting is no longer used), but users' existing configuration should continue loading.

reta · 2022-05-26T15:13:29Z

So, if a user has this in a config, will it break?

Yes

It's okay to deprecate settings with warnings (e.g. this setting is no longer used), but users' existing configuration should continue loading.

They already are deprecated (and should be gone in 2.0 technically)

rguo-aws added 2 commits May 6, 2022 04:11

Make search/write queue resizable

0b5e342

Signed-off-by: Ruizhen <ruizhen@amazon.com>

Address PR comments

fcffcf8

Signed-off-by: Ruizhen <ruizhen@amazon.com>

reta changed the title ~~Add resizable write/search queue to OpenSearch (picking up #826)~~ Add resizable search queue to OpenSearch (picking up #826) May 5, 2022

reta added the v3.0.0 Issues and PRs related to version 3.0.0 label May 5, 2022

reta force-pushed the issue-476 branch from 5a25457 to 15ff5f0 Compare May 5, 2022 20:48

Bukhtawar reviewed May 6, 2022

View reviewed changes

server/src/main/java/org/opensearch/threadpool/ThreadPool.java Outdated Show resolved Hide resolved

reta force-pushed the issue-476 branch from 15ff5f0 to 5f64ad0 Compare May 6, 2022 12:56

reta force-pushed the issue-476 branch from 5f64ad0 to edff7a9 Compare May 6, 2022 15:21

reta force-pushed the issue-476 branch from edff7a9 to 67df4c7 Compare May 6, 2022 15:49

reta force-pushed the issue-476 branch from 67df4c7 to 2a15da8 Compare May 6, 2022 16:48

reta marked this pull request as ready for review May 6, 2022 16:49

reta requested a review from a team as a code owner May 6, 2022 16:49

reta mentioned this pull request May 6, 2022

Add resizable write/search queue to OpenSearch #826

Closed

5 tasks

Refactoring resizable queue implementation

44c01c0

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

reta force-pushed the issue-476 branch from 2a15da8 to 44c01c0 Compare May 6, 2022 16:58

saratvemulapalli previously approved these changes May 6, 2022

View reviewed changes

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java Outdated Show resolved Hide resolved

saratvemulapalli self-requested a review May 6, 2022 19:26

Addressing code review comments

e55e711

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Addressing code review comments

cd0f9e4

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

dblock requested a review from Bukhtawar May 10, 2022 15:28

Bukhtawar reviewed May 12, 2022

View reviewed changes

Addressing code review comments

3dfe946

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Bukhtawar approved these changes May 13, 2022

View reviewed changes

owaiskazi19 reviewed May 13, 2022

View reviewed changes

...n/java/org/opensearch/common/util/concurrent/QueueResizableOpenSearchThreadPoolExecutor.java Show resolved Hide resolved

dblock approved these changes May 16, 2022

View reviewed changes

dblock merged commit 38fb1d9 into opensearch-project:main May 16, 2022

reta mentioned this pull request May 19, 2022

[REMOVE] Cleanup deprecated thread pool types (FIXED_AUTO_QUEUE_SIZE) #3369

Merged

5 tasks

		// This is a random starting point alpha. TODO: revisit this with actual testing and/or make it configurable
		double EWMA_ALPHA = 0.3;

Add resizable search queue to OpenSearch (picking up #826) #3207

Add resizable search queue to OpenSearch (picking up #826) #3207

Conversation

reta commented May 5, 2022 • edited Loading

Description

Issues Resolved

Check List

opensearch-ci-bot commented May 5, 2022

opensearch-ci-bot commented May 5, 2022

peterzhuamazon commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

dreamer-89 commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

opensearch-ci-bot commented May 6, 2022

reta commented May 6, 2022

opensearch-ci-bot commented May 7, 2022

reta commented May 7, 2022

opensearch-ci-bot commented May 7, 2022

reta commented May 7, 2022

opensearch-ci-bot commented May 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reta May 12, 2022 • edited Loading

Choose a reason for hiding this comment

opensearch-ci-bot commented May 12, 2022

reta commented May 12, 2022

reta commented May 12, 2022

opensearch-ci-bot commented May 12, 2022

reta commented May 13, 2022

saratvemulapalli commented May 13, 2022

dblock commented May 16, 2022

reta commented May 16, 2022

dblock commented May 16, 2022

reta commented May 16, 2022

reta commented May 25, 2022

dblock commented May 26, 2022

reta commented May 26, 2022

reta commented May 5, 2022 •

edited

Loading

reta May 12, 2022 •

edited

Loading