-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Concurrent Segment Search][META] Performance benchmark plan #9049
Comments
@reta @anasalkouz @andrross Would like to get your feedback on this. |
@sohami Is this correct? The nyc_taxis workload does perform a force merge, but the |
@sohami Digging a bit into this, AWS recommends 10-30GB for search workloads and 30-50GB for a logs workload. Similarly, Opster gives a general guidance of 30-50GB. Given this, I would suggest using the |
Thanks reworded, didn't mean to say that benchmark is performing force merge to 1 by default.
Yes that is the idea. I was planning to use |
@sohami Yes, of course you're right. The My main concern with |
@sohami very curious if concurrent search will help http_logs workload. Specially |
Thanks @sohami
I am 100% agree with you here - the single shard scenario is highly unrealistic, I would suggest to exclude it from the benchmarking (it could be useful for troubleshooting fe but this workload over such configuration is non representative).
Do you mean
I think |
The main goal is to exercise different query types (like range/term/aggregations) and use the workloads available in OSB. These operations should be common across search/log analytics use cases. Since |
@reta What I mean by this is with single shard setup we will get the best possible improvement with concurrent execution. So it is important to understand that and see the behavior for benchmark. Also it is not entirely unrealistic as for some of the search use cases users assign 1 shard per node in their setup. For understanding behavior with multiple shards being searched on a node, we will have multiple client sending search request to single shard and say with CPU utilization at 50%. We can use that to run the same workload with concurrent search enabled and see the behavior. Having said that we will also run single client and multiple shards on a node scenarios too and then searching on all the shards. The expectation is with shard per node > 5 we should ideally see the latency improvement to multiply as in that case for each search request multiple round trips will be made.
No I meant search pool, currently search threadpool is set to 1.5x processor count and if all the threads are busy then it ends up consuming all the available cores and reaching CPU utilization of ~100%. I want to see if we vary the search pool and with concurrent search enabled how system behaves w.r.t default setup.
Will take a look at it |
Thanks @sohami
👍 We definitely should include the index searcher sizing as well I think, since this is the one index searcher will use. |
We enabled this on our SIEM-cluster while trying to improve the performance, see also We saw throughput rise with ±45% (400 -> 580MB/s) and haven't encountered any adverse effects as far as I can tell. Response times in Dashboards are notably faster (no formal numbers, just a users experience and I'm sensitive to lag :) ) I don't have end to end performance traces but from experience I suspect I can extract more performance from a single search node when I could up the amount of parallelism against our S3 solution. |
This issue captures the different performance benchmarks which we plan to do as part of evaluating concurrent segment search which is tracked as part of project board here. We will gather feedback from the community to see if we are missing anything that needs to be considered.
Objective:
Benchmark Plan:
Overview
For concurrent segment search, the concurrency part is introduced for the shard level request and in the query phase. To get the baseline improvement we can use setup with single shard index (of varying shard sizes). With varying number of the search clients sending the request to this single shard we can achieve both of the below as each request is independent of each other.
Note: The improvement at per shard level may not be the actual perceived latency improvement by the user. In real world, each top level search request usually will touch multiple shards. The wider the query the better the improvement overall will be as there can be multiple round trips (based on 5 shard per node limit) at the request level. So E2E latency will show better results compared to per shard level index result. This can be found by performing the benchmark comparison with multiple indices of single shard on a node (like 5/10/15 shards) or by extrapolating the baseline number obtained from single shard use case as well.
Things to consider during benchmark setup:
nyc_taxis
we can merge to 20 segments across different setupTest Dimensions
Performance Test Scenarios:
search_after
andsort
queries with concurrent searchterminate_after
clause and compare it with concurrent search disabled scenario. We expectterminate_after
to perform more work with concurrent search, so we should see if there is any regression for such cases.The text was updated successfully, but these errors were encountered: