Add benchmarks for OpenSearch language clients #381

dblock · 2023-10-03T21:33:24Z

Is your feature request related to a problem? Please describe.

Coming from opensearch-project/opensearch-py#446, we are seeing that at least the Python client may be a bottleneck for data ingestion. We can't fix what we cannot measure, the ask is to add and publish daily benchmarks for OpenSearch clients.

Describe the solution you'd like

I would like to understand how different language clients compare to each-other in terms of performance, then have the ability to vary various client configurations (e.g. Python has a sync and an async mode, Ruby has 3.x vs. JRuby, etc .) to compare throughput, CPU, memory, etc. I also want to find the breaking points at which the network becomes the bottleneck (aka the max amount of data a client can push through).

Describe alternatives you've considered

We have been isolating and fixing individual problems ad-hoc without benchmarks or with anecdotal, homegrown and local benchmarks.

IanHoang · 2023-10-04T16:45:58Z

@dblock Thanks for bringing this up! OpenSearch-Benchmark is heavily reliant on opensearch-py client and it'd be useful to know if there are bottlenecks happening in the client itself. Based on the solution you described, we could implement an additional parameter that specifies the type of client and client configuration to use (similar to how users can specify --provision-config-instance to describe which OpenSearch configuration they want the test to use). These tests would operate like any other test run but its results would include additional metrics on the client's throughput / latency. Opensearch-py would be the quickest to implement since it's already incorporated into OSB.

To get a better idea of how we should design this solution, we can work with @navneet1v to thoroughly understand how he constructed his tests in opensearch-project/opensearch-py#446.

navneet1v · 2023-10-04T16:58:21Z

@IanHoang as per my understanding of OpenSearch-Benchmark repo, we use Multi Processing to spin up the clients. But I am not after that what happens. It would be interesting to see how load is being generated from each of those client.

navneet1v · 2023-10-04T16:58:38Z

To get a better idea of how we should design this solution, we can work with @navneet1v to thoroughly understand how he constructed his tests in opensearch-project/opensearch-py#446.

Sure let me know if you have any questions.

gkamat · 2023-10-06T16:46:24Z

The appropriate solution to handle this is to refactor OSB to behave similarly to YCSB, the de-facto benchmarking tool for K-V stores. This has a core performance measurement component and plugins for various database implementations. OSB should interface with clients in the same manner. At a higher-level, the core of OSB should not be tied to OpenSearch either; it should be capable of operating on alternatives like Datadog, Grafana, Splunk, etc.

To address the need expressed in this issue, there can be instrumentation added at the point OSB calls in the Python HTTP requests library to get some understanding of the overhead, but separating out network latency from the server overhead will need to utilize the "took" time the server reports. This needs to be looked into further. Another aspect of the ask is to enable OSB to be used as a load testing tool; there are other issues related to this as well.

@navneet1v, OSB uses the Thespian library for multiprocessing.

prudhvigodithi · 2023-10-20T03:46:20Z

Just a thought if we consider using opensearch-benchmark for benchmarking clients, then the logic in benchmark should be changed to directly query the OpenSearch API rather than going through the opensearch-py client, this way the queries/responses are directly from/to OpenSearch. Now for clients, the benchmark should use a universal framework thats support all clients, and calls to OpenSearch should go via the client using this framework, then it would be right measure to benchmark the clients.

wbeckler · 2024-03-04T22:15:28Z

In the above diagram, would the current implementation be the top line+arrow between opensearch-benchmark and opensearch? And would all non-python clients go into the Client Benchmark Framework?

saimedhi · 2024-03-05T10:09:11Z

Could I kindly request your thoughts on this proposal, @bbarani ?

dblock added the enhancement New feature or request label Oct 3, 2023

dblock changed the title ~~Benchmark OpenSearch clients~~ Add benchmarks for OpenSearch language clients Oct 3, 2023

github-actions bot added the untriaged label Oct 3, 2023

IanHoang removed the untriaged label Oct 4, 2023

github-project-automation bot added this to OpenSearch Benchmark Roadmap Aug 30, 2024

github-project-automation bot moved this to Roadmap Project Backlog in OpenSearch Benchmark Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for OpenSearch language clients #381

Add benchmarks for OpenSearch language clients #381

dblock commented Oct 3, 2023 •

edited

Loading

IanHoang commented Oct 4, 2023 •

edited

Loading

navneet1v commented Oct 4, 2023

navneet1v commented Oct 4, 2023

gkamat commented Oct 6, 2023

prudhvigodithi commented Oct 20, 2023 •

edited

Loading

wbeckler commented Mar 4, 2024

saimedhi commented Mar 5, 2024

Add benchmarks for OpenSearch language clients #381

Add benchmarks for OpenSearch language clients #381

Comments

dblock commented Oct 3, 2023 • edited Loading

IanHoang commented Oct 4, 2023 • edited Loading

navneet1v commented Oct 4, 2023

navneet1v commented Oct 4, 2023

gkamat commented Oct 6, 2023

prudhvigodithi commented Oct 20, 2023 • edited Loading

wbeckler commented Mar 4, 2024

saimedhi commented Mar 5, 2024

dblock commented Oct 3, 2023 •

edited

Loading

IanHoang commented Oct 4, 2023 •

edited

Loading

prudhvigodithi commented Oct 20, 2023 •

edited

Loading