Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for OpenSearch language clients #381

Open
dblock opened this issue Oct 3, 2023 · 7 comments
Open

Add benchmarks for OpenSearch language clients #381

dblock opened this issue Oct 3, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@dblock
Copy link
Member

dblock commented Oct 3, 2023

Is your feature request related to a problem? Please describe.

Coming from opensearch-project/opensearch-py#446, we are seeing that at least the Python client may be a bottleneck for data ingestion. We can't fix what we cannot measure, the ask is to add and publish daily benchmarks for OpenSearch clients.

Describe the solution you'd like

I would like to understand how different language clients compare to each-other in terms of performance, then have the ability to vary various client configurations (e.g. Python has a sync and an async mode, Ruby has 3.x vs. JRuby, etc .) to compare throughput, CPU, memory, etc. I also want to find the breaking points at which the network becomes the bottleneck (aka the max amount of data a client can push through).

Describe alternatives you've considered

We have been isolating and fixing individual problems ad-hoc without benchmarks or with anecdotal, homegrown and local benchmarks.

@dblock dblock added the enhancement New feature or request label Oct 3, 2023
@dblock dblock changed the title Benchmark OpenSearch clients Add benchmarks for OpenSearch language clients Oct 3, 2023
@IanHoang IanHoang removed the untriaged label Oct 4, 2023
@IanHoang
Copy link
Collaborator

IanHoang commented Oct 4, 2023

@dblock Thanks for bringing this up! OpenSearch-Benchmark is heavily reliant on opensearch-py client and it'd be useful to know if there are bottlenecks happening in the client itself. Based on the solution you described, we could implement an additional parameter that specifies the type of client and client configuration to use (similar to how users can specify --provision-config-instance to describe which OpenSearch configuration they want the test to use). These tests would operate like any other test run but its results would include additional metrics on the client's throughput / latency. Opensearch-py would be the quickest to implement since it's already incorporated into OSB.

To get a better idea of how we should design this solution, we can work with @navneet1v to thoroughly understand how he constructed his tests in opensearch-project/opensearch-py#446.

@navneet1v
Copy link

@IanHoang as per my understanding of OpenSearch-Benchmark repo, we use Multi Processing to spin up the clients. But I am not after that what happens. It would be interesting to see how load is being generated from each of those client.

@navneet1v
Copy link

To get a better idea of how we should design this solution, we can work with @navneet1v to thoroughly understand how he constructed his tests in opensearch-project/opensearch-py#446.

Sure let me know if you have any questions.

@gkamat
Copy link
Collaborator

gkamat commented Oct 6, 2023

The appropriate solution to handle this is to refactor OSB to behave similarly to YCSB, the de-facto benchmarking tool for K-V stores. This has a core performance measurement component and plugins for various database implementations. OSB should interface with clients in the same manner. At a higher-level, the core of OSB should not be tied to OpenSearch either; it should be capable of operating on alternatives like Datadog, Grafana, Splunk, etc.

To address the need expressed in this issue, there can be instrumentation added at the point OSB calls in the Python HTTP requests library to get some understanding of the overhead, but separating out network latency from the server overhead will need to utilize the "took" time the server reports. This needs to be looked into further. Another aspect of the ask is to enable OSB to be used as a load testing tool; there are other issues related to this as well.

@navneet1v, OSB uses the Thespian library for multiprocessing.

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Oct 20, 2023

Just a thought if we consider using opensearch-benchmark for benchmarking clients, then the logic in benchmark should be changed to directly query the OpenSearch API rather than going through the opensearch-py client, this way the queries/responses are directly from/to OpenSearch. Now for clients, the benchmark should use a universal framework thats support all clients, and calls to OpenSearch should go via the client using this framework, then it would be right measure to benchmark the clients.

Screenshot 2023-10-19 at 8 42 43 PM

@wbeckler
Copy link

wbeckler commented Mar 4, 2024

In the above diagram, would the current implementation be the top line+arrow between opensearch-benchmark and opensearch? And would all non-python clients go into the Client Benchmark Framework?

@saimedhi
Copy link
Contributor

saimedhi commented Mar 5, 2024

Could I kindly request your thoughts on this proposal, @bbarani ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

7 participants