-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks for OpenSearch language clients #381
Comments
@dblock Thanks for bringing this up! OpenSearch-Benchmark is heavily reliant on opensearch-py client and it'd be useful to know if there are bottlenecks happening in the client itself. Based on the solution you described, we could implement an additional parameter that specifies the type of client and client configuration to use (similar to how users can specify To get a better idea of how we should design this solution, we can work with @navneet1v to thoroughly understand how he constructed his tests in opensearch-project/opensearch-py#446. |
@IanHoang as per my understanding of OpenSearch-Benchmark repo, we use Multi Processing to spin up the clients. But I am not after that what happens. It would be interesting to see how load is being generated from each of those client. |
Sure let me know if you have any questions. |
The appropriate solution to handle this is to refactor OSB to behave similarly to YCSB, the de-facto benchmarking tool for K-V stores. This has a core performance measurement component and plugins for various database implementations. OSB should interface with clients in the same manner. At a higher-level, the core of OSB should not be tied to OpenSearch either; it should be capable of operating on alternatives like Datadog, Grafana, Splunk, etc. To address the need expressed in this issue, there can be instrumentation added at the point OSB calls in the Python HTTP requests library to get some understanding of the overhead, but separating out network latency from the server overhead will need to utilize the "took" time the server reports. This needs to be looked into further. Another aspect of the ask is to enable OSB to be used as a load testing tool; there are other issues related to this as well. @navneet1v, OSB uses the Thespian library for multiprocessing. |
In the above diagram, would the current implementation be the top line+arrow between opensearch-benchmark and opensearch? And would all non-python clients go into the Client Benchmark Framework? |
Could I kindly request your thoughts on this proposal, @bbarani ? |
Is your feature request related to a problem? Please describe.
Coming from opensearch-project/opensearch-py#446, we are seeing that at least the Python client may be a bottleneck for data ingestion. We can't fix what we cannot measure, the ask is to add and publish daily benchmarks for OpenSearch clients.
Describe the solution you'd like
I would like to understand how different language clients compare to each-other in terms of performance, then have the ability to vary various client configurations (e.g. Python has a sync and an async mode, Ruby has 3.x vs. JRuby, etc .) to compare throughput, CPU, memory, etc. I also want to find the breaking points at which the network becomes the bottleneck (aka the max amount of data a client can push through).
Describe alternatives you've considered
We have been isolating and fixing individual problems ad-hoc without benchmarks or with anecdotal, homegrown and local benchmarks.
The text was updated successfully, but these errors were encountered: