Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] A tool to help decide the optimal batch size for ingestion with neural search processors #655

Closed
chishui opened this issue Mar 28, 2024 · 2 comments

Comments

@chishui
Copy link
Contributor

chishui commented Mar 28, 2024

Is your feature request related to a problem?

In this batch ingestion RFC, we proposed a batch ingestion feature which could accelerate the ingestion with neural search processors. It introduces an additional parameter "batch size" that texts from different documents could be combined and sent to ML server in one request. Since user could have different data set, different ML servers with different resources, in order to achieve better performance, they would need to experiment with different value of batch size to get the optimal performance. To offload the burden from user, we'd like to have a automation tool which could find this optimal batch size automatically.

What solution would you like?

The automation tool would run bulk index with different batch size to see which batch size would lead to optimal performance (high throughput & low latency & no errors). The OpenSearch-benchmark tool already provides rich features on benchmark which we could utilize for this automation. We can call benchmark with different parameter, collect and evaluate results then provide the recommendation.

The tool can be made to help select bulk size and client number as well which could be supported in the future phase.

What alternatives have you considered?

No alternatives.

Do you have any additional context?

No

@navneet1v
Copy link
Collaborator

@chishui this is an interesting feature, and +1 on building such a tool. I would love to see more details around this tool to be added in the issue description(something like an RFC).

@chishui
Copy link
Contributor Author

chishui commented Apr 2, 2024

Closing this feature request and in favor of the one created in OpenSearch repo to get more attention.

opensearch-project/OpenSearch#13009

@chishui chishui closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants