-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Benchmarking K-NN Plugin and Vectorsearch Workload #103
Comments
@travisbenedict @achitojha Im working on adding training component from k-NN to a custom runner (i.e. train-model) I would like a user to be able to specify the body for the train API in a file and parametrize it from the workload-params: (train-body.json)
I know that this works when defining indices, but is there a way for this work for arbitrary custom json file parameters? Docs: https://opensearch.org/docs/latest/search-plugins/knn/api/#train-model |
I'm not 100% familiar with your usecase but I think you should be able to define this operation in the operations file for your workload and parameterize that. You can see an example of this with the nyc_taxis workload |
Thanks @travisbenedict that makes sense |
I submitted a PR to add index load tests for k-NN into the repo: opensearch-project/k-NN#364. For now, I think it makes sense to keep them in that repo as they will continue to evolve. Please take a look and let me know what you think. |
@travisbenedict I added another PR to add querying functionality to kNN runners and param source: opensearch-project/k-NN#409. I am having an issue with getting the recall metric to show up in the results. Would you be able to take a look? |
Closing this as OSB now supports vectorsearch workload. |
Is your feature request related to a problem? Please describe.
The k-NN plugin adds support for a new field type,
knn_vector
, which can be thought of as an array of floating point numbers. The plugin then also adds support for running approximate k nearest neighbor search on these fields.For benchmarking the plugin, we are interested in several metrics including:
Currently, we have our own custom code to get these metrics: https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool. The reason we decided to build our own tool was that we needed some functionality that was not easily available in Rally/OpenSearch Benchmark: ability to compute the recall from a set of queries, integration of our own custom APIs and metrics, using datasets in alternative forms, etc.
Describe the solution you'd like
We would prefer to use OpenSearch Benchmarks to collect the metrics above so that we don't have to maintain our own tool and that allows customers to not have to adopt another tool other than OpenSearch Benchmarks. I saw #98 was created and I imagine we may need that in order to reach our goal. I would be interested in helping contribute this feature.
Describe alternatives you've considered
The alternative is to continue to use our own benchmarking tool. However, this has several drawbacks mentioned above.
The text was updated successfully, but these errors were encountered: