Support Benchmarking K-NN Plugin and Vectorsearch Workload #103

jmazanec15 · 2022-01-12T23:25:07Z

Is your feature request related to a problem? Please describe.

The k-NN plugin adds support for a new field type, knn_vector, which can be thought of as an array of floating point numbers. The plugin then also adds support for running approximate k nearest neighbor search on these fields.

For benchmarking the plugin, we are interested in several metrics including:

Query latency
Indexing throughput
Refresh time
Recall (the ratio of neighbors returned by an Approximate search that are actually in the ground truth nearest neighbors)
Training latency
Native memory footprint
Disk utilization

Currently, we have our own custom code to get these metrics: https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool. The reason we decided to build our own tool was that we needed some functionality that was not easily available in Rally/OpenSearch Benchmark: ability to compute the recall from a set of queries, integration of our own custom APIs and metrics, using datasets in alternative forms, etc.

Describe the solution you'd like

We would prefer to use OpenSearch Benchmarks to collect the metrics above so that we don't have to maintain our own tool and that allows customers to not have to adopt another tool other than OpenSearch Benchmarks. I saw #98 was created and I imagine we may need that in order to reach our goal. I would be interested in helping contribute this feature.

Describe alternatives you've considered

The alternative is to continue to use our own benchmarking tool. However, this has several drawbacks mentioned above.

The text was updated successfully, but these errors were encountered:

jmazanec15 · 2022-03-30T22:01:17Z

@travisbenedict @achitojha Im working on adding training component from k-NN to a custom runner (i.e. train-model)

I would like a user to be able to specify the body for the train API in a file and parametrize it from the workload-params:

(train-body.json)

{
    "training_index": " {{ training_index }}",
    "training_field": " {{ training_field }}",
    "dimension":  {{ dimension] }},
    "max_training_vector_count":  {{ max_training_vector_count | default(8) }},
    "search_size":  {{ search_size | default(8) }},
    "description": "My model",
    "method": {
        "name":"ivf",
        "engine":"faiss",
        "space_type": "l2",
        "parameters":{
            "nlists": {{ nlists | default(8) }},
            "encoder":{
                "name":"pq",
                "parameters":{
                    "code_size": {{ code_size | default(8) }}
                }
            }
        }
    }
}

I know that this works when defining indices, but is there a way for this work for arbitrary custom json file parameters?

Docs: https://opensearch.org/docs/latest/search-plugins/knn/api/#train-model

travisbenedict · 2022-03-31T16:57:13Z

I'm not 100% familiar with your usecase but I think you should be able to define this operation in the operations file for your workload and parameterize that. You can see an example of this with the nyc_taxis workload

jmazanec15 · 2022-03-31T18:21:31Z

Thanks @travisbenedict that makes sense

jmazanec15 · 2022-04-15T18:40:51Z

I submitted a PR to add index load tests for k-NN into the repo: opensearch-project/k-NN#364. For now, I think it makes sense to keep them in that repo as they will continue to evolve. Please take a look and let me know what you think.

jmazanec15 · 2022-05-23T17:05:40Z

@travisbenedict I added another PR to add querying functionality to kNN runners and param source: opensearch-project/k-NN#409. I am having an issue with getting the recall metric to show up in the results. Would you be able to take a look?

IanHoang · 2024-12-11T19:47:09Z

Closing this as OSB now supports vectorsearch workload.

jmazanec15 added the enhancement New feature or request label Jan 12, 2022

jmazanec15 mentioned this issue Mar 30, 2022

Move benchmarking to OpenSearch Benchmarks opensearch-project/k-NN#341

Closed

jmazanec15 mentioned this issue May 31, 2022

Make output metrics extendable #199

Open

dblock mentioned this issue Sep 8, 2023

Track and report managed/native memory footprint #369

Open

This was referenced Dec 12, 2023

Add vector search as new operation type to search #423

Merged

Add dataset parser for vector search #424

Merged

This was referenced Dec 20, 2023

Add vector search param source #425

Merged

Add support to parse response if fields or _source are enabled #427

Merged

Release 1.20 ( payload with vector search ) #428

Closed

VijayanB mentioned this issue Dec 28, 2023

Add vector search bulk param and runner #431

Merged

1 task

IanHoang changed the title ~~Support benchmarking k-NN plugin~~ Support Benchmarking K-NN Plugin and Vector Workload Apr 10, 2024

IanHoang changed the title ~~Support Benchmarking K-NN Plugin and Vector Workload~~ Support Benchmarking K-NN Plugin and Vectorsearch Workload Apr 10, 2024

dblock mentioned this issue May 3, 2024

OpenSearch Performance Experiments Results opensearch-project/OpenSearch#2461

Closed

github-project-automation bot added this to OpenSearch Benchmark Roadmap Aug 30, 2024

github-project-automation bot moved this to Roadmap Project Backlog in OpenSearch Benchmark Roadmap Aug 30, 2024

gkamat added the Medium Priority label Sep 13, 2024

IanHoang closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Benchmarking K-NN Plugin and Vectorsearch Workload #103

Support Benchmarking K-NN Plugin and Vectorsearch Workload #103

jmazanec15 commented Jan 12, 2022

jmazanec15 commented Mar 30, 2022

travisbenedict commented Mar 31, 2022

jmazanec15 commented Mar 31, 2022

jmazanec15 commented Apr 15, 2022

jmazanec15 commented May 23, 2022

IanHoang commented Dec 11, 2024

Support Benchmarking K-NN Plugin and Vectorsearch Workload #103

Support Benchmarking K-NN Plugin and Vectorsearch Workload #103

Comments

jmazanec15 commented Jan 12, 2022

jmazanec15 commented Mar 30, 2022

travisbenedict commented Mar 31, 2022

jmazanec15 commented Mar 31, 2022

jmazanec15 commented Apr 15, 2022

jmazanec15 commented May 23, 2022

IanHoang commented Dec 11, 2024