Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector search workload with no train procedure as default #144

Merged
merged 6 commits into from
Jan 19, 2024

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Nov 22, 2023

Description

Add vector search workload to benchmark performance of indexing and search using knn_vector as field type.

Issues Resolved

#140

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB VijayanB force-pushed the add-knn-vector-workload branch 3 times, most recently from 4cbd141 to d40f7d4 Compare November 22, 2023 23:52
@VijayanB
Copy link
Member Author

This PR contains benchmark workload that was previously added to knn repository . This PR contains only indexing and search component. Other features like training will be added in subsequent PR.

@VijayanB
Copy link
Member Author

VijayanB commented Nov 23, 2023

@rishabh6788 To run this workload, we have dependencies to library like numpy and h5py. Should this be added in this workload or to opensearch-benchmark repository? It is good to be in this repository, provided that while checking out this repository we also install requirements.

@VijayanB VijayanB marked this pull request as draft November 23, 2023 00:06
@VijayanB VijayanB force-pushed the add-knn-vector-workload branch 2 times, most recently from 116f071 to 73425c6 Compare November 23, 2023 00:11
knnvector/README.md Outdated Show resolved Hide resolved
knnvector/runners.py Outdated Show resolved Hide resolved
knnvector/runners.py Outdated Show resolved Hide resolved
knnvector/README.md Outdated Show resolved Hide resolved
@VijayanB
Copy link
Member Author

VijayanB commented Nov 30, 2023

[INFO] Executing test with workload [knnvector], test_procedure [no-train-test] and provision_config_instance ['external'] with version [2.10.0].

[WARNING] indexing_total_time is 568 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 517 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 11 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-target-index                                                    [100% done]
Running create-target-index                                                    [100% done]
Running wait-for-cluster-to-be-green                                           [100% done]
Running custom-vector-bulk                                                     [100% done]
Running force-merge-segments                                                   [100% done]
Running refresh-target-index                                                   [100% done]
Running warmup-indices                                                         [100% done]
Running prod-queries                                                           [100% done]

(venv) ➜  opensearch-benchmark git:(main) ✗ curl http://localhost:9200/_cat/indices\?v
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   target_index              OZ6FQEjHQhafoE8CTFvVPA   3   1     100000            0    146.6mb        146.6mb
green  open   .plugins-ml-config        fgiAUaxuQPa7ShZFMnw6Vg   1   0          1            0      3.9kb          3.9kb
green  open   .opensearch-observability k7ccOjOhRtaS-Y9Yx1kdmA   1   0          0            0       208b           208b

When using default num of segments

(venv) ➜  opensearch-benchmark git:(main) ✗ curl http://localhost:9200/_cat/segments\?v
index              shard prirep ip         segment generation docs.count docs.deleted   size size.memory committed searchable version compound
target_index       0     p      172.17.0.2 _5               5      33162            0 48.6mb           0 true      true       9.7.0   false
target_index       1     p      172.17.0.2 _4               4      33324            0 48.9mb           0 true      true       9.7.0   false
target_index       2     p      172.17.0.2 _3               3      33514            0 49.1mb           0 true      true       9.7.0   false
.plugins-ml-config 0     p      172.17.0.2 _0               0          1            0  3.6kb           0 true      true       9.7.0   true

(venv) ➜  opensearch-benchmark git:(main) ✗ curl http://localhost:9200/_plugins/_knn/stats\?pretty              
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "docker-cluster",
  "circuit_breaker_triggered" : false,
  "model_index_status" : null,
  "nodes" : {
    "PrEm8YY6QQ6CNN9HkMUOJQ" : {
      "graph_memory_usage_percentage" : 1.8404406,
      "graph_query_requests" : 960000,
      "graph_memory_usage" : 65308,
      "cache_capacity_reached" : false,
      "load_success_count" : 96,
      "training_memory_usage" : 0,
      "indices_in_cache" : {
        "target_index" : {
          "graph_memory_usage_percentage" : 1.8404406,
          "graph_memory_usage" : 65308,
          "graph_count" : 3
        }
      },
      "script_query_errors" : 0,
      "hit_count" : 960000,
      "knn_query_requests" : 120000,
      "total_load_time" : 224553968,
      "miss_count" : 96,
      "knn_query_with_filter_requests" : 0,
      "training_memory_usage_percentage" : 0.0,
      "lucene_initialized" : false,
      "graph_index_requests" : 108,
      "faiss_initialized" : false,
      "load_exception_count" : 0,
      "training_errors" : 0,
      "eviction_count" : 0,
      "nmslib_initialized" : true,
      "script_compilations" : 0,
      "script_query_requests" : 0,
      "graph_query_errors" : 0,
      "indexing_from_model_degraded" : false,
      "graph_index_errors" : 0,
      "training_requests" : 0,
      "script_compilation_errors" : 0
    }
  }
}

@VijayanB VijayanB force-pushed the add-knn-vector-workload branch from 73425c6 to 0534761 Compare November 30, 2023 21:39
@VijayanB VijayanB marked this pull request as ready for review November 30, 2023 21:49
knnvector/params_sources.py Outdated Show resolved Hide resolved
knnvector/params_sources.py Outdated Show resolved Hide resolved
knnvector/runners.py Outdated Show resolved Hide resolved
knnvector/runners.py Outdated Show resolved Hide resolved
knnvector/test_procedures/default.json Outdated Show resolved Hide resolved
@VijayanB VijayanB force-pushed the add-knn-vector-workload branch 3 times, most recently from 33a70b9 to 6f40d5f Compare December 6, 2023 23:24
@VijayanB VijayanB requested a review from jmazanec15 December 7, 2023 04:35
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
@VijayanB VijayanB force-pushed the add-knn-vector-workload branch from 6bc2022 to 48b20ff Compare January 9, 2024 20:01
@IanHoang
Copy link
Collaborator

IanHoang commented Jan 9, 2024

There is no coupling between params file name and workload. I named it such a way that it gives hint on selected param values .

@VijayanB I understand now. To clarify this, could you add a section in the README stating that the files faiss-sift-128-l2 and nmslib-sift-128-l2 are sample params that can be used in the workload and can be used as reference params file for users who want to make their own custom params file.
EDIT: See that it was added now 👍🏻

@VijayanB
Copy link
Member Author

There is no coupling between params file name and workload. I named it such a way that it gives hint on selected param values .

@VijayanB I understand now. To clarify this, could you add a section in the README stating that the files faiss-sift-128-l2 and nmslib-sift-128-l2 are sample params that can be used in the workload and can be used as reference params file for users who want to make their own custom params file. EDIT: See that it was added now 👍🏻

@IanHoang Any pending comments needs to be addressed?

@VijayanB
Copy link
Member Author

@gkamat can you take a look at this PR? Thanks

vectorsearch/README.md Outdated Show resolved Hide resolved
vectorsearch/README.md Outdated Show resolved Hide resolved
Comment on lines 17 to 18
Currently, we support one test procedures for the vector search workload:
no-train-test that does not have steps to train a model included in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support only one test procedure for the vector search workload. This is named no-train-test and does not include the steps required to train the model being used.

Please indicate how the training steps are supposed to be carried out. Or if the expectation is that the workload is to be run on an untrained system, please clarify.

Copy link
Member Author

@VijayanB VijayanB Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkamat I will update the text and will add new procedure which can use model in future. Do you recommend to mention this future work in README?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be ideal. Subsequently, you can update the writeup when the new procedure gets added.

vectorsearch/README.md Outdated Show resolved Hide resolved
vectorsearch/README.md Outdated Show resolved Hide resolved
vectorsearch/runners.py Outdated Show resolved Hide resolved
vectorsearch/runners.py Show resolved Hide resolved
vectorsearch/test_procedures/default.json Outdated Show resolved Hide resolved
vectorsearch/test_procedures/default.json Outdated Show resolved Hide resolved
vectorsearch/workload.json Outdated Show resolved Hide resolved
@IanHoang IanHoang added backport 2 Backport to the "2" branch backport 1 labels Jan 17, 2024
@VijayanB VijayanB force-pushed the add-knn-vector-workload branch from 371fc9c to 026ecc5 Compare January 18, 2024 00:05
Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
@VijayanB VijayanB force-pushed the add-knn-vector-workload branch from 026ecc5 to 6835e95 Compare January 18, 2024 00:39
@VijayanB VijayanB requested a review from gkamat January 18, 2024 00:41
@gkamat
Copy link
Collaborator

gkamat commented Jan 18, 2024

Please confirm this is intended for backport to both the 1 and 2 branches.

Copy link
Collaborator

@gkamat gkamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm the backport labels are set correctly before merging. Thanks.

@VijayanB
Copy link
Member Author

@gkamat yes, This is supported for both OpenSearch 1.x and 2.x

@IanHoang IanHoang merged commit bdbd4bb into opensearch-project:main Jan 19, 2024
3 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 19, 2024
* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Update README

Update readme to include how to execute this workload.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Rename knnvector to vectorsearch

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add lucene engine

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* fix code review comments

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

---------

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
(cherry picked from commit bdbd4bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 19, 2024
* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Update README

Update readme to include how to execute this workload.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Rename knnvector to vectorsearch

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add lucene engine

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* fix code review comments

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

---------

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
(cherry picked from commit bdbd4bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
IanHoang pushed a commit that referenced this pull request Jan 19, 2024
…156)

* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.



* Update README

Update readme to include how to execute this workload.



* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type



* Rename knnvector to vectorsearch



* Add lucene engine



* fix code review comments



---------


(cherry picked from commit bdbd4bb)

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
IanHoang pushed a commit that referenced this pull request Jan 19, 2024
…157)

* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.



* Update README

Update readme to include how to execute this workload.



* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type



* Rename knnvector to vectorsearch



* Add lucene engine



* fix code review comments



---------


(cherry picked from commit bdbd4bb)

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@gkamat gkamat added the backport 3 Backport to the "3" branch label Jan 31, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 31, 2024
* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Update README

Update readme to include how to execute this workload.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Rename knnvector to vectorsearch

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add lucene engine

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* fix code review comments

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

---------

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
(cherry picked from commit bdbd4bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gkamat pushed a commit that referenced this pull request Jan 31, 2024
* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Update README

Update readme to include how to execute this workload.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Rename knnvector to vectorsearch

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add lucene engine

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* fix code review comments

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

---------

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
(cherry picked from commit bdbd4bb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
harshavamsi pushed a commit to harshavamsi/opensearch-benchmark-workloads that referenced this pull request Mar 5, 2024
…arch-project#144)

* Add knnvector as new workload

Create new workload to benchmark performacne of knn_vector
field type.
Added unit test and procedure for notrain.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Update README

Update readme to include how to execute this workload.

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add new param file faiss enginge

Added new param file to index/search vector search using
faiss as engine type

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Rename knnvector to vectorsearch

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* Add lucene engine

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

* fix code review comments

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>

---------

Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1 backport 2 Backport to the "2" branch backport 3 Backport to the "3" branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants