Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TrainKNN Runner/Operation for Benchmarking Approximate KNN Algorithms #556

Merged
merged 9 commits into from
Jul 18, 2024

Conversation

finnroblin
Copy link
Contributor

Description

Customers may want to benchmark approximate k-NN search algorithms that require a training step. For example, the k-NN plugin with the FAISS engine and IVF method requires a training step to cluster database vectors. Then search can be performed against a smaller number of cluster centroids instead of the entire database.

Benchmarking these algorithms requires a new operation-type to train the model. This PR adds an operation that calls the k-NN Train Model API to train a model and then polls the Get Model API until model training succeeds or fails.

Issues Resolved

This PR is a task towards resolving Issue 332 opened in opensearch-benchmarks-workloads to add a train model benchmark workload. Please see this issue for more context about why this feature is important for improving benchmarking workflow.

Testing

  • New functionality includes unit tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
@@ -651,6 +651,72 @@ def error_description(self, error_details):
def __repr__(self, *args, **kwargs):
return "bulk-index"

class TrainKNNModel(Runner):
"""
Trains model named model_id until training is complete or timeout is exceeded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max retry is reached

model_response = await opensearch.transport.perform_request("GET", model_uri)

if 'state' not in model_response.keys() or current_number_retries > max_retries:
request_context_holder.on_client_request_end()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_context_holder.on_client_request_start() is missing

while True:
model_response = await opensearch.transport.perform_request("GET", model_uri)

if 'state' not in model_response.keys() or current_number_retries > max_retries:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you can combine both the condition. Can you please break it into two condition and raise appropriate exception message?

Comment on lines 711 to 716
def inspect_model_response(model_response):
if model_response['state'] == 'created':
return 1, "models_trained"

if model_response['state'] == 'failed':
raise Exception("Failed to create model: {}".format(model_response))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@@ -746,6 +747,8 @@ def from_hyphenated_string(cls, v):
return OperationType.RegisterMlModel
elif v == "deploy-ml-model":
return OperationType.DeployMlModel
elif v == "train-k-n-n-model":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif v == "train-k-n-n-model":
elif v == "train-knn-model":

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this by changing the name to TrainKnnModel. It seems like hyphens are automatically placed between every capitalized letter.

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
if model_response['state'] == 'created':
self.logger.info(
"Training model [%s] was completed successfully.", model_id)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return instead of break

break

# training failed.
self.logger.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should add model_response['state'] == 'failed' to validate failure condition, and, raise exception if state contains unexpected value. This will help us in case knn decided to add new status

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not mean to include this file, I will discard on the next commit.

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
…xception

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
@finnroblin finnroblin requested a review from VijayanB June 26, 2024 21:30

async def __call__(self, opensearch, params):
model_id = parse_string_parameter("model_id", params)
should_ignore_if_model_DNE = params.get("ignore-if-model-does-not-exist", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
should_ignore_if_model_DNE = params.get("ignore-if-model-does-not-exist", False)
ignore_if_model_does_not_exist = params.get("ignore-if-model-does-not-exist", False)

NAME = "delete-knn-model"

async def __call__(self, opensearch, params):
model_id = parse_string_parameter("model_id", params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check can we use mandatory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these helpers enforce mandatory if they don't have a third argument. So if model_id is not provided parse_string_parameter will throw an error.

and response["status"] == 404
and not should_ignore_if_model_DNE
):
self.logger.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be warning


if (
"error" in response.keys()
and response["status"] == 404
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: create constant for 404


request_context_holder.on_client_request_end()

if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can flip if condition with if response is success return success, and, check for error condition one by one.

Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
@finnroblin finnroblin requested a review from VijayanB June 27, 2024 23:22
@VijayanB
Copy link
Member

VijayanB commented Jul 10, 2024

@finnroblin Looks good to me.

@IanHoang IanHoang merged commit 01346bb into opensearch-project:main Jul 18, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants