-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add option to provision synchronously #967
Comments
Thanks @dbwiddis, this approach will help simplify provisioning/automation of opensearch resources with minimal client side code. Few follow up questions:
|
Probably the standard OpenSearch default timeout for Rest Requests. We can handle timeout any way we want: cancelling the futures of a workflow in progress will probably suffice. Note that some workflow steps in progress may continue even after a cancellation but the overall workflow would stop executing.
Yes.
Sounds reasonable to provide the same return value as workflow status API. |
Can we rollback partially provisioned resources in case of failure? |
The deprovision API will do that. We have not yet added an auto-rollback capability, which would be equally appropriate for a failed async provision. Also, regarding cancellation, if we tried an immediate rollback it may not catch all the in-progress resources. For example, say we registering and deploying a local model and then creating an agent. Assume registering completes successfully but the deploy step times out because it's a very large model. Registration would create the model resource. Upon failure (the timeout), all the futures would be cancelled, meaning the agent would never run. However, the model deployment would eventually probably complete. If we tried to deprovision immediately we'd only see the registered model. (I'm not sure what happens if we try to delete a model which is in the process of deploying?) If we wait for the step to complete we might have it deployed. In that case you'd have both the register and deploy "resources" and you could successfully deprovision with an undeploy/delete. This is just one simple example, it can get more complex. Which is why we haven't gotten to it yet. |
[Catch All Triage - 1, 2, 3, 4] |
Is your feature request related to a problem?
Presently, when provisioning a workflow (via either the provision API, create API with provision or param, the REST call returns immediately with a 200 (OK) response, but the caller must then poll the Workflow Status API to monitor the status of provisioning.
This asynchronous execution of provisioning was intentional to provide the ability for a front end to obtain status throughout provisioning, possibly including a progress bar or similar, and because some provisioning processes take longer than the expected time for a REST response.
However, there are some use cases where the user may be willing to wait for a completed response, and not have to poll. This would be particularly useful in cases similar to the ML Commons Remote Model deployment which provides such a synchronous API.
What solution would you like?
Add optional parameters to the create and provision work flow APIs to wait for the request to complete, with a timeout. Other OpenSearchAPIs use
wait_for_completion
andwait_for_completion_timeout
so I'd suggest these names.Alter the Provision Workflow Transport action, when this parameter is present, to wait to return until provisioning is complete (or the timeout).
What alternatives have you considered?
A separate wrapper API that does the retries internally.
Do you have any additional context?
This would be a much simpler approach for automation tools, that would not require them to code all the polling themselves.
The text was updated successfully, but these errors were encountered: