Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make some long running operations execute asynchronously and can be tracked by _tasks API #6228

Closed
gaobinlong opened this issue Feb 8, 2023 · 2 comments
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request

Comments

@gaobinlong
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Currently, only reindex, update_by_query and delete_by_query operations can be tracked by the _tasks API, but there are some other long running operations such as shrink/split/clone/open/forcemerge cannot be tracked by _tasks API, users have to wait until the operation is completed when they call shrink or split API. The problem is that we don't have an uniform way to check the status of all these long running operations, it's hard to monitor and manage these operations, and if we want to send notification to the users when these long running operations completes, we have to use different methods to check the status for different operations, which is tricky.

Describe the solution you'd like
I think we can have an uniform way to manage all these long running operations, similar to reindex API, we can add a request parameter wait_for_completion to shrink/split/clone/open/forcemerge APIs, when the parameter is set to false, the API will return a taskId which can be used to check the status of the operation, and the task's result will be recorded into .tasks index. Additionally, some operations may fail because of the settings or the performance of the cluster, so we can add another parameter like task_execution_timeout to set the task's status to failed when the timeout expires, this will reduce some cost of resource because we have to launch a single thread for each operation to check its status. For example, when shrink operation occurs, if some primary shards of the new shrunken index cannot be allocated because of hardware failure, after 1h or 2h, we set the task's status to failed and unregister the task.

At least two benefits we can get if we implement this function:

  1. We can have an uniform way to check the progress of all of the long running operations, includes reindex, shrink, split, force merge, it's easy to monitor the operation's progress and alert the operator if the operation fails or takes too much time to execute. In the long term, OpenSearch-Dashboards will add a Task Management page which list all the running or completed tasks, if we implement this function, the frontend node server only need to call the _tasks API to get all the running tasks.
  2. As all of these long running operations' results are recorded into the .tasks index, we can monitor the .task index, if a new long running operation is completed or failed then we will send notification to the user.

Describe alternatives you've considered
We have to use different method for each long operation to check it's progress, for reindex, we use _tasks/{taskId} API to check the status of the reindex operation; for shrink/split, we use the _recovery API to check the status of all the shards of the new shrunken or split index; for force merge, we do not have a method to check it's progress currently.

Additional context
This issue originates from #5479, the original idea is that we want to have a generic solution to send notification to users when the long running operation completes. After many internal discussion, we think we can make all the long operations can be tracked by _tasks API firstly, then we will have an uniform way to get every operation's result and then send notification. But as said above, this function will also benefit the future feature like Task Management page in OpenSearch-Dashboards.

@gaobinlong gaobinlong added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 8, 2023
@andrross
Copy link
Member

andrross commented Feb 8, 2023

I think it makes sense to have an asynchronous option for any potentially long running operation, and it also makes sense to have a consistent async experience across APIs. Therefore using the existing _tasks API to bring shrink/split/clone/open/forcemerge in line with the reindex API seems like the right thing to do. I would love to hear if anybody has good reasons not to do this as it seems like an obvious win to me.

@dbwiddis dbwiddis added the discuss Issues intended to help drive brainstorming and decision making label Feb 10, 2023
@Poojita-Raj Poojita-Raj added feature New feature or request and removed untriaged labels Feb 21, 2023
gaobinlong added a commit to gaobinlong/OpenSearch that referenced this issue Feb 22, 2023
andrross pushed a commit that referenced this issue Mar 27, 2023
#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
gaobinlong added a commit to gaobinlong/OpenSearch that referenced this issue Mar 28, 2023
opensearch-project#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (opensearch-project#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 3fec567)

Modify the yaml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
andrross added a commit that referenced this issue Mar 29, 2023
… forcemerge APIs (#6855)

* Add wait_for_completion parameter to resize, open, and forcemerge APIs (#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 3fec567)

Modify the yaml test file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify package name

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
@gaobinlong
Copy link
Collaborator Author

Close this issue as the related PR has been merged yet, @andrross thanks for your support, really appreciate it.

mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this issue Apr 5, 2023
opensearch-project#6434)

* Add wait_for_completion parameter to resize&open&forcemerge APIs (opensearch-project#6228)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* change header of new file

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* modify changelog

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Valentin Mitrofanov <mitrofmep@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants