-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link child requests to the root task that initiated the action #48422
Conversation
This commit modifies the hierarchy of tasks to link any child requests executed by a transport action to the root task that initiated the request. For instance today a shard search is linked to the main search task that coordinate the entire request. However if the main search task was created by a multi search action, multiple roundtrips are needed to link the multi search to the shard searches that it creates (by recursively checking the parent of the tasks). This is not convenient when cancelling a task since it requires to send the cancellation for the root task first, then to all nodes to cancel the child task and recursively until there is no more child tasks. Currently we only handle one level so multi _search cancellation cannot reach the grand-children (the shard search). The proposal in this change is to always link child tasks to their root task (the task that initiated the original action). While the change is simple it might be considered as breaking since we loose the association between children and grand-children so it would not be possible to cancel an individual search without cancelling the entire msearch action. I think it's an acceptable trade-off mainly because other solutions that would not break this behavior would require much more changes. We could for instance keep the full list of parents for a specific task but that would require to change all the call to execute a task locally. Another solution would be to add more roundtrips to task cancellations but that's also a big change that would complicate cancellation significantly.
Pinging @elastic/es-search (:Search/Search) |
Pinging @elastic/es-distributed (:Distributed/Task Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just goes to show that the tasks API is not the right tool for what you would like to build here. More discussion about this here: #47378
We discussed another route offline, for reindex there are discussions around a new mechanism to allow cancelling only the root task. For msearch we should use a header and add the ability to cancel a task from its |
This commit modifies the hierarchy of tasks to link any child requests executed
by a transport action to the root task that initiated the request. For instance today
a shard search is linked to the main search task that coordinate the entire request.
However if the main search task was created by a multi search action, multiple roundtrips
are needed to link the multi search to the shard searches that it creates (by recursively checking the parent of the tasks).
This is not convenient when cancelling a task since it requires to send the cancellation for the root task first, then to all nodes to cancel the child task and recursively until there is no more child tasks. Currently we only handle one level so multi_search cancellation cannot reach the grand-children (the shard search).
The proposal in this change is to always link child tasks to their root task (the task that initiated the original action). While the change is simple it might be considered as breaking since we loose the association between children and grand-children so it would not be possible to cancel an individual search without cancelling the entire msearch action.
I think it's an acceptable trade-off mainly because other solutions that would not break this behavior would require much more changes.
We could for instance keep the full list of parents for a specific task but that would require to change all the call to execute a task locally.
Another solution would be to add more roundtrips to task cancellations but that's also a big change that would complicate cancellation significantly.
I didn't change the documentation around cancellation yet since this change might be controversial. Let's discuss it here and we can add documentation if we reach a consensus.