You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
Currently when broadcasting a cluster level event, the TransportNodesActioncan be leveraged to send requests to multiple nodes in the cluster. When target nodes received the request, they should return a BaseNodeResponse by overriding the nodeOperation method.
The drawback is this method is in sync approach and if the node needs async operation like query index or remote service call, there isn't an easy way to return a response when the async operation complete. E.g. query index then do something code looks like below:
@Override
public MyBaseNodeResponse nodeOperation(NodeRequest request) {
// build the getRequest
nodeClient.get(getRequest, ActionListener<GetResponse> responseListener)
// do something else
return new BaseNodeResponse();
}
In the above code snippet, when the method returns, the query might be still running which means when user received a success response, it doesn't mean all the operations are completed, so user needs to either wait for some time or loop query the data to confirm the final result as expected.
Describe the solution you'd like
Add a new ListenableTransportRequestHandler in TransportNodesAction so nodeOperation method can go along the listener approach. This can solve the issue in a lightweight approach and doesn't have extra burden for user.
Related component
Other
Describe alternatives you've considered
Use a dedicated thread pool and put the time consuming operation in it and block the thread until response returns.
This approach we need to introduce extra thread pool which consumes resources and tuning even cluster settings needs to expose to user to tune the thread pool which is a burden for user.
Additional context
Use case:
In ml-commons, a scenario is to deploy model to the cluster, this action needs to read the model metadata from index first and then do time-consuming operation for locally running model and non time-consuming operations for remote running model.
The thing is the current TransportNodesAction limited the API that developers have to return a response when performing nodeOperation, so we have to workaround this by forwarding the deploy result to a listener to update the model status, the workflow looks like below:
This works fine for the time-consuming models(run locally) but not for remote models because deploying remote models is lightweight which only needs create several object in memory after metadata read. Once we enhanced this API to support listenableTransportRequestHandler, then we can pass the listener to the noOperation and return actual model status instead of deploying.
Also recently we found an issue of model deployment which is caused by the forwarding deploy result to listener part, simply put, if a node crash during the model deploy, then the listener will not able to receive all responses and not updating the model status, please refer: opensearch-project/ml-commons#2970.
So enhancing this class offering a straightforward solution to this scenario and I believe it can benefit other similar cases which accept a listener during the nodeOperation.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
Currently when broadcasting a cluster level event, the TransportNodesActioncan be leveraged to send requests to multiple nodes in the cluster. When target nodes received the request, they should return a BaseNodeResponse by overriding the nodeOperation method.
The drawback is this method is in sync approach and if the node needs async operation like query index or remote service call, there isn't an easy way to return a response when the async operation complete. E.g. query index then do something code looks like below:
In the above code snippet, when the method returns, the query might be still running which means when user received a success response, it doesn't mean all the operations are completed, so user needs to either wait for some time or loop query the data to confirm the final result as expected.
Describe the solution you'd like
Add a new ListenableTransportRequestHandler in TransportNodesAction so
nodeOperation
method can go along the listener approach. This can solve the issue in a lightweight approach and doesn't have extra burden for user.Related component
Other
Describe alternatives you've considered
Use a dedicated thread pool and put the time consuming operation in it and block the thread until response returns.
This approach we need to introduce extra thread pool which consumes resources and tuning even cluster settings needs to expose to user to tune the thread pool which is a burden for user.
Additional context
Use case:
In ml-commons, a scenario is to deploy model to the cluster, this action needs to read the model metadata from index first and then do time-consuming operation for locally running model and non time-consuming operations for remote running model.
The thing is the current
TransportNodesAction
limited the API that developers have to return a response when performing nodeOperation, so we have to workaround this by forwarding the deploy result to a listener to update the model status, the workflow looks like below:This works fine for the time-consuming models(run locally) but not for remote models because deploying remote models is lightweight which only needs create several object in memory after metadata read. Once we enhanced this API to support listenableTransportRequestHandler, then we can pass the listener to the noOperation and return actual model status instead of
deploying
.Also recently we found an issue of model deployment which is caused by the
forwarding deploy result to listener
part, simply put, if a node crash during the model deploy, then the listener will not able to receive all responses and not updating the model status, please refer: opensearch-project/ml-commons#2970.So enhancing this class offering a straightforward solution to this scenario and I believe it can benefit other similar cases which accept a listener during the nodeOperation.
The text was updated successfully, but these errors were encountered: