Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support listenable TransportRequestHandler in TransportNodesAction #15165

Open
zane-neo opened this issue Aug 8, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Other

Comments

@zane-neo
Copy link
Contributor

zane-neo commented Aug 8, 2024

Is your feature request related to a problem? Please describe

Currently when broadcasting a cluster level event, the TransportNodesActioncan be leveraged to send requests to multiple nodes in the cluster. When target nodes received the request, they should return a BaseNodeResponse by overriding the nodeOperation method.
The drawback is this method is in sync approach and if the node needs async operation like query index or remote service call, there isn't an easy way to return a response when the async operation complete. E.g. query index then do something code looks like below:

@Override
public MyBaseNodeResponse nodeOperation(NodeRequest request) {
  // build the getRequest
  nodeClient.get(getRequest, ActionListener<GetResponse> responseListener)
  // do something else
return new BaseNodeResponse();
}

In the above code snippet, when the method returns, the query might be still running which means when user received a success response, it doesn't mean all the operations are completed, so user needs to either wait for some time or loop query the data to confirm the final result as expected.

Describe the solution you'd like

Add a new ListenableTransportRequestHandler in TransportNodesAction so nodeOperation method can go along the listener approach. This can solve the issue in a lightweight approach and doesn't have extra burden for user.

Related component

Other

Describe alternatives you've considered

Use a dedicated thread pool and put the time consuming operation in it and block the thread until response returns.

This approach we need to introduce extra thread pool which consumes resources and tuning even cluster settings needs to expose to user to tune the thread pool which is a burden for user.

Additional context

Use case:
In ml-commons, a scenario is to deploy model to the cluster, this action needs to read the model metadata from index first and then do time-consuming operation for locally running model and non time-consuming operations for remote running model.

The thing is the current TransportNodesAction limited the API that developers have to return a response when performing nodeOperation, so we have to workaround this by forwarding the deploy result to a listener to update the model status, the workflow looks like below:
deploy_model

This works fine for the time-consuming models(run locally) but not for remote models because deploying remote models is lightweight which only needs create several object in memory after metadata read. Once we enhanced this API to support listenableTransportRequestHandler, then we can pass the listener to the noOperation and return actual model status instead of deploying.

Also recently we found an issue of model deployment which is caused by the forwarding deploy result to listener part, simply put, if a node crash during the model deploy, then the listener will not able to receive all responses and not updating the model status, please refer: opensearch-project/ml-commons#2970.

So enhancing this class offering a straightforward solution to this scenario and I believe it can benefit other similar cases which accept a listener during the nodeOperation.

@zane-neo zane-neo added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 8, 2024
@github-actions github-actions bot added the Other label Aug 8, 2024
@dblock dblock removed the untriaged label Aug 26, 2024
@dblock
Copy link
Member

dblock commented Aug 26, 2024

[Catch All Triage - 1, 2, 3, 4, 5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Other
Projects
None yet
Development

No branches or pull requests

2 participants