Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add listenable TransportRequestHandler in TransportNodesAction #15166

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zane-neo
Copy link
Contributor

@zane-neo zane-neo commented Aug 8, 2024

Description

Add listenable TransportRequestHandler in TransportNodesAction.

Use case:
In ml-commons, a scenario is to deploy model to the cluster, this action needs to read the model metadata from index first and then do time-consuming operation for locally running model and non time-consuming operations for remote running model.

The thing is the current TransportNodesAction limited the API that developers have to return a response when performing nodeOperation, so we have to workaround this by forwarding the deploy result to a listener to update the model status, the workflow looks like below:
deploy_model

This works fine for the time-consuming models(run locally) but not for remote models because deploying remote models is lightweight which only needs create several object in memory after metadata read. Once we enhanced this API to support listenableTransportRequestHandler, then we can pass the listener to the noOperation and return actual model status instead of deploying.

Also recently we found an issue of model deployment which is caused by the forwarding deploy result to listener part, simply put, if a node crash during the model deploy, then the listener will not able to receive all responses and not updating the model status, please refer: opensearch-project/ml-commons#2970.

So enhancing this class offering a straightforward solution to this scenario and I believe it can benefit other similar cases which accept a listener during the nodeOperation.

Related Issues

#15165

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Aug 8, 2024

✅ Gradle check result for 11c214f: SUCCESS

Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 72.13%. Comparing base (b67cdf4) to head (a11a8a1).

Files with missing lines Patch % Lines
...rch/action/support/nodes/TransportNodesAction.java 83.33% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15166      +/-   ##
============================================
+ Coverage     72.03%   72.13%   +0.10%     
- Complexity    65230    65261      +31     
============================================
  Files          5318     5318              
  Lines        304051   304069      +18     
  Branches      43990    43991       +1     
============================================
+ Hits         219021   219355     +334     
+ Misses        67121    66747     -374     
- Partials      17909    17967      +58     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Aug 9, 2024

❌ Gradle check result for f17a5c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3c68c3f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@zane-neo zane-neo changed the title []Add listenable TransportRequestHandler in TransportNodesAction Add listenable TransportRequestHandler in TransportNodesAction Aug 15, 2024
@zane-neo
Copy link
Contributor Author

@peternied @gaobinlong @dblock Can you help review this PR?

@gaobinlong gaobinlong added the backport 2.x Backport to 2.x branch label Aug 15, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 14, 2024
@zane-neo zane-neo force-pushed the add-listenable-transport-request-handler branch from 3c68c3f to 62c17d5 Compare September 29, 2024 07:03
Copy link
Contributor

❌ Gradle check result for cdd58e2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@gaobinlong gaobinlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zane-neo Could you give some use cases about this change to help understanding the intention, thanks!

@zane-neo
Copy link
Contributor Author

zane-neo commented Oct 8, 2024

@zane-neo Could you give some use cases about this change to help understanding the intention, thanks!

@gaobinlong Thanks, Binlong. I'll add more cases in the description and the issue.

@zane-neo zane-neo force-pushed the add-listenable-transport-request-handler branch from cdd58e2 to 2e6023c Compare October 8, 2024 02:52
Copy link
Contributor

github-actions bot commented Oct 8, 2024

❌ Gradle check result for 2e6023c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 8, 2024

✅ Gradle check result for 2e6023c: SUCCESS

Copy link
Collaborator

@gaobinlong gaobinlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gaobinlong
Copy link
Collaborator

@rajiv-kv @shwetathareja could you also help to take a look at this PR, thanks!

@zane-neo zane-neo force-pushed the add-listenable-transport-request-handler branch 3 times, most recently from c9a2500 to 067fc68 Compare October 17, 2024 05:46
Copy link
Contributor

❕ Gradle check result for 067fc68: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
@zane-neo zane-neo force-pushed the add-listenable-transport-request-handler branch from 067fc68 to a11a8a1 Compare December 13, 2024 07:19
Copy link
Contributor

✅ Gradle check result for a11a8a1: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants