-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dispatch ML task to ML node first #346
Conversation
Signed-off-by: Yaliang Wu <ylwu@amazon.com>
@@ -115,14 +113,7 @@ public class MachineLearningPlugin extends Plugin implements ActionPlugin { | |||
private ClusterService clusterService; | |||
private ThreadPool threadPool; | |||
|
|||
public static final Setting<Boolean> IS_ML_NODE_SETTING = Setting.boolSetting("node.ml", false, Setting.Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity question: what is the purpose of IS_ML_NODE_SETTING
before and why we don't need it now? I saw this part of logic was moved to TestHelper
class, what's the reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part was from prototype when we tried to support ML node. But actually it's not being used in our formal release as OpenSearch core doesn't support ML role and we have to postpone that. Now OpenSearch plan to support ML role with dynamic role feature in 2.1. We can add this back but we don't need this prototype/experiment code any more. Just move it to test part.
DiscoveryNode[] mlNodes = eligibleMLNodes.toArray(new DiscoveryNode[0]); | ||
log.debug("Find {} dedicated ML nodes: {}", eligibleMLNodes.size(), Arrays.toString(mlNodes)); | ||
return mlNodes; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: This "else" should be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's code style preference. People have different preference for "No-else-after-return" or not, check https://stackoverflow.com/questions/46875442/unnecessary-else-after-return-no-else-return.
For me, I feel the code is more readable to keep else
to make the returns have same indentation.
/** | ||
* Get eligible node to run ML task. If there are nodes with ml role, will return all these | ||
* ml nodes; otherwise return all data nodes. | ||
* | ||
* @return array of discovery node | ||
*/ | ||
protected DiscoveryNode[] getEligibleNodes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only preferable to run ML tasks in ml node? I assume ml-common can run in data node as well. Also is there any logic in the ClusterState.nodes() to evaluate if any ml node is overloaded, etc? I just wonder, in the future, if we want to add more priority based strategy here to prioritize ML node, but still use data node if ML node is heavy loaded, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only preferable to run ML tasks in ml node? I assume ml-common can run in data node as well.
Check the comment "If there are nodes with ml role, will return all these ml nodes; otherwise return all data nodes."
Also is there any logic in the ClusterState.nodes() to evaluate if any ml node is overloaded, etc?
Yes, we check JVM heap usage and how many ML task running on a node. If exceeds limit, will not dispatch new ML task to that node.
I just wonder, in the future, if we want to add more priority based strategy here to prioritize ML node, but still use data node if ML node is heavy loaded, etc.
I think we'd better ask user to scale the cluster by adding more ML node or switch to more powerful node type if ML node is heavy/over loaded. But this is not the one way door, we can always tune the code if cx really needs to run model on data nodes if ML node overloaded.
Signed-off-by: Yaliang Wu <ylwu@amazon.com> (cherry picked from commit 6cbb626)
Signed-off-by: Yaliang Wu ylwu@amazon.com
Description
We plan to support dynamic role in OpenSearch. Refer to this PR opensearch-project/OpenSearch#3436 and issue opensearch-project/OpenSearch#2877
This PR is to change the task dispatcher to support ML node. Will check if cluster has any node with "ml" role first. If yes, will dispatch ML task to "ml" nodes; otherwise will dispatch to data nodes.
Issues Resolved
close #79
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.