Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Integration with Ultrawarm - Follow up #97

Merged

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented May 1, 2020

Issue #, if available:

Description of changes:
This is a follow up PR to address comments.

See context in #95

Testing done:

  1. gradle build passes
  2. Verified AD runs only in hot nodes.
  3. stats API and HourlyCron still works.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This is a follow up PR to address comments.

Testing done:
1. gradle build passes
2. Verified AD runs only in hot nodes.
3. stats API and HourlyCron still works.
@kaituo kaituo requested a review from wnbts May 1, 2020 05:39

static class HotDataNodePredicate implements Predicate<DiscoveryNode> {
@Override
public boolean test(DiscoveryNode discoveryNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question. Just to confirm this if-and-only-if behavior since it is different from the last change, any data nodes that are marked but not marked hot are not eligible? if a data node is marked with some unknown new value, it would be ineligible based on this rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there are only hot and warm data nodes.
Previously, if a data node is marked with warm node, we ignore it
Now, if a data node is marked with hot node or not marked at all, we use it.
If a data node is marked with some unknown new value in the future, it would be ineligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion. Since this business logic/decision has important effects on system behavior, it is better to document it. Also, the documentation at line 49 is outdated based on this new change.

* @return whether we should use this node for AD
*/
public boolean isEligibleNode(DiscoveryNode node) {
return new HotDataNodePredicate().test(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue. this predicate should be instantiated, maybe injected, and reused for calls and should not be instantiated once for every call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Changed.

ClusterState state = this.clusterService.state();
final List<DiscoveryNode> eligibleNodes = new ArrayList<>();
final HotDataNodePredicate eligibleNodeFilter = new HotDataNodePredicate();
for (DiscoveryNode node : state.nodes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion. stream helps improve code efficiency (eliminating need for intermediary data structures, variables creation and operation) and readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is personal coding style. Thanks for the suggestions.


import com.amazon.opendistroforelasticsearch.ad.constant.CommonName;

public class DiscoveryNodeFilterer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor. class documentation is missing. For public classes and methods, the responsibilities should be summarized for clients and readers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* model partitions to all data nodes in the cluster randomly, which
* could cause a model performance downgrade issue once warm nodes
* are throttled due to resource limitations. The PR excludes warm nodes to place model partitions.
* @return an array of eligible data nodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: indentation is not needed in java docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -119,7 +119,7 @@ public String getName() {
/**
* Constructor.
*
* @param clusterStateUtils cluster info
* @param nodeFilter utility class to select nodesr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: typo nodes r

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.


import com.amazon.opendistroforelasticsearch.ad.constant.CommonName;

public class DiscoveryNodeFilterer {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any UT for this class. Would be great to add some.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have coverage tool to check coverage (75% line and 60% branch coverage). Without enough coverage, build would fail. This class is covered by its callers’ tests.

@kaituo kaituo merged commit fbc8a4e into opendistro-for-elasticsearch:opendistro-1.4 May 4, 2020
kaituo added a commit to kaituo/anomaly-detection that referenced this pull request May 19, 2020
This is a follow up PR to address comments.

Testing done:
1. gradle build passes
2. Verified AD runs only in hot nodes.
3. stats API and HourlyCron still works.
kaituo added a commit that referenced this pull request May 19, 2020
* Integration with Ultrawarm (#95)

Ultrawarm introduces warm nodes into the ES cluster. Currently, we distribute model partitions to all data nodes in the cluster randomly, which could cause a model performance downgrade issue once warm nodes are throttled due to resource limitations. The PR excludes warm nodes to place model partitions.

Since index shards are hosted on hot nodes, AD's coordinating nodes are in hot nodes as well. We don't need to send HourlyCron job and stats requests to warm nodes anymore. This PR implements those changes.

Testing done:
1. Verified AD runs only in hot nodes.
2. stats API and HourlyCron still works.

* Integration with Ultrawarm - Follow up (#97)

This is a follow up PR to address comments.

Testing done:
1. gradle build passes
2. Verified AD runs only in hot nodes.
3. stats API and HourlyCron still works.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants