Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Selectively store anomaly results when index pressure is high #241

Merged
merged 4 commits into from
Oct 13, 2020

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Oct 7, 2020

Issue #, if available:

Description of changes:
Our performance evaluation has observed a large number of EsRejectedExecutionException due to indexing too many anomaly results. This PR adds a customized bulk transport action to deal with high index memory pressure. When the index pressure exceeds 80%, we only index non-zero anomaly grade index results and a random subset of zero anomaly grade index results.

This PR adds a few supporting data structures related changes:1. add constructors in AnomalyResult/FeatureData to use them in the newly added transport action.2. add Entity class to represent an entity in anomaly results.
This PR also renames ANOMALY_RESULT_INDEX to ANOMALY_RESULT_INDEX_ALIAS and moves it to the CommonName class. The name refers to an alias and is accessed in multiple places.

Testing done:

  1. we don't observe EsRejectedExecutionException after applying the change (together with a series of other related changes).
  2. added unit tests.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

Merging #241 into master will increase coverage by 0.27%.
The diff coverage is 86.58%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #241      +/-   ##
============================================
+ Coverage     72.76%   73.04%   +0.27%     
- Complexity     1422     1461      +39     
============================================
  Files           160      164       +4     
  Lines          6680     6834     +154     
  Branches        508      527      +19     
============================================
+ Hits           4861     4992     +131     
- Misses         1571     1591      +20     
- Partials        248      251       +3     
Flag Coverage Δ Complexity Δ
#plugin 72.31% <86.58%> (+0.32%) 1461.00 <40.00> (+39.00)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...troforelasticsearch/ad/feature/FeatureManager.java 96.68% <ø> (ø) 96.00 <0.00> (ø)
...on/opendistroforelasticsearch/ad/model/Entity.java 50.00% <50.00%> (ø) 4.00 <4.00> (?)
...elasticsearch/ad/transport/ADResultBulkAction.java 75.00% <75.00%> (ø) 2.00 <2.00> (?)
...distroforelasticsearch/ad/model/AnomalyResult.java 90.83% <91.66%> (+0.47%) 36.00 <8.00> (+10.00)
...arch/ad/transport/ADResultBulkTransportAction.java 92.68% <92.68%> (ø) 13.00 <13.00> (?)
...lasticsearch/ad/transport/ADResultBulkRequest.java 95.65% <95.65%> (ø) 9.00 <9.00> (?)
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java 93.54% <100.00%> (ø) 10.00 <0.00> (ø)
...sticsearch/ad/indices/AnomalyDetectionIndices.java 61.87% <100.00%> (-0.45%) 23.00 <1.00> (-1.00)
...stroforelasticsearch/ad/model/AnomalyDetector.java 64.02% <100.00%> (ø) 51.00 <1.00> (ø)
...endistroforelasticsearch/ad/model/FeatureData.java 100.00% <100.00%> (ø) 9.00 <2.00> (+2.00)
... and 7 more

@@ -151,6 +210,7 @@ public AnomalyDetector(
this.uiMetadata = uiMetadata;
this.schemaVersion = schemaVersion;
this.lastUpdateTime = lastUpdateTime;
this.categoryField = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may refactor this constructor to

public AnomalyDetector(....,lastUpdateTime) {
     new AnomalyDetector(....,lastUpdateTime, null);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changed.

@@ -91,6 +95,55 @@ public AnomalyResult(
this.executionStartTime = executionStartTime;
this.executionEndTime = executionEndTime;
this.error = error;
this.entity = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above to refactor constructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

*/
public AnomalyDetector(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has some overlap with #240 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they have common dependencies.

@@ -428,7 +497,9 @@ public TimeConfiguration getWindowDelay() {
}

public Integer getShingleSize() {
return shingleSize == null ? DEFAULT_SHINGLE_SIZE : shingleSize;
return shingleSize == null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move this logic to detector constructor?Always set shingle size as default value if it's null. Then this get method can return shingleSize directly; and we should persist shingle size in detector document even it's default shingle size in case the default shingle size changes in future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. Changed.

public static final String ENTITY_NAME_FIELD = "name";
public static final String ENTITY_VALUE_FIELD = "value";

private final String name;
Copy link
Contributor

@ylwu-amzn ylwu-amzn Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is category field name and value is the entity's unique value? For example, name will be "IP" and value will be 53.230.198.42 ?
Is it necessary to store same name for every AD result doc ? As high cardinality detector will generate too much AD results, I think it will save some disk usage if we remove such duplicate information (same for feature data).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, your understanding is correct. I am mostly doing this for the future: in case we have multiple categorical fields and the parser knows which value belongs to which field. Open to change if we have a solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as is currently. Remove these duplicate fields needs some snapshot mechanism to track the old version of detector configuration. Let's discuss more to make a decision.

}
}

client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check bulk request size, if no anomaly result, no need to execute bulk index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Added.

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change

…hold.

Our performance evaluation has observed a large number of EsRejectedExecutionException due to indexing too many anomaly results. This PR adds a customized bulk transport action to deal with high index memory pressure. When the index pressure exceeds 80%, we only index non-zero anomaly grade index results and a random subset of zero anomaly grade index results.

This PR adds a few supporting data structures related changes:1. add constructors in AnomalyResult/FeatureData to use them in the newly added transport action.2. add Entity class to represent an entity in anomaly results.
This PR also renames ANOMALY_RESULT_INDEX to ANOMALY_RESULT_INDEX_ALIAS and moves it to the CommonName class. The name refers to an alias and is accessed in multiple places.

Testing done:
1. we don't observe EsRejectedExecutionException after applying the change (together with a series of other related changes).
2. added unit tests.
@kaituo kaituo merged commit 2175fde into opendistro-for-elasticsearch:master Oct 13, 2020
@ohltyler ohltyler added the enhancement New feature or request label Oct 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants