Selectively store anomaly results when index pressure is high #241

kaituo · 2020-10-07T21:06:59Z

Issue #, if available:

Description of changes:
Our performance evaluation has observed a large number of EsRejectedExecutionException due to indexing too many anomaly results. This PR adds a customized bulk transport action to deal with high index memory pressure. When the index pressure exceeds 80%, we only index non-zero anomaly grade index results and a random subset of zero anomaly grade index results.

This PR adds a few supporting data structures related changes:1. add constructors in AnomalyResult/FeatureData to use them in the newly added transport action.2. add Entity class to represent an entity in anomaly results.
This PR also renames ANOMALY_RESULT_INDEX to ANOMALY_RESULT_INDEX_ALIAS and moves it to the CommonName class. The name refers to an alias and is accessed in multiple places.

Testing done:

we don't observe EsRejectedExecutionException after applying the change (together with a series of other related changes).
added unit tests.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2020-10-07T21:24:29Z

Codecov Report

Merging #241 into master will increase coverage by 0.27%.
The diff coverage is 86.58%.

@@             Coverage Diff              @@
##             master     #241      +/-   ##
============================================
+ Coverage     72.76%   73.04%   +0.27%     
- Complexity     1422     1461      +39     
============================================
  Files           160      164       +4     
  Lines          6680     6834     +154     
  Branches        508      527      +19     
============================================
+ Hits           4861     4992     +131     
- Misses         1571     1591      +20     
- Partials        248      251       +3

Flag	Coverage Δ	Complexity Δ
#plugin	`72.31% <86.58%> (+0.32%)`	`1461.00 <40.00> (+39.00)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...troforelasticsearch/ad/feature/FeatureManager.java	`96.68% <ø> (ø)`	`96.00 <0.00> (ø)`
...on/opendistroforelasticsearch/ad/model/Entity.java	`50.00% <50.00%> (ø)`	`4.00 <4.00> (?)`
...elasticsearch/ad/transport/ADResultBulkAction.java	`75.00% <75.00%> (ø)`	`2.00 <2.00> (?)`
...distroforelasticsearch/ad/model/AnomalyResult.java	`90.83% <91.66%> (+0.47%)`	`36.00 <8.00> (+10.00)`
...arch/ad/transport/ADResultBulkTransportAction.java	`92.68% <92.68%> (ø)`	`13.00 <13.00> (?)`
...lasticsearch/ad/transport/ADResultBulkRequest.java	`95.65% <95.65%> (ø)`	`9.00 <9.00> (?)`
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java	`93.54% <100.00%> (ø)`	`10.00 <0.00> (ø)`
...sticsearch/ad/indices/AnomalyDetectionIndices.java	`61.87% <100.00%> (-0.45%)`	`23.00 <1.00> (-1.00)`
...stroforelasticsearch/ad/model/AnomalyDetector.java	`64.02% <100.00%> (ø)`	`51.00 <1.00> (ø)`
...endistroforelasticsearch/ad/model/FeatureData.java	`100.00% <100.00%> (ø)`	`9.00 <2.00> (+2.00)`
... and 7 more

yizheliu-amazon · 2020-10-09T00:28:29Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/model/AnomalyDetector.java

@@ -151,6 +210,7 @@ public AnomalyDetector(
        this.uiMetadata = uiMetadata;
        this.schemaVersion = schemaVersion;
        this.lastUpdateTime = lastUpdateTime;
+        this.categoryField = null;


You may refactor this constructor to

public AnomalyDetector(....,lastUpdateTime) { new AnomalyDetector(....,lastUpdateTime, null); }

Yes, changed.

yizheliu-amazon · 2020-10-09T00:29:45Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/model/AnomalyResult.java

@@ -91,6 +95,55 @@ public AnomalyResult(
        this.executionStartTime = executionStartTime;
        this.executionEndTime = executionEndTime;
        this.error = error;
+        this.entity = null;


same comment as above to refactor constructor

ylwu-amzn · 2020-10-09T07:31:35Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/model/AnomalyDetector.java

     */
+    public AnomalyDetector(


This PR has some overlap with #240 ?

Yes, they have common dependencies.

ylwu-amzn · 2020-10-09T07:42:40Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/model/AnomalyDetector.java

@@ -428,7 +497,9 @@ public TimeConfiguration getWindowDelay() {
    }

    public Integer getShingleSize() {
-        return shingleSize == null ? DEFAULT_SHINGLE_SIZE : shingleSize;
+        return shingleSize == null


How about we move this logic to detector constructor？Always set shingle size as default value if it's null. Then this get method can return shingleSize directly; and we should persist shingle size in detector document even it's default shingle size in case the default shingle size changes in future.

good idea. Changed.

ylwu-amzn · 2020-10-09T08:00:58Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/model/Entity.java

+    public static final String ENTITY_NAME_FIELD = "name";
+    public static final String ENTITY_VALUE_FIELD = "value";
+
+    private final String name;


name is category field name and value is the entity's unique value? For example, name will be "IP" and value will be 53.230.198.42 ?
Is it necessary to store same name for every AD result doc ? As high cardinality detector will generate too much AD results, I think it will save some disk usage if we remove such duplicate information (same for feature data).

yes, your understanding is correct. I am mostly doing this for the future: in case we have multiple categorical fields and the parser knows which value belongs to which field. Open to change if we have a solution.

Let's keep it as is currently. Remove these duplicate fields needs some snapshot mechanism to track the old version of detector configuration. Let's discuss more to make a decision.

ylwu-amzn · 2020-10-09T08:18:42Z

...ain/java/com/amazon/opendistroforelasticsearch/ad/transport/ADResultBulkTransportAction.java

+            }
+        }
+
+        client


Check bulk request size, if no anomaly result, no need to execute bulk index.

good catch. Added.

ylwu-amzn

LGTM. Thanks for the change

…hold. Our performance evaluation has observed a large number of EsRejectedExecutionException due to indexing too many anomaly results. This PR adds a customized bulk transport action to deal with high index memory pressure. When the index pressure exceeds 80%, we only index non-zero anomaly grade index results and a random subset of zero anomaly grade index results. This PR adds a few supporting data structures related changes:1. add constructors in AnomalyResult/FeatureData to use them in the newly added transport action.2. add Entity class to represent an entity in anomaly results. This PR also renames ANOMALY_RESULT_INDEX to ANOMALY_RESULT_INDEX_ALIAS and moves it to the CommonName class. The name refers to an alias and is accessed in multiple places. Testing done: 1. we don't observe EsRejectedExecutionException after applying the change (together with a series of other related changes). 2. added unit tests.

kaituo requested review from ylwu-amzn and yizheliu-amazon October 7, 2020 21:07

yizheliu-amazon approved these changes Oct 9, 2020

View reviewed changes

ylwu-amzn reviewed Oct 9, 2020

View reviewed changes

ylwu-amzn approved these changes Oct 13, 2020

View reviewed changes

kaituo added 3 commits October 13, 2020 14:40

Address comments

89d055c

rebase

1e7f3bc

kaituo force-pushed the resultBulk branch from cafebc2 to 1e7f3bc Compare October 13, 2020 21:48

Fix flaky test

60b94f5

kaituo merged commit 2175fde into opendistro-for-elasticsearch:master Oct 13, 2020

ohltyler added the enhancement New feature or request label Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selectively store anomaly results when index pressure is high #241

Selectively store anomaly results when index pressure is high #241

kaituo commented Oct 7, 2020

codecov bot commented Oct 7, 2020 •

edited

Loading

yizheliu-amazon Oct 9, 2020

kaituo Oct 12, 2020

yizheliu-amazon Oct 9, 2020

kaituo Oct 12, 2020

ylwu-amzn Oct 9, 2020

kaituo Oct 12, 2020

ylwu-amzn Oct 9, 2020

kaituo Oct 12, 2020

ylwu-amzn Oct 9, 2020 •

edited

Loading

kaituo Oct 12, 2020

ylwu-amzn Oct 12, 2020

ylwu-amzn Oct 9, 2020

kaituo Oct 12, 2020

ylwu-amzn left a comment

+                          }
+                      }
+                      client

Selectively store anomaly results when index pressure is high #241

Selectively store anomaly results when index pressure is high #241

Conversation

kaituo commented Oct 7, 2020

codecov bot commented Oct 7, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ylwu-amzn Oct 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ylwu-amzn left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 7, 2020 •

edited

Loading

ylwu-amzn Oct 9, 2020 •

edited

Loading