add anomaly feature attribution to model output #232

wnbts · 2020-09-14T20:29:56Z

Description of changes: This pr adds normalized anomaly score attribution to anomaly detection model output. Anomaly score is attributed to each feature dimension of a current single data point and normalized to 1 for easier consumption. Further changes to external API and persistence is not included in this pr.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2020-09-14T20:31:29Z

Codecov Report

Merging #232 into master will decrease coverage by 0.17%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #232      +/-   ##
============================================
- Coverage     72.25%   72.08%   -0.18%     
- Complexity     1278     1289      +11     
============================================
  Files           139      139              
  Lines          6045     5993      -52     
  Branches        469      476       +7     
============================================
- Hits           4368     4320      -48     
+ Misses         1465     1461       -4     
  Partials        212      212

Flag	Coverage Δ	Complexity Δ
#cli	`78.11% <ø> (-2.20%)`	`0.00 <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...opendistroforelasticsearch/ad/ml/ModelManager.java	`91.39% <100.00%> (+0.45%)`	`106.00 <7.00> (+6.00)`
...on/opendistroforelasticsearch/ad/ml/RcfResult.java	`100.00% <100.00%> (ø)`	`14.00 <3.00> (+2.00)`
...oforelasticsearch/ad/ml/rcf/CombinedRcfResult.java	`100.00% <100.00%> (ø)`	`12.00 <3.00> (+2.00)`
...rch/ad/transport/AnomalyResultTransportAction.java	`78.59% <100.00%> (+0.12%)`	`59.00 <0.00> (ø)`
...relasticsearch/ad/transport/RCFResultResponse.java	`100.00% <100.00%> (ø)`	`8.00 <2.00> (+1.00)`
...csearch/ad/transport/RCFResultTransportAction.java	`89.65% <100.00%> (+1.65%)`	`5.00 <0.00> (ø)`
cli/internal/gateway/ad/ad.go	`60.43% <0.00%> (-4.95%)`	`0.00% <0.00%> (ø%)`
cli/internal/controller/ad/ad.go	`75.17% <0.00%> (-2.22%)`	`0.00% <0.00%> (ø%)`
cli/internal/gateway/es/es.go	`84.61% <0.00%> (-2.06%)`	`0.00% <0.00%> (ø%)`
...asticsearch/ad/cluster/ADClusterEventListener.java	`92.00% <0.00%> (-2.00%)`	`14.00% <0.00%> (-1.00%)`
... and 129 more

kaituo · 2020-09-24T16:23:28Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/ml/ModelManager.java

+        vec.renormalize(1d);
+        double[] attribution = new double[vec.getDimensions()];
+        for (int i = 0; i < attribution.length; i++) {
+            attribution[i] = vec.getHighLowSum(i);


Questions:
First, how to interpret both high and low are non-zero? Is it really high or low? Does it mean RCF trees think the value can be both higher or lower than the recently observed data trends for that column? Do we need a majority win rule to say it is actually high or low?
Second, when doing a high low sum, we lose direction. Is there any way to preserve the direction?
Third, when users see two features' attribution like x: 1% and y 99%, it tells users y is the place anomaly happens. It might as well not to show x's 1%. I feel an attribution score less than 1/d (d is the number of features) is not useful to users. Any comments on this?

for a given sum, the larger value indicates the direction/relative position. Note even data within normal range has non-zero values for both.

since the current ux design only shows feature contribution, only contribution is computed here. The direction is not lost. It can be added when it's needed.

1/99 or 0/100 probably won't make a difference for users. In general, additional rules should be avoided for simplicity and correctness as they might introduce their own problems. As an extreme example, the contribution from two features is 49/51, if 49 is omitted, after normalization the result could be 0/100.

LiuJoyceC · 2020-09-30T23:18:29Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/ml/ModelManager.java

            } else {
                double score = rcfResults.stream().mapToDouble(r -> r.getScore() * r.getForestSize()).sum() / totalForestSize;
                double confidence = rcfResults.stream().mapToDouble(r -> r.getConfidence() * r.getForestSize()).sum() / Math
                    .max(rcfNumTrees, totalForestSize);
-                combinedResult = new CombinedRcfResult(score, confidence);
+                double[] attribution = combineAttribution(rcfResults, numFeatures, totalForestSize);
+                combinedResult = new CombinedRcfResult(score, confidence, combineAttribution(rcfResults, numFeatures, totalForestSize));


[minor] It looks like the variable attribution was meant to be used in line 229, but didn't get used and instead combineAttribution was invoked again with identical args.

good catch. I fixed it in the new commit.

add anomaly feature attribution to model output

8223872

wnbts marked this pull request as ready for review September 15, 2020 00:10

kaituo approved these changes Sep 23, 2020

View reviewed changes

kaituo reviewed Sep 24, 2020

View reviewed changes

LiuJoyceC approved these changes Sep 30, 2020

View reviewed changes

remove redundant attribution

d1c95fb

wnbts merged commit 2d29d47 into opendistro-for-elasticsearch:master Oct 1, 2020

ohltyler added the enhancement New feature or request label Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add anomaly feature attribution to model output #232

add anomaly feature attribution to model output #232

wnbts commented Sep 14, 2020 •

edited

Loading

codecov bot commented Sep 14, 2020 •

edited

Loading

kaituo Sep 24, 2020

wnbts Sep 25, 2020

LiuJoyceC Sep 30, 2020

wnbts Oct 1, 2020

add anomaly feature attribution to model output #232

add anomaly feature attribution to model output #232

Conversation

wnbts commented Sep 14, 2020 • edited Loading

codecov bot commented Sep 14, 2020 • edited Loading

Codecov Report

kaituo Sep 24, 2020

Choose a reason for hiding this comment

wnbts Sep 25, 2020

Choose a reason for hiding this comment

LiuJoyceC Sep 30, 2020

Choose a reason for hiding this comment

wnbts Oct 1, 2020

Choose a reason for hiding this comment

wnbts commented Sep 14, 2020 •

edited

Loading

codecov bot commented Sep 14, 2020 •

edited

Loading