Extract a new class for entity frequenty tracking #389

kaituo · 2021-02-22T23:34:12Z

Issue #, if available:

Description of changes:
HC detectors use a 1-pass algorithm for estimating heavy hitters in a stream. Our method maintains a time-decayed count for each entity, which allows us to compare the frequencies of entities from different detectors in the stream. To reuse the code in historical detectors, I created a new class PriorityTracker and moved all related logic there. When an entity is hit, the caller can call PriorityTracker.updatePriority to update the entity's priority. The callers can find the most frequently occurring entities in the stream using PriorityTracker.getTopNEntities.

This PR also adds tests for NodeStateManager.

Testing done:

manually tested basic workflow of HC detectors still works.
added new tests for PriorityTracker.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

HC detectors use a 1-pass algorithm for estimating heavy hitters in a stream. Our method maintains a time-decayed count for each entity, which allows us to compare the frequencies of entities from different detectors in the stream. To reuse the code in historical detectors, I created a new class PriorityTracker and moved all related logic there. When an entity is hit, the caller can call PriorityTracker.updatePriority to update the entity's priority. The callers can find the most frequently occurring entities in the stream using PriorityTracker.getTopNEntities. This PR also adds tests for NodeStateManager. Testing done: 1. manually tested basic workflow of HC detectors still works. 2. added new tests for PriorityTracker.

codecov · 2021-02-22T23:35:30Z

Codecov Report

Merging #389 (ab741f5) into main (a6ce6fc) will increase coverage by 0.00%.
The diff coverage is 96.47%.

@@            Coverage Diff            @@
##               main     #389   +/-   ##
=========================================
  Coverage     79.14%   79.15%           
- Complexity     2662     2680   +18     
=========================================
  Files           247      248    +1     
  Lines         11717    11749   +32     
  Branches       1008     1010    +2     
=========================================
+ Hits           9274     9300   +26     
- Misses         1964     1971    +7     
+ Partials        479      478    -1

Flag	Coverage Δ	Complexity Δ
plugin	`79.14% <96.47%> (+<0.01%)`	`0.00 <27.00> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...stroforelasticsearch/ad/caching/PriorityCache.java	`83.60% <87.50%> (+0.06%)`	`60.00 <1.00> (ø)`
...distroforelasticsearch/ad/caching/CacheBuffer.java	`74.78% <92.30%> (-4.28%)`	`36.00 <6.00> (-2.00)`
...roforelasticsearch/ad/caching/PriorityTracker.java	`98.43% <98.43%> (ø)`	`20.00 <20.00> (?)`
...stroforelasticsearch/ad/AnomalyDetectorRunner.java	`40.00% <0.00%> (-6.25%)`	`6.00% <0.00%> (-1.00%)`
...pendistroforelasticsearch/ad/cluster/HashRing.java	`78.94% <0.00%> (-5.27%)`	`10.00% <0.00%> (-1.00%)`
...port/SearchAnomalyDetectorInfoTransportAction.java	`59.09% <0.00%> (-4.55%)`	`4.00% <0.00%> (ø%)`
...troforelasticsearch/ad/task/ADBatchTaskRunner.java	`87.82% <0.00%> (-0.87%)`	`63.00% <0.00%> (-1.00%)`
...pendistroforelasticsearch/ad/NodeStateManager.java	`77.45% <0.00%> (+1.96%)`	`32.00% <0.00%> (+1.00%)`
...asticsearch/ad/cluster/ADClusterEventListener.java	`94.00% <0.00%> (+2.00%)`	`15.00% <0.00%> (+1.00%)`
... and 1 more

LiuJoyceC · 2021-02-23T02:10:54Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/caching/PriorityTracker.java

+     * @param n the number of entities to return.  Can be less than n if there are not enough entities stored.
+     * @return top entities in the descending order of priority
+     */
+    public List<String> getTopNEntities(int n) {


Is this method going to be used elsewhere eventually? So far it's only called by tests.

yes, historical detectors are expected to call it.

LiuJoyceC

Other than a minor question I had about a currently unused new method, the refactor looks good.

ylwu-amzn · 2021-02-24T01:34:31Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/caching/PriorityTracker.java

+     * @param entityModelId Entity model Id
+     * @param priority priority
+     */
+    protected void addPriority(String entityModelId, float priority) {


The initial priority of one entity should be 0? I see the line 210 is using new PriorityNode(entityModelId, 0f)

Why need to set the key as entityModelId ? Can we just put entity as key?

The initial priority of one entity should be 0? I see the line 210 is using new PriorityNode(entityModelId, 0f)

Yes

Why need to set the key as entityModelId ? Can we just put entity as key?

You asked me to use this name :) How about entityId?

For realtime HC detector, it stores entity model id (detectorId + "entity" + entity ?). For historical HC detector, plan to use just entity value. As we discussed, as detector will have separate PriorityTracker, so we can just use entity, that can save some memory.

" as detector will have separate PriorityTracker, so we can just use entity, that can save some memory." Could you explain this sentence? I don't understand.

For example, we store "<detector_id>+<entity_value>_entity" in cache, we can change to "<entity_value>" to avoid saving duplicate information of "<detector_id>" and "_entity" as the cache is on detector level.

ylwu-amzn · 2021-02-24T01:51:09Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/caching/PriorityTracker.java

+     * detector is enabled. i - L measures the elapsed periods since detector starts.
+     * 0.125 is the decay constant.
+     *
+     * Since g(p−L) is changing and they are the same for all entities of the same detector,


Seems all other places in this java doc are using g(i - L), change g(p-L) to g(i - L) ?

Good catch. Changed.

…odelId to entityId

kaituo requested review from LiuJoyceC and ylwu-amzn February 22, 2021 23:34

LiuJoyceC reviewed Feb 23, 2021

View reviewed changes

LiuJoyceC approved these changes Feb 23, 2021

View reviewed changes

kaituo added the enhancement New feature or request label Feb 23, 2021

ylwu-amzn reviewed Feb 24, 2021

View reviewed changes

Update comments, add a limit on tracker size, and change from entityM…

ab741f5

…odelId to entityId

kaituo changed the title ~~Extrac a new class for entity frequenty tracking~~ Extract a new class for entity frequenty tracking Feb 24, 2021

ylwu-amzn approved these changes Feb 25, 2021

View reviewed changes

kaituo merged commit 0515a35 into opendistro-for-elasticsearch:main Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract a new class for entity frequenty tracking #389

Extract a new class for entity frequenty tracking #389

kaituo commented Feb 22, 2021

codecov bot commented Feb 22, 2021 •

edited

Loading

LiuJoyceC Feb 23, 2021

kaituo Feb 23, 2021

LiuJoyceC left a comment

ylwu-amzn Feb 24, 2021 •

edited

Loading

ylwu-amzn Feb 24, 2021

kaituo Feb 24, 2021 •

edited

Loading

ylwu-amzn Feb 24, 2021

kaituo Feb 24, 2021

ylwu-amzn Mar 11, 2021

ylwu-amzn Feb 24, 2021

kaituo Feb 24, 2021

Extract a new class for entity frequenty tracking #389

Extract a new class for entity frequenty tracking #389

Conversation

kaituo commented Feb 22, 2021

codecov bot commented Feb 22, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiuJoyceC left a comment

Choose a reason for hiding this comment

ylwu-amzn Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaituo Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 22, 2021 •

edited

Loading

ylwu-amzn Feb 24, 2021 •

edited

Loading

kaituo Feb 24, 2021 •

edited

Loading