Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Verifying multi-entity detectors #240

Merged
merged 4 commits into from
Oct 13, 2020

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Oct 7, 2020

Issue #, if available:

Description of changes:

This PR adds categorical fields' number and type check. We only support one categorical field, and the categorical field can only be of type keyword and ip. We also limit the max multi-entity detectors to 10.

Testing done:

  1. added unit tests
  2. did manual testing.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

Merging #240 into master will increase coverage by 1.14%.
The diff coverage is 69.53%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #240      +/-   ##
============================================
+ Coverage     71.70%   72.84%   +1.14%     
- Complexity     1367     1425      +58     
============================================
  Files           157      160       +3     
  Lines          6513     6680     +167     
  Branches        493      508      +15     
============================================
+ Hits           4670     4866     +196     
+ Misses         1610     1567      -43     
- Partials        233      247      +14     
Flag Coverage Δ Complexity Δ
#cli 79.27% <ø> (ø) 0.00 <ø> (ø)
#plugin 72.06% <69.53%> (+1.30%) 1425.00 <26.00> (+58.00)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java 93.54% <ø> (ø) 10.00 <0.00> (ø)
...distroforelasticsearch/ad/constant/CommonName.java 66.66% <ø> (ø) 1.00 <0.00> (ø)
...arch/ad/transport/IndexAnomalyDetectorRequest.java 46.80% <50.00%> (+1.64%) 11.00 <4.00> (+4.00)
...stroforelasticsearch/ad/model/AnomalyDetector.java 64.02% <56.52%> (-2.26%) 51.00 <6.00> (+5.00) ⬇️
...est/handler/IndexAnomalyDetectorActionHandler.java 51.41% <73.23%> (+35.93%) 26.00 <14.00> (+24.00)
...transport/IndexAnomalyDetectorTransportAction.java 92.00% <83.33%> (+0.33%) 3.00 <1.00> (+1.00)
...search/ad/rest/RestIndexAnomalyDetectorAction.java 46.93% <88.88%> (+4.38%) 3.00 <0.00> (ø)
...ticsearch/ad/settings/AnomalyDetectorSettings.java 100.00% <100.00%> (ø) 1.00 <1.00> (ø)
...ransport/DeleteAnomalyDetectorTransportAction.java 59.03% <0.00%> (-2.41%) 16.00% <0.00%> (ø%)
... and 12 more

@kaituo kaituo changed the title Verifying Categorical field's type and total number Verifying multi-entity detectors Oct 8, 2020
}

/**
* Precondition: anomalyDetector.getCategoryField() != null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not true anymore. Removed.

if (shingleSize != null && shingleSize < 1) {
throw new IllegalArgumentException("Shingle size must be a positive integer");
}
if (categoryField != null && categoryField.size() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is magic number, how about we create CATEGORY_FIELD_LIMIT instant and use it at all other places. like IndexAnomalyDetectorActionHandler.java line 269:
if (categoryField.size() != 1) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I like the simpler check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -87,6 +88,7 @@
private final Map<String, Object> uiMetadata;
private final Integer schemaVersion;
private final Instant lastUpdateTime;
private final List<String> categoryField;
Copy link
Contributor

@ylwu-amzn ylwu-amzn Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: if we plan to support multiple category fields, should we name this as "categoryFields" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, changed

Copy link
Contributor

@saratvemulapalli saratvemulapalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good to me.

@@ -87,6 +88,7 @@
private final Map<String, Object> uiMetadata;
private final Integer schemaVersion;
private final Instant lastUpdateTime;
private final List<String> categoryField;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we just use 1 category field today, do we see possible use cases of multiple category fields ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, maybe in the future

@@ -22,6 +22,9 @@
// index name for anomaly checkpoint of each model. One model one document.
public static final String CHECKPOINT_INDEX_NAME = ".opendistro-anomaly-checkpoints";

// The alias of the index in which to write single-entity AD result history
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to write HC detector's result into another index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the plan. The code would have to change a lot for that to happen. e.g., depends on there is categorical field or not, job scheduler will save to different places. Right now, job scheduler does not have access to AnomalyDetector object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, how about remove the "single-entity" from the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -72,6 +75,7 @@
private static final String SHINGLE_SIZE_FIELD = "shingle_size";
private static final String LAST_UPDATE_TIME_FIELD = "last_update_time";
public static final String UI_METADATA_FIELD = "ui_metadata";
public static final String CATEGORY_FIELD = "category_field";
Copy link
Contributor

@ylwu-amzn ylwu-amzn Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change to "category_fields" ? Same question for line 113, 130, and other places. Suggest to replace by searching all files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it later? Both Yizhe and I need to change for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your consideration.

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change

Copy link
Contributor

@saratvemulapalli saratvemulapalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

This PR adds categorical fields' number and length check. We only support one categorical field, and the categorical field can only be of type keyword and ip.  We also limit the max multi-entity detectors to 10.

Testing done:
1. added unit tests
2. did manual testing.
@kaituo kaituo merged commit 85e38ac into opendistro-for-elasticsearch:master Oct 13, 2020
@ohltyler ohltyler added the enhancement New feature or request label Oct 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants