Forward porting profile API related PRs #128

kaituo · 2020-05-19T23:45:51Z

Issue #, if available:

Description of changes:
e06cf6f
85d2768
a40ccf6

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…API (opendistro-for-elasticsearch#113) Hash ring helps identify node X runs the AD job for a detector Y with models on node 1,2,3. This helps oncalls locate logs. Total model size gives transparency relating to the current memory usage. What's more, shingle size help answer question "why my detector does not report anything?" This PR adds the above info to profile API via a broadcast call that consults ModelManager and FeatureManager about current state pertaining to a detector. Then these states are consolidated into information humans can parse. This PR also queries all AD result indices instead of only current result index so that we can fetch a stopped detector's error after the result index with the error is rotated. Testing done: 1. add unit tests for the newly added code 2. Run end-to-end testing to verify new profiles make senses when a detector stops running and is running

DetectorProfile's merge does not include new fields added. This PR fixes that. Testing done: * Manually verified profile API works as expected

…icsearch#117) Previously, profile API scans all anomaly result indices to get a detector's most recent error, which can cause performance bottleneck with large anomaly result indices. This PR improves this aspect via various efforts. First, when a detector is running, we only need to scan the current index, not all of the rolled over ones since we are interested in the latest error. Second, when a detector is disabled, we only need to scan the latest anomaly result indices created before the detector's enable time. Third, setting track total hits false makes ES terminate search early. ES will not try to count the number of documents and will be able to end the query as soon as N document have been collected per segment. Testing done: 1. patched a cluster with 1,000 detectors and 2GB anomaly result indices. Without the PR, scanning anomaly result indices 1000 times would timeout after 30 seconds. After the PR, we would not see the timeout. 2. A detector's error message can be on a rotated index. Adds a test case to makes sure we get error info from .opendistro-anomaly-results index that has been rolled over.

kaituo added 3 commits May 19, 2020 16:09

Fix bug in profile API (opendistro-for-elasticsearch#115)

109d1d8

DetectorProfile's merge does not include new fields added. This PR fixes that. Testing done: * Manually verified profile API works as expected

kaituo requested review from ylwu-amzn and yizheliu-amazon May 19, 2020 23:45

yizheliu-amazon approved these changes May 20, 2020

View reviewed changes

ylwu-amzn approved these changes May 20, 2020

View reviewed changes

kaituo merged commit 72a59eb into opendistro-for-elasticsearch:master May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward porting profile API related PRs #128

Forward porting profile API related PRs #128

kaituo commented May 19, 2020

Forward porting profile API related PRs #128

Forward porting profile API related PRs #128

Conversation

kaituo commented May 19, 2020