[ML] Improve sampling and normalization of population chart. #24402

walterra · 2018-10-23T14:21:38Z

Summary

This PR optimizes how contextual data is fetched for the population analysis chart. The previous structure of nested aggregations to get the data was like this:

byTime (date histogram) -> sample (sampling on each time slice) -> entities

This is now changed to the following structure:

sample (sample on whole dataset) => byTime (date histogram) -> entities

Previously the sampling was done inside each time slice. That means the longer the time range queried, the more sampling would have been done. The updated version does the sampling in the outer most aggregation. This makes the sampling and performance more predictable.
Additionally, a random document score is now used to get a more random sample of documents. Sampling takes the top N documents based on their score. Because the previous query was a bool query with must clauses only, all documents had the same score.
And finally, another optimization is the normalization of values for metric functions based on count and sum. In contrast to functions like mean, the results of the former could be heavily skewed because of the sampling. The normalization adjusts the values to take into account the total amount of documents without sampling.

The screenshots above show results before (left) and after (right) the optimization.

The currently used sampling size is now hard coded at 50000, the examples above are just to give you an impression how the sampling and normalization work before and after.
The left versions did the sampling for each time slice so the results look better for lower sampling values, but the queries were also much heavier.
Looking at the left versions, you can see that the results for the contextual data grow with higher sampling, that's the result of the lack of normalization.
The right side with sampling of 200 show very limited results, because the sampling is now applied to the outer aggregation. However, you can already notice the normalization kicking in with some dots showing up higher than any in the left version.
For comparison, the center bottom chart is done without sampling.

Checklist

Checklist not applicable (existing tests pass, no DOM changes).

Part of #21163.

elasticmachine · 2018-10-23T14:21:40Z

Pinging @elastic/ml-ui

elasticmachine · 2018-10-23T16:39:30Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

x-pack/plugins/ml/public/services/results_service.js

peteharverson · 2018-10-24T08:35:55Z

x-pack/plugins/ml/public/services/results_service.js

+            {
+              random_score: {
+                // static seed to get same randomized results on every request
+                seed: 'ml'


Looks like you now need to use a field parameter here as well as seed to get reproducible results - see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-random

Thanks for pointing that out, I was working off some outdated docs. Push an update in 43ced43.

alvarezmelissa87 · 2018-10-24T10:49:55Z

This LGTM overall. 👍 I'm just waiting on responses to Pete's comments since I have less context on that bit.

alvarezmelissa87

LGTM

peteharverson

LGTM

elasticmachine · 2018-10-24T13:17:30Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

walterra · 2018-10-24T13:18:07Z

retest

elasticmachine · 2018-10-24T15:32:45Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

…#24402) This optimizes how contextual data is fetched for the population analysis chart.

…#24508) This optimizes how contextual data is fetched for the population analysis chart.

[ML] Improve sampling and normalization of population chart.

bb00836

walterra added non-issue Indicates to automation that a pull request should not appear in the release notes v7.0.0 :ml Feature:ml-results legacy - do not use v6.5.0 labels Oct 23, 2018

walterra self-assigned this Oct 23, 2018

walterra requested review from alvarezmelissa87, peteharverson and jgowdyelastic October 23, 2018 14:21

walterra mentioned this pull request Oct 23, 2018

[ML] Anomaly Explorer charts #21163

Open

39 tasks

peteharverson reviewed Oct 24, 2018

View reviewed changes

[ML] Tweak random scoring.

43ced43

alvarezmelissa87 approved these changes Oct 24, 2018

View reviewed changes

peteharverson approved these changes Oct 24, 2018

View reviewed changes

walterra merged commit 2c7caee into elastic:master Oct 24, 2018

walterra deleted the ml-population-chart-normalization branch October 24, 2018 15:37

walterra mentioned this pull request Oct 24, 2018

[6.x] [ML] Improve sampling and normalization of population chart. (#24402) #24508

Merged

walterra added a commit to walterra/kibana that referenced this pull request Oct 24, 2018

[ML] Improve sampling and normalization of population chart. (elastic…

2ab9bb8

…#24402) This optimizes how contextual data is fetched for the population analysis chart.

walterra added a commit that referenced this pull request Oct 24, 2018

[ML] Improve sampling and normalization of population chart. (#24402) (…

b616418

…#24508) This optimizes how contextual data is fetched for the population analysis chart.

sophiec20 added the Feature:Anomaly Detection ML anomaly detection label Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Improve sampling and normalization of population chart. #24402

[ML] Improve sampling and normalization of population chart. #24402

walterra commented Oct 23, 2018 •

edited

Loading

elasticmachine commented Oct 23, 2018

elasticmachine commented Oct 23, 2018

peteharverson Oct 24, 2018

walterra Oct 24, 2018

alvarezmelissa87 commented Oct 24, 2018

alvarezmelissa87 left a comment

peteharverson left a comment

elasticmachine commented Oct 24, 2018

walterra commented Oct 24, 2018

elasticmachine commented Oct 24, 2018

[ML] Improve sampling and normalization of population chart. #24402

[ML] Improve sampling and normalization of population chart. #24402

Conversation

walterra commented Oct 23, 2018 • edited Loading

Summary

Checklist

elasticmachine commented Oct 23, 2018

elasticmachine commented Oct 23, 2018

💚 Build Succeeded

peteharverson Oct 24, 2018

Choose a reason for hiding this comment

walterra Oct 24, 2018

Choose a reason for hiding this comment

alvarezmelissa87 commented Oct 24, 2018

alvarezmelissa87 left a comment

Choose a reason for hiding this comment

peteharverson left a comment

Choose a reason for hiding this comment

elasticmachine commented Oct 24, 2018

💔 Build Failed

walterra commented Oct 24, 2018

elasticmachine commented Oct 24, 2018

💚 Build Succeeded

walterra commented Oct 23, 2018 •

edited

Loading