From 33d083c9fbd3f4b5fba5b0d43be460ae182ecd47 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 2 Jun 2021 18:02:37 +0200 Subject: [PATCH] [DOCS] Adds item about fields containing arrays to anomaly detection limitations (#1651) (#1683) Co-authored-by: Lisa Cawley --- .../anomaly-detection/ml-limitations.asciidoc | 37 +++++++++++++++++-- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc index b61d205ff..90e6b5eef 100644 --- a/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc @@ -70,6 +70,31 @@ You cannot use the following field names in the `by_field_name` or `over_field_name` properties in a job: `by`; `count`; `over`. This limitation also applies to those properties when you create advanced jobs in {kib}. + +[discrete] +[[ml-arrays-limitations]] +=== Arrays in analyzed fields are turned into comma-separated strings + +If an {anomaly-job} is configured to analyze an aggregatable field (a field that +is part of the index mapping definition), and this field contains an array, then +the array is turned into a comma-separated concatenated string. The items in the +array are sorted alphabetically and the duplicated items are removed. For +example, the array `["zebra", "dog", "cat", "alligator", "cat"]` becomes +`alligator,cat,dog,zebra`. The Anomaly Explorer charts don't display any results +for the job as the string does not exist in the source data. The Single Metric +Viewer displays results if the model plot is enabled. + +If an array field is not aggregatable and is retrieved from `_source`, the array +is also turned into a comma-separated, concatenated list. However, the list +items are not sorted alphabetically, nor are they deduplicated. Taking the +example above, the comma-separated list, in this case, would be +`zebra,dog,cat,alligator,cat`. + +Analyzing large arrays results in long strings which may require more system +resources. Consider using a query in the {dfeed} that filters on the relevant +items of the array. + + [discrete] [[ml-frozen-limitations]] === Frozen indices are not supported @@ -109,10 +134,11 @@ For more information about any of these functions, see <>. [[ml-limitations-runtime]] === {anomaly-detect-cap} performs better on indexed fields -{anomaly-jobs-cap} sort all data by a user-defined time field, which is frequently -accessed. If the time field is a {ref}/runtime.html[runtime field], the -performance impact of calculating field values at query time can significantly slow -the job. Use an indexed field as a time field when running {anomaly-jobs}. +{anomaly-jobs-cap} sort all data by a user-defined time field, which is +frequently accessed. If the time field is a {ref}/runtime.html[runtime field], +the performance impact of calculating field values at query time can +significantly slow the job. Use an indexed field as a time field when running +{anomaly-jobs}. [discrete] @@ -144,6 +170,7 @@ you send to the job must use the JSON format. For more information about this API, see {ref}/ml-post-data.html[Post Data to Jobs]. + [discrete] === Misleading high missing field counts //See x-pack-elasticsearch/#684 @@ -288,6 +315,7 @@ To avoid this behavior, make sure that the aggregation interval in the {dfeed} configuration and the bucket span in the {anomaly-job} configuration have the same values. + [discrete] [[ml-space-limitations]] === Calendars and filters are visible in all {kib} spaces @@ -298,6 +326,7 @@ that belong to your space. However, this limited scope does not apply to <> and <>; they are visible in all spaces. + [discrete] [[ml-rollup-limitations]] === Rollup indices and index patterns are not supported in {kib}