[DOCS] Adds item about fields containing arrays to anomaly detection …

…limitations (elastic#1651) (elastic#1683) Co-authored-by: Lisa Cawley <lcawley@elastic.co>
lcawl · Jun 2, 2021 · 33d083c · 33d083c
1 parent efcbae2
commit 33d083c
Showing 1 changed file with 33 additions and 4 deletions.
diff --git a/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc b/docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc
@@ -70,6 +70,31 @@ You cannot use the following field names in the `by_field_name` or
 `over_field_name` properties in a job: `by`; `count`; `over`. This limitation
 also applies to those properties when you create advanced jobs in {kib}.
 
+
+[discrete]
+[[ml-arrays-limitations]]
+=== Arrays in analyzed fields are turned into comma-separated strings
+
+If an {anomaly-job} is configured to analyze an aggregatable field (a field that 
+is part of the index mapping definition), and this field contains an array, then 
+the array is turned into a comma-separated concatenated string. The items in the 
+array are sorted alphabetically and the duplicated items are removed. For 
+example, the array `["zebra", "dog", "cat", "alligator", "cat"]` becomes 
+`alligator,cat,dog,zebra`. The Anomaly Explorer charts don't display any results 
+for the job as the string does not exist in the source data. The Single Metric 
+Viewer displays results if the model plot is enabled.
+
+If an array field is not aggregatable and is retrieved from `_source`, the array 
+is also turned into a comma-separated, concatenated list. However, the list 
+items are not sorted alphabetically, nor are they deduplicated. Taking the 
+example above, the comma-separated list, in this case, would be
+`zebra,dog,cat,alligator,cat`.
+
+Analyzing large arrays results in long strings which may require more system 
+resources. Consider using a query in the {dfeed} that filters on the relevant 
+items of the array.
+
+
 [discrete]
 [[ml-frozen-limitations]]
 === Frozen indices are not supported
@@ -109,10 +134,11 @@ For more information about any of these functions, see <<ml-functions>>.
 [[ml-limitations-runtime]]
 === {anomaly-detect-cap} performs better on indexed fields
 
-{anomaly-jobs-cap} sort all data by a user-defined time field, which is frequently 
-accessed. If the time field is a {ref}/runtime.html[runtime field], the 
-performance impact of calculating field values at query time can significantly slow
-the job. Use an indexed field as a time field when running {anomaly-jobs}.
+{anomaly-jobs-cap} sort all data by a user-defined time field, which is 
+frequently accessed. If the time field is a {ref}/runtime.html[runtime field], 
+the performance impact of calculating field values at query time can 
+significantly slow the job. Use an indexed field as a time field when running 
+{anomaly-jobs}.
 
 
 [discrete]
@@ -144,6 +170,7 @@ you send to the job must use the JSON format.
 For more information about this API, see
 {ref}/ml-post-data.html[Post Data to Jobs].
 
+
 [discrete]
 === Misleading high missing field counts
 //See x-pack-elasticsearch/#684
@@ -288,6 +315,7 @@ To avoid this behavior, make sure that the aggregation interval in the {dfeed}
 configuration and the bucket span in the {anomaly-job} configuration have the 
 same values.
 
+
 [discrete]
 [[ml-space-limitations]]
 === Calendars and filters are visible in all {kib} spaces
@@ -298,6 +326,7 @@ that belong to your space. However, this limited scope does not apply to
 <<ml-calendars,calendars>> and <<ml-rules,filters>>; they are visible in all
 spaces.
 
+
 [discrete]
 [[ml-rollup-limitations]]
 === Rollup indices and index patterns are not supported in {kib}